top of page

What's New in Statistics for Data Analysis V.29

# INDEXā

## Introduction

Intrduzione

Statistics for Data Analysis powered by SPSS is a comprehensive solution that addresses all facets of the analytical process from data preparation and management to analysis and reporting.

Statistics for Data Analysis V29 introduces many new features, including a new overview tab in the Data Editor, OLS Regression, better integration with open-source extensions and other enhancements that have been designed to streamline your daily work.

novitĆ  v.29

## Data Editor

Near Data View and Variable View, a new Overview tab provides information on characteristics of the data in a dataset or file, with summaries of variable types, measurement levels, missing data, and allows drill down into individual variables with appropriate charts and summary statistics based on measurement level definitions.

Visualizza dati

## Violin Plot

Violin plots have been added to the Graphboard Template Chooser. These plots are a hybrid of the box plots and kernel density plots. Violin plots show peaks in the data and are used to visualize the distribution of scale variables. Unlike a box plot that can only show summary statistics, violin plots depict summary statistics and the density of each variable.

Grafici violino

## New Analyze Procedures

Nuove prcedure di analisi

### Linear OLS Alternatives: Lasso, Ridge and Elastic Net

The new 29 version of Statistics for Data Analysis powered by SPSS includes three new regression procedures that employ different forms of regularization:  Lasso, Ridge e Elastic Net.

All of these techniques are optimised to prevent problems of over-fitting that are commonly associated with ordinary least squares regression. Generally speaking, these regularization techniques work by penalizing large model coefficients.

ā

Lasso

Often referred to as L1 regularization, the Lasso procedure (Least Absolute Shrinkage and Selection Operator) works by penalising the least important features, shrinking them towards zero. It is therefore useful for feature selection, as the weak variables are effectively nullified, thus simplifying the final model.

Ridge

L2 regularization, known as Ridge regression, tends to penalize coefficients in a more even manner than L1. As well as creating more generalisable models, it’s commonly employed when dealing with issues of multicollinearity.

Elastic Net

Elastic Net combines Lasso (L1) and Ridge regression (L2), which may result in a more balanced model if each individual method is in some way sub-optimal

Trace plot output from the Elastic Net procedure showing the effect on model coefficients using different values of the Alpha hyperparameter.

## Modelli AFT (Parametric Accelerated Failure Time)

Statistics for Data Analysis V. 29 introduce una nuova tecnica alla famiglia delle procedure di analisi della sopravvivenza.

A differenza delle esistenti tavole di sopravvivenza, Kaplan-Meier e Cox Regression, il nuovo Accelerated Failure Time Model è di natura parametrica. Ciò significa che si presume che la variabile dipendente segua una distribuzione specifica.

I modelli parametrici sono spesso considerati meno flessibili dei modelli non parametrici, ma se la variabile outcome segue una distribuzione nota, questo tipo di procedure può essere molto utile.

Mentre i modelli di rischio proporzionale presuppongono che l'effetto di una covariata sia quello di moltiplicare il rischio per una costante, un modello AFT presuppone che gli effetti della covariata accelerino o decelerino la sopravvivenza di una costante.

Questa capacità può essere utile per i ricercatori che studiano il tempo al fallimento accelerato con i loro effetti del modello proporzionali rispetto al tempo di sopravvivenza.

La nuova procedura supporta modelli parametrici basati sulle distribuzioni Weibull, Esponenziale, Log-Normale e Log-Logistica.

Questa nuova funzione richiede il Modulo Statistics Advanced.

### Pseudo-R2 measures in Linear Mixed Models and Generalized Linear Mixed Models

The output from Linear Mixed Models and Generalized Linear Mixed Models includes pseudo-R2 measures and the intra-class correlation coefficient. R2 is a commonly reported fit statistic indicating the proportion of variance explained by a linear model. The intra-class correlation coefficient (ICC) is a related statistic that indicates how much variance is explained by a grouping (random) factor in multilevel/ hierarchical data.

Miglioramenti

Miglioramenti

Statistics for Data Analysis V. 29 introduce anche diversi miglioramenti che facilitano il lavoro quotidiano dell’analista. Di seguito si ripotano quelli più significativi.

All’interno della Barra degli strumenti troviamo due nuovi pulsanti: Mostra/Nascondi tutte le finestre di sintassi e Cancella tutto l'output.

Inoltre, è stato aggiunto un nuovo pulsante nella barra di stato che consente di passare rapidamente dalla modalità classica (finestre di output e sintassi separate) alla modalità cartella di lavoro.

### Search

The Search feature provides options for entering terms directly into a toolbar allowing users to view search results in a drop-down pane.

### Dialog Box Functions

Percentiles

A new Percentiles dialog box (Analyze/ Descriptive Statistics/ Percentiles) is available for full access to all percentiles functions in the EXAMINE procedure, including specification of quartiles or custom percentiles, by using any of the five available estimation methods, and bootstrapped confidence intervals.

Linear Regression

3 new functions are available in the Linear Regression menu (Analyze/ Regression/ Linear):

1. From the Statistics submenu it is possible to select Selection Criteria, which recalls the statistics in the output

2. From the Statistics submenu you can obtain the predicted residual sum of squares (PRESS) statistic, a popular cross-validation-type statistic for assessing linear regression.

3. From Options submenu you can specify the Tolerance keyword level for the criteria subcommand for handling variables exhibiting near collinearity.

Cox Regression

It is available the new Cox w/ Time-Dep Cov dialog box that allows specification/computations of multiple time-dependent covariates for Cox regression models.

### New Command syntax

ā

SURVREG RECURRENT

Estimates parametric survival models for recurrent events data via incorporation of a shared frailty term. This term is treated as a random component to account for an unobserved effect due to individual or group-level variability.

MIXED

Adds an OUTFILE subcommand with an EBLUPS keyword to export EBLUPs or random effects parameter predictions to datasets or .sav files. If multiple sets of EBLUPs are requested on RANDOM subcommands via SOLUTION keywords, the FILE_SEPARATE keyword can be used with TRUE or FALSE to save predictions in one or multiple data sets or files.

GENLINMIXED

Adds an EBLUPS keyword to the OUTFILE subcommand to export EBLUPs or random effects parameter predictions to datasets or .sav files. If multiple sets of EBLUPs are requested on RANDOM subcommands via SOLUTION keywords, the FILE_SEPARATE keyword can be used with TRUE or FALSE to save predictions in one or multiple data sets or files.

Version comparison