Creating a dataset

In PyMARE, operations are performed on Dataset objects. Datasets are very lightweight objects that store the data used for meta-analyses, including study-level estimates (y), variances (v), predictors (X), and sample sizes (n).

Start with the necessary imports

from pprint import pprint

import pandas as pd

from pymare import core

Datasets can be created from arrays

The simplest way to create a dataset is to pass in arguments as numpy arrays.

y refers to the study-level estimates, v to the variances, X to any study-level regressors, and n to the sample sizes.

Not all Estimators require all of these arguments, so not all need to be used in a given Dataset.

y = [2, 4, 6]
v = [100, 100, 100]
X = [[5, 9], [2, 8], [1, 7]]

dataset = core.Dataset(y=y, v=v, X=X, X_names=["X1", "X7"])

pprint(vars(dataset))

Out:

{'X': array([[1., 5., 9.],
       [1., 2., 8.],
       [1., 1., 7.]]),
 'X_names': ['intercept', 'X1', 'X7'],
 'n': None,
 'v': array([[100],
       [100],
       [100]]),
 'y': array([[2],
       [4],
       [6]])}

Datasets have the to_df() method.

dataset.to_df()
y v intercept X1 X7
0 2 100 1.0 5.0 9.0
1 4 100 1.0 2.0 8.0
2 6 100 1.0 1.0 7.0


Datasets can also be created from pandas DataFrames

df = pd.DataFrame(
    {
        "y": [2, 4, 6],
        "v_alt": [100, 100, 100],
        "X1": [5, 2, 1],
        "X7": [9, 8, 7],
    }
)

dataset = core.Dataset(v="v_alt", X=["X1", "X7"], data=df, add_intercept=False)

pprint(vars(dataset))

Out:

{'X': array([[5, 9],
       [2, 8],
       [1, 7]]),
 'X_names': ['X1', 'X7'],
 'n': None,
 'v': array([[100],
       [100],
       [100]]),
 'y': array([[2],
       [4],
       [6]])}

Datasets can also contain multiple dependent variables

These variables are analyzed in parallel, but as unrelated variables, rather than as potentially correlated ones.

This is particularly useful for image-based neuroimaging meta-analyses. For more information about this, see NiMARE.

y = [
    [2, 4, 6],  # Estimates for first study's three outcome variables.
    [3, 2, 1],  # Estimates for second study's three outcome variables.
]
v = [
    [100, 100, 100],  # Estimate variances for first study's three outcome variables.
    [8, 4, 2],  # Estimate variances for second study's three outcome variables.
]
X = [
    [5, 9],  # Predictors for first study. Same across all three outcome variables.
    [2, 8],  # Predictors for second study. Same across all three outcome variables.
]

dataset = core.Dataset(y=y, v=v, X=X, X_names=["X1", "X7"])

pprint(vars(dataset))

Out:

{'X': array([[1., 5., 9.],
       [1., 2., 8.]]),
 'X_names': ['intercept', 'X1', 'X7'],
 'n': None,
 'v': array([[100, 100, 100],
       [  8,   4,   2]]),
 'y': array([[2, 4, 6],
       [3, 2, 1]])}

Total running time of the script: ( 0 minutes 0.015 seconds)

Gallery generated by Sphinx-Gallery