Note
Go to the end to download the full example code
Creating a dataset
In PyMARE, operations are performed on Dataset
objects.
Datasets are very lightweight objects that store the data used for
meta-analyses, including study-level estimates (y), variances (v),
predictors (X), and sample sizes (n).
Start with the necessary imports
from pprint import pprint
import pandas as pd
from pymare import core
Datasets can be created from arrays
The simplest way to create a dataset is to pass in arguments as numpy arrays.
y
refers to the study-level estimates, v
to the variances,
X
to any study-level regressors, and n
to the sample sizes.
Not all Estimators require all of these arguments, so not all need to be used in a given Dataset.
{'X': array([[1., 5., 9.],
[1., 2., 8.],
[1., 1., 7.]]),
'X_names': ['intercept', 'X1', 'X7'],
'n': None,
'v': array([[100],
[100],
[100]]),
'y': array([[2],
[4],
[6]])}
Datasets have the to_df()
method.
dataset.to_df()
Datasets can also be created from pandas DataFrames
{'X': array([[5, 9],
[2, 8],
[1, 7]]),
'X_names': ['X1', 'X7'],
'n': None,
'v': array([[100],
[100],
[100]]),
'y': array([[2],
[4],
[6]])}
Datasets can also contain multiple dependent variables
These variables are analyzed in parallel, but as unrelated variables, rather than as potentially correlated ones.
This is particularly useful for image-based neuroimaging meta-analyses. For more information about this, see NiMARE.
y = [
[2, 4, 6], # Estimates for first study's three outcome variables.
[3, 2, 1], # Estimates for second study's three outcome variables.
]
v = [
[100, 100, 100], # Estimate variances for first study's three outcome variables.
[8, 4, 2], # Estimate variances for second study's three outcome variables.
]
X = [
[5, 9], # Predictors for first study. Same across all three outcome variables.
[2, 8], # Predictors for second study. Same across all three outcome variables.
]
dataset = core.Dataset(y=y, v=v, X=X, X_names=["X1", "X7"])
pprint(vars(dataset))
{'X': array([[1., 5., 9.],
[1., 2., 8.]]),
'X_names': ['intercept', 'X1', 'X7'],
'n': None,
'v': array([[100, 100, 100],
[ 8, 4, 2]]),
'y': array([[2, 4, 6],
[3, 2, 1]])}
Total running time of the script: (0 minutes 0.016 seconds)