.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/02_meta-analysis/plot_meta-analysis_walkthrough.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_02_meta-analysis_plot_meta-analysis_walkthrough.py: .. _meta_walkthrough: ================================================ Run Estimators on a simulated dataset ================================================ PyMARE implements a range of meta-analytic estimators. In this example, we build a simulated dataset with a known ground truth and use it to compare PyMARE's estimators. .. note:: The variance of the true effect is composed of both between-study variance (:math:`\tau^{2}`) and within-study variance (:math:`\sigma^{2}`). Within-study variance is generally taken from sampling variance values from individual studies (``v``), while between-study variance can be estimated via a number of methods. .. GENERATED FROM PYTHON SOURCE LINES 21-32 .. code-block:: Python # sphinx_gallery_thumbnail_number = 3 import matplotlib.pyplot as plt import numpy as np import seaborn as sns from scipy import stats from pymare import core, estimators from pymare.stats import var_to_ci sns.set_style("whitegrid") .. GENERATED FROM PYTHON SOURCE LINES 33-38 Here we simulate a dataset ----------------------------------------------------------------------------- This is a simple dataset with a one-sample design. We are interested in estimating the true effect size from a set of one-sample studies. .. GENERATED FROM PYTHON SOURCE LINES 38-58 .. code-block:: Python N_STUDIES = 40 BETWEEN_STUDY_VAR = 400 # population variance between_study_sd = np.sqrt(BETWEEN_STUDY_VAR) TRUE_EFFECT = 20 sample_sizes = np.round(np.random.normal(loc=50, scale=20, size=N_STUDIES)).astype(int) within_study_vars = np.random.normal(loc=400, scale=100, size=N_STUDIES) study_means = np.random.normal(loc=TRUE_EFFECT, scale=between_study_sd, size=N_STUDIES) sample_sizes[sample_sizes <= 1] = 2 within_study_vars = np.abs(within_study_vars) # Convert data types and match PyMARE nomenclature y = study_means X = np.ones((N_STUDIES)) v = within_study_vars n = sample_sizes sd = np.sqrt(v * n) z = y / sd p = stats.norm.sf(abs(z)) * 2 .. GENERATED FROM PYTHON SOURCE LINES 59-61 Plot variable distributions ----------------------------------------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 61-76 .. code-block:: Python fig, axes = plt.subplots(nrows=5, figsize=(6, 5)) sns.distplot(y, ax=axes[0], bins=20) axes[0].set_title("y") sns.distplot(v, ax=axes[1], bins=20) axes[1].set_title("v") sns.distplot(n, ax=axes[2], bins=20) axes[2].set_title("n") sns.distplot(z, ax=axes[3], bins=20) axes[3].set_title("z") sns.distplot(p, ax=axes[4], bins=20) axes[4].set_title("p") for i in range(5): axes[i].set_yticks([]) fig.tight_layout() .. image-sg:: /auto_examples/02_meta-analysis/images/sphx_glr_plot_meta-analysis_walkthrough_001.png :alt: y, v, n, z, p :srcset: /auto_examples/02_meta-analysis/images/sphx_glr_plot_meta-analysis_walkthrough_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/pymare/checkouts/0.0.9/examples/02_meta-analysis/plot_meta-analysis_walkthrough.py:62: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(y, ax=axes[0], bins=20) /home/docs/checkouts/readthedocs.org/user_builds/pymare/checkouts/0.0.9/examples/02_meta-analysis/plot_meta-analysis_walkthrough.py:64: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(v, ax=axes[1], bins=20) /home/docs/checkouts/readthedocs.org/user_builds/pymare/checkouts/0.0.9/examples/02_meta-analysis/plot_meta-analysis_walkthrough.py:66: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(n, ax=axes[2], bins=20) /home/docs/checkouts/readthedocs.org/user_builds/pymare/checkouts/0.0.9/examples/02_meta-analysis/plot_meta-analysis_walkthrough.py:68: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(z, ax=axes[3], bins=20) /home/docs/checkouts/readthedocs.org/user_builds/pymare/checkouts/0.0.9/examples/02_meta-analysis/plot_meta-analysis_walkthrough.py:70: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(p, ax=axes[4], bins=20) .. GENERATED FROM PYTHON SOURCE LINES 77-81 Plot means and confidence intervals ----------------------------------- Here we can show study-wise mean effect and CIs, along with the true effect and CI corresponding to the between-study variance. .. GENERATED FROM PYTHON SOURCE LINES 81-108 .. code-block:: Python fig, ax = plt.subplots(figsize=(6, 14)) study_ticks = np.arange(N_STUDIES) # Get 95% CI for individual studies lower_bounds, upper_bounds = var_to_ci(y, v, n) ax.scatter(y, study_ticks + 1) for study in study_ticks: ax.plot((lower_bounds[study], upper_bounds[study]), (study + 1, study + 1), color="blue") ax.axvline(0, color="gray", alpha=0.2, linestyle="--", label="Zero") ax.axvline(np.mean(y), color="orange", alpha=0.2, label="Mean of Observed Effects") # Get 95% CI for true effect lower_bound, upper_bound = var_to_ci(TRUE_EFFECT, BETWEEN_STUDY_VAR, 1) ax.scatter((TRUE_EFFECT,), (N_STUDIES + 1,), color="green", label="True Effect") ax.plot( (lower_bound, upper_bound), (N_STUDIES + 1, N_STUDIES + 1), color="green", linewidth=3, label="Between-Study 95% CI", ) ax.set_ylim((0, N_STUDIES + 2)) ax.set_xlabel("Mean (95% CI)") ax.set_ylabel("Study") ax.legend() fig.tight_layout() .. image-sg:: /auto_examples/02_meta-analysis/images/sphx_glr_plot_meta-analysis_walkthrough_002.png :alt: plot meta analysis walkthrough :srcset: /auto_examples/02_meta-analysis/images/sphx_glr_plot_meta-analysis_walkthrough_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 109-111 Create a Dataset object containing the data -------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 111-116 .. code-block:: Python dset = core.Dataset(y=y, X=None, v=v, n=n, add_intercept=True) # Here is a dictionary to house results across models results = {} .. GENERATED FROM PYTHON SOURCE LINES 117-141 Fit models ----------------------------------------------------------------------------- When you have ``z`` or ``p``: - :class:`pymare.estimators.StoufferCombinationTest` - :class:`pymare.estimators.FisherCombinationTest` When you have ``y`` and ``v`` and don't want to estimate between-study variance: - :class:`pymare.estimators.WeightedLeastSquares` When you have ``y`` and ``v`` and want to estimate between-study variance: - :class:`pymare.estimators.DerSimonianLaird` - :class:`pymare.estimators.Hedges` - :class:`pymare.estimators.VarianceBasedLikelihoodEstimator` When you have ``y`` and ``n`` and want to estimate between-study variance: - :class:`pymare.estimators.SampleSizeBasedLikelihoodEstimator` When you have ``y`` and ``v`` and want a hierarchical model: - :class:`pymare.estimators.StanMetaRegression` .. GENERATED FROM PYTHON SOURCE LINES 143-148 First, we have "combination models", which combine p and/or z values ````````````````````````````````````````````````````````````````````````````` The two combination models in PyMARE are Stouffer's and Fisher's Tests. Notice that these models don't use :class:`~pymare.core.Dataset` objects. .. GENERATED FROM PYTHON SOURCE LINES 148-159 .. code-block:: Python stouff = estimators.StoufferCombinationTest() stouff.fit(z[:, None]) print("Stouffers") print("p: {}".format(stouff.params_["p"])) print() fisher = estimators.FisherCombinationTest() fisher.fit(z[:, None]) print("Fishers") print("p: {}".format(fisher.params_["p"])) .. rst-class:: sphx-glr-script-out .. code-block:: none Stouffers p: [0.1773233] Fishers p: [0.87163814] .. GENERATED FROM PYTHON SOURCE LINES 160-164 Now we have a fixed effects model ````````````````````````````````````````````````````````````````````````````` This estimator does not attempt to estimate between-study variance. Instead, it takes ``tau2`` (:math:`\tau^{2}`) as an argument. .. GENERATED FROM PYTHON SOURCE LINES 164-171 .. code-block:: Python wls = estimators.WeightedLeastSquares() wls.fit_dataset(dset) wls_summary = wls.summary() results["Weighted Least Squares"] = wls_summary.to_df() print("Weighted Least Squares") print(wls_summary.to_df().T) .. rst-class:: sphx-glr-script-out .. code-block:: none Weighted Least Squares 0 name intercept estimate 19.064697 se 3.217642 z-score 5.925053 p-value 0.0 ci_0.025 12.758235 ci_0.975 25.371158 .. GENERATED FROM PYTHON SOURCE LINES 172-182 Methods that estimate between-study variance ````````````````````````````````````````````````````````````````````````````` The ``DerSimonianLaird``, ``Hedges``, and ``VarianceBasedLikelihoodEstimator`` estimators all estimate between-study variance from the data, and use ``y`` and ``v``. ``DerSimonianLaird`` and ``Hedges`` use relatively simple methods for estimating between-study variance, while ``VarianceBasedLikelihoodEstimator`` can use either maximum-likelihood (ML) or restricted maximum-likelihood (REML) to iteratively estimate it. .. GENERATED FROM PYTHON SOURCE LINES 182-232 .. code-block:: Python dsl = estimators.DerSimonianLaird() dsl.fit_dataset(dset) dsl_summary = dsl.summary() results["DerSimonian-Laird"] = dsl_summary.to_df() print("DerSimonian-Laird") print(dsl_summary.to_df().T) print() hedge = estimators.Hedges() hedge.fit_dataset(dset) hedge_summary = hedge.summary() results["Hedges"] = hedge_summary.to_df() print("Hedges") print(hedge_summary.to_df().T) print() vb_ml = estimators.VarianceBasedLikelihoodEstimator(method="ML") vb_ml.fit_dataset(dset) vb_ml_summary = vb_ml.summary() results["Variance-Based with ML"] = vb_ml_summary.to_df() print("Variance-Based with ML") print(vb_ml_summary.to_df().T) print() vb_reml = estimators.VarianceBasedLikelihoodEstimator(method="REML") vb_reml.fit_dataset(dset) vb_reml_summary = vb_reml.summary() results["Variance-Based with REML"] = vb_reml_summary.to_df() print("Variance-Based with REML") print(vb_reml_summary.to_df().T) print() # The ``SampleSizeBasedLikelihoodEstimator`` estimates between-study variance # using ``y`` and ``n``, but assumes within-study variance is homogenous # across studies. sb_ml = estimators.SampleSizeBasedLikelihoodEstimator(method="ML") sb_ml.fit_dataset(dset) sb_ml_summary = sb_ml.summary() results["Sample Size-Based with ML"] = sb_ml_summary.to_df() print("Sample Size-Based with ML") print(sb_ml_summary.to_df().T) print() sb_reml = estimators.SampleSizeBasedLikelihoodEstimator(method="REML") sb_reml.fit_dataset(dset) sb_reml_summary = sb_reml.summary() results["Sample Size-Based with REML"] = sb_reml_summary.to_df() print("Sample Size-Based with REML") print(sb_reml_summary.to_df().T) .. rst-class:: sphx-glr-script-out .. code-block:: none DerSimonian-Laird 0 name intercept estimate 19.064697 se 3.217642 z-score 5.925053 p-value 0.0 ci_0.025 12.758235 ci_0.975 25.371158 Hedges 0 name intercept estimate 19.064697 se 0.158114 z-score 120.575728 p-value 0.0 ci_0.025 18.754799 ci_0.975 19.374594 Variance-Based with ML 0 name intercept estimate 19.064697 se 3.217642 z-score 5.925053 p-value 0.0 ci_0.025 12.758235 ci_0.975 25.371158 Variance-Based with REML 0 name intercept estimate 19.064697 se 3.217642 z-score 5.925053 p-value 0.0 ci_0.025 12.758235 ci_0.975 25.371158 Sample Size-Based with ML 0 name intercept estimate 19.840907 se 2.792072 z-score 7.10616 p-value 0.0 ci_0.025 14.368547 ci_0.975 25.313267 Sample Size-Based with REML 0 name intercept estimate 19.83974 se 2.827161 z-score 7.017548 p-value 0.0 ci_0.025 14.298606 ci_0.975 25.380874 .. GENERATED FROM PYTHON SOURCE LINES 233-237 What about the Stan estimator? ````````````````````````````````````````````````````````````````````````````` We're going to skip this one here because of how computationally intensive it is. .. GENERATED FROM PYTHON SOURCE LINES 239-241 Let's check out our results! ````````````````````````````````````````````````````````````````````````````` .. GENERATED FROM PYTHON SOURCE LINES 241-265 .. code-block:: Python fig, ax = plt.subplots(figsize=(6, 6)) for i, (estimator_name, summary_df) in enumerate(results.items()): ax.scatter((summary_df.loc[0, "estimate"],), (i + 1,), label=estimator_name) ax.plot( (summary_df.loc[0, "ci_0.025"], summary_df.loc[0, "ci_0.975"]), (i + 1, i + 1), linewidth=3, ) # Get 95% CI for true effect lower_bound, upper_bound = var_to_ci(TRUE_EFFECT, BETWEEN_STUDY_VAR, 1) ax.scatter((TRUE_EFFECT,), (i + 2,), label="True Effect") ax.plot( (lower_bound, upper_bound), (i + 2, i + 2), linewidth=3, label="Between-Study 95% CI", ) ax.set_ylim((0, i + 3)) ax.set_yticklabels([None] + list(results.keys()) + ["True Effect"]) ax.set_xlabel("Mean (95% CI)") fig.tight_layout() .. image-sg:: /auto_examples/02_meta-analysis/images/sphx_glr_plot_meta-analysis_walkthrough_003.png :alt: plot meta analysis walkthrough :srcset: /auto_examples/02_meta-analysis/images/sphx_glr_plot_meta-analysis_walkthrough_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/pymare/checkouts/0.0.9/examples/02_meta-analysis/plot_meta-analysis_walkthrough.py:261: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator. ax.set_yticklabels([None] + list(results.keys()) + ["True Effect"]) .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.923 seconds) .. _sphx_glr_download_auto_examples_02_meta-analysis_plot_meta-analysis_walkthrough.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_meta-analysis_walkthrough.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_meta-analysis_walkthrough.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_meta-analysis_walkthrough.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_