6.2. Pruebas T, parte 1#

Para realizar las pruebas t con inferencia Bayesiana el método más sencillo es utilizar pingouin que produce por defecto factores de Bayes (BF) además del valor p frecuentista.

6.2.1. Librerías y datos#

Utilizaremos los palmer penguins que ya hemos revisado en el curso.

import pandas as pd
import pingouin as pg
import seaborn as sns

data = pg.read_dataset('penguins')
data.head()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Biscoe 37.8 18.3 174.0 3400.0 female
1 Adelie Biscoe 37.7 18.7 180.0 3600.0 male
2 Adelie Biscoe 35.9 19.2 189.0 3800.0 female
3 Adelie Biscoe 38.2 18.1 185.0 3950.0 male
4 Adelie Biscoe 38.8 17.2 180.0 3800.0 male

6.2.2. Visualización de grupos.#

Probemos la forma más sencilla del análisis comparando body_mass_g de acuerdo a la variable sex.

clean_data = data.dropna(subset=['body_mass_g', 'sex'])

sns.catplot(clean_data, x='sex', y='body_mass_g', hue='species', kind='box')
<seaborn.axisgrid.FacetGrid at 0x1862d9925d0>
../_images/8b252ff5f5135ed6d8a414acaeab1d7b6e6f46207007e18b365cc3607ccdaf94.png
sns.displot(
    clean_data,
    x='body_mass_g',
    hue='sex',
    kind='kde',
    col='species'
)
<seaborn.axisgrid.FacetGrid at 0x18630c1a210>
../_images/c2947ac51055ce90eb73c645bd9c038788deeadb9c76d7f391c01c45cd23ccc8.png

Vemos que efectivamente hay una diferencia en peso corporal con respecto al sexo y a la especie. Probemos ahora el análisis.

Antes de ejecutar el análisis, conviene leer la documentación de la función. Te sugiero leerlo en la página de pingouin, pero también podemos verlo con la función help.

help(pg.ttest)
Help on function ttest in module pingouin.parametric:

ttest(
    x,
    y,
    paired=False,
    alternative='two-sided',
    correction='auto',
    r=0.707,
    confidence=0.95
)
    T-test.

    Parameters
    ----------
    x : array_like
        First set of observations.
    y : array_like or float
        Second set of observations. If ``y`` is a single value, a one-sample
        T-test is computed against that value (= "mu" in the t.test R
        function).
    paired : boolean
        Specify whether the two observations are related (i.e. repeated
        measures) or independent.
    alternative : string
        Defines the alternative hypothesis, or tail of the test. Must be one of
        "two-sided" (default), "greater" or "less". Both "greater" and "less" return one-sided
        p-values. "greater" tests against the alternative hypothesis that the mean of ``x``
        is greater than the mean of ``y``.
    correction : string or boolean
        For unpaired two sample T-tests, specify whether or not to correct for
        unequal variances using Welch separate variances T-test. If 'auto', it
        will automatically uses Welch T-test when the sample sizes are unequal,
        as recommended by Zimmerman 2004.
    r : float
        Cauchy scale factor for computing the Bayes Factor.
        Smaller values of r (e.g. 0.5), may be appropriate when small effect
        sizes are expected a priori; larger values of r are appropriate when
        large effect sizes are expected (Rouder et al 2009).
        The default is 0.707 (= :math:`\sqrt{2} / 2`).
    confidence : float
        Confidence level for the confidence intervals (0.95 = 95%)

        .. versionadded:: 0.3.9

    Returns
    -------
    stats : :py:class:`pandas.DataFrame`

        * ``'T'``: T-value
        * ``'dof'``: degrees of freedom
        * ``'alternative'``: alternative of the test
        * ``'p-val'``: p-value
        * ``'CI95%'``: confidence intervals of the difference in means
        * ``'cohen-d'``: Cohen d effect size
        * ``'BF10'``: Bayes Factor of the alternative hypothesis
        * ``'power'``: achieved power of the test ( = 1 - type II error)

    See also
    --------
    mwu, wilcoxon, anova, rm_anova, pairwise_tests, compute_effsize

    Notes
    -----
    Missing values are automatically removed from the data. If ``x`` and
    ``y`` are paired, the entire row is removed (= listwise deletion).

    The **T-value for unpaired samples** is defined as:

    .. math::

        t = \frac{\overline{x} - \overline{y}}
        {\sqrt{\frac{s^{2}_{x}}{n_{x}} + \frac{s^{2}_{y}}{n_{y}}}}

    where :math:`\overline{x}` and :math:`\overline{y}` are the sample means,
    :math:`n_{x}` and :math:`n_{y}` are the sample sizes, and
    :math:`s^{2}_{x}` and :math:`s^{2}_{y}` are the sample variances.
    The degrees of freedom :math:`v` are :math:`n_x + n_y - 2` when the sample
    sizes are equal. When the sample sizes are unequal or when
    :code:`correction=True`, the Welch–Satterthwaite equation is used to
    approximate the adjusted degrees of freedom:

    .. math::

        v = \frac{(\frac{s^{2}_{x}}{n_{x}} + \frac{s^{2}_{y}}{n_{y}})^{2}}
        {\frac{(\frac{s^{2}_{x}}{n_{x}})^{2}}{(n_{x}-1)} +
        \frac{(\frac{s^{2}_{y}}{n_{y}})^{2}}{(n_{y}-1)}}

    The p-value is then calculated using a T distribution with :math:`v`
    degrees of freedom.

    The T-value for **paired samples** is defined by:

    .. math:: t = \frac{\overline{x}_d}{s_{\overline{x}}}

    where

    .. math:: s_{\overline{x}} = \frac{s_d}{\sqrt n}

    where :math:`\overline{x}_d` is the sample mean of the differences
    between the two paired samples, :math:`n` is the number of observations
    (sample size), :math:`s_d` is the sample standard deviation of the
    differences and :math:`s_{\overline{x}}` is the estimated standard error
    of the mean of the differences. The p-value is then calculated using a
    T-distribution with :math:`n-1` degrees of freedom.

    The scaled Jeffrey-Zellner-Siow (JZS) Bayes Factor is approximated
    using the :py:func:`pingouin.bayesfactor_ttest` function.

    Results have been tested against JASP and the `t.test` R function.

    References
    ----------
    * https://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm

    * Delacre, M., Lakens, D., & Leys, C. (2017). Why psychologists should
      by default use Welch’s t-test instead of Student’s t-test.
      International Review of Social Psychology, 30(1).

    * Zimmerman, D. W. (2004). A note on preliminary tests of equality of
      variances. British Journal of Mathematical and Statistical
      Psychology, 57(1), 173-181.

    * Rouder, J.N., Speckman, P.L., Sun, D., Morey, R.D., Iverson, G.,
      2009. Bayesian t tests for accepting and rejecting the null
      hypothesis. Psychon. Bull. Rev. 16, 225–237.
      https://doi.org/10.3758/PBR.16.2.225

    Examples
    --------
    1. One-sample T-test.

    >>> from pingouin import ttest
    >>> x = [5.5, 2.4, 6.8, 9.6, 4.2]
    >>> ttest(x, 4).round(2)
              T  dof alternative  p-val         CI95%  cohen-d   BF10  power
    T-test  1.4    4   two-sided   0.23  [2.32, 9.08]     0.62  0.766   0.19

    2. One sided paired T-test.

    >>> pre = [5.5, 2.4, 6.8, 9.6, 4.2]
    >>> post = [6.4, 3.4, 6.4, 11., 4.8]
    >>> ttest(pre, post, paired=True, alternative='less').round(2)
               T  dof alternative  p-val          CI95%  cohen-d   BF10  power
    T-test -2.31    4        less   0.04  [-inf, -0.05]     0.25  3.122   0.12

    Now testing the opposite alternative hypothesis

    >>> ttest(pre, post, paired=True, alternative='greater').round(2)
               T  dof alternative  p-val         CI95%  cohen-d  BF10  power
    T-test -2.31    4     greater   0.96  [-1.35, inf]     0.25  0.32   0.02

    3. Paired T-test with missing values.

    >>> import numpy as np
    >>> pre = [5.5, 2.4, np.nan, 9.6, 4.2]
    >>> post = [6.4, 3.4, 6.4, 11., 4.8]
    >>> ttest(pre, post, paired=True).round(3)
                T  dof alternative  p-val          CI95%  cohen-d   BF10  power
    T-test -5.902    3   two-sided   0.01  [-1.5, -0.45]    0.306  7.169  0.073

    Compare with SciPy

    >>> from scipy.stats import ttest_rel
    >>> np.round(ttest_rel(pre, post, nan_policy="omit"), 3)
    array([-5.902,  0.01 ])

    4. Independent two-sample T-test with equal sample size.

    >>> np.random.seed(123)
    >>> x = np.random.normal(loc=7, size=20)
    >>> y = np.random.normal(loc=4, size=20)
    >>> ttest(x, y)
                   T  dof alternative         p-val         CI95%   cohen-d       BF10  power
    T-test  9.106452   38   two-sided  4.306971e-11  [2.64, 4.15]  2.879713  1.366e+08    1.0

    5. Independent two-sample T-test with unequal sample size. A Welch's T-test is used.

    >>> np.random.seed(123)
    >>> y = np.random.normal(loc=6.5, size=15)
    >>> ttest(x, y)
                   T        dof alternative     p-val          CI95%   cohen-d   BF10     power
    T-test  1.996537  31.567592   two-sided  0.054561  [-0.02, 1.65]  0.673518  1.469  0.481867

    6. However, the Welch's correction can be disabled:

    >>> ttest(x, y, correction=False)
                   T  dof alternative     p-val          CI95%   cohen-d   BF10     power
    T-test  1.971859   33   two-sided  0.057056  [-0.03, 1.66]  0.673518  1.418  0.481867

    Compare with SciPy

    >>> from scipy.stats import ttest_ind
    >>> np.round(ttest_ind(x, y, equal_var=True), 6)  # T value and p-value
    array([1.971859, 0.057056])
condition = (clean_data['sex'] == 'female')
fem = clean_data[condition]['body_mass_g']
male = clean_data[~condition]['body_mass_g'] # operador not

# prueba t

pg.ttest(fem, male)
T dof alternative p-val CI95% cohen-d BF10 power
T-test -8.554537 323.895881 two-sided 4.793891e-16 [-840.58, -526.25] 0.936205 1.142e+13 1.0

El resultado incluye el estadístico t, los grados de libertad, el tipo de hipótesis, el valor p, el intervalo de confianza para la diferencia de medias, el tamaño del efecto (d de Cohen), el factor de Bayes \(BF_{10}\), y el poder post-hoc del análisis.

En este caso, se utilizó un valor por defecto de \(r=0.707\), que corresponde al parámetro de escala de una distribución Cauchy centrada en 0, utilizada como prior sobre el tamaño del efecto bajo la hipótesis alternativa.

El factor r, bajo la distribución Cauchy del tamaño del efecto se define con la siguiente fórmula, el factor por defecto fue descrito por Rouder (2009) y se utiliza en muchas implementaciones en Python, R, JASP, etc. Se calcula de la siguiente forma:

\[ \frac{1}{\sqrt{2}} \approx 0.707 \]

Rouder, J.N., Speckman, P.L., Sun, D., Morey, R.D., Iverson, G., Bayesian t tests for accepting and rejecting the null hypothesis. Psychon. Bull. Rev. 16, 225–237. https://doi.org/10.3758/PBR.16.2.225

Pero si contáramos con estudios previos que sugirieran un tamaño del efecto diferente, el análisis debería contemplarlo. Analicemos la diferencia global del peso entre los Chinstrap y los Adelie.

condition = (clean_data['sex'] == 'female') & (clean_data['species'] != 'Gentoo')
sns.displot(clean_data[condition], x='body_mass_g', hue='species', kind='kde')
<seaborn.axisgrid.FacetGrid at 0x1863346a990>
../_images/9f271c6679ca8e1bebfe13d4ca211eed13ab6a3a07b845d57e8588e1152a32e0.png

Vemos que la diferencia es minúscula. Veamos cómo cambia el factor de Bayes con diferentes valores de r

chinstrap = clean_data[clean_data['species'] == 'Chinstrap']['body_mass_g']
adelie = clean_data[clean_data['species'] == 'Adelie']['body_mass_g']


result = pd.concat([
    pg.ttest(adelie, chinstrap, r=r).assign(r=r)
    for r in [0.1, 0.5, 0.707, 1]
])

result
T dof alternative p-val CI95% cohen-d BF10 power r
T-test -0.44793 154.032619 two-sided 0.654833 [-145.66, 91.82] 0.06168 0.651 0.070266 0.100
T-test -0.44793 154.032619 two-sided 0.654833 [-145.66, 91.82] 0.06168 0.238 0.070266 0.500
T-test -0.44793 154.032619 two-sided 0.654833 [-145.66, 91.82] 0.06168 0.175 0.070266 0.707
T-test -0.44793 154.032619 two-sided 0.654833 [-145.66, 91.82] 0.06168 0.126 0.070266 1.000

Nota

El código anterior utiliza una comprensión de lista, puedes ver cómo funciona aquí.

Observamos que el valor p no alcanza significación estadística, y que el \(BF_{10}\) proporciona evidencia moderada a fuerte a favor de la hipótesis nula \(H_0: \text{Peso}{\text{Adelie}} = \text{Peso}{\text{Chinstrap}}\), incluso al usar un valor ingenuo de \(r = 0.5\).

6.2.3. Interpretación de \(BF_{10}\)#

../_images/Interpretation-of-Bayes-factors_W640.jpg

Figura 6.1 Tabla de referencia para los diferentes puntos de corte del Factor de Bayes.#

Imagen reproducida bajo los términos de la licencia CC BY 4.0.

Garnett, Claire & Michie, Susan & West, Robert & Brown, Jamie. (2019). Updating the evidence on the effectiveness of the alcohol reduction app, Drink Less: using Bayes factors to analyse trial datasets supplemented with extended recruitment. F1000Research. 8. 114. 10.12688/f1000research.17952.2.

6.2.4. Ejercicios#

  1. Aplica lo aprendido con tus propios datos.

  2. Analiza la diferencia en el ancho del pico de acuerdo al sexo.