Note
Go to the end to download the full example code.
Apply and compare several estimators to conduct a causal mediation analysis#
Establish identifiability#
med_bench implements several estimators for the natural direct and indirect causal effects. Before moving forward with estimation, the investigator should ensure that those causal effects are identified by discussing the plausability of the identification assumptions
SUTVA (Stable Unit Treatment Values Assumption)
Sequential ignorability of the treatment and the mediator(s), by selecting an adequate set of confounding variables that need to be adjusted on
Positivity of the treatment and the mediator.
In this example we will admit those assumptions and simulate a dataset, with the following data generating process, with simple linear models.
We use the function simulate_data
to obtain a full simulated dataset for mediation analysis.
from med_bench.get_simulated_data import simulate_data
from med_bench.estimation.mediation_coefficient_product import CoefficientProduct
from numpy.random import default_rng
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
rg = default_rng(42)
(x, t, m, y, total, theta_1, theta_0,
delta_1, delta_0, p_t, th_p_t_mx) = \
simulate_data(n=500,
rg=rg,
mis_spec_m=False,
mis_spec_y=False,
dim_x=5,
dim_m=1,
seed=5,
type_m='continuous',
sigma_y=0.5,
sigma_m=0.5,
beta_t_factor=0.2,
beta_m_factor=5)
We can check the true values of the effects
print_effects = ('total effect: {:.2f}\n'
'direct effect: {:.2f}\n'
'indirect effect: {:.2f}')
print('True effects')
print(print_effects.format(total, theta_1, delta_0))
True effects
total effect: 1.70
direct effect: 1.20
indirect effect: 0.50
Contrary to the sequential ignorability assumption, the positivity assumption can be checked experimentally (not a guarantee but a good indication). We represent the distribution of \(P(T=1|X, M)\) for the treated and the untreated.
th_df = pd.DataFrame(zip(th_p_t_mx, t.ravel()), columns=['th_p_t_mx', 't'])
sns.displot(data=th_df, x='th_p_t_mx', hue='t')
plt.show()

Apply a baseline causal mediation estimator to your data: the coefficient product#
We se that there are individuals from both treatment groups for all propensity values, which is supporting (bot not proving) the positivity assumption. Let’s now proceed with estimation. We begin with the simple coefficient product approach.
estimator = CoefficientProduct(regularize=False)
estimator.fit(t, m, x, y)
causal_effects = estimator.estimate(t.ravel(), m, x, y.ravel())
print('Estimated effects with the coefficient product')
print(print_effects.format(causal_effects["total_effect"],
causal_effects["direct_effect_treated"],
causal_effects["indirect_effect_control"]))
Nuisance models fitted
Estimated effects with the coefficient product
total effect: 1.78
direct effect: 1.23
indirect effect: 0.54
Comparaison with the other estimators#
upcoming
Total running time of the script: (0 minutes 0.325 seconds)