This package contains simulations for causal inference, estimators for ATE and CATE as well as code for experiments described in the paper : How to select predictive models for causal inference ?
The package code is contained in: caussim
-
estimation
contains CATE and ATE estimators usable with any scikit-learn compatible base estimators and meta-learners such as TLearner, SLearner or RLearner. -
simulations
simulations with basis expansion (available Nystroem, Splines) -
experiences
used to run extensive evaluation of causal metrics on ACIC 2016 and handcrafted simulations. -
reports
contains the scripts used to derive figures and tables presented in the paper. The main results are obtained by launching the -
utils.py
plot utils -
pdistances
naive implementation of MMD, Total Varation and Jensen Shannon Divergences used to measure population overlap -
demos
contains notebooks used to create toy example and risks maps for the 2D simulations. -
data
contains utils to load semi-simulated datasets (ACIC 2016, ACIC 2018, TWINS). A dedicated README is available in the root data folder.
Experiences outputs are mainly csvs (one for each sampled dataset). To launch an experience, run python scripts/experiences/<experience.py>
and it should output the csv in a dedidacted folder in the corresponding subfolder data/experiences/<dataset>/<experience_name>
.
🔎 Replicate the main experience of the paper (section 5.), launch the script scripts/experiences/causal_scores_evaluation.py. Make sure that the configurations for the datasets at the beginning of the file is :
from caussim.experiences.base_config import DATASET_GRID_FULL_EXPES
DATASET_GRID = DATASET_GRID_FULL_EXPES
📢 Note that the results of the section 5 are already provided in the zenodo link experiences.zip
.
Reports outputs are mainly figures for the papers. To obtain the results, run pytest scripts/reports/<report.py>
and it should output the figures in one or several corresponding folders in figures/
.
The main report type is a pytest function contained in the reports/causal_scores_evaluation.py
script. For each macro-dataset, it plot the results of running a given set of candidate estimators with a fixed nuisance estimator on several generation process of the macro-dataset (often hundreds of sampled datasets).
🔎 Replicate the main figure of the paper (Figure 3.), launch the script
scripts/reports/_1_r_risk_domination.py.
It should take some time because of the high number of simulations results. Make
sure that the appropriate experiences results exists. The one used in the paper
are provided in the zenodo link experiences.zip
.
pytest scripts/reports/causal_scores_evaluation.py
- We recommend the use of poetry and python>=3.9 to manage dependencies.
You can install caussim via poetry:
poetry install
or
pip. In this case you also need to install the dependies listed in the pyproject.toml
:
pip install caussim
python = ">=3.9, <3.11"
python-dotenv = "^0.15.0"
click = "^8.0.1"
yapf = "^0.31.0"
matplotlib = "^3.4.2"
numpy = "^1.20.3"
seaborn = "^0.11.1"
jupytext = "^1.11.5"
rope = "^0.19.0"
scikit-learn = "^1.0"
jedi = "^0.18.0"
tqdm = "^4.62.3"
tabulate = "^0.8.9"
statsmodels = "^0.13.1"
pyarrow = "^6.0.1"
submitit = "^1.4.1"
rpy2 = "^3.4.5"
moepy = "^1.1.4"