causalpy.pymc_models
Defines generic PyMC ModelBuilder class and subclasses for
WeightedSumFitter model for Synthetic Control experiments
LinearRegression model
Models are intended to be used from inside an experiment class (see PyMC experiments). This is why the examples require some extra manipulation input data, often to ensure y has the correct shape.
- class causalpy.pymc_models.InstrumentalVariableRegression[source]
Custom PyMC model for instrumental linear regression
Example
>>> import causalpy as cp >>> import numpy as np >>> from causalpy.pymc_models import InstrumentalVariableRegression >>> N = 10 >>> e1 = np.random.normal(0, 3, N) >>> e2 = np.random.normal(0, 1, N) >>> Z = np.random.uniform(0, 1, N) >>> ## Ensure the endogeneity of the the treatment variable >>> X = -1 + 4 * Z + e2 + 2 * e1 >>> y = 2 + 3 * X + 3 * e1 >>> t = X.reshape(10,1) >>> y = y.reshape(10,1) >>> Z = np.asarray([[1, Z[i]] for i in range(0,10)]) >>> X = np.asarray([[1, X[i]] for i in range(0,10)]) >>> COORDS = {'instruments': ['Intercept', 'Z'], 'covariates': ['Intercept', 'X']} >>> sample_kwargs = { ... "tune": 5, ... "draws": 10, ... "chains": 2, ... "cores": 2, ... "target_accept": 0.95, ... "progressbar": False, ... } >>> iv_reg = InstrumentalVariableRegression(sample_kwargs=sample_kwargs) >>> iv_reg.fit(X, Z,y, t, COORDS, { ... "mus": [[-2,4], [0.5, 3]], ... "sigmas": [1, 1], ... "eta": 2, ... "lkj_sd": 1, ... }, None) Inference data...
- build_model(X, Z, y, t, coords, priors)[source]
Specify model with treatment regression and focal regression data and priors
- Parameters:
X – A pandas dataframe used to predict our outcome y
Z – A pandas dataframe used to predict our treatment variable t
y – An array of values representing our focal outcome y
t – An array of values representing the treatment t of which we’re interested in estimating the causal impact
coords – A dictionary with the coordinate names for our instruments and covariates
priors – An optional dictionary of priors for the mus and sigmas of both regressions
priors = {"mus": [0, 0], "sigmas": [1, 1], "eta": 2, "lkj_sd": 2}
- fit(X, Z, y, t, coords, priors, ppc_sampler=None)[source]
Draw samples from posterior distribution and potentially from the prior and posterior predictive distributions. The fit call can take values for the ppc_sampler = [‘jax’, ‘pymc’, None] We default to None, so the user can determine if they wish to spend time sampling the posterior predictive distribution independently.
- sample_predictive_distribution(ppc_sampler='jax')[source]
Function to sample the Multivariate Normal posterior predictive Likelihood term in the IV class. This can be slow without using the JAX sampler compilation method. If using the JAX sampler it will sample only the posterior predictive distribution. If using the PYMC sampler if will sample both the prior and posterior predictive distributions.
- class causalpy.pymc_models.LinearRegression[source]
Custom PyMC model for linear regression
Defines the PyMC model
\[ \begin{align}\begin{aligned}\beta &\sim \mathrm{Normal}(0, 50)\\\sigma &\sim \mathrm{HalfNormal}(1)\\\mu &= X * \beta\\y &\sim \mathrm{Normal}(\mu, \sigma)\end{aligned}\end{align} \]Example
>>> import causalpy as cp >>> import numpy as np >>> from causalpy.pymc_models import LinearRegression >>> rd = cp.load_data("rd") >>> X = rd[["x", "treated"]] >>> y = np.asarray(rd["y"]).reshape((rd["y"].shape[0],1)) >>> lr = LinearRegression(sample_kwargs={"progressbar": False}) >>> lr.fit(X, y, coords={ ... 'coeffs': ['x', 'treated'], ... 'obs_indx': np.arange(rd.shape[0]) ... }, ... ) Inference data...
- class causalpy.pymc_models.ModelBuilder[source]
This is a wrapper around pm.Model to give scikit-learn like API.
Public Methods
build_model: must be implemented by subclasses
fit: populates idata attribute
predict: returns predictions on new data
score: returns Bayesian \(R^2\)
Example
>>> import causalpy as cp >>> import numpy as np >>> import pymc as pm >>> from causalpy.pymc_models import ModelBuilder >>> class MyToyModel(ModelBuilder): ... def build_model(self, X, y, coords): ... with self: ... X_ = pm.Data(name="X", value=X) ... y_ = pm.Data(name="y", value=y) ... beta = pm.Normal("beta", mu=0, sigma=1, shape=X_.shape[1]) ... sigma = pm.HalfNormal("sigma", sigma=1) ... mu = pm.Deterministic("mu", pm.math.dot(X_, beta)) ... pm.Normal("y_hat", mu=mu, sigma=sigma, observed=y_) >>> rng = np.random.default_rng(seed=42) >>> X = rng.normal(loc=0, scale=1, size=(20, 2)) >>> y = rng.normal(loc=0, scale=1, size=(20,)) >>> model = MyToyModel( ... sample_kwargs={ ... "chains": 2, ... "draws": 2000, ... "progressbar": False, ... "random_seed": rng, ... } ... ) >>> model.fit(X, y) Inference data... >>> X_new = rng.normal(loc=0, scale=1, size=(20,2)) >>> model.predict(X_new) Inference data... >>> model.score(X, y) r2 0.390344 r2_std 0.081135 dtype: float64
- __init__(sample_kwargs=None)[source]
- Parameters:
sample_kwargs (
Optional
[Dict
[str
,Any
]]) – A dictionary of kwargs that get unpacked and passed to thepymc.sample()
function. Defaults to an empty dictionary.
- fit(X, y, coords=None)[source]
Draw samples from posterior, prior predictive, and posterior predictive distributions, placing them in the model’s idata attribute.
- predict(X)[source]
Predict data given input data X
Caution
Results in KeyError if model hasn’t been fit.
- score(X, y)[source]
Score the Bayesian \(R^2\) given inputs
X
and outputsy
. :rtype:Series
Caution
The Bayesian \(R^2\) is not the same as the traditional coefficient of determination, https://en.wikipedia.org/wiki/Coefficient_of_determination.
- Return type:
- class causalpy.pymc_models.PropensityScore[source]
Custom PyMC model for inverse propensity score models
Defines the PyMC model
\[ \begin{align}\begin{aligned}\beta &\sim \mathrm{Normal}(0, 1)\\\sigma &\sim \mathrm{HalfNormal}(1)\\\mu &= X * \beta\\p &= logit^{-1}(mu)\\t &\sim \mathrm{Bernoulli}(p)\end{aligned}\end{align} \]Example
>>> import causalpy as cp >>> import numpy as np >>> from causalpy.pymc_models import PropensityScore >>> df = cp.load_data('nhefs') >>> X = df[["age", "race"]] >>> t = np.asarray(df["trt"]) >>> ps = PropensityScore(sample_kwargs={"progressbar": False}) >>> ps.fit(X, t, coords={ ... 'coeffs': ['age', 'race'], ... 'obs_indx': np.arange(df.shape[0]) ... }, ... ) Inference...
- class causalpy.pymc_models.WeightedSumFitter[source]
Used for synthetic control experiments
Note
Generally, the .fit() method should be used rather than calling .build_model() directly.
Defines the PyMC model:
\[ \begin{align}\begin{aligned}\sigma &\sim \mathrm{HalfNormal}(1)\\\beta &\sim \mathrm{Dirichlet}(1,...,1)\\\mu &= X * \beta\\y &\sim \mathrm{Normal}(\mu, \sigma)\end{aligned}\end{align} \]Example
>>> import causalpy as cp >>> import numpy as np >>> from causalpy.pymc_models import WeightedSumFitter >>> sc = cp.load_data("sc") >>> X = sc[['a', 'b', 'c', 'd', 'e', 'f', 'g']] >>> y = np.asarray(sc['actual']).reshape((sc.shape[0], 1)) >>> wsf = WeightedSumFitter(sample_kwargs={"progressbar": False}) >>> wsf.fit(X,y) Inference data...