sklift.datasets.fetch_criteo
- sklift.datasets.datasets.fetch_criteo(target_col='visit', treatment_col='treatment', data_home=None, dest_subdir=None, download_if_missing=True, percent10=False, return_X_y_t=False)[source]
Load and return the Criteo Uplift Prediction Dataset (classification).
This dataset is constructed by assembling data resulting from several incrementality tests, a particular randomized trial procedure where a random part of the population is prevented from being targeted by advertising.
Major columns:
treatment
(binary): treatmentexposure
(binary): treatmentvisit
(binary): targetconversion
(binary): targetf0, ... , f11
(float): feature values
Read more in the docs.
- Parameters:
target_col (string, 'visit', 'conversion' or 'all', default='visit') – Selects which column from dataset will be target. If ‘all’, return a DataFrame with all targets cols.
treatment_col (string,'treatment', 'exposure' or 'all', default='treatment') – Selects which column from dataset will be treatment. If ‘all’, return a DataFrame with all treatment cols.
data_home (string) – Specify a download and cache folder for the datasets.
dest_subdir (string) – The name of the folder in which the dataset is stored.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source site.
percent10 (bool, default=False) – Whether to load only 10 percent of the data.
return_X_y_t (bool, default=False) – If True, returns (data, target, treatment) instead of a Bunch object.
- Returns:
dataset.
- Bunch:
By default dictionary-like object, with the following attributes:
data
(DataFrame object): Dataset without target and treatment.target
(Series or DataFrame object): Column target by values.treatment
(Series or DataFrame object): Column treatment by values.DESCR
(str): Description of the Criteo dataset.feature_names
(list): Names of the features.target_name
(str list): Name of the target.treatment_name
(str or list): Name of the treatment.
- Tuple:
tuple (data, target, treatment) if return_X_y is True
- Return type:
Bunch or tuple
Example:
from sklift.datasets import fetch_criteo dataset = fetch_criteo(target_col='conversion', treatment_col='exposure') data, target, treatment = dataset.data, dataset.target, dataset.treatment # alternative option data, target, treatment = fetch_criteo(target_col='conversion', treatment_col='exposure', return_X_y_t=True)
References
Diemert Eustache, Betlei Artem et al. [2018]
[DiemertEustacheBArtemRMR18]Diemert Eustache, Betlei Artem, Christophe Renaudin, and Amini Massih-Reza. A large scale benchmark for uplift modeling. In Proceedings of the AdKDD and TargetAd Workshop, KDD, London,United Kingdom, August, 20, 2018. ACM, 2018.