Configuration Reference
The config.yml file is used to configure the ConversionFlow model pipeline. This reference document provides detailed explanations of each parameter and its options.
General Configuration
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
str |
A descriptive name for identification and organization purposes |
None |
Preprocessing Configuration (preprocess)
Scaling (preprocess.scaling)
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
bool |
Flag to enable or disable feature scaling |
|
|
str |
Scaling method to be used if scaling is enabled |
|
|
list |
Feature names to exclude from the scaling process |
|
Available scaling methods:
standard: Standardization (z-score normalization)minmax: Min-Max scalingrobust: Robust scaling using quartiles
Missing Values (preprocess.missing_values)
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
str |
Strategy for handling missing values |
|
|
any |
Value to use for filling missing values |
|
Available strategies:
fill: Fill missing values with the specifiedfill_valuedrop: Drop rows containing missing values
Model Configuration (model)
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
str |
Name for the model, used for identification |
None |
|
str |
Brief description of the model |
None |
|
dict |
Nodes (variables) in the Bayesian Network |
None |
|
list |
Directed edges (relationships) in the Bayesian Network |
None |
Example node definition:
nodes:
stage1:
- car_configuration
- finance_calculator
stage2:
- brochure_request
- contact_request
final:
- test_drive
Example edge definition:
edges:
- [car_configuration, brochure_request]
- [car_configuration, contact_request]
- [finance_calculator, brochure_request]
- [brochure_request, test_drive]
- [contact_request, test_drive]
Priors Configuration (priors)
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
float |
Standard deviation for the prior distribution of node intercepts |
|
|
float |
Global standard deviation for prior distributions of regression coefficients |
|
|
dict |
Prior distribution for each directed edge |
None |
Example beta distribution definition:
beta_distributions:
car_configuration_brochure_request:
dist: HalfCauchy
parameters:
loc: 0
scale: 5
Inference Configuration (inference)
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
int |
Number of MCMC samples to draw after tuning |
|
|
int |
Number of MCMC samples to discard as warm-up |
|
|
int |
Number of independent MCMC chains to run |
|
|
int |
Seed for the random number generator |
|
Output Configuration (output)
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
str |
Directory where all output files will be saved |
|
|
str |
File extension for output figures |
|
|
int |
DPI for output figures |
|
Logging Configuration (logging)
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
str |
Logging level, determining the verbosity of logs |
|
|
str |
Path to the log file where messages will be written |
|
Complete Example
project_name: "Toyota Conversion Analysis"
preprocess:
scaling:
enabled: true
method: standard
exclude_from_scaling: [date, user_id]
missing_values:
strategy: fill
fill_value: 0
model:
name: "toyota_conversion_model"
description: "Bayesian Network model for Toyota conversion flow"
nodes:
stage1:
- car_configuration
- finance_calculator
stage2:
- brochure_request
- contact_request
final:
- test_drive
edges:
- [car_configuration, brochure_request]
- [car_configuration, contact_request]
- [finance_calculator, brochure_request]
- [brochure_request, test_drive]
- [contact_request, test_drive]
priors:
sigma_node: 5.0
sigma_beta_global: 5.0
beta_distributions:
car_configuration_brochure_request:
dist: HalfCauchy
parameters:
loc: 0
scale: 5
inference:
draws: 1000
tune: 500
chains: 4
random_seed: 42
output:
directory: "output/"
figures:
extension: "png"
dpi: 300
logging:
level: "INFO"
file: "logs/pipeline.log"