Configuration Reference

The config.yml file is used to configure the ConversionFlow model pipeline. This reference document provides detailed explanations of each parameter and its options.

General Configuration

Parameter

Type

Description

Default

project_name

str

A descriptive name for identification and organization purposes

None

Preprocessing Configuration (preprocess)

Scaling (preprocess.scaling)

Parameter

Type

Description

Default

enabled

bool

Flag to enable or disable feature scaling

true

method

str

Scaling method to be used if scaling is enabled

standard

exclude_from_scaling

list

Feature names to exclude from the scaling process

[]

Available scaling methods:

  • standard: Standardization (z-score normalization)

  • minmax: Min-Max scaling

  • robust: Robust scaling using quartiles

Missing Values (preprocess.missing_values)

Parameter

Type

Description

Default

strategy

str

Strategy for handling missing values

fill

fill_value

any

Value to use for filling missing values

0

Available strategies:

  • fill: Fill missing values with the specified fill_value

  • drop: Drop rows containing missing values

Model Configuration (model)

Parameter

Type

Description

Default

name

str

Name for the model, used for identification

None

description

str

Brief description of the model

None

nodes

dict

Nodes (variables) in the Bayesian Network

None

edges

list

Directed edges (relationships) in the Bayesian Network

None

Example node definition:

nodes:
  stage1:
    - car_configuration
    - finance_calculator
  stage2:
    - brochure_request
    - contact_request
  final:
    - test_drive

Example edge definition:

edges:
  - [car_configuration, brochure_request]
  - [car_configuration, contact_request]
  - [finance_calculator, brochure_request]
  - [brochure_request, test_drive]
  - [contact_request, test_drive]

Priors Configuration (priors)

Parameter

Type

Description

Default

sigma_node

float

Standard deviation for the prior distribution of node intercepts

5.0

sigma_beta_global

float

Global standard deviation for prior distributions of regression coefficients

5.0

beta_distributions

dict

Prior distribution for each directed edge

None

Example beta distribution definition:

beta_distributions:
  car_configuration_brochure_request:
    dist: HalfCauchy
    parameters:
      loc: 0
      scale: 5

Inference Configuration (inference)

Parameter

Type

Description

Default

draws

int

Number of MCMC samples to draw after tuning

1000

tune

int

Number of MCMC samples to discard as warm-up

500

chains

int

Number of independent MCMC chains to run

4

random_seed

int

Seed for the random number generator

42

Output Configuration (output)

Parameter

Type

Description

Default

directory

str

Directory where all output files will be saved

output/

figures.extension

str

File extension for output figures

png

figures.dpi

int

DPI for output figures

300

Logging Configuration (logging)

Parameter

Type

Description

Default

level

str

Logging level, determining the verbosity of logs

INFO

file

str

Path to the log file where messages will be written

logs/pipeline.log

Complete Example

project_name: "Toyota Conversion Analysis"

preprocess:
  scaling:
    enabled: true
    method: standard
    exclude_from_scaling: [date, user_id]
  missing_values:
    strategy: fill
    fill_value: 0

model:
  name: "toyota_conversion_model"
  description: "Bayesian Network model for Toyota conversion flow"
  nodes:
    stage1:
      - car_configuration
      - finance_calculator
    stage2:
      - brochure_request
      - contact_request
    final:
      - test_drive
  edges:
    - [car_configuration, brochure_request]
    - [car_configuration, contact_request]
    - [finance_calculator, brochure_request]
    - [brochure_request, test_drive]
    - [contact_request, test_drive]

priors:
  sigma_node: 5.0
  sigma_beta_global: 5.0
  beta_distributions:
    car_configuration_brochure_request:
      dist: HalfCauchy
      parameters:
        loc: 0
        scale: 5

inference:
  draws: 1000
  tune: 500
  chains: 4
  random_seed: 42

output:
  directory: "output/"
  figures:
    extension: "png"
    dpi: 300

logging:
  level: "INFO"
  file: "logs/pipeline.log"