# Configuration Reference The `config.yml` file is used to configure the ConversionFlow model pipeline. This reference document provides detailed explanations of each parameter and its options. ## General Configuration | Parameter | Type | Description | Default | |-----------|------|-------------|---------| | `project_name` | str | A descriptive name for identification and organization purposes | None | ## Preprocessing Configuration (`preprocess`) ### Scaling (`preprocess.scaling`) | Parameter | Type | Description | Default | |-----------|------|-------------|---------| | `enabled` | bool | Flag to enable or disable feature scaling | `true` | | `method` | str | Scaling method to be used if scaling is enabled | `standard` | | `exclude_from_scaling` | list | Feature names to exclude from the scaling process | `[]` | Available scaling methods: - `standard`: Standardization (z-score normalization) - `minmax`: Min-Max scaling - `robust`: Robust scaling using quartiles ### Missing Values (`preprocess.missing_values`) | Parameter | Type | Description | Default | |-----------|------|-------------|---------| | `strategy` | str | Strategy for handling missing values | `fill` | | `fill_value` | any | Value to use for filling missing values | `0` | Available strategies: - `fill`: Fill missing values with the specified `fill_value` - `drop`: Drop rows containing missing values ## Model Configuration (`model`) | Parameter | Type | Description | Default | |-----------|------|-------------|---------| | `name` | str | Name for the model, used for identification | None | | `description` | str | Brief description of the model | None | | `nodes` | dict | Nodes (variables) in the Bayesian Network | None | | `edges` | list | Directed edges (relationships) in the Bayesian Network | None | Example node definition: ```yaml nodes: stage1: - car_configuration - finance_calculator stage2: - brochure_request - contact_request final: - test_drive ``` Example edge definition: ```yaml edges: - [car_configuration, brochure_request] - [car_configuration, contact_request] - [finance_calculator, brochure_request] - [brochure_request, test_drive] - [contact_request, test_drive] ``` ## Priors Configuration (`priors`) | Parameter | Type | Description | Default | |-----------|------|-------------|---------| | `sigma_node` | float | Standard deviation for the prior distribution of node intercepts | `5.0` | | `sigma_beta_global` | float | Global standard deviation for prior distributions of regression coefficients | `5.0` | | `beta_distributions` | dict | Prior distribution for each directed edge | None | Example beta distribution definition: ```yaml beta_distributions: car_configuration_brochure_request: dist: HalfCauchy parameters: loc: 0 scale: 5 ``` ## Inference Configuration (`inference`) | Parameter | Type | Description | Default | |-----------|------|-------------|---------| | `draws` | int | Number of MCMC samples to draw after tuning | `1000` | | `tune` | int | Number of MCMC samples to discard as warm-up | `500` | | `chains` | int | Number of independent MCMC chains to run | `4` | | `random_seed` | int | Seed for the random number generator | `42` | ## Output Configuration (`output`) | Parameter | Type | Description | Default | |-----------|------|-------------|---------| | `directory` | str | Directory where all output files will be saved | `output/` | | `figures.extension` | str | File extension for output figures | `png` | | `figures.dpi` | int | DPI for output figures | `300` | ## Logging Configuration (`logging`) | Parameter | Type | Description | Default | |-----------|------|-------------|---------| | `level` | str | Logging level, determining the verbosity of logs | `INFO` | | `file` | str | Path to the log file where messages will be written | `logs/pipeline.log` | ## Complete Example ```yaml project_name: "Toyota Conversion Analysis" preprocess: scaling: enabled: true method: standard exclude_from_scaling: [date, user_id] missing_values: strategy: fill fill_value: 0 model: name: "toyota_conversion_model" description: "Bayesian Network model for Toyota conversion flow" nodes: stage1: - car_configuration - finance_calculator stage2: - brochure_request - contact_request final: - test_drive edges: - [car_configuration, brochure_request] - [car_configuration, contact_request] - [finance_calculator, brochure_request] - [brochure_request, test_drive] - [contact_request, test_drive] priors: sigma_node: 5.0 sigma_beta_global: 5.0 beta_distributions: car_configuration_brochure_request: dist: HalfCauchy parameters: loc: 0 scale: 5 inference: draws: 1000 tune: 500 chains: 4 random_seed: 42 output: directory: "output/" figures: extension: "png" dpi: 300 logging: level: "INFO" file: "logs/pipeline.log" ```