# Estimation Process ## Overview The Estimation stage is dedicated to constructing and training a Bayesian Network model to understand the underlying dynamics of the customer journey. This process is systematically broken down into distinct substages. ## Data Loading The initial substage is Data Loading, where relevant data is ingested into the pipeline. This typically involves reading datasets from specified sources, such as CSV files, into a structured format suitable for subsequent processing. Let $D$ represent the raw dataset, which is loaded and transformed into a Pandas DataFrame for efficient manipulation and analysis in Python. ## Data Preprocessing Following data loading, the pipeline proceeds to Data Preprocessing. This substage is multifaceted, encompassing several essential transformations: ### Missing Value Handling Strategies such as imputation or removal of incomplete records are applied to ensure data integrity. Let $D_{MV}$ denote the dataset after missing value handling. ### Feature Scaling Performed to standardise the numerical features, mitigating issues arising from differing scales and improving the performance of the MCMC sampling. Common scaling techniques include standardisation and Min-Max scaling. Let $D_{scaled}$ represent the dataset after feature scaling. ## Bayesian Network Model Inference The core substage within Estimation is Bayesian Network Model Inference, a process that employs probabilistic graphical models and Bayesian statistical methodologies. ### Model Structure We instantiate a Bayesian Network, denoted as $\mathcal{BN} = (G, \Theta)$, where: - $G = (V, E)$ is the Directed Acyclic Graph (DAG) defining the network structure - Nodes $V = \{v_1, v_2, \ldots, v_n\}$ represent customer journey touchpoints - Edges $E$ represent their probabilistic dependencies - $\Theta$ represents the set of parameters of the Bayesian Network ### Likelihood Function For each node $a \in V$, we define a likelihood function that models the probability of observing the data for node $a$ given its parameters. A Student's t-distribution is typically employed: $$P(data_a | \theta_a) = \text{StudentT}(data_a | \mu_a, \sigma_a, \nu)$$ Where: - $data_a$ represents the observed data for node $a$ - $\mu_a$ is the location parameter (mean) for node $a$, parameterised as: $$\mu_a = \beta_{a0} + \beta_{a1} \ln\left(1 + \frac{x_a}{S}\right) + \sum_{j \in pa(a)} \beta_{aj} P_j(x)$$ - $\sigma_a$ is the scale parameter - $\nu$ represents the degrees of freedom (typically set to 4) ### Prior Distributions Prior distributions are assigned to all unknown parameters: - For regression coefficients: Half-Cauchy distributions - For scale parameters: Half-Cauchy priors ### MCMC Sampling The No-U-Turn Sampler (NUTS), a variant of Hamiltonian Monte Carlo, is employed to sample from the posterior distribution. The MCMC sampling approximates: $$P(\theta | D_{scaled}) \propto P(D_{scaled} | \theta) \pi(\theta)$$ ### Output Generation The final output includes: - Posterior Samples - Sample Stats - Log Likelihood - Posterior Predictive Samples - Parameter Summaries These outputs provide a comprehensive understanding of the customer journey dynamics and serve as inputs for the Optimization stage.