Estimation Process
Overview
The Estimation stage is dedicated to constructing and training a Bayesian Network model to understand the underlying dynamics of the customer journey. This process is systematically broken down into distinct substages.
Data Loading
The initial substage is Data Loading, where relevant data is ingested into the pipeline. This typically involves reading datasets from specified sources, such as CSV files, into a structured format suitable for subsequent processing.
Let \(D\) represent the raw dataset, which is loaded and transformed into a Pandas DataFrame for efficient manipulation and analysis in Python.
Data Preprocessing
Following data loading, the pipeline proceeds to Data Preprocessing. This substage is multifaceted, encompassing several essential transformations:
Missing Value Handling
Strategies such as imputation or removal of incomplete records are applied to ensure data integrity. Let \(D_{MV}\) denote the dataset after missing value handling.
Feature Scaling
Performed to standardise the numerical features, mitigating issues arising from differing scales and improving the performance of the MCMC sampling. Common scaling techniques include standardisation and Min-Max scaling. Let \(D_{scaled}\) represent the dataset after feature scaling.
Bayesian Network Model Inference
The core substage within Estimation is Bayesian Network Model Inference, a process that employs probabilistic graphical models and Bayesian statistical methodologies.
Model Structure
We instantiate a Bayesian Network, denoted as \(\mathcal{BN} = (G, \Theta)\), where:
\(G = (V, E)\) is the Directed Acyclic Graph (DAG) defining the network structure
Nodes \(V = \{v_1, v_2, \ldots, v_n\}\) represent customer journey touchpoints
Edges \(E\) represent their probabilistic dependencies
\(\Theta\) represents the set of parameters of the Bayesian Network
Likelihood Function
For each node \(a \in V\), we define a likelihood function that models the probability of observing the data for node \(a\) given its parameters. A Student’s t-distribution is typically employed:
Where:
\(data_a\) represents the observed data for node \(a\)
\(\mu_a\) is the location parameter (mean) for node \(a\), parameterised as:
\(\sigma_a\) is the scale parameter
\(\nu\) represents the degrees of freedom (typically set to 4)
Prior Distributions
Prior distributions are assigned to all unknown parameters:
For regression coefficients: Half-Cauchy distributions
For scale parameters: Half-Cauchy priors
MCMC Sampling
The No-U-Turn Sampler (NUTS), a variant of Hamiltonian Monte Carlo, is employed to sample from the posterior distribution. The MCMC sampling approximates:
Output Generation
The final output includes:
Posterior Samples
Sample Stats
Log Likelihood
Posterior Predictive Samples
Parameter Summaries
These outputs provide a comprehensive understanding of the customer journey dynamics and serve as inputs for the Optimization stage.