Estimation Process

Overview

The Estimation stage is dedicated to constructing and training a Bayesian Network model to understand the underlying dynamics of the customer journey. This process is systematically broken down into distinct substages.

Data Loading

The initial substage is Data Loading, where relevant data is ingested into the pipeline. This typically involves reading datasets from specified sources, such as CSV files, into a structured format suitable for subsequent processing.

Let \(D\) represent the raw dataset, which is loaded and transformed into a Pandas DataFrame for efficient manipulation and analysis in Python.

Data Preprocessing

Following data loading, the pipeline proceeds to Data Preprocessing. This substage is multifaceted, encompassing several essential transformations:

Missing Value Handling

Strategies such as imputation or removal of incomplete records are applied to ensure data integrity. Let \(D_{MV}\) denote the dataset after missing value handling.

Feature Scaling

Performed to standardise the numerical features, mitigating issues arising from differing scales and improving the performance of the MCMC sampling. Common scaling techniques include standardisation and Min-Max scaling. Let \(D_{scaled}\) represent the dataset after feature scaling.

Bayesian Network Model Inference

The core substage within Estimation is Bayesian Network Model Inference, a process that employs probabilistic graphical models and Bayesian statistical methodologies.

Model Structure

We instantiate a Bayesian Network, denoted as \(\mathcal{BN} = (G, \Theta)\), where:

\(G = (V, E)\) is the Directed Acyclic Graph (DAG) defining the network structure
Nodes \(V = \{v_1, v_2, \ldots, v_n\}\) represent customer journey touchpoints
Edges \(E\) represent their probabilistic dependencies
\(\Theta\) represents the set of parameters of the Bayesian Network

Likelihood Function

For each node \(a \in V\), we define a likelihood function that models the probability of observing the data for node \(a\) given its parameters. A Student’s t-distribution is typically employed:

\[P(data_a | \theta_a) = \text{StudentT}(data_a | \mu_a, \sigma_a, \nu)\]

Where:

\(data_a\) represents the observed data for node \(a\)
\(\mu_a\) is the location parameter (mean) for node \(a\), parameterised as:

\[\mu_a = \beta_{a0} + \beta_{a1} \ln\left(1 + \frac{x_a}{S}\right) + \sum_{j \in pa(a)} \beta_{aj} P_j(x)\]

\(\sigma_a\) is the scale parameter
\(\nu\) represents the degrees of freedom (typically set to 4)

Prior Distributions

Prior distributions are assigned to all unknown parameters:

For regression coefficients: Half-Cauchy distributions
For scale parameters: Half-Cauchy priors

MCMC Sampling

The No-U-Turn Sampler (NUTS), a variant of Hamiltonian Monte Carlo, is employed to sample from the posterior distribution. The MCMC sampling approximates:

\[P(\theta | D_{scaled}) \propto P(D_{scaled} | \theta) \pi(\theta)\]

Output Generation

The final output includes:

Posterior Samples
Sample Stats
Log Likelihood
Posterior Predictive Samples
Parameter Summaries

These outputs provide a comprehensive understanding of the customer journey dynamics and serve as inputs for the Optimization stage.