Estimation Process

Overview

The Estimation stage is dedicated to constructing and training a Bayesian Network model to understand the underlying dynamics of the customer journey. This process is systematically broken down into distinct substages.

Data Loading

The initial substage is Data Loading, where relevant data is ingested into the pipeline. This typically involves reading datasets from specified sources, such as CSV files, into a structured format suitable for subsequent processing.

Let \(D\) represent the raw dataset, which is loaded and transformed into a Pandas DataFrame for efficient manipulation and analysis in Python.

Data Preprocessing

Following data loading, the pipeline proceeds to Data Preprocessing. This substage is multifaceted, encompassing several essential transformations:

Missing Value Handling

Strategies such as imputation or removal of incomplete records are applied to ensure data integrity. Let \(D_{MV}\) denote the dataset after missing value handling.

Feature Scaling

Performed to standardise the numerical features, mitigating issues arising from differing scales and improving the performance of the MCMC sampling. Common scaling techniques include standardisation and Min-Max scaling. Let \(D_{scaled}\) represent the dataset after feature scaling.

Bayesian Network Model Inference

The core substage within Estimation is Bayesian Network Model Inference, a process that employs probabilistic graphical models and Bayesian statistical methodologies.

Model Structure

We instantiate a Bayesian Network, denoted as \(\mathcal{BN} = (G, \Theta)\), where:

  • \(G = (V, E)\) is the Directed Acyclic Graph (DAG) defining the network structure

  • Nodes \(V = \{v_1, v_2, \ldots, v_n\}\) represent customer journey touchpoints

  • Edges \(E\) represent their probabilistic dependencies

  • \(\Theta\) represents the set of parameters of the Bayesian Network

Likelihood Function

For each node \(a \in V\), we define a likelihood function that models the probability of observing the data for node \(a\) given its parameters. A Student’s t-distribution is typically employed:

\[P(data_a | \theta_a) = \text{StudentT}(data_a | \mu_a, \sigma_a, \nu)\]

Where:

  • \(data_a\) represents the observed data for node \(a\)

  • \(\mu_a\) is the location parameter (mean) for node \(a\), parameterised as:

\[\mu_a = \beta_{a0} + \beta_{a1} \ln\left(1 + \frac{x_a}{S}\right) + \sum_{j \in pa(a)} \beta_{aj} P_j(x)\]
  • \(\sigma_a\) is the scale parameter

  • \(\nu\) represents the degrees of freedom (typically set to 4)

Prior Distributions

Prior distributions are assigned to all unknown parameters:

  • For regression coefficients: Half-Cauchy distributions

  • For scale parameters: Half-Cauchy priors

MCMC Sampling

The No-U-Turn Sampler (NUTS), a variant of Hamiltonian Monte Carlo, is employed to sample from the posterior distribution. The MCMC sampling approximates:

\[P(\theta | D_{scaled}) \propto P(D_{scaled} | \theta) \pi(\theta)\]

Output Generation

The final output includes:

  • Posterior Samples

  • Sample Stats

  • Log Likelihood

  • Posterior Predictive Samples

  • Parameter Summaries

These outputs provide a comprehensive understanding of the customer journey dynamics and serve as inputs for the Optimization stage.