# Estimation Process

## Overview

The Estimation stage is dedicated to constructing and training a Bayesian Network model to understand the underlying dynamics of the customer journey. This process is systematically broken down into distinct substages.

## Data Loading

The initial substage is Data Loading, where relevant data is ingested into the pipeline. This typically involves reading datasets from specified sources, such as CSV files, into a structured format suitable for subsequent processing. 

Let $D$ represent the raw dataset, which is loaded and transformed into a Pandas DataFrame for efficient manipulation and analysis in Python.

## Data Preprocessing

Following data loading, the pipeline proceeds to Data Preprocessing. This substage is multifaceted, encompassing several essential transformations:

### Missing Value Handling 

Strategies such as imputation or removal of incomplete records are applied to ensure data integrity. Let $D_{MV}$ denote the dataset after missing value handling.

### Feature Scaling

Performed to standardise the numerical features, mitigating issues arising from differing scales and improving the performance of the MCMC sampling. Common scaling techniques include standardisation and Min-Max scaling. Let $D_{scaled}$ represent the dataset after feature scaling.

## Bayesian Network Model Inference

The core substage within Estimation is Bayesian Network Model Inference, a process that employs probabilistic graphical models and Bayesian statistical methodologies.

### Model Structure

We instantiate a Bayesian Network, denoted as $\mathcal{BN} = (G, \Theta)$, where:
- $G = (V, E)$ is the Directed Acyclic Graph (DAG) defining the network structure
- Nodes $V = \{v_1, v_2, \ldots, v_n\}$ represent customer journey touchpoints
- Edges $E$ represent their probabilistic dependencies
- $\Theta$ represents the set of parameters of the Bayesian Network

### Likelihood Function

For each node $a \in V$, we define a likelihood function that models the probability of observing the data for node $a$ given its parameters. A Student's t-distribution is typically employed:

$$P(data_a | \theta_a) = \text{StudentT}(data_a | \mu_a, \sigma_a, \nu)$$

Where:
- $data_a$ represents the observed data for node $a$
- $\mu_a$ is the location parameter (mean) for node $a$, parameterised as:

$$\mu_a = \beta_{a0} + \beta_{a1} \ln\left(1 + \frac{x_a}{S}\right) + \sum_{j \in pa(a)} \beta_{aj} P_j(x)$$

- $\sigma_a$ is the scale parameter
- $\nu$ represents the degrees of freedom (typically set to 4)

### Prior Distributions

Prior distributions are assigned to all unknown parameters:

- For regression coefficients: Half-Cauchy distributions
- For scale parameters: Half-Cauchy priors

### MCMC Sampling

The No-U-Turn Sampler (NUTS), a variant of Hamiltonian Monte Carlo, is employed to sample from the posterior distribution. The MCMC sampling approximates:

$$P(\theta | D_{scaled}) \propto P(D_{scaled} | \theta) \pi(\theta)$$

### Output Generation

The final output includes:
- Posterior Samples
- Sample Stats
- Log Likelihood
- Posterior Predictive Samples
- Parameter Summaries

These outputs provide a comprehensive understanding of the customer journey dynamics and serve as inputs for the Optimization stage.