Bayesian Networks in ConversionFlow
Conceptual Foundation
Bayesian Networks form the backbone of the ConversionFlow library’s analytical capabilities. A Bayesian Network is a probabilistic graphical model that represents random variables and their conditional dependencies via a directed acyclic graph (DAG). In the context of conversion flow analysis, these networks are particularly well-suited for modeling customer journeys and conversion processes.
Key Principles
Directed Acyclic Graph (DAG) Structure
The structure of a Bayesian Network is a DAG where:
Nodes represent random variables (in our case, touchpoints in the customer journey)
Edges represent conditional dependencies between these variables
The acyclic nature ensures that no cycles exist, preventing a variable from being its own ancestor
This structure captures the causal relationships in a customer journey, where interactions with earlier touchpoints influence the likelihood of interactions with later touchpoints.
Conditional Probability Distributions
Each node in the network is associated with a conditional probability distribution (CPD) that quantifies the effect of parent nodes on the current node. In ConversionFlow, these CPDs are parameterized using:
Intercept terms (\(\beta_{a0}\)) representing baseline propensities
Budget sensitivity coefficients (\(\beta_{a1}\)) quantifying direct budget impact
Parent influence coefficients (\(\beta_{aj}\)) representing causal relationships
Probabilistic Inference
Bayesian Networks allow for various types of probabilistic inference:
Forward inference: Predicting downstream effects given observations of upstream variables
Backward inference: Inferring likely causes given observations of effects
Interventional inference: Predicting outcomes when we intervene on specific variables
Application in ConversionFlow
Customer Journey Modeling
ConversionFlow uses Bayesian Networks to model the entire customer journey from initial awareness to final conversion. Each node represents a specific touchpoint (e.g., website visit, car configuration, test drive request), and edges represent the influence that one touchpoint has on another.
The probability of conversion at each touchpoint is modeled as:
Where:
\(\sigma\) is the sigmoid function
\(\beta_{a0}\) is the baseline conversion propensity
\(\beta_{a1}\) is the budget sensitivity
\(\beta_{aj}\) are the parent influence coefficients
\(x_a\) is the budget allocation
\(S\) is a scaling factor
Diminishing Returns Modeling
The logarithmic term \(\ln\left(1 + \frac{x_a}{S}\right)\) captures diminishing returns on budget allocation, a crucial aspect of marketing investment. This ensures that the model recognizes that doubling the budget doesn’t double the conversion rate.
Uncertainty Quantification
By using Bayesian inference (specifically, MCMC sampling), ConversionFlow quantifies uncertainty in all parameter estimates. This provides not just point estimates but entire posterior distributions, allowing for robust decision-making that accounts for uncertainty.
Advantages Over Traditional Approaches
Handling Partial Information
Unlike traditional funnel models, Bayesian Networks can handle partial information and missing data naturally, making inferences even when observations are incomplete.
Causal Understanding
The DAG structure encodes causal relationships, providing insights into not just correlations but actual causal effects between touchpoints.
Integration with Business Logic
The probabilistic nature of Bayesian Networks allows for seamless integration with business logic and domain knowledge through prior distributions and network structure.
Uncertainty-Aware Decisions
The complete posterior distributions provided by Bayesian inference enable uncertainty-aware decision-making, acknowledging the inherent uncertainties in customer behavior.
Limitations and Considerations
Acyclicity Assumption
The DAG structure assumes acyclicity, which may not capture feedback loops in customer journeys (e.g., returning to configuration after a test drive).
Computational Complexity
Inference in Bayesian Networks with many nodes can be computationally intensive, particularly when using MCMC sampling.
Structural Learning Challenges
While ConversionFlow currently uses a predefined network structure, learning the structure from data (structural learning) remains challenging and is an area for future development.