Model Complexity Considerations
Understanding Model Complexity in ConversionFlow
The ConversionFlow library implements Bayesian Network models of varying complexity to capture the intricacies of customer journeys. This document explores the considerations around model complexity, including how to balance complexity with interpretability and computational efficiency.
Dimensions of Model Complexity
Structural Complexity
The structural complexity of a ConversionFlow model is determined by:
Number of Nodes: Each touchpoint in the customer journey adds a node to the network. More nodes allow for more detailed journey mapping but increase computational demands.
Edge Density: The proportion of possible edges that are actually present in the DAG. Sparser networks are easier to interpret and compute, while denser networks capture more complex interdependencies.
Node Types: The current implementation uses primarily binary nodes (whether a touchpoint was engaged with), but future extensions might include:
Continuous nodes (e.g., time spent on a page)
Count nodes (e.g., number of visits)
Categorical nodes (e.g., device type)
Parametric Complexity
Beyond structure, models can vary in parametric complexity:
Prior Distributions: More flexible priors like the Half-Cauchy (currently used) allow for a wider range of parameter values compared to more restrictive priors like the Half-Gaussian.
Parameter Sharing: Some models might share parameters across similar touchpoints to reduce complexity.
Non-linear Relationships: The logarithmic term for budget effects introduces non-linearity, but additional non-linearities could be introduced for more complex relationships.
Computational Complexity
The computational demands of model fitting and inference scale with:
Number of Parameters: More parameters require more MCMC samples for convergence.
Graph Connectivity: Highly connected graphs create complex parameter dependencies that slow MCMC sampling.
Data Size: Larger datasets provide more information but increase computation time.
Model Selection Strategies
Cross-Validation
ConversionFlow implements cross-validation techniques to compare models of different complexity:
Leave-One-Out Cross-Validation (LOO-CV): Implemented via the
az.loo()function from ArviZ to estimate out-of-sample predictive accuracy.Widely Applicable Information Criterion (WAIC): An alternative to LOO-CV that approximates Bayesian cross-validation.
The model with the lowest LOO or WAIC value typically represents the best balance of fit and complexity.
Posterior Predictive Checks
Beyond formal cross-validation, ConversionFlow employs posterior predictive checks to assess model adequacy:
Graphical Checks: Comparing the distribution of observed data to predictions from the model.
Statistical Discrepancy Measures: Quantifying differences between observed and predicted data.
Specific Test Statistics: Targeting particular aspects of the data that should be captured by the model.
Balancing Complexity and Utility
When to Increase Complexity
Consider a more complex model when:
Simple models show poor predictive performance in cross-validation.
Domain knowledge suggests important relationships are missing.
Posterior predictive checks reveal systematic discrepancies.
The business context requires more detailed insights.
When to Reduce Complexity
Consider a simpler model when:
Parameter estimates show high uncertainty (wide posterior distributions).
MCMC sampling exhibits convergence problems.
The computational cost becomes prohibitive.
The model becomes difficult to interpret and explain to stakeholders.
Practical Recommendations
Based on experience with the ConversionFlow library, we recommend:
Start Simple: Begin with a sparse DAG structure focusing on the most important touchpoints.
Iterative Refinement: Gradually add complexity while monitoring cross-validation metrics.
Domain Knowledge Integration: Use business understanding to guide structural choices rather than relying solely on data-driven approaches.
Sensitivity Analysis: Assess how sensitive optimization recommendations are to changes in model complexity.
Case Study: Evolution of ConversionFlow Models
The ConversionFlow library has evolved through several model iterations:
Initial Model (v1): A simple linear funnel with few touchpoints and standard normal priors.
Intermediate Model (v2): An expanded DAG with more touchpoints and Half-Gaussian priors.
Current Model (v3): A comprehensive DAG with specialized touchpoint types, Half-Cauchy priors, and more sophisticated handling of budget effects.
Each iteration has brought improvements in predictive performance and business utility, but also increased computational demands.
Future Directions in Model Complexity
Future developments may explore:
Hierarchical Models: Allowing for customer segmentation with segment-specific parameters.
Time-Varying Parameters: Capturing how touchpoint effectiveness changes over time.
Structural Learning: Using causal discovery algorithms to learn optimal DAG structures from data.
Neural Network Integration: Combining Bayesian Networks with neural networks for more flexible functional forms.
These advances will need to be carefully balanced against the core values of interpretability and computational tractability that make ConversionFlow valuable for practical marketing decisions.