# Bayesian Networks in ConversionFlow ## Conceptual Foundation Bayesian Networks form the backbone of the ConversionFlow library's analytical capabilities. A Bayesian Network is a probabilistic graphical model that represents random variables and their conditional dependencies via a directed acyclic graph (DAG). In the context of conversion flow analysis, these networks are particularly well-suited for modeling customer journeys and conversion processes. ## Key Principles ### Directed Acyclic Graph (DAG) Structure The structure of a Bayesian Network is a DAG where: - Nodes represent random variables (in our case, touchpoints in the customer journey) - Edges represent conditional dependencies between these variables - The acyclic nature ensures that no cycles exist, preventing a variable from being its own ancestor This structure captures the causal relationships in a customer journey, where interactions with earlier touchpoints influence the likelihood of interactions with later touchpoints. ### Conditional Probability Distributions Each node in the network is associated with a conditional probability distribution (CPD) that quantifies the effect of parent nodes on the current node. In ConversionFlow, these CPDs are parameterized using: - Intercept terms ($\beta_{a0}$) representing baseline propensities - Budget sensitivity coefficients ($\beta_{a1}$) quantifying direct budget impact - Parent influence coefficients ($\beta_{aj}$) representing causal relationships ### Probabilistic Inference Bayesian Networks allow for various types of probabilistic inference: - Forward inference: Predicting downstream effects given observations of upstream variables - Backward inference: Inferring likely causes given observations of effects - Interventional inference: Predicting outcomes when we intervene on specific variables ## Application in ConversionFlow ### Customer Journey Modeling ConversionFlow uses Bayesian Networks to model the entire customer journey from initial awareness to final conversion. Each node represents a specific touchpoint (e.g., website visit, car configuration, test drive request), and edges represent the influence that one touchpoint has on another. The probability of conversion at each touchpoint is modeled as: $$P_a(x) = \sigma\left(\beta_{a0} + \beta_{a1} \ln\left(1 + \frac{x_a}{S}\right) + \sum_{j \in pa(a)} \beta_{aj} P_j(x)\right)$$ Where: - $\sigma$ is the sigmoid function - $\beta_{a0}$ is the baseline conversion propensity - $\beta_{a1}$ is the budget sensitivity - $\beta_{aj}$ are the parent influence coefficients - $x_a$ is the budget allocation - $S$ is a scaling factor ### Diminishing Returns Modeling The logarithmic term $\ln\left(1 + \frac{x_a}{S}\right)$ captures diminishing returns on budget allocation, a crucial aspect of marketing investment. This ensures that the model recognizes that doubling the budget doesn't double the conversion rate. ### Uncertainty Quantification By using Bayesian inference (specifically, MCMC sampling), ConversionFlow quantifies uncertainty in all parameter estimates. This provides not just point estimates but entire posterior distributions, allowing for robust decision-making that accounts for uncertainty. ## Advantages Over Traditional Approaches ### Handling Partial Information Unlike traditional funnel models, Bayesian Networks can handle partial information and missing data naturally, making inferences even when observations are incomplete. ### Causal Understanding The DAG structure encodes causal relationships, providing insights into not just correlations but actual causal effects between touchpoints. ### Integration with Business Logic The probabilistic nature of Bayesian Networks allows for seamless integration with business logic and domain knowledge through prior distributions and network structure. ### Uncertainty-Aware Decisions The complete posterior distributions provided by Bayesian inference enable uncertainty-aware decision-making, acknowledging the inherent uncertainties in customer behavior. ## Limitations and Considerations ### Acyclicity Assumption The DAG structure assumes acyclicity, which may not capture feedback loops in customer journeys (e.g., returning to configuration after a test drive). ### Computational Complexity Inference in Bayesian Networks with many nodes can be computationally intensive, particularly when using MCMC sampling. ### Structural Learning Challenges While ConversionFlow currently uses a predefined network structure, learning the structure from data (structural learning) remains challenging and is an area for future development.