top of page

Research

My substantive research interests are Customer Analytics,
Customer Relationship Management, 
Choice Modeling, Privacy, and Charitable Giving.


My methodological interests are Bayesian Econometrics, Bayesian Machine Learning, Bayesian Nonparametrics, and Deep Generative Models.

Working Papers

Digital Twins: A Generative Approach for Counterfactual Customer Analytics

(Job Market Paper)

This research provides a novel methodology, Digital Marketing Twins, that automatically extracts latent features from individual-level brand survey responses to inform a statistically-principled, deep generative model of customer-side brand affinity and firm-side performance factors. The proposed model enables marketers to find drivers of individual-level brand affinity, as opposed to traditionally observed metrics that must be analyzed in aggregation. The framework serves a counterfactual purpose at the customer level. The generative part of the model completes the distribution of survey responses over time, and across firms -- thereby addressing the archetypal missing data problem -- by imputing customer responses in counterfactual regimes. The proposed prescriptive framework also proposes policy optimization through customer surveys, using Bayesian optimization, which efficiently identifies "paths of least resistance" among customer responses to service-quality questions -- a search that otherwise would represent a complexity of O(n^d). 

This research applies Digital Marketing Twins methodology to the competitive landscape of the U.S. wireless telecommunications retail market, leveraging a unique dataset of large-scale quarterly brand surveys from all three major carriers (AT&T, T-Mobile, and Verizon) from 2020 to 2022. It optimizes over the learned generative model from the multi-firm brand surveys to provide marketing policy recommendations according to individual-level counterfactual responses and different carriers. Empirically, this approach reveals latent asymmetries in competition in terms of brand affinity, together with a nonlinear increase in brand affinity for certain types of drivers, such as satisfaction with network speed, but a nonlinear decrease in brand affinity for customers who report greater likelihoods of changing plans, providers, or devices, relative to their current wireless services.

Paper status: Preparing for submission

Privacy-Preserving Data Fusion 

with Longxiu Tian and Dana Turjeman

Data fusion combines multiple datasets to make inferences that are more accurate, generalizable, and useful than those made with any single dataset alone. However, data fusion poses a privacy hazard due to the risk of revealing user identities. We propose a privacy preserving data fusion (PPDF) methodology intended to preserve user-level anonymity while allowing for a robust and expressive data fusion process. PPDF is based on variational autoencoders and normalizing flows, together enabling a highly expressive, nonparametric, Bayesian, generative modeling framework, estimated in adherence to differential privacy -- the state-of-the-art theory for privacy preservation.

PPDF does not require the same users to appear across datasets when learning the joint data generating process and explicitly accounts for missingness in each dataset to correct for sample selection. Moreover, PPDF is model-agnostic: it allows for downstream inferences to be made on the fused data without the analyst needing to specify a discriminative model or likelihood a priori. 

We undertake a series of simulations to showcase the quality of our proposed methodology. Then, we fuse a large-scale customer satisfaction survey to the customer relationship management (CRM) database from a leading U.S. telecom carrier. The resulting fusion yields the joint distribution between survey satisfaction outcomes and CRM engagement metrics at the customer level, including the likelihood of leaving the company’s services. Highlighting the importance of correcting selection bias, we illustrate the divergence between the observed survey responses vs. the imputed distribution on the customer base. Managerially, we find a negative, nonlinear relationship between satisfaction and future account termination across the telecom carrier's customers, which can aid in segmentation, targeting, and proactive churn management. Overall, PPDF will substantially reduce the risk of compromising privacy and anonymity when fusing different datasets.

Paper status: R&R at Marketing Science

Relaxing Functional Form in Choice Models Through Gaussian Processes
 

with Alan Montgomery

Consumers change their choice as expenditures within a category increase. Traditional choice models usually make restrictive structural assumptions to specify the expenditure elasticity. This imposed functional form of utility strongly influences the range of estimable substitution patterns across goods. Consumers with highly nonlinear preferences may have consumption thresholds in which buying patterns dramatically change when price or budget changes. Understanding these thresholds with a flexible utility-based model could lead to improved pricing and promotion decisions. Using Gaussian process priors on utility functions, we relax the functional form of both inside goods and outside good, within the context of constrained utility maximization. We estimate a general direct utility choice model for simultaneous purchases within a product category. We build a hierarchical model by borrowing information from a parametric functional form that constitutes an informative prior at the individual level.  Our model captures non-linear rates of satiation and precise baseline preferences that traditional non-homothetic parametric models fail to capture by assuming a given functional form of utility. The proposed model automatically detects non-linear patterns of consumption from the data and provide a more precise statistical inference.

Paper status: Working draft available upon request.

bottom of page