In this open source publication, my Lirio colleagues Andrew Starnes and Anton Dereventsov built a simulator that can be used to generate an arbitrary amount of synthetic datapoints - in other words, imperfect data that might better mimic human decision making than do perfect data.
Abstract: We establish a non-deterministic model that predicts a user's food preferences from their demographic information. Our simulator is based on NHANES dataset and domain expert knowledge in the form of established behavioral studies. Our model can be used to generate an arbitrary amount of synthetic datapoints that are similar in distribution to the original dataset and align with behavioral science expectations. Such a simulator can be used in a variety of machine learning tasks and especially in applications requiring human behavior prediction. https://arxiv.org/abs/2301.09454