Optimization Model for Multi-Stakeholder Food Allocation System Using Reinforcement Learning



Summary and goal

  • Developing an RL-based user modeling framework for food agencies to optimizae food allocation for end-users by capturing, shaping user needs and preferences, and nudging shifting tastes via persuasive strategies.
  • Approaching as a value-awareness multistakeholder recommendation system balancing benefits of both users and community prosociality.
  • Q-learning based simulation model that learns dynamic preference changes and food availability over time and reproduces close to reality.

Abstract

The global issue of food insecurity affects millions worldwide, posing challenges in ensuring access to nutritious food. Despite efforts to address this, food waste and loss still impact food security. The food bank system plays a vital role in distributing available resources to those in need, directly impacting food security. However, diverse preferences among recipients pose a challenge under limited resources. Recent technological advancements offer opportunities to tackle food insecurity, often through personalized solutions. While the primary goal is to alleviate food insecurity, recipient satisfaction is crucial. Access to nutritious food is vital for physical and mental well-being, necessitating a balanced and personalized food distribution system. This paper aims to bridge the gap between personalization and prosociality in the AI field by proposing an optimization model that considers both stakeholders—providers and individuals. Using reinforcement learning, our model adapts to changing preferences and food availability over time, ensuring a fairer and more sustainable distribution. Results demonstrate superior performance compared to models focusing solely on users or the community, presenting promising implications for addressing food insecurity and resource allocation issues. Major contributions include a novel multistakeholder framework maximizing user satisfaction and societal benefits, acknowledging dynamic stakeholder contexts, and experimental verification of optimal allocation while fostering prosocial behavior.

Stakeholders and objectives

Users

Users are the individuals to be served through our recommendation system. For users, the aim is to acquire recommendations that align with their preferences and needs. This user-centric perspective emphasizes the importance of enhancing user satisfaction and personalized experiences for food allocation.

User preferences: As users interact with the system, their preferences for food items are continuously captured and refined. Rather than being static, these dynamic preferences evolve over time shaped by various factors, including age, health status, dietary constraints, household status, willingness to make prosocial choices, and more. The system learns these dynamics by reflecting user feedback toward recommended food items. This continuous learning process allows the system to provide more personalized and relevant recommendations to users matching their tastes and unique situations of each individual. A user’s preferences are represented by a set of tuples. In the decision-making modules of an agent, the agent understands the values and preferences of users in a community, the future state of the world for each action it can perform, and the social experience its user will derive for each action it can perform. By understanding the dynamic nature of user preferences, our model can effectively navigate the complexities of individual preferences and needs.

Food provider

The provider’s objective lies around ensuring the efficient allocation of available food resources, eventually maximizing the overall benefit to society. This includes minimizing food waste, optimizing food distribution, and supporting the broader community’s needs while encouraging users to select food items that align with their preferences.

Provider’s benefit: From the perspective of the provider, not all food items are treated as equal in reality. The provider has their own goodness for the food items that reflect their importance. These values are influenced by the urgency of consumption, which is decided by the available quantity of the item, its perishability, expiration dates, demand, and other resource constraints. Perishable items often carry higher priority on consumption due to their limited shelf life connecting with the need for immediate consumption. Similarly, food items with limited quantities may have lower values for community benefit than those with abundant supplies as they can’t meet minimum user demand and need to be kept for users with priority needs. Food with high quantities may have higher values as it must be consumed faster. Different values are attached to different food items. These dynamic values are constantly updated in real-time as allocations are made. Since unsold food becomes waste, the model must allocate all remaining food before it expires.

Overall objective

As described in the previous section, the primary objectives are maximizing user satisfaction and simultaneously maximizing the provider’s benefit. Our model moderates to achieve an optimal trade-off between two stakeholders. To balance these objectives, a weighted sum of user satisfaction and community benefit is used with a weighting factor.


Publications