Academic

COOL-MC: Verifying and Explaining RL Policies for Platelet Inventory Management

arXiv:2603.02396v1 Announce Type: new Abstract: Platelets expire within five days. Blood banks face uncertain daily demand and must balance ordering decisions between costly wastage from overstocking and life-threatening shortages from understocking. Reinforcement learning (RL) can learn effective ordering policies for this Markov decision process (MDP), but the resulting neural policies remain black boxes, hindering trust and adoption in safety-critical domains. We apply COOL-MC, a tool that combines RL with probabilistic model checking and explainable RL, to verify and explain a trained policy for the MDP on platelet inventory management inspired by Haijema et al. By constructing a policy-induced discrete-time Markov chain (which includes only the reachable states under the trained policy to reduce memory usage), we verify PCTL properties and provide feature-level explanations. Results show that the trained policy achieves a 2.9% stockout probability and a 1.1% inventory-full (poten

D
Dennis Gross
· · 1 min read · 5 views

arXiv:2603.02396v1 Announce Type: new Abstract: Platelets expire within five days. Blood banks face uncertain daily demand and must balance ordering decisions between costly wastage from overstocking and life-threatening shortages from understocking. Reinforcement learning (RL) can learn effective ordering policies for this Markov decision process (MDP), but the resulting neural policies remain black boxes, hindering trust and adoption in safety-critical domains. We apply COOL-MC, a tool that combines RL with probabilistic model checking and explainable RL, to verify and explain a trained policy for the MDP on platelet inventory management inspired by Haijema et al. By constructing a policy-induced discrete-time Markov chain (which includes only the reachable states under the trained policy to reduce memory usage), we verify PCTL properties and provide feature-level explanations. Results show that the trained policy achieves a 2.9% stockout probability and a 1.1% inventory-full (potential wastage) probability within a 200-step horizon, primarily attends to the age distribution of inventory rather than other features such as day of week or pending orders. Action reachability analysis reveals that the policy employs a diverse replenishment strategy, with most order quantities reached quickly, while several are never selected. Counterfactual analysis shows that replacing medium-large orders with smaller ones leaves both safety probabilities nearly unchanged, indicating that these orders are placed in well-buffered inventory states. This first formal verification and explanation of an RL platelet inventory management policy demonstrates COOL-MC's value for transparent, auditable decision-making in safety-critical healthcare supply chain domains.

Executive Summary

This article presents COOL-MC, a tool that combines reinforcement learning (RL) with probabilistic model checking and explainable RL to verify and explain a trained policy for platelet inventory management. The authors apply COOL-MC to a Markov decision process (MDP) and demonstrate its value in transparent, auditable decision-making in safety-critical healthcare supply chain domains. The results show that the trained policy achieves a 2.9% stockout probability and a 1.1% inventory-full probability within a 200-step horizon. The analysis reveals that the policy employs a diverse replenishment strategy and primarily attends to the age distribution of inventory. This study highlights the importance of formal verification and explanation in RL-based decision-making, particularly in safety-critical domains. The findings of this study have significant implications for the development of transparent and auditable decision-making systems in healthcare supply chain management.

Key Points

  • COOL-MC combines RL with probabilistic model checking and explainable RL to verify and explain a trained policy
  • The tool is applied to a Markov decision process (MDP) for platelet inventory management
  • The trained policy achieves a 2.9% stockout probability and a 1.1% inventory-full probability within a 200-step horizon

Merits

Strength in Formal Verification

COOL-MC provides a formal framework for verifying RL policies, ensuring that they meet the desired safety and performance criteria.

Explainable Decision-Making

The tool provides feature-level explanations of the trained policy, allowing for transparency and accountability in decision-making processes.

Demerits

Limited Generalizability

The study focuses on a specific domain (platelet inventory management) and may not be directly applicable to other healthcare supply chain domains.

Expert Commentary

The study presents a significant contribution to the field of explainable AI in healthcare, particularly in the context of RL-based decision-making. The application of COOL-MC to a Markov decision process (MDP) for platelet inventory management demonstrates its value in providing formal verification and explanation of a trained policy. The findings of this study have significant implications for the development of transparent and auditable decision-making systems in healthcare supply chain management. The study highlights the importance of prioritizing explainability and formal verification in RL-based decision-making, particularly in safety-critical domains. The results of this study provide valuable insights for policymakers, healthcare providers, and researchers interested in developing transparent and accountable decision-making systems.

Recommendations

  • Future studies should explore the application of COOL-MC to other healthcare supply chain domains to evaluate its generalizability and adaptability.
  • The development of COOL-MC should be further refined to include additional features and improvements, such as handling dynamic environments and multiple stakeholders.

Sources