Heparin is a critical aspect of managing sepsis after abdominal surgery, which can improve microcirculation, protect organ function, and reduce mortality. However, there is no clinical evidence to support decision-making for heparin dosage. This paper proposes a model called SOFA-MDP, which utilizes SOFA scores as states of MDP, to investigate clinic policies. Different algorithms provide different value functions, making it challenging to determine which value function is more reliable. Due to ethical restrictions, we cannot test all policies on patients. To address this issue, we proposed two value function assessment methods: action similarity rate and relative gain. We experimented with heparin treatment policies for sepsis patients after abdominal surgery using MIMIC-IV. In the experiments, TD(0) shows the most reliable performance. Using the action similarity rate and relative gain to assess AI policy from TD(0), the agreement rates between AI policy and “good” physician’s actual treatment are 64.6% and 73.2%, while the agreement rates between AI policy and “bad” physician’s actual treatment are 44.1% and 35.8%, the gaps are 20.5% and 37.4%, respectively. External validation using action similarity rate and relative gain based on eICU resulted in agreement rates of 61.5% and 69.1% with the “good” physician’s treatment, and 45.2% and 38.3% with the “bad” physician’s treatment, with gaps of 16.3% and 30.8%, respectively. In conclusion, the model provides instructive support for clinical decisions, and the evaluation methods accurately distinguish reliable and unreasonable outcomes.Copyright © 2023. Published by Elsevier B.V.