Most infectious diseases are caused by viruses, fungi, bacteria and parasites. Their ability to easily infect humans and trigger large-scale epidemics makes them a public health concern. Methods for early detection of these diseases have been developed; however, they are hindered by the absence of a unified, interoperable and reusable model. This study seeks to create a holistic and real-time model for swift, preliminary detection of infectious diseases using symptoms and additional clinical data.
In this study, we present a medical knowledge graph (MKG) that leverages multiple data sources to analyse connections between different nodes. Medical ontologies were used to enhance the MKG. We applied various graph algorithms to extract key features. The performance of multiple machine-learning (ML) techniques for influenza and hepatitis detection was assessed, selecting multi-layer perceptron (MLP) and random forest (RF) models due to their superior outcomes. The hyperparameters of both graph-based ML models were automatically fine-tuned.
Both the graph-based MLP and RF models showcased the least loss and error rates, along with the most specific, accurate recall, precision and 1 scores. Their Matthews correlation coefficients were also optimal. When compared with existing ML techniques and findings from the literature, these graph-based ML models manifested superior detection accuracy.
The graph-based MLP and RF models effectively diagnosed influenza and hepatitis, respectively. This underlines the potential of graph data science in enhancing ML model performance and uncovering concealed relationships in the MKG.