Abstract
Diagnostic processes typically rely on traditional and laborious methods, that are prone to human error, resulting in frequent misdiagnosis of diseases. Computational approaches are being increasingly used for more precise diagnosis of the clinical pathology, diagnosis of genetic and microbial diseases, and analysis of clinical chemistry data. These approaches are progressively used for improving the reliability of testing, resulting in reduced diagnostic errors. Artificial intelligence (AI)-based computational approaches mostly rely on training sets obtained from patient data stored in clinical databases. However, the use of AI is associated with several ethical issues, including patient privacy and data ownership. The capacity of AI-based mathematical models to interpret complex clinical data frequently leads to data bias and reporting of erroneous results based on patient data. In order to improve the reliability of computational approaches in clinical diagnostics, strategies to reduce data bias and analyzing real-life patient data need to be further refined.
Several diagnostic techniques rely on traditional processes that are often laborious and result in misdiagnosis of diseases due to human error. However, the use of computational approaches in clinical diagnostics has revolutionized the field of diagnostic pathology and opened new treatment avenues for patients.1,2 To reduce potential human errors, image-processing methodologies have been developed for the histopathological analysis of human tissue sections; these methodologies have replaced the pathological scoring of disease phenotypes based on the trained eyes of pathologists and the use of a microscope.3 Moreover, these methodologies are continuously being updated to diagnose complex diseases that require a multi-modal approach.4 Complex computer-generated image-processing techniques are also used for image acquisition and processing in molecular diagnostic techniques, such as computed tomography (CT), magnetic resonance imaging (MRI), and positron-electron tomography (PET).5,6 These computational approaches enable proper visualization of organs without the need for invasive surgical techniques, thereby resulting in rapid and reliable disease diagnosis, and consequently increasing patient survival rates.7
Computational approaches for diagnosing genetic disorders are among the diagnostic practices that have undergone major innovation in the last decade. Artificial intelligence (AI) algorithms are being used to generate training sets based on input data for the accurate detection of possible unidentified pathogenic mutations.8 This has led to a faster diagnosis of diseases and has facilitated the study of various disease models for the development of novel therapeutic strategies. However, several ethical issues have been raised regarding the use of AI algorithms in the field of clinical diagnostics; patient privacy, data ownership, and transparent use of patient data are the most critical issues.9 Additionally, the process of widespread approvals from international regulatory agencies for the use of AI algorithms in clinical diagnostics is slow because of the prevalence of data bias, which may result in misdiagnosis. Hence, strategies are continuously being developed to overcome hurdles related to the use of AI algorithms in computational diagnostics approaches.10 Here we review the computational approaches used in clinical diagnostics, especially microbial diagnostics, tissue imaging and analysis, diagnosis of genetic disorders, and analysis of clinical chemistry data.
Computational approaches used in microbial diagnostics
Traditionally, microbial diagnostic techniques involve pathogen cultures in clinical laboratories and pathogen identification using serological tests or on the basis of phenotypic or biochemical traits.11 Although these traditional diagnostic methods have been successful to some extent in providing accurate information on specific pathogenic strains of bacteria and viruses, a limitation is that they underestimate the vast number of pathogens,12 primarily due to the difficulties encountered when culturing different microorganisms. Microorganisms may not be properly identified using traditional diagnostic methods, and it is estimated that more than 99% of pathogens have not been formally identified.13,14
With advancements in gene sequencing and genetic engineering techniques, several molecular biology-based tools have been developed to improve the ease and reliability of microbial identification. The sequencing of conserved genomic entities in the microbial genome has enabled thorough and efficient identification of pathogens. Many of these molecular identification techniques, such as polymerase chain reaction (PCR), real-time PCR, DNA microarray, metagenomics, and next-generation sequencing (NGS), are currently being used when assessing DNA or RNA for the diagnosis of a vast number of microorganisms.15-18
Deoxyribonucleic acid microarray technology is an emerging molecular diagnostic tool for the identification of microbial pathogens when traditional diagnostic methods fail to provide positive results. A DNA microarray chip contains pathogen-specific oligonucleotide sequences that are computationally obtained through an extensive BLAST search of the sequences against all non-target genomes to get sequence similarities. The advantage of using this technique is its ability to screen for all identified as well as unidentified pathogens, which helps prevent the occurrence of diseases caused by unknown pathogens as well.19,20
Metagenomics has emerged as a relative powerhouse in the field of clinical diagnostics because it allows the identification of pathogens from a microbial community. In other words, it does not require the isolation of pure cultures for sequencing purposes. After the sample and metadata are collected, DNA extraction, library probe construction, sequencing, read processing, and assembly are performed. Databases and other computational tools are then used to analyze the obtained data. Computational techniques utilizing machine learning-based computational models have also been developed to predict the associations between microorganisms and diseases. One such example is the novel bidirectional label propagation human microbe-disease association (NBLPIHMDA) database, which is a prediction model uses a disease similarity network and a microbe similarity network to perform bidirectional label propagation in order to establish the associations between microorganisms and diseases.21,22
Next-generation sequencing facilitates the proper identification of microbial pathogens through the acquisition of large amounts of sequencing data. The evolutionary genomic variability within the genomes of microorganisms is harnessed to enable a rapid and multimodal diagnosis of previously unknown pathogens in a clinical setting. This approach is currently being used in viral diagnostics and for the screening of antiviral drug resistance.23,24
Computational approaches used in analysis of clinical chemistry data
The field of metabolomics has been generating increasing amounts of data, and complex computational power will be required to unravel the complex nature of these data. Several mathematical models have been developed to interpret complex clinical chemistry data related to diseases, such as diabetes, cardiovascular disease, and cancer. For example, in diabetes, the assessment of insulin resistance based on plasma glucose levels after the ingestion of an oral dose of glucose in time series experiments has been well defined using predictive mathematical models.25,26 Other relevant computational approaches have been developed in the field of oncology, namely, a model capable of predicting the tumor size based on the expression levels of circulating tumor biomarkers in the plasma of patients has been developed.27,28 For effective data analysis, it is crucial to determine the type of clinical chemistry data in order to select the best computational approach. Computational modeling and analysis can easily be used for publicly available data in a clinical setting, namely, for blood cytokines and liver enzymes.29 The diagnostic significance of clinical chemistry data must be determined using large-scale experiments to validate the findings of the selected computational approach. The reproducibility of the data must be validated across other studies with a small margin of error between validation experiments, as large margins of errors will lead to modeling errors in terms of data fitting. Proper standardization and reproducibility of the data are necessary. The data must also be biologically validated across other measurements. The clinical chemistry data used for computational analysis should be easy to interpret and should reflect the disease in order to add relevance to the computational analysis. Finally, the results of the computational analysis should be indicative of the types of biological processes associated with the onset of the disease.30,31
Different types of models can be used to analyze clinical chemistry data. Bayesian networks have been used for the computational analysis of clinical chemistry data through the identification of neural hubs containing essential information on the disease process.32 In addition, compartmental computer modeling has been used to analyze complex clinical chemistry data, namely, predicting the tumor size based on the analysis of circulating tumor biomarker levels in the plasma of cancer patients.33,34 Another example is the use of a computational pipeline for the diagnosis of common variable immunodeficiency (CVID). This pipeline is based on an automated machine learning approach for diagnosing CVID using flowcytometry to sort the cells. Automated quality controls, data pre-processing, and automated population identification create a machine learning classifier that distinguishes CVID from other primary antibody deficiencies.35
The implementation of a computational model is based on the modeling strategy being used. The model should be validated in a clinically relevant setting and should be able to recapitulate the results observed in a clinical setting based on disease progression. An example of the relevance of computational approaches for the analysis of clinical chemistry data is the computational modeling of lipoprotein profiles, which has been used for the accurate prediction of cardiovascular disease risk.36 In addition, multicell biomarker profiles in inflammation and cancer have been predicted using cytokine profiles based on different responses of cancer cells to different cytokines in laboratory experiments. This computational model has the potential to predict the response of tumor cells to anti-inflammatory treatment and immunotherapy based on the inhibition of pro-inflammatory cytokine secretions in the tumor microenvironment.37
Computational approaches used in the diagnosis of genetic disorders
Genetic disorders have been diagnosed using traditional methods, such as fluorescence in situ hybridization (FISH).38 However, advancements in human genome sequencing have led to the generation of massive amounts of complex data that can only be analyzed using complex computational systems. Next-generation sequencing may be used to sequence the whole genome of a patient having a disease with an unknown genetic cause. Typically, this generates huge amounts of complex data that must be analyzed using a computational approach to enable accurate diagnosis. After sequencing a patient’s genome, research databases are generally used to obtain information about a particular pathogenic genetic variant and to identify if other closely related variants of the gene could be pathogenic.39-41 Databases containing information on human mutations include the Human Gene Mutation Database (HGMD) and the NCBI single nucleotide polymorphism database (dbSNP).42
Advancements in computational approaches for the diagnosis of genetic disorders have led to the identification of several pathogenic genetic mutations. Specifically, these genetic mutations can be directly linked to drastic changes in the levels of proteins, encoded by these genes, which are then reflected as the phenotype of the patient.43 Computational approaches are most important for identifying mutations with unknown pathogenicity.44 An example of this type of genetic variation is a missense mutation. Several algorithms, such as the sorting intolerant from tolerant (SIFT) algorithm, have been developed to evaluate whether a missense mutation is pathogenic.45 The pathogenicity of synonymous mutations can be evaluated by computational analysis of the mRNA structure, as well as by predicting splice variants in the genome.46 Computational tools used to identify splice sites that might be implicated in disease pathogenesis include the Human Splicing Finder and GeneSplicer.46-48
Many studies have been conducted to determine whether the frequency of alleles at specific sites of interest is responsible for several diseases. A genome-wide association study (GWAS) is a computational approach used to identify numerous pathogenic loci.49 Expression quantitative trait loci (eQTLs) are located within the DNA and are correlated with severity of the genetic disorder. The data used to identify an eQTL are derived from gene mapping, microarray data, and specific pathogenic genotypes. Expression quantitative trait loci can be used to measure gene expression levels and to correlate a particular genetic mutation to the phenotype of an individual.50 However, major drawback associated with the use of computational analysis for the diagnosis of a genetic disease is the introduction of potential bias when data analysis is based on the functional expression of a pathogenic gene.
Computational approaches used in tissue imaging and analysis
Previous approaches for analyzing histopathological tissue sections frequently involved the use of high-powered microscopes and skilled pathologists to diagnose known diseases based on tissue abnormalities through physical observation (namely, visually). However, inaccurate physical assessment of stained tissue sections might result in misdiagnoses. In order to overcome these issues, several image-processing technologies have been developed to predict the survival rates of cancer patients. For example, image features generated through computer-aided pathological analysis have been used to accurately diagnose breast cancer patients and predict their likelihood of survival.51-53
The use of AI and other computational methods for digital pathological analysis has highlighted their efficacy in revealing unbiased clinical outcomes of patients. For example, an image analysis system called MAGIC has been used to predict prostate cancer recurrence. In this system, tissue images are divided into segments and classified under various histological patterns based on epithelial and nuclear morphology, color, texture, and other cellular morphological features. The generic system (Table 1) used to classify images obtained from the histological staining of tissue sections begins with image refinement. This step involves the removal of tissue background regions that are correlated with transparent regions in the tissue corners. The histogram image must be matched to a reference image. The next step involves analysis and classification; this includes descriptors, such as color and texture, followed by other cellular morphological characteristics to obtain a final diagnostic decision.54-57
If detailed anatomical analysis of a specific tissue is required, the computational analysis of images obtained from medical imaging procedures, such as CT, MRI, and PET, is routinely performed. A CT scan uses computer-processed combinations of various X-ray images obtained from different angles. This leads to the acquisition of cross-sectional images of a specific area of a scanned organ, which allows the internal visualization of an organ without performing surgery.58 Computed tomography has several advantages over other diagnostic imaging modalities. This includes its high speed, which in turn facilitates the rapid diagnosis of a pathological condition. However, as this procedure uses radiation for image acquisition, there might be an increased risk of developing cancer due to radiation exposure.58
As a safer alternative to CT, MRI uses strong magnetic fields, magnetic field gradients, and radio waves to generate a cross-sectional image of an organ. The lack of radiation exposure permits image acquisition while decreasing the risk of exposure to unnecessary radiation.59 Positron-electron tomography, which is based on the injection of radioactive components, followed by the detection and reconstruction of the injected radiotracer, enables the acquisition of highly accurate cross-sectional images of organs.60
Study limitations
There are several limitations to the use of computational methods in clinical diagnostics. One of the most prominent limitations stems from data bias based on the type of data used to build the training sets. The processing of images obtained from diagnostic tests, such as CT and MRI, has limitations based on the algorithms used for input data collection, pre-processing, processing, and system assessment. In addition, most algorithms are designed to select a single diagnosis per patient, posing a problem for patients with multiple comorbidities. Furthermore, AI-based algorithms can successfully interpret complex clinical data. However, given the power and complexity of these algorithms, data interpretation can often generate biased and superfluous results, which might be unethical and even discriminatory.
In conclusion, traditional clinical diagnostic methods have been beset with high rates of false-negative and false-positive results. The involvement of many people in performing laborious diagnostic laboratory procedures often results in errors in disease diagnosis. However, recent advancements in clinical diagnostic methods using computational models have enabled the rapid identification of previously unknown pathogenic genetic mutations, in addition to the rapid and accurate stratification of patient-based, computer-generated image-processing technologies.
The use of computational analysis to generate training sets for disease prediction is based on inputting patient data from clinical databases that have been subjected to ethical review. This is mostly due to patient privacy issues, personal autonomy, public demand for transparency, and trust in how the data are managed and stored. However, ethical issues are being raised about how academic and commercial entities use data, who has ownership rights to the data, and whether patients can access their data. These concerns are currently being reviewed by several ethical review committees worldwide, and new laws are currently being drafted to address these issues.
The future use of AI-based algorithms in clinical diagnostics has not been adequately validated, mostly due to machine bias. Therefore, future studies should focus on reducing machine bias by monitoring the performance using real data, selecting the accurate learning model for the problem, and selecting a dataset that is reflective of the real disease situation.
Footnotes
Disclosure. Authors have no conflict of interests, and the work was not supported or funded by any drug company.
- Copyright: © Saudi Medical Journal
This is an open-access article distributed under the terms of the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.