Basic Artificial Intelligence Techniques

Evaluation of Artificial Intelligence Performance
      With the rapidly expanding use cases for AI, there is a growing need for proper evaluation of developed algorithms.


      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribers receive full online access to your subscription and archive of back issues up to and including 2002.

      Content published before 2002 is available via pay-per-view purchase only.


      Subscribe to Radiologic Clinics
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Aerts H.J.W.L.
        • Velazquez E.R.
        • Leijenaar R.T.H.
        • et al.
        Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach.
        Nat Commun. 2014; 5: 4006
        • Yun L.
        • Syed Z.
        • Scirica B.M.
        • et al.
        ECG Morphological Variability in Beat Space for Risk Stratification After Acute Coronary Syndrome.
        J Am Heart Assoc. 2021; 3: e000981
        • LeCun Y.
        • Bengio Y.
        • Hinton G.
        Deep learning.
        Nature. 2015; 521: 436
        • Krizhevsky A.
        • Sutskever I.
        • Hinton G.E.
        ImageNet Classification with Deep Convolutional Neural Networks.
        Commun ACM. 2017; 60: 84-90
        • Hinton G.
        • Deng L.
        • Yu D.
        • et al.
        Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups.
        IEEE Signal Process Mag. 2012; 29: 82-97
      1. Collobert R, Weston J. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. in Proceedings of the 25th International Conference on Machine Learning. Association for Computing Machinery. Helsinki, Finland, July 2008. p. 160-7.

        • Hannun A.Y.
        • Rajpurkar P.
        • Haghpanahi M.
        • et al.
        Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network.
        Nat Med. 2019; 25: 65-69
        • Brown J.
        • Campbell J.P.
        • Beers A.
        • et al.
        Automated Diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks.
        JAMA Ophthalmol. 2018; 136: 803-810
        • Kather J.N.
        • Pearson A.T.
        • Halama N.
        • et al.
        Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer.
        Nat Med. 2019; 25: 1054-1056
        • Esteva A.
        • Kuprel B.
        • Novoa R.A.
        • et al.
        Dermatologist-level classification of skin cancer with deep neural networks.
        Nature. 2017; 542: 115-118
        • Chang K.
        • Beers A.L.
        • Bai H.X.
        • et al.
        Automatic assessment of glioma burden: a deep learning algorithm for fully automated volumetric and bidimensional measurement.
        Neuro Oncol. 2019; 21: 1412-1422
        • Li M.D.
        • Chang K.
        • Bearce B.
        • et al.
        Siamese neural networks for continuous disease severity evaluation and change detection in medical imaging.
        Npj Digit Med. 2020; 3: 48
        • Ouyang D.
        • He B.
        • Ghorbani A.
        • et al.
        Video-based AI for beat-to-beat assessment of cardiac function.
        Nature. 2020; 580: 252-256
      2. Irvin J, Rajpurkar P, Ko M, et al. CheXpert: {A} Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. CoRR abs/1901.0, (2019).

        • Yala A.
        • Lehman C.
        • Schuster T.
        • et al.
        A Deep Learning Mammography-based Model for Improved Breast Cancer Risk Prediction.
        Radiology. 2019; 292: 60-66
      3. Lu JT, Brooks R, Hahn S, et al. DeepAAA: clinically applicable and generalizable detection of abdominal aortic aneurysm using deep learning. (2019).

        • Chen K.T.
        • Gong E.
        • de Carvalho Macruz F.B.
        • et al.
        Ultra–Low-Dose 18F-Florbetaben Amyloid PET Imaging Using Deep Learning with Multi-Contrast MRI Inputs.
        Radiology. 2018; 290: 649-656
        • Huang Y.
        • Cheng Y.
        • Bapna A.
        • et al.
        GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism.
        in: Wallach H. Advances in neural information processing systems. Curran Associates, Inc., 2019: 32
        • Zech J.R.
        • Badgeley M.A.
        • Liu M.
        • et al.
        Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study.
        PLoS Med. 2018;
        • Chang K.
        • Beers A.L.
        • Brink L.
        • et al.
        Multi-Institutional Assessment and Crowdsourcing Evaluation of Deep Learning for Automated Classification of Breast Density.
        J Am Coll Radiol. 2020;
        • Albadawy E.A.
        • Saha A.
        • Mazurowski M.A.
        Deep learning for segmentation of brain tumors: Impact of cross-institutional training and testing: Impact.
        Med Phys. 2018;
        • Mårtensson G.
        • Ferreira D.
        • Granberg T.
        • et al.
        The reliability of a deep learning model in clinical out-of-distribution MRI data: A multicohort study.
        Med Image Anal. 2020; 66: 101714
        • DeGrave A.J.
        • Janizek J.D.
        • Lee S.-I.
        AI for radiographic COVID-19 detection selects shortcuts over signal.
        medRxiv. 2020;
        • Haibe-Kains B.
        • Adam G.A.
        • Hosny A.
        • et al.
        Transparency and reproducibility in artificial intelligence.
        Nature. 2020; 586: E14-E16
        • Gibson E.
        • Li W.
        • Sudre C.
        • et al.
        NiftyNet: a deep-learning platform for medical imaging.
        Comput Methods Programs Biomed. 2018; 158: 113-122
        • Isensee F.
        • Jaeger P.F.
        • Kohl S.A.A.
        • et al.
        a self-configuring method for deep learning-based biomedical image segmentation.
        Nat Methods. 2021; 18: 203-211
        • Beers A.
        • Brown J.
        • Chang K.
        • et al.
        DeepNeuro: an open-source deep learning toolbox for neuroimaging.
        Neuroinformatics. 2020;
        • Larrazabal A.J.
        • Nieto N.
        • Peterson V.
        • et al.
        Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis.
        Proc Natl Acad Sci. 2020; 117: 12592-12594
        • Seyyed-Kalantari L.
        • Liu G.
        • McDermott M.
        • et al.
        CheXclusion: Fairness gaps in deep chest X-ray classifiers.
        . 2020; 26: 232-243
        • Tomašev N.
        • Glorot X.
        • Rae J.W.
        • et al.
        A clinically applicable approach to continuous prediction of future acute kidney injury.
        Nature. 2019; 572: 116-119
        • van Amsterdam W.A.C.
        • Verhoeff J.J.C.
        • de Jong P.A.
        • et al.
        Eliminating biasing signals in lung cancer images for prognosis predictions with deep learning.
        Npj Digit Med. 2019; 2: 122
      4. Amini A, Soleimany AP, Schwarting W, et al. Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. Association for Computing Machinery. Honolulu, HI, January 2019. p. 289–95. doi:10.1145/3306618.3314243.

        • Mendoza F.
        • Lu R.
        Basics of Image Analysis.
        Food Eng Ser. 2015; : 9-56
        • Fawcett T.
        An introduction to ROC analysis.
        Pattern Recognit Lett. 2006; 27: 861-874
        • Saito T.
        • Rehmsmeier M.
        The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets.
        PLoS One. 2015; 10: e0118432
        • He H.
        • Garcia E.A.
        Learning from Imbalanced Data.
        IEEE Trans Knowl Data Eng. 2009; 21: 1263-1284
      5. Chinchor, N. MUC-4 Evaluation Metrics. in Proceedings of the 4th Conference on Message Understanding. Association for Computational Linguistics. McLean, VA, June 1992. p. 22–9. doi:10.3115/1072064.1072067.

        • McHugh M.L.
        Interrater reliability: the kappa statistic.
        Biochem Med. 2012; 22: 276-282
        • Jaccard P.
        The Distribution of the Flora in the Alpine Zone.
        New Phytol. 1912; 11: 37-50
        • Dice L.R.
        Measures of the Amount of Ecologic Association Between Species.
        Ecology. 1945; 26: 297-302
      6. Huttenlocher, D. P., Rucklidge, W. J. & Klanderman, G. A. Comparing images using the Hausdorff distance under translation. in Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Champaign, IL, June 1992. 654–6. doi: 10.1109/CVPR.1992.223209.

        • Taha A.A.
        • Hanbury A.
        Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool.
        BMC Med Imaging. 2015; 15: 29
        • Everingham M.
        • Van Gool L.
        • Williams C.K.I.
        • et al.
        The Pascal Visual Object Classes (VOC) Challenge.
        Int J Comput Vis. 2010; 88: 303-338
      7. Lin, T.-Y. et al. Microsoft COCO: Common Objects in Context. in ECCV (European Conference on Computer Vision). Zürich, Switzerland, September 2014.

        • Bandos A.I.
        • Rockette H.E.
        • Song T.
        • et al.
        Area under the free-response ROC curve (FROC) and a related summary index.
        Biometrics. 2009; 65: 247-256
        • Shrout P.E.
        • Fleiss J.L.
        Intraclass correlations: uses in assessing rater reliability.
        Psychol Bull. 1979; 86: 420-428
        • McGraw K.O.
        • Wong S.P.
        Forming inferences about some intraclass correlation coefficients.
        Psychol Methods. 1996; 1: 30-46
        • Koo T.K.
        • Li M.Y.
        A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research.
        J Chiropr Med. 2016; 15: 155-163
        • Despotović I.
        • Goossens B.
        • Philips W.
        MRI Segmentation of the Human Brain: Challenges, Methods, and Applications.
        Comput Math Methods Med. 2015; 2015: 450341
        • Pace D.F.
        • Dalca A.V.
        • Geva T.
        • et al.
        Interactive Whole-Heart Segmentation in Congenital Heart Disease.
        Med Image Comput Comput Assist Interv. 2015; 9351: 80-88
        • Prevedello L.M.
        • Halabi S.S.
        • Shih G.
        • et al.
        Challenges Related to Artificial Intelligence Research in Medical Imaging and the Importance of Image Analysis Competitions.
        Radiol Artif Intell. 2019; 1: e180031
        • Armato 3rd, S.G.
        • McNitt-Gray M.F.
        • Reeves A.P.
        • et al.
        The Lung Image Database Consortium (LIDC): an evaluation of radiologist variability in the identification of lung nodules on CT scans.
        Acad Radiol. 2007; 14: 1409-1421
        • Armato 3rd, S.G.
        • McLennan G.
        • Bidaut L.
        • et al.
        The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans.
        Med Phys. 2011; 38: 915-931
        • Taylor-Phillips S.
        • Jenkinson D.
        • Stinton C.
        • et al.
        Double Reading in Breast Cancer Screening: Cohort Evaluation in the CO-OPS Trial.
        Radiology. 2018; 287: 749-757
        • Barnett M.L.
        • Boddupalli D.
        • Nundy S.
        • et al.
        Comparative Accuracy of Diagnosis by Collective Intelligence of Multiple Physicians vs Individual Physicians.
        JAMA Netw Open. 2019; 2: e190096
        • Krause J.
        • Gulshan V.
        • Rahimy E.
        • et al.
        Grader Variability and the Importance of Reference Standards for Evaluating Machine Learning Models for Diabetic Retinopathy.
        Ophthalmology. 2018; 125: 1264-1272
      8. Jordan MI, Jacobs RA. Hierarchical mixtures of experts and the EM algorithm. in Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan), October 1993. p. 1339–44. vol.2. doi: 10.1109/IJCNN.1993.716791.

        • Ratner A.
        • Bach S.H.
        • Ehrenberg H.
        • et al.
        Snorkel: Rapid Training Data Creation with Weak Supervision.
        Proc VLDB Endow. 2017; 11: 269-282
      9. Rolnick D, Veit A, Belongie S, et al. Deep Learning is Robust to Massive Label Noise. 2018.

      10. Wang X, Peng Y, Lu L, et al. ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, July 21-26, 2017. p. 3462–71. doi: 10.1109/CVPR.2017.369.

        • Likert R.
        A technique for the measurement of attitudes.
        Arch Psychol. 1932; 22: 55
      11. Adebayo J, Gilmer J, Muelly M, et al. Sanity Checks for Saliency Maps. 2020.

        • Reyes M.
        • Meier R.
        • Pereira S.
        • et al.
        On the Interpretability of Artificial Intelligence in Radiology: Challenges and Opportunities.
        Radiol Artif Intell. 2020; 2: e190043
      12. Arun N, Gaw N, Singh P, et al. Assessing the (Un)Trustworthiness of Saliency Maps for Localizing Abnormalities in Medical Imaging. (2020).