计算机辅助诊断技术.pdf

发布时间：2022-06-03 发布人：admin 分类：说明书资料大小：0.87M 资料格式：pdf 举报版权申诉

djr007-6391651-16359647635411540354.pdf-第1页.png

第1页 / 共11页

djr007-6391651-16359647635411540354.pdf-第2页.png

第2页 / 共11页

djr007-6391651-16359647635411540354.pdf-第3页.png

第3页 / 共11页

djr007-6391651-16359647635411540354.pdf-第4页.png

第4页 / 共11页

djr007-6391651-16359647635411540354.pdf-第5页.png

第5页 / 共11页

djr007-6391651-16359647635411540354.pdf-第6页.png

第6页 / 共11页

djr007-6391651-16359647635411540354.pdf-第7页.png

第7页 / 共11页

djr007-6391651-16359647635411540354.pdf-第8页.png

第8页 / 共11页

文本预览

REVIEW Computer-aided diagnostic models in breast cancer screening Mammography is the most common modality for breast cancer detection and diagnosis and is often complemented by ultrasound and MRI. However, similarities between early signs of breast cancer and normal structures in these images make detection and diagnosis of breast cancer a difficult task. To aid physicians in detection and diagnosis, computer-aided detection and computer-aided diagnostic (CADx) models have been proposed. A large number of studies have been published for both computer-aided detection and CADx models in the last 20 years. The purpose of this article is to provide a comprehensive survey of the CADx models that have been proposed to aid in mammography, ultrasound and MRI interpretation. We summarize the noteworthy studies according to the screening modality they consider and describe the type of computer model, input data size, feature selection method, input feature type, reference standard and performance measures for each study. We also list the limitations of the existing CADx models and provide several possible future research directions. Turgay Ayer1, Mehmet US Ayvaci1, Ze Xiu Liu1, Oguzhan Alagoz1,2 & Elizabeth S Burnside†1,3 1Industrial & Systems Engineering Department, University of Wisconsin, Madison, WI, USA 2Department of Population Health Sciences, University of Wisconsin, Madison, WI, USA 3Department of Biostatistics & Medical Informatics, University of Wisconsin, Madison, WI, USA †Author for correspondence: Department of Radiology, University of Wisconsin Medical School, E3/311, 600 Highland Avenue, Madison, WI 53792-3252, USA Tel.: +1 608 265 2021 Fax: +1 608 265 1836 eburnside@uwhealth.org KEYWORDS: breast cancer n computer-aided detection n computer-aided diagnosis n mammography n MRIn ultrasound Radiological imaging, which often includes mammography, ultrasound (US) and MRI, is the most effective means, to date, for early detection of breast cancer [1]. However, differen- tiating between benign and malignant findings is difficult. Successful breast cancer diagnosis requires sys- tematic image ana lysis, characterization and inte- gration of numerous clinical and mammographic variables [2], which is a difficult and error-prone task for physicians. This leads to low positive predictive value of imaging interpretation [3]. The integration of computer models into the radiological imaging interpretation process can increase the accuracy of image interpretation. There are two broad categories of computer models in breast cancer diagnosis: computer- aided detection (CADe) and computer-aided diagnostic (CADx) models. CADe models are computerized tools that assist radiologists in locating and identifying possible abnormalities in radiologic images, leaving the interpretation of the abnormality to the radiologist [4]. The poten- tial for CADe models to improve detection of cancer has been investigated in several retrospec- tive studies [5–8] as well as carefully controlled prospective studies [9–12]. For a review of CADe studies, the reader is referred to recent review articles by Hadjiiski et al. [13] and Nishikawa [14]. CADx models are decision aids for radiologists characterizing findings from radiologic images (e.g., size, contrast and shape) identified either by a radiologist or a CADe model [15]. CADx models have been demonstrated to increase the accuracy of mammography interpretation in sev- eral studies. Encouraged by promising results in mammography interpretation, numerous CADx models are being developed to help in breast US and MRI interpretation. There are two reviews of CADx models, but neither are comprehensive in nature. The first, by Elter and Horsch, focuses on CADx models in mammography interpretation, but not in US and MRI, and concentrates on technical aspects of model development rather than more clini- cally relevant considerations [16]. The second, by Dorrius and van Ooijen, focuses on MRI CADx models [17]. Here we provide a compre- hensive review for mammography, US and MRI CADx models in breast cancer diagnosis. We start by summarizing CADx models proposed for mammography interpretation. We then describe CADx models in US and MRI. We conclude by discussing several common limita- tions of existing research on CADx models and provide possible future research directions. Mammography CADx models Early work involving CADx models in mammo- graphy interpretation dates back to 1993. A sum- mary list for primary mammography CADx models is presented in Table 1. 10.2217/IIM.10.24 © 2010 Future Medicine Ltd Imaging Med. (2010) 2(3), 313–323 ISSN 1755-5191 313 For reprint orders, please contact: reprints@futuremedicine.com

REVIEW Ayer, Ayvaci, Liu, Alagoz & Burnside Table 1. Summary of computer-aided diagnostic models in mammography interpretation. Study (year) Model AUC Ref. Size of dataset (n) 107 240 110 500 2100 253 115 419 62,219 62,219 62,219 151 [26] Jiang et al. (1996) [31] Markopoulos et al. (2001) [35] Huo et al. (2002) [37] Floyd et al. (2000) [38] Elter et al. (2007) [34] Chan et al. (1999) [41] Gupta et al. (2006) [42] Wang et al. (1999) [43] Chhatwal et al. (2009) [44] Burnside et al. (2009) [45] Ayer et al. (2010) [46] Bilska-Wolak et al. (2005) ANN: Artificial neural network; AUC: Area under the curve; BN: Bayesian network; CBR: Case-based reasoning; DT: Decision tree; LDA: Linear discriminant ana lysis; LDC: Linear discriminant classifier; LR: Logistic regression; LRbC: Likelihood ratio-based classifier. 0.92 0.937 0.96 0.83 0.87/0.89 0.91 0.92 0.886 0.963 0.960 0.965 0.88 ANN ANN ANN CBR DT/CBR LDC LDA BN LR BN ANN LRbC Reader study Yes Yes Yes No No Yes No No Yes Yes Yes No Early work of CADx research used artificial neural networks (ANNs) and Bayesian networks (BNs). The first CADx model was proposed by Wu et al., who developed an ANN to classify lesions detected by radiologists as malignant or benign [18]. They demonstrated that their sim- ple ANN, which was built using 14 radiologist- extracted mammography features and trained on a small set of data, achieved higher area under the curve (AUC) of the receiver operating char- acteristic (ROC) curve than a group of attending radiologists without computer aid (0.89 vs 0.84). Baker et al. later built more complex ANN mod- els, where the inputs included Breast Imaging Reporting and Data System (BI-RADS) descrip- tors as well as variables related to the patient’s medical history [19]. Their approach was later extended and evaluated by others [20–23]. Fogel et al. also built one of the early ANN models that prospectively examined suspicious masses as a second opinion to radiologists [24]. Kahn et al. developed one of the first BN models to classify mammographic lesions as benign and malignant [25]. They used radiologist-extracted mammo- graphy features as the input to their model and demonstrated that BNs had a potential to help radiologists making diagnostic decisions. Jiang et al. trained an ANN to differentiate malignant and benign clustered microcalcifica- tions [26]. The microcalcifications were initially identified by the radiologists and eight features of these microcalcifications were automatically extracted by an image-processing algorithm. The training and testing data included 107 cases (40 malignant) from 53 patients. This retrospec- tive study only included microcalcifications that underwent biopsy. Five radiologists participated in the observer study. ROC ana lysis was used to assess performance. The average cumula- tive AUC values for the ANN and the radiolo- gists were 0.92 and 0.89, respectively. While the cumulative AUCs did not have a signifi- cant difference (p = 0.22), the comparison of AUCs over the 0.90 sensitivity threshold yielded statistically significant differences (p < 0.05). Jiang et al. later extended this model to classify lesions as malignant or benign for multiple-view mammograms [27]. They found that the use of a CADx model decreased the number of biop- sied benign lesions while increasing the biopsy recommendations for malignant clusters. In a follow-up study, Jiang et al. demonstrated that, in addition to its diagnostic power, their ANN model had the potential to reduce the variabil- ity among radiologists in the interpretation of mammograms [28]. In another study, they com- pared their CADx model with independent dou- ble readings on 104 mammograms (46 malig- nant) containing clustered microcalcifications and reported more significant improvements in the ROC performance when the CADx model was used as compared with the independent double readings [29]. More recently, Rana et al. applied the CADx model developed by Jiang et al. on screen-film mammograms [26,27] to full- field digital mammograms [30]. They concluded that their CADx model maintained consistently high performance in classifying calcifications in full-field digital mammograms without requir- ing substantial modifications from its initial development on screen-film mammograms. Markopoulos et al. compared three radio- logists’ diagnostic accuracies with or with- out computer aid [31]. The computer ana lysis utilized an ANN in diagnosis of clustered microcalcifications on mammograms. This retrospective study included 240 suspicious microcalcifications (108 malignant), which were identified by radiologists and extracted by an image-processing algorithm. The inputs to the ANN included eight features of the cal- cifications. Biopsy was the reference standard. The AUC of the CADx was 0.937, which was significantly higher than that of the physician with the highest performance (AUC = 0.835, p = 0.012). The authors concluded that CADx models also have the potential to help improve the diagnostic accuracy of radiologists. Huo et al. also used ANNs to classify mass lesions detected on screen-film mammograms [32,33]. They automated the feature extraction proc- ess to reduce the intra-observer variability [28,34]. 314 Imaging Med. (2010) 2(3) future science group

Computer-aided diagnostic models in breast cancer screening REVIEW In a follow-up study, Huo et al. used different sets of data for training and testing instead of a single database [35]. Their database included 50 biopsy-proven malignant masses, 50 biopsy- proven benign masses and ten cysts proved by fine needle aspiration. The inputs to the ANN included four characteristics of masses (mar- gin, sharpness, density and texture) that were automatically extracted by an image processing algorithm. When the CADx model was used, the average AUC of the radiologists increased from 0.93 to 0.96 (p < 0.001), demonstrating the generalizability of CADx models to distinct datasets. More recently, Li et al. converted the CADx model developed by Huo et al. on screen- film mammograms to apply to full-field digital mammograms [36]. They evaluated the per- formance of this CADx model using the AUC at various stages of the conversion process and concluded that CADx models had a potential to aid physicians in the clinical interpretation of full-field digital mammograms. Floyd et al. proposed a case-based reasoning (CBR) approach, in which the classification is based on the ratio of the matched malignant cases to total matches in the database [37]. The primary advantage of the CBR method over an ANN is the transparent reasoning process that leads to the system’s diagnosis. However, a key limitation of CBR is that a new case might not have any match in the database. This CBR ana lysis included 500 (174 malignant) cases. Of these 500 cases, 232 were masses alone, 192 were microcalcifications alone and 29 were com- binations of masses and associated microcalci- fications. The inputs to the CBR included ten features from the BI-RADS lexicon (five mass descriptors and five calcification descriptors) and a descriptor from clinical data. Biopsy was the reference standard. Two radiologists were asked to describe each lesion using the BI-RADS lexi- con. The input dataset contained both retrospec- tive (206 cases) and prospective (194 cases) data. The performance of the CBR model was com- pared with that of an ANN. While the ANN slightly outperformed the CBR (AUC = 0.86 vs 0.83, respectively), the study did not report statistical significance of this difference. Elter et al. evaluated two novel CADx approaches that predicted breast biopsy outcomes [38]. The study retrospectively analyzed cases that contained masses or calcifications but not both. The dataset included 2100 masses (1045 malig- nant) and 1359 calcifications (610 malignant) that were extracted from mammograms in a pub- lic database and double reviewed by radiologists. The positive cases included histologically proven cancers, while negative cases were followed up for a 2-year period. The inputs to the CADx model included patient age and five features from the BI-RADS lexicon (two mass descriptors and three calcification descriptors). Elter et al. used two types of CADx systems: a decision tree and a CBR. An ANN was also implemented to compare its performance to that of the two pro- posed models. The models were evaluated based on ROC ana lysis. Contrary to the findings by Floyd et al. [37], they found that the CBR out- performed the ANN (AUC = 0.89 vs 88, respec- tively, p < 0.001), while the ANN performed bet- ter than the decision tree (AUC = 0.88 vs 0.87, respectively, p < 0.001). The authors concluded that both systems could potentially reduce the number of unnecessary biopsies with more accurate prediction of breast biopsy outcomes. However, the differences in AUC performances were small, raising the possibility that they may not be clinically significant. Chan et al. retrospectively evaluated the effects of a linear discriminant classifier on radiologists’ characterization of masses [34]. The dataset included 253 mammograms (127 malig- nant). Biopsy was the reference standard. The findings were initially identified by a radiologist and 41 features of these findings (texture and morphologic features) extracted by an image- processing algorithm were used as inputs to the linear discriminant classifier. Six reading radio- logists evaluated the mammograms with and without CADx. The classification performance was evaluated by ROC ana lysis. The average AUC of the reading radiologists without CADx was 0.87 and improved to 0.91 with CADx (p < 0.05). Hadjiiski et al. performed similar studies to evaluate a CADx model and par- ticularly investigated the extent of increase in diagnostic accuracy when more mammographic information was available [39,40]. Specifically, they evaluated two scenarios: the increase in the performance of CADx when trained on serial mammograms [39] and the increase in the per- formance of CADx when trained with interval change ana lysis, which used interval change information extracted from prior and current mammograms [40]. For both scenarios, they reported superior AUCs for the radiologists with CADx when compared with the radiologists without CADx (for the first scenario AUC = 0.85 vs 0.79, respectively, p = 0.005; and for the sec- ond scenario AUC = 0.87 vs 0.83, respectively, p < 0.05) and, thus, a significant improvement of the radiologists’ diagnostic accuracy. future science group www.futuremedicine.com 315

REVIEW Ayer, Ayvaci, Liu, Alagoz & Burnside Gupta et al. retrospectively studied 115 biopsy-proven masses or calcification lesions (51 malignant) using a linear discriminant ana lysis (LDA)-based CADx model [41]. The images and case records were obtained from a public database. This study compared the performance of the LDA while using differ- ent descriptors for one mammographic view and two mammographic views. The attending radiologists described each abnormality using BI-RADS descriptors and categories. The inputs to the CADx model included patient age and two features from the BI-RADS lexicon (mass shape and mass margin). While the CADx with two mammographic views outperformed that with one mammographic view (AUC = 0.920 vs 0.881, respectively), the difference was not statistically significant (p = 0.056). Wang et al. built and evaluated three BNs [42]. One of the BNs was constructed based on a total of 13 mammographic features and patients’ characteristics. The other two BNs were hybrid classifiers, one of which was constructed by averaging the outputs from two subnetworks of mammographic-only or non-mammographic features. The third classifier used logistic regres- sion (LR) to compute the outputs from the same subnetworks. This retrospective study included 419 cases (92 malignant). The verification of positive cases included biopsy and/or surgical reports, while negative cases were followed up for at least a 2-year period. The input features included four mammographic findings and nine descriptors from clinical data. The features were manually extracted by radiologists. The AUC for the BN that incorporated all 13 features was 0.886 and the AUCs for the BNs that included only mammographic features and patient char- acteristics were 0.813 and 0.713, respectively. The BN that included the full feature set was significantly better than both of the hybrid BNs (p < 0.05). Recently, Chhatwal et al. [43] and Burnside et al. [44] developed a LR and BN, respectively, based on a consecutive dataset from a breast imaging practice consisting of 62,219 mammog- raphy records (510 malignant). The input fea- tures included 36 variables based on BI-RADS descriptors for masses, calcifications, breast density, associated findings and patients’ clini- cal descriptors. The input dataset was recorded in the national mammography database format, which allowed the use of these models in other healthcare institutions. Contrary to most stud- ies in the literature, they included the nonbiop- sied mammograms in their training dataset and used cancer registries as the reference standard instead of the biopsy results. They analyzed the performance of the CADx models using ROC ana lysis and concluded that their CADx models performed better than that of the radiologists in aggregate (AUCs = 0.963 and 0.960 for LR and BN, respectively, vs 0.939 for the radio logist; p < 0.05). More recently, Ayer et al. developed an ANN model using the same dataset and demonstrated that the ANN model achieved slightly a higher AUC (0.965) than that of the LR and BN models as well as the radiologists [45]. Additionally, Ayer et al. extended the per- formance ana lysis of the CADx models from discrimination (classification) to calibration metrics, which assessed the ability of this ANN model to accurately predict the cancer risk for individual patients. Bilska-Wolak et al. conducted a preclinical evaluation of a previously developed CADx model, a likelihood ratio-based classifier, on a new set of data [46]. The model retrospec- tively evaluated 151 new and independent cases (42 malignant). Biopsy was the reference standard. Suspicious masses were detected and described by an attending radiologist using 16 different features from the BI-RADS lexi- con and patient history. The authors evaluated the CADx model based on ROC ana lysis and sensitivity statistics. The average AUC was 0.88. The model achieved 100% sensitivity at 26% specificity. The results were compared with an ANN model created using the same datasets. The AUC of the ANN was lower than that of the likelihood ratio-based classifier. Bilska-Wolak et al. concluded that their CADx model showed promising results that could reduce the number of false-positive mammograms. US CADx models Ultrasound imaging is an adjunct to diagnostic mammography, where CADx models could be used for improving diagnostic accuracy. CADx models developed for US scans date back to late 1990s. In this section, we review studies that apply CADx systems to breast sonography or US-mammography combination in distinguish- ing malignant from benign lesions. A sum- mary list for the primary US CADx models is presented in Table 2. Giger et al. classified malignant lesions in a database of 184 digitized US images [47]. Biopsy, cyst aspiration or image interpretation alone were used to confirm benign lesions, whereas malig- nancy was proven at biopsy. The authors utilized an LDA model to differentiate between benign 316 Imaging Med. (2010) 2(3) future science group

Computer-aided diagnostic models in breast cancer screening REVIEW and malignant lesions using five computer- extracted features based on lesion shape and mar- gin, texture, and posterior acoustic attenuation (two features). ROC ana lysis yielded AUCs of 0.94 for the entire database and 0.87 for the data- base that only included biopsy- and cyst-proven cases. The authors concluded that their ana lysis demonstrated that computerized ana lysis could improve the specificity of breast sonography. Chen et al. developed an ANN to classify malignancies on US images [48]. A physician manually selected sub-images corresponding to a suspicious tumor region followed by compu- terized ana lysis of intensity variation and tex- ture information. Texture correlation between neighboring pixels was used as the input to the ANN. The training and testing dataset included 140 biopsy-proven breast tumors (52 malig- nant). The performance was assessed by AUC, sensitivity and specificity metrics, which yielded an AUC of 0.956 with 98% sensitivity and 93% specificity at a threshold level of 0.2. The authors concluded that their CADx model was useful in distinguishing benign and malignant cases, yet also noted that larger datasets could be used to improve the performance. Later, Chen et al. improved on a previous study [48] and devised an ANN model com- posed of three components: feature extraction, feature selection, and classification of benign and malignant lesions [49]. The study used two sets of biopsy-proven lesions; the first set with 160 dig- itally stored lesions (69 malignant) and the sec- ond set with 111 lesions (71 malignant) in hard- copy images that were obtained with the same US system. Hard-copy images were digitized using film scanners. Seven morphologic features were extracted from each lesion using an image- processing algorithm. Given the classifier, for- ward stepwise regression was employed to define the best performing features. These features were used as inputs to a two-layer feed-forward ANN. For the first set, the ANN achieved an AUC of 0.952, 90.6% sensitivity and 86.6% specificity. For the second set, the ANN achieved an AUC of 0.982, 96.7% sensitivity and 97.2% specifi- city. The ANN model trained on each dataset was demonstrated to be statistically extendible to other datasets at a 5% significance level. The authors concluded that their ANN model was an effective and robust approach for lesion classifi- cation, performing better than the counterparts published earlier [47,48]. Horsch et al. explored three aspects of an LDA classifier that was based on automatic segmenta- tion of lesions and automatic extraction of lesion Table 2. Summary of computer-aided diagnostic models in ultrasound interpretation. Study (year) Model AUC Size of dataset (n) 184 140 160/111 400 102 1046 717 67 Giger et al. (1999) Chen et al. (1999) Chen et al. (2003) Horsch et al (2002) Sahiner et al. (2004) Drukker et al. (2008) Horsch et al. (2006) Sahiner et al. (2009) ANN: Artificial neural network; AUC: Area under the curve; BNN: Bayesian neural network; LDA: Linear discriminant analysis. 0.94 0.956 0.952/0.982 0.87 0.92 0.90 0.91 0.95 LDA ANN ANN LDA LDA BNN BNN LDA Reader study No No No No Yes Yes Yes Yes Ref. [47] [48] [49] [50] [51] [52] [53] [54] shape, margin, texture and posterior acoustic behavior [50]. The study was conducted using a database of 400 cases with 94 malignancies, 124 complex cysts and 182 benign lesions. The refer- ence standard was either biopsy or aspiration. First, marginal benefit of adding a feature to the LDA model was investigated. Second, the performance of the LDA model in distinguishing carcinomas from different benign lesions was explored. The AUC values for the LDA model were 0.93 for dis- tinguishing carcinomas from complex cysts and 0.72 for differentiating fibrocystic disease from carcinoma. Finally, eleven independent trials of training and testing were conducted to validate the LDA model. Validation resulted in a mean AUC of 0.87 when computer-extracted features from automatically delineated lesion margins were used. There was no statistically significant dif- ference between the best two- and four-feature classifiers; therefore, adding features to the LDA model did not improve the performance. Sahiner et al. investigated computer vision techniques to characterize breast tumors on 3D US volumetric images [51]. The dataset was composed of masses from 102 women who underwent either biopsy or fine-needle aspira- tion (56 had malignant masses). Automated mass segmentation in 2D and 3D, as well as feature extraction followed by LDA, were imple- mented to obtain malignancy scores. Stepwise feature selection was employed to reduce eight morphologic and 72 texture features into a best- feature subset. An AUC of 0.87 was achieved for the 2D-based classifier, while the AUC for the 3D-based classifier was 0.92. There was no statistically significant difference between the two classifiers (p = 0.07). The AUC values of the four radiologists fell in the range of 0.84 to 0.92. Comparing the performance of their model to that of radiologists, the difference was not sta- tistically significant (p = 0.05). However, the future science group www.futuremedicine.com 317

REVIEW Ayer, Ayvaci, Liu, Alagoz & Burnside partial AUC for their model was significantly higher than those of the three radiologists (p < 0.03, 0.02 and 0.001). Drukker et al. used various feature segmen- tation and extraction schemes as inputs to a Bayesian neural network (BNN) classifier with five hidden layers [52]. The purpose of the study was to evaluate a CADx workstation in a realistic setting representative of clinical diagnostic breast US practice. Benign or malignant lesions that were verified at biopsy or aspiration, as well as those determined through imaging characteristics on US scans, MR images and mammograms, were used for the ana lysis. The authors included non- biopsied lesions in the dataset to make the series consecutive, which more accurately reflects clini- cal practice. The inputs to the network included lesion descriptors consisting of the depth:width ratio, radial gradient index, posterior acoustic sig- nature and autocorrelation texture feature. The output of the network represented the probabil- ity of malignancy. The study was conducted on a patient population of 508 (101 had breast cancer) with 1046 distinct abnormalities (157 cancer- ous lesions). Comparing the current radiology practice with the CADx workstation, the CADx scheme achieved an AUC of 0.90, corresponding to 100% sensitivity at 30% specificity, while radi- ologists performed with 77% specificity for 100% sensitivity when only nonbiopsied lesions were included. When only biopsy-proven lesions were analyzed, computerized lesion characterization outperformed the radiologists. In routine clinical practice, radiologists often combine the results from mammography and US, if available, when making diagnostic decisions. Several studies demonstrated that CADx could be useful in the differentiation of benign findings from malignant breast masses when sonographic data are combined with corresponding mammo- graphic data. Horsch et al. evaluated and com- pared the performance of five radiologists with different expertise levels and five imaging fellows with or without the help of a BNN [53]. The BNN model utilized a computerized segmentation of the lesion. Mammographic features used as the input included spiculation, lesion shape, margin sharpness, texture and gray level. Sonographic input features included lesion shape, margin, texture and posterior acoustic behavior. All fea- tures were automatically extracted by an image- processing algorithm. This retrospective study examined a total of 359 (199 malignant) mam- mographic and 358 (67 malignant) sonographic images. Additionally, 97 (39 malignant) multimo- dality cases (both mammogram and sonogram) were used for testing purposes only. Biopsy was the reference standard. The performances of each radiologist/imaging fellow or pair of observers were quantified by the AUC, sensitivity and spe- cificity metrics. Average AUC without BNN was 0.87 and with BNN was 0.92 (p < 0.001). The sensitivities without and with BNN were 0.88 and 0.93, respectively (p = 0.005). There was not a significant difference in specificities without and with BNN (0.66 vs 0.69, p = 0.20). The authors concluded that the performance of the radiologists and imaging fellows increased significantly with the help of the BNN model. In another multimodality study, Sahiner et al. investigated the effect of a multimodal CADx system (using mammography and US data) in discriminating between benign and malignant lesions [54]. The dataset for the study consisted of 13 mammography features (nine morphologic, three spiculation and one texture) and eight 3D US features (two morphologic and six texture) that were extracted from 67 biopsy-proven masses (35 malignant). Ten experienced readers first gave a malignancy score based on mammography only, then re-evaluated based on mammography and US combined, and were finally allowed to change their minds given the CADx system’s evaluation of the mass. The CADx system automatically extracted the features, which were then fed into a multimodality classifier (using LDA) to give a risk score. The results were compared using ROC curves, which suggested statistically significant improvement (p = 0.05) when the CADx system was consulted (average AUC = 0.95) over read- ers’ assessment of combined mammography and US without the CADx (average AUC = 0.93). Sahiner et al. concluded that a CADx system combining the features from mammography and US may have the potential to improve radiologist’s diagnostic decisions [54]. As discussed previously, a variety of sono- graphic features (texture, margin and shape) are used to classify benign and malignant lesions. 2D/3D Doppler imaging provides additional advantages in classification when compared with grayscale, by demonstrating breast lesion vascularity. Chang et al. extracted features of tumor vascularity from 3D power Doppler US images of 221 lesions (110 benign) and devised an ANN to classify lesions [55]. The study dem- onstrated that CADx, using 3D power Doppler imaging, can aid in the classification of benign and malignant lesions. In addition to the aforementioned studies, there are other works that developed and evalu- ated CADx systems in differentiating between 318 Imaging Med. (2010) 2(3) future science group

Computer-aided diagnostic models in breast cancer screening REVIEW benign and malignant lesions. Joo et al. devel- oped an ANN that was demonstrated to have potential to increase the specificity of US char- acterization of breast lesions [56]. Song et al. compared an LR and an ANN in the context of differentiating between malignant and benign masses on breast sonograms from a small dataset [57]. There was no statistically significant differ- ence between the performances of the two meth- ods. Shen et al. investigated the statistical cor- relation between the computerized sonographic features, as defined by BI-RADS, and the signs of malignancy [58]. Chen and Hsiao evaluated US-based CADx systems by reviewing the meth- ods used in classification [59]. They suggested the inclusion of pathologically specific tissue-and hormone-related features in future CADx sys- tems. Gruszauskas et al. examined the effect of image selection on the performance of a breast US CADx system and concluded that their auto- mated breast sonography classification scheme was reliable even with variation in user input [60]. Recently, Cui et al. published a study focusing on the development of an automated method seg- menting and characterizing the breast masses on US images [61]. Their CADx system performed similarly whether it used automated segmenta- tion or an experienced radiologist’s segmentation. In a recent study, Yap et al. designed a survey to evaluate the benefits of computerized processing of US images in improving the readers’ perform- ance of breast cancer detection and classification [62]. The study demonstrated marginal improve- ments in classification when computer-processed US images alongside the originals are used in distinguishing benign from malignant lesions. MRI CADx models Dynamic contrast-enhanced MRI of the breast has been increasingly used in breast cancer evaluation and has been demonstrated to have potential to improve breast cancer diagnosis. The major advantage of MRI over other modalities is its ability to depict both morphologic and physio- logic (kinetic enhancement) information [63]. Despite the advantages of MRI, it is a technology that is continuously evolving and is not currently cost effective for screening the general population [64,65]. Nevertheless, breast MRI is promising in terms of its high sensitivity, especially for high- risk young women with dense breasts. However, specificity has been highly variable in detection of breast cancer [17]. As a way of improving spe- cificity, CADx models to aid discrimination of benign from malignant lesions in MRI imaging would be valuable. There are numerous CADx studies based on breast MRI. Generally, both morphologic and kinetic (enhancement) features are used in these studies to predict benign versus malignant breast lesions. In this section of the article, we only discuss the recent articles (pub- lished after 2003) that exemplify distinct aspects of breast MRI CADx research. A summary list for the primary MRI CADx models is presented in Table 3. Szabó et al. used an ANN to retrospectively determine the discriminative ability of kinetic, morphologic and combined MRI features [66]. Inputs to the ANN included four morphologic and nine kinetic features from 105 biopsy-proven breast lesions with 75 malignancies. The model derived from the most relevant input variables, called the minimal model, resulted in the highest AUC value (0.771). The model with best kinetic features had an AUC of 0.743, the model with all features had an AUC of 0.727 and the model with qualitative architectural features, called the morphologic model, had an AUC of 0.678. The expert radiologists achieved an AUC of 0.799; therefore, the performance was comparable to that of the minimized model. Nattkemper et al. analyzed various machine learning methods using four morphologic and five kinetic tumor features found on MRI as input [67]. The investigated methods included k-means clustering, self-organizing maps, Fisher discrimi- nant ana lysis, k-nearest classifiers, support vector machines and decision tree. The input dataset included dynamic contrast-enhanced MRI data of 74 breast lesions (49 malignant). Biopsy was the reference standard. Among the investigated methods, support vector machines achieved the highest AUC (0.88). They also demonstrated that, among all the MRI features analyzed, the wash-out type features extracted by radiologists improved classification performance the most. Meinel et al. developed an MRI CADx system to improve radiologists’ performance in classify- ing breast lesions [68]. An ANN was constructed using 80 biopsy-proven lesions (43 malignant). Inputs to the ANN were the best 13 features Table 3. Summary of computer-aided diagnostic models in MRI interpretation. Study (year) Model AUC Size of dataset (n) 105 74 80 100 Szabó et al. (2004) Nattkemper et al. (2005) Meinel et al. (2007) Deurloo et al. (2005) ANN: Artificial neural network; AUC: Area under the curve; LR: Logistic regression; SVM: Support vector machine. 0.771 0.88 0.907 0.91 ANN SVM ANN LR Reader study Yes No Yes Yes Ref. [66] [67] [68] [69] future science group www.futuremedicine.com 319

REVIEW Ayer, Ayvaci, Liu, Alagoz & Burnside from a set of 42, based on lesion shape, texture and enhancement kinetics information. The per- formance was assessed by comparison of AUC values from five human readers diagnosing the tumor with and without the help of the CADx system. When only the first abnormality shown to human readers was included, ROC ana lysis yielded AUCs of 0.907 with ANN assistance and 0.816 without the assistance. The difference was statistically significant (p < 0.011); therefore, Meinel et al. demonstrated that their ANN model improves the performance of human readers. Deurloo et al. combined the clinical assessment of clinically and mammographically occult breast lesions by radiologists with computer-calculated probability of malignancy of each lesion into an LR model [69]. Inputs to the LR model included the four best features from a set of six morphologic and three temporal features. Either biopsy-proven lesions or lesions showing transient enhancement were included in the study. The difference between the performance of clinical readings (AUC = 0.86) and computerized ana lysis (AUC = 0.85) was not statistically significant (p = 0.99). However, the combined model performed significantly higher (AUC = 0.91, p = 0.03) when compared with clinical reading without computerized ana lysis. The results demonstrated how computerized ana- lysis could complement clinical interpretation of magnetic resonance images. There are several other studies that addressed the use of CADx systems in MRI of the breast. Williams et al. evaluated the sensitiv- ity of computer-generated kinetic features from CADstream, the first CADx system for breast MRI, for 154 biopsy-proven lesions (41 malig- nant) [70]. The study suggested that computer- aided classification improved radiologists’ per- formance. Lehman et al. compared the accuracy of breast MRI assessments with and without the same software, CADstream [71]. They concluded that the software may improve the accuracy of radiologists’ interpretation; however, the study was conducted on a small set of 33 lesions (nine malignant). Nie et al. investigated the feasibil- ity of quantitative ana lysis of MRI images [72]. Morphology/texture features of breast lesions were selected by an ANN and used in the classi- fication of benign and malignant lesions. Baltzer et al. investigated the incremental diagnostic value of complete enhancing lesions using a CADx model [73]. The study reported improve- ment in specificity with no statistical significance. In a different study, Baltzer et al. investigated both automated and manual measurement meth- ods to assess contrast enhancement kinetics [74]. They analyzed and compared evaluation of con- trast enhancements via curve-type assessment by radiologists, region of interest and CADx. The methods proved diagnostically useful although no statistically significant difference was found. Future perspective There have been significant advances in CADx models in the last 20 years. However, several issues remain open for future researchers. First and most notably, almost all of the existing CADx models are trained and tested on retro- spectively collected cases that may not represent the real clinical practice. Large prospective stud- ies are required to evaluate the performance of CADx models in real life before employing them in a clinical setting. Second, an objective comparative perform- ance evaluation of the existing CADx models is difficult because the reported performances depend on the dataset used in model build- ing. One approach to a systematic performance comparison would be to use large and consistent, publicly available datasets for testing purposes. However, although this approach will give some idea about the realistic/comparable performances of the CADx systems, it would not be completely accurate because a CADx model performing the best on one dataset might be outperformed by another CADx model on another dataset. Third, a frequently ignored issue in CADx model development is the clinical interpretability of the model. Aspects of the CADx model that allow clinical interpretations significantly influ- ence the acceptance of the CADx model by the physicians. Most of the existing CADx models are based on ANNs. Although ANNs are pow- erful in terms of their predictive abilities, their parameters do not carry any real-life interpreta- tion, hence, they are often referred to as ‘black boxes’. Other models such as LR, BN or CBR allow direct clinical interpretation. However, the number of such studies is significantly limited as compared with ANN models. Fourth, performance assessment of the CADx models are usually limited to discrimination (clas- sification) metrics (e.g., sensitivity, specificity and AUC). On the other hand, the accuracy of risk prediction for individual patients, referred to as calibration, is often ignored. Although discrimi- nation assesses the ability to correctly distinguish between benign and malignant abnormalities, it does not tell much about the accuracy of risk pre- diction for individual patients [75]. However, clini- cal decision-making usually involves decisions for individual patients under uncertainty; therefore, 320 Imaging Med. (2010) 2(3) future science group

分享到：

赞收藏

资料库

计算机辅助诊断技术.pdf

相关推荐

行业

热门标签

最新资料