Abstract
Objectives: To correlate breast imaging-reporting and data system (BI-RADS) category 4 lesions with histopathology results to assess the accuracy of subcategorization.
Methods: A retrospective study was carried out from September 2021 to June 2022. A total of 247 breast lesions were reviewed categorized as BI-RADS 4 using ultrasound (US) and digital mammography. Feature analysis of the lesions were obtained using BI-RADS terminology and assigned to subcategories (4A, 4B, and 4C). Pathological analysis was carried out on tissue obtained through US-guided core biopsy. A p-value of <0.05 was considered significant.
Results: Of the 247 lesions, 135 were categorized as subcategory 4A, 68 as 4B, and 44 as 4C. Overall, 41 (16.6%) had malignant lesions, while 206 (83.4%) had benign lesions. The mean age of the patients with benign versus malignant lesions was (43.18±14.02 vs. 51.24±14.15 years; p<0.001). Mean size of benign versus malignant lesions was (1.93±1.65 vs. 3.82±3.89 cm; p<0.001). Findings were compared with histopathology, and the positive predictive value fell within the reference range for subcategories 4C (>70%). High reliability was observed between the 2 readers, with a weighted Cohen’s Kappa value of 0.79 (0.73-0.85). Significant disagreements in the assignment of features on radiological lesion characterization were observed between the 2 readers regarding lesion density, shape, echo pattern, vascularity, and borders.
Conclusion: The results of this study contribute to the existing body of knowledge, emphasizing the need for standardized guidelines for the characterization of BI-RADS 4 subcategories and improved diagnostic accuracy in the management of breast lesions.
Breast cancer is a worldwide health challenge and a leading cause of death among women. In 2020, 2.3 million women were diagnosed with breast cancer, resulting in 685,000 deaths globally. In Saudi Arabia, breast cancer is a major health problem with a cumulative incidence rate of 14.8% and mortality rate of 8.5% among both gender.1 The incidence rate among women in Saudi Arabia was reported to be 29.7% in 2018.1
The breast imaging-reporting and data system (BI-RADS) is a standardized risk estimation and reporting organization for breast pathology encountered in ultrasound (US). It includes mammography and magnetic resonance imaging (MRI). Its fifth edition was updated by the American College of Radiology (ACR) in 2013, incorporating a few changes.2 The BI-RADS facilitates communication between radiologists and referring clinicians through easily understood terminology and international standardization.
The BI-RADS is categorized from 0-6, with each category linked to specific management approaches that play a significant role in the outcome of breast pathology. Suspicious breast lesions are classified as BI-RADS 4, which encompasses a wide range of likelihood of malignancy (>2 to <95%) and should be managed through tissue biopsy. It is further subcategorized into: 4A (mostly benign pathology [>2 to ≤10%] likelihood of malignancy), 4B (moderate suspicion [>10 to ≤50%] likelihood of malignancy), and 4C (high suspicion [>50 to <95%] likelihood of malignancy).3
Ultrasound-guided core needle biopsy (CNB) is a primary interventional procedure used for breast pathology. It offers several advantages compared to open biopsy, such as lower cost, invasiveness, patient discomfort, and procedure time.4 It also demonstrates sensitivity and specificity close to open biopsy.5 However, CNB has a false negative rate of approximately 0.1-2.5%.6-8 Radiological-pathological concordance is a cornerstone that significantly affects patient outcomes and prevents delays in management.9 Hence, a thorough radiologist assessment of the lesion and effective communication with the pathologist helps identify false negative pathology and minimize radiologic-pathologic discordance. This is particularly important due to the lack of objective criteria for BI-RADS 4 subcategorization and poor inter-observer agreement.10
Our study has 2 objectives. First, to ensure the accurate subcategorization of BI-RADS 4 (4A, 4B, and 4C) lesions by comparing them with histopathology results. Second, to determine the inter-observer variability between breast radiologists’ judgment on BI-RADS 4 subcategorization.
Methods
This retrospective study was carried out from September 2021 to June 2022. A total of 3,193 women underwent mammography and US, resulting in the classification of 274 lesions as BI-RADS 4. Of these, 247 lesions were included in this study, and correlations were subsequently established with histopathology reports from the same hospital. Exclusion criteria encompassed lesions without available images (9 patients) or pathology results (17 lesions) as shown in Appendix 1.
The ethical committee of the institutional review board, General Directorate of Health Affairs approved this study (national registration number with NCBE-KACST: H-03-M-84. IRB log No:044-22). Suspicious lesions classified as BI-RADS 4 were independently reviewed and blindly scored as BI-RADS 4A, 4B, or 4C. Lesion features were evaluated based on ACR BI-RADS lexicon 5th edition. Mammography assessed density, mass shape (oval, lobular, and irregular), suspicious calcification, asymmetry, architectural distortion, and nipple retraction. While the US assessed margin, echo pattern, posterior characteristics, and associated features such as vascularity, and skin thickening.
In our institution, mammography and US are routine diagnostic tools for symptomatic women over 35 years old for provisional diagnosis while tissue biopsy with histology is used for confirmatory diagnosis. The US alone is carried out for younger symptomatic women. Also, screening mammography is carried out for women who are 40 years old or more; in addition to US if indicated. Breast radiologists carried out US using Philips EPIQ 7G and GE Healthcare machines equipped with high-frequency linear probes (5-12 and 6-15 MHz). Mammography was executed by skilled technologists (with 16 years of experience) using the Hologic Selenia Dimensions system, capturing mediolateral oblique (MLO) and craniocaudal (CC) views alongside tomosynthesis for all patients. Additional views, such as magnification and compression, were employed as needed. The BI-RADS 4 lesions were sampled under US guidance using a 14-gauge core needle through an automated gun, with at least 3 samples obtained from each lesion.
Hematoxylin and eosin-stained (H&E) sections of the core biopsies were examined. Results were categorized as benign or malignant. In accordance with the the BI-RADS ACR criteria, atypical ductal hyperplasia was grouped among benign lesions; while carcinoma in situ was considered a high-risk “cancerous” lesion and grouped with malignancies to calculate PPV. Patients with atypical ductal hyperplasia on histology were recommended for complete surgical excision.
Statistical analysis
Data analysis was carried out using the Statistical Package for the Social Sciences, version 27.0 (IBM Corp., Armonk, NY, USA). Descriptive statistics were calculated, including means ± standard deviations (SD), medians (interquartile range [IQR]), and percentages. Weighted Cohen’s Kappa with quadratic weighting was used in estimating the inter-rater reliability between the 2 raters. Percentage positive predictive value (%PPV) for malignancy in the lesions was estimated with their 95% confidence intervals (CI) for the 2 raters. Continuous variables were compared using the t-test; while categorical groups were compared with the Chi-square test or Fisher’s exact test and Monte Carlo exact test as suitable. A p-value of <0.05 was considered significant.
Results
The mean age for the 247 lesions was 44.51±14.33 years, with a mean size of 2.26±2.29 cm. Mammography was carried out for 197 (79.8%) lesions. Among them, 84 (43.0%) patients had category B breast density, 75 (38.5%) had category C, 28 (14.4%) had category A, and 8 (4.1%) patients had category D. The left breast was more frequently affected compared to the right side 127 (51.6%) vs. 120 (48.4%). The upper outer quadrant was the most common location of the lesions 133 (53.8%), followed by the retro-areolar area 57 (23.1%), as shown in Table 1.
Of the 247 BI-RADS 4 lesions, the first reader categorized 135 (54.7%) as BI-RADS 4A, 68 (27.5%) as 4B, and 44 (17.8%) as 4C. Meanwhile, the second reader classified 151 (61.1%) as BI-RADS 4A, 48 (19.4%) as 4B, and 48 (19.4%) as 4C, as presented in Table 2.
The imaging characteristics of the lesions are summarized in Table 1. Mass constituted 164 (66.4%) of the cases, mass with calcification accounted for 14 (5.7%), calcification only for 5 (2.0%), asymmetry for 39 (15.8%), architectural distortion for 26 (10.5%), intraductal mass for 37 (15.0%), complex mass for 18 (7.3%), skin thickening for 6(2.4%), and nipple retraction for 2 (0.8%).
Histopathological diagnoses of the lesions are presented in Figures 1 & 2. Benign lesions comprised 206 (83.4%) and malignant lesions accounted for 41 (16.6%) of the cases. The mean age of the patients with benign lesions was 43.18±14.02 years compared to 51.24±14.15 years for patients having malignant lesions (p<0.001). Mean size of benign versus malignant lesions was (1.93±1.65 vs. 3.82±3.89 cm; p<0.001). There were no significant differences in breast density (p=0.07) and the affected side of the body (p=0.98) in the prevalence of benign and malignant lesions. Among the benign cases (n=206), the most common benign lesions were benign breast tissue (n=59 [28.6%]), followed by fibroadenoma (n=41 [19.9%]), fibrocystic changes (n=19 [9.2%]), fibroadenosis (n=16 [7.8%]), usual ductal hyperplasia (n=13 [6.3%]), chronic mastitis (n=13 [6.3%]), fibrosis (n=10 [4.9%]), papilloma (n=9 [4.4%]), adenosis (n=9 [4.4%]), atypical ductal hyperplasia (n=7 [3.4%]), and sclerosis (n=6 [29%]) as shown in Figure 1. Among the malignant lesions (n=41), the most common lesions were invasive ductal carcinoma type-2 (n=18 [43.9%]) and invasive ductal carcinoma type-3 (n=15 [36.6%]) as shown in Figure 2.
Our analysis revealed 1.5% of cancer-PPV for BI-RADS 4A, 7.4% for 4B, and 77.3% for 4C lesions, by rater 1; and cancer-PPV of 1.3% for BI-RADS 4A, 8.3% for 4B, and 72.9% for 4C lesions, by rater 2 as shown in Table 3.
Very good reliability was observed between the 2 readers in the subcategorization of the lesions, with a weighted Cohen’s Kappa values of 0.79 (0.73-0.85) as shown in Table 4. Additionally, a significant agreement was observed between the 2 readers in subcategorization, particularly in subcategory 4A and 4C (p<0.001).
Differences in the feature assignment between the 2 readers in the imaging characterization, with significant p-values, were observed in mammography including lesion density of (p=0.025), and shape (p=0.04), as well as US included; vascularity of the mass (p=0.01), and lesion borders, including circumscribed (p=0.046), angular (p=0.015), and micro lobular borders (p=0.048), as detailed in Table 5.
Discussion
Our study focused on the analysis and interpretation of various aspects of BI-RADS category 4 lesions. The results revealed valuable insights into lesion categorization, patient demographics, imaging characteristics, histopathological diagnoses, and inter-reader reliability for malignancy.
Patient demographics revealed an average age of 44.57 years, aligning with previous research that showed similar age distribution among women with BI-RADS 4 lesions.11 Also, consistent with previous reports, we found a higher prevalence of benign lesions in the younger age group and a higher burden of malignant breast lesions among older women.12,13 Most mammography cases were associated with category B and C breast density. The prevalence of breast density categories indicated a potential association between higher density and risk of breast cancer, a trend consistent with previous studies.14
Our analysis demonstrated cancer PPV of BI-RADS 4 subcategories 4A as 1.5%, 4B as 7.4%, and 4C as 77.3% for rater 1; and 4A as 1.3%, 4B as 8.3%, and 4C as 72.9% for rater 2. Our findings were within the ACR 2013 reference range for subcategories 4C but not for 4A and 4B. The results obtained in our study were consistent with the existing literature that suggests a substantially higher PPV for BI-RADS 4C subcategory.15 However, compared to the same study, the PPV of subcategories 4A and 4B were significantly lower. The considerably higher PPV for subcategory 4C substantiates its association with a higher likelihood of malignancy, reflecting the importance of accurate subcategorization in guiding clinical decisions. It is important that both rater 1 and rater 2 in our study hit the ACR PPV benchmark for subcategory 4C. This indicate that they each have a PPV of >50% of correctly predicting the likelihood of malignancy in cancerous lesions. Therefore, such patients can go straight to biopsy without further adjunctive diagnostic tests. Masses that are classified as BI-RADS 4A and 4B often require consistent objective criteria for BI-RADS 4 subcategorization or in the case of 4A additional diagnostic tests before being downgraded to BI-RADS 3. A potential improvement that could lead to an increased PPV for BI-RADS categorization is using artificial intelligence through training observers in deep-learning computer-based diagnostic systems.16
The findings highlighted the distribution of BI-RADS 4 lesions among 2 readers, with variations observed in their subcategorizations. The study assessed inter-rater reliability rating of the lesions between rater 1 and rater 2, revealing substantial agreement between the 2 raters. The high weighted Cohen’s Kappa value of 0.79 suggests high consistency in BI-RADS 4 subcategorization. In addition, there is an observed difference in the feature assignment by the raters in imaging characterization, especially regarding lesion density, shape, vascularity, and borders. The reliability between the 2 readers in the subcategorization of BIRADS 4 is substantial. In the context of significant differences in lesion characterization, it is worth noting that the PPV fell within the reference range established by the ACR in 2013 for subcategory 4C. However, the PPV for subcategories 4A and 4B did not align with the reference range. Furthermore, both raters had mean PPVs for subcategory 4A that lie within the BIRADS 3 benchmark PPV range (namely, <2%) and both raters had mean PPVs for subcategory 4B that lie within the BIRADS 4A benchmark PPV range (namely, 2 to <10%). The reasons for these observation in the patient population may be because the overall prevalence of malignant lesions are small, the size of malignant lesions are large, and therefore, the difference in mean diameters between benign and malignant lesions are large. This indirectly suggests that it is likely that the majority of the patients studied presents clinically with palpable breast lesions with a relatively small percentage of the lesions of the patients being detected early through mammography screening. Also, the large and significant discordances in feature assignment between the 2 raters may likely contribute significantly to inability of the raters to achieve PPV benchmarks for subcategory 4A and 4B. Therefore, the discrepancy between the 2 raters in the subcategorization of BIRADS 4A and 4B lesions highlight the potential need for consistent criteria in characterizing BI-RADS 4 lesions. Examples of the disagreements are presented in Appendices 2 and 3. In the first case, one of the readers classified the lesion as 4C, whereas the other reader classified it as 4B. For the lesion in Appendix 3, the first reader classified it as 4B, and the second reader classified it as 4A.
These differences could be attributed to perception, experience, or interpretation variations. It could also be due to the fact that BI-RADS recommendations are non-strict or non-rigid, which could lead to improper categorization in BI-RADS 4 subcategories.17 Such discrepancies in subcategorization underscore the inherent subjectivity in mammographic interpretation. Comparable studies have documented similar inter-reader variations in BI-RADS assessments.15,18
This study unveiled a diverse range of imaging characteristics in BI-RADS 4 lesions, with masses being the most common presentation (66.4%), followed by other features such as calcifications, architectural distortion, and intraductal masses. This aligns with the characteristic diversity expected within the BI-RADS 4 categories established by ACR guidelines.2 The various imaging characteristics underscore the importance of a comprehensive assessment approach to evaluate lesions for potential malignancy accurately.
Histopathological examination of the lesions shows a predominantly benign nature (83.4%), with fibroadenomas and benign breast tissue being the most frequent findings. These results concur with previous research that reported a higher prevalence of benign lesions within the BI-RADS 4 category.11 However, the notable presence of malignant lesions (16.6%) emphasizes the significance of accurate classification and management of these lesions. Also, there was a marked significant difference in the mean diameter of benign lesions (1.93 cm) compared to malignant lesions (3.82 cm). The sizes and the marked variation in the sizes of benign and malignant lesions suggest that majority of the patients studied might have presented with clinically palpable lesions and were not detected early through mammography screening or other imaging tests. A previous survey of 1,135 women aged 50 years or older in the study setting revealed that 92% of the women reported never having a mammogram despite the availability of free mammography screening services; and a more recent survey of 3,245 women aged ≥40 years, revealed that only 40% of them reported ever having a mammogram.19,20 In addition, several studies in the setting have highlighted substantial perceived barriers to the free mammography screening services among women.21-24
Study limitations
Although our study has provided valuable information regarding the correlation between the different BI-RADS 4 subcategories and histopathology results, some limitations should be addressed. Unlike prospective studies, our study has a retrospective design, and the absence of predefined protocols in our retrospective study could introduce various limitations, such as potential selection bias, limitations in reading US images, and the inability to control variables. Our study was carried out exclusively at a single hospital. The single-center approach could limit the generalizability of our findings to a broader population. Moreover, our study focused primarily on the correlation of BI-RADS 4 subcategorization with immediate histopathological findings. Having clinical examination data like palpable lump, thickening, and nipple discharge as well as post-surgical excision and long-term follow-up data, including patient outcomes and progression of identified lesions, could provide a more comprehensive assessment of the accuracy of subcategorization.
In conclusion, this study comprehensively analyzed BI-RADS 4 lesions, encompassing categorization, demographics, imaging characteristics, histopathological diagnoses, and inter-reader reliability. The results contribute to the existing body of knowledge, emphasizing the need for standardized guidelines and improved diagnostic accuracy in managing breast lesions.
Acknowledgment
The authors gratefully acknowledge Research Medics for the English language editing.
Footnotes
Disclosure. Authors have no conflict of interests, and the work was not supported or funded by any drug company.
- Received May 23, 2024.
- Accepted October 17, 2024.
- Copyright: © Saudi Medical Journal
This is an Open Access journal and articles published are distributed under the terms of the Creative Commons Attribution-NonCommercial License (CC BY-NC). Readers may copy, distribute, and display the work for non-commercial purposes with the proper citation of the original work.