Performance of a point-of-care ultrasound platform for artificial intelligence-enabled assessment of pulmonary B-lines
(2025) In Cardiovascular Ultrasound 23(1).- Abstract
Background: The incorporation of artificial intelligence (AI) into point-of-care ultrasound (POCUS) platforms has rapidly increased. The number of B-lines present on lung ultrasound (LUS) serve as a useful tool for the assessment of pulmonary congestion. Interpretation, however, requires experience and therefore AI automation has been pursued. This study aimed to test the agreement between the AI software embedded in a major vendor POCUS system and visual expert assessment. Methods: This single-center prospective study included 55 patients hospitalized for various respiratory symptoms, predominantly acutely decompensated heart failure. A 12-zone protocol was used. Two experts in LUS independently categorized B-lines into 0, 1–2, 3–4,... (More)
Background: The incorporation of artificial intelligence (AI) into point-of-care ultrasound (POCUS) platforms has rapidly increased. The number of B-lines present on lung ultrasound (LUS) serve as a useful tool for the assessment of pulmonary congestion. Interpretation, however, requires experience and therefore AI automation has been pursued. This study aimed to test the agreement between the AI software embedded in a major vendor POCUS system and visual expert assessment. Methods: This single-center prospective study included 55 patients hospitalized for various respiratory symptoms, predominantly acutely decompensated heart failure. A 12-zone protocol was used. Two experts in LUS independently categorized B-lines into 0, 1–2, 3–4, and ≥ 5. The intraclass correlation coefficient (ICC) was used to determine agreement. Results: A total of 672 LUS zones were obtained, with 584 (87%) eligible for analysis. Compared with expert reviewers, the AI significantly overcounted number of B-lines per patient (23.5 vs. 2.8, p < 0.001). A greater proportion of zones with > 5 B-lines was found by the AI than by the reviewers (38% vs. 4%, p < 0.001). The ICC between the AI and reviewers was 0.28 for the total sum of B-lines and 0.37 for the zone-by-zone method. The interreviewer agreement was excellent, with ICCs of 0.92 and 0.91, respectively. Conclusion: This study demonstrated excellent interrater reliability of B-line counts from experts but poor agreement with the AI software embedded in a major vendor system, primarily due to overcounting. Our findings indicate that further development is needed to increase the accuracy of AI tools in LUS.
(Less)
- author
- Labaf, Ashkan
LU
; Åhman-Persson, Linda
; Husu, Leo Silvén
; Smith, J. Gustav
LU
; Ingvarsson, Annika LU
and Evaldsson, Anna Werther LU
- organization
- publishing date
- 2025-12
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- Artificial intelligence, B-lines, Lung ultrasound, POCUS
- in
- Cardiovascular Ultrasound
- volume
- 23
- issue
- 1
- article number
- 3
- publisher
- BioMed Central (BMC)
- external identifiers
-
- pmid:40025516
- scopus:85219638236
- ISSN
- 1476-7120
- DOI
- 10.1186/s12947-025-00338-2
- language
- English
- LU publication?
- yes
- additional info
- Publisher Copyright: © The Author(s) 2025.
- id
- 416e0fbc-2063-4584-9d02-3faa27074721
- date added to LUP
- 2025-03-17 12:39:29
- date last changed
- 2025-07-21 23:36:39
@article{416e0fbc-2063-4584-9d02-3faa27074721, abstract = {{<p>Background: The incorporation of artificial intelligence (AI) into point-of-care ultrasound (POCUS) platforms has rapidly increased. The number of B-lines present on lung ultrasound (LUS) serve as a useful tool for the assessment of pulmonary congestion. Interpretation, however, requires experience and therefore AI automation has been pursued. This study aimed to test the agreement between the AI software embedded in a major vendor POCUS system and visual expert assessment. Methods: This single-center prospective study included 55 patients hospitalized for various respiratory symptoms, predominantly acutely decompensated heart failure. A 12-zone protocol was used. Two experts in LUS independently categorized B-lines into 0, 1–2, 3–4, and ≥ 5. The intraclass correlation coefficient (ICC) was used to determine agreement. Results: A total of 672 LUS zones were obtained, with 584 (87%) eligible for analysis. Compared with expert reviewers, the AI significantly overcounted number of B-lines per patient (23.5 vs. 2.8, p < 0.001). A greater proportion of zones with > 5 B-lines was found by the AI than by the reviewers (38% vs. 4%, p < 0.001). The ICC between the AI and reviewers was 0.28 for the total sum of B-lines and 0.37 for the zone-by-zone method. The interreviewer agreement was excellent, with ICCs of 0.92 and 0.91, respectively. Conclusion: This study demonstrated excellent interrater reliability of B-line counts from experts but poor agreement with the AI software embedded in a major vendor system, primarily due to overcounting. Our findings indicate that further development is needed to increase the accuracy of AI tools in LUS.</p>}}, author = {{Labaf, Ashkan and Åhman-Persson, Linda and Husu, Leo Silvén and Smith, J. Gustav and Ingvarsson, Annika and Evaldsson, Anna Werther}}, issn = {{1476-7120}}, keywords = {{Artificial intelligence; B-lines; Lung ultrasound; POCUS}}, language = {{eng}}, number = {{1}}, publisher = {{BioMed Central (BMC)}}, series = {{Cardiovascular Ultrasound}}, title = {{Performance of a point-of-care ultrasound platform for artificial intelligence-enabled assessment of pulmonary B-lines}}, url = {{http://dx.doi.org/10.1186/s12947-025-00338-2}}, doi = {{10.1186/s12947-025-00338-2}}, volume = {{23}}, year = {{2025}}, }