Location

Hilton Hawaiian Village, Honolulu, Hawaii

Event Website

https://hicss.hawaii.edu/

Start Date

3-1-2024 12:00 AM

End Date

6-1-2024 12:00 AM

Description

data interpretation. Building such tools requires labeled training datasets. We tested whether a gamified crowdsourcing approach can produce clinical expert-quality lung ultrasound clip labels. 2,384 lung ultrasound clips were retrospectively collected. Six lung ultrasound experts classified 393 of these clips as having no B-lines, one or more discrete B-lines, or confluent B-lines to create two sets of reference standard labels: a training and test set. Sets trained users on a gamified crowdsourcing platform, and compared concordance of the resulting crowd labels to the concordance of individual experts to reference standards, respectively. 99,238 crowdsourced opinions were collected from 426 unique users over 8 days. Mean labeling concordance of individual experts relative to the reference standard was 85.0% ± 2.0 (SEM), compared to 87.9% crowdsourced label concordance (p=0.15). Scalable, high-quality labeling approaches such as crowdsourcing may streamline training dataset creation for machine learning model development.

Share

COinS
 
Jan 3rd, 12:00 AM Jan 6th, 12:00 AM

Expert-quality Dataset Labeling via Gamified Crowdsourcing on Point-of-Care Lung Ultrasound Data

Hilton Hawaiian Village, Honolulu, Hawaii

data interpretation. Building such tools requires labeled training datasets. We tested whether a gamified crowdsourcing approach can produce clinical expert-quality lung ultrasound clip labels. 2,384 lung ultrasound clips were retrospectively collected. Six lung ultrasound experts classified 393 of these clips as having no B-lines, one or more discrete B-lines, or confluent B-lines to create two sets of reference standard labels: a training and test set. Sets trained users on a gamified crowdsourcing platform, and compared concordance of the resulting crowd labels to the concordance of individual experts to reference standards, respectively. 99,238 crowdsourced opinions were collected from 426 unique users over 8 days. Mean labeling concordance of individual experts relative to the reference standard was 85.0% ± 2.0 (SEM), compared to 87.9% crowdsourced label concordance (p=0.15). Scalable, high-quality labeling approaches such as crowdsourcing may streamline training dataset creation for machine learning model development.

https://aisel.aisnet.org/hicss-57/hc/emergency_care/4