Active Learning Techniques for Annotation Efficiency in Detecting Coffee Berry Disease

Björklund, Daniel; Amnemyr, Emma

Active Learning Techniques for Annotation Efficiency in Detecting Coffee Berry Disease

Mark

Björklund, Daniel ^LU and Amnemyr, Emma ^LU (2024) In Master's Theses in Mathematical Sciences FMAM05 20241
Mathematics (Faculty of Engineering)

Abstract: Arabica coffee production in Africa has declined significantly over the last half century, partly due to Coffee Berry Disease (CBD), caused by the fungus Colletotrichum kahawae. This disease results in substantial economic losses, estimated at USD 350- 500 million annually. Recent advancements in machine learning (ML) and computer vision offer powerful tools for disease detection. However, annotating data for training ML models is both time-consuming and costly. Active Learning (AL) aims to maximize annotation efficiency by strategically selecting data points for annotation, thereby accelerating model performance improvement. This thesis evaluates the impact of utilizing both strong and weak labels in AL for detecting CBD. Initially, an AL... (More); Arabica coffee production in Africa has declined significantly over the last half century, partly due to Coffee Berry Disease (CBD), caused by the fungus Colletotrichum kahawae. This disease results in substantial economic losses, estimated at USD 350- 500 million annually. Recent advancements in machine learning (ML) and computer vision offer powerful tools for disease detection. However, annotating data for training ML models is both time-consuming and costly. Active Learning (AL) aims to maximize annotation efficiency by strategically selecting data points for annotation, thereby accelerating model performance improvement. This thesis evaluates the impact of utilizing both strong and weak labels in AL for detecting CBD. Initially, an AL framework was implemented, and four query strategies using only strong annotations were developed and evaluated. One of these strategies, ALCU Soft-Rank, showed promise and appeared to outperform the baselines. This strategy was then further developed to determine whether the inclusion of weak labels could enhance the performance. The results indicated that, under the chosen conditions, incorporating weak labels was not beneficial, and the original ALCU Soft-Rank utilizing only strong labels performed best. Further exploration of active learning in this setting, especially using other base models, would be interesting. (Less)
Popular Abstract (Swedish): Active Learning för annotationseffektivitet inom detektion av kaffebärssjukdom

Två veckor och trettitusen bär senare så har jag nu äntligen lyckats rita rutor kring varje bär, allt för att min AI-modell ska bli så bra som möjligt. Blicken börjar sina och det repetitiva arbetet har gjort mig snurrig. Om det ändå hade funnits något bättre sätt tänker jag för mig själv, eller det kanske det gör? Snabbt slänger jag mig över sökmotorn och häpnar till över mina resultat... ”Active Learning?!”

De senaste åren så har AI nått populärmedia mer än någonsin - och alla vill vara med! Trots att modeller är väl utvecklade och ofta finns färdiga att använda så behöver AI-modeller för specifika applikationer ofta justeras en aning. Detta kräver i sin... (More); Active Learning för annotationseffektivitet inom detektion av kaffebärssjukdom

Två veckor och trettitusen bär senare så har jag nu äntligen lyckats rita rutor kring varje bär, allt för att min AI-modell ska bli så bra som möjligt. Blicken börjar sina och det repetitiva arbetet har gjort mig snurrig. Om det ändå hade funnits något bättre sätt tänker jag för mig själv, eller det kanske det gör? Snabbt slänger jag mig över sökmotorn och häpnar till över mina resultat... ”Active Learning?!”

De senaste åren så har AI nått populärmedia mer än någonsin - och alla vill vara med! Trots att modeller är väl utvecklade och ofta finns färdiga att använda så behöver AI-modeller för specifika applikationer ofta justeras en aning. Detta kräver i sin tur att modellerna i sig måste anpassas. För att en AI-modell ska fungera bra så behöver den tränas, precis som vi människor behöver göra om vi vill bli bättre på något. Många nätverk är baserade på så kallade djupa neurala nätverk, vilket faktiskt är en modell inspirerad av hur hjärnan lär sig. Men för att den ska kunna lära sig så behövs mycket, och gärna högkvalitativ information - i vårt fall kallas detta annoterad data. För den specifika applikationen i vårt projekt så rör det sig bland annat om att rita boxar kring bär på bilder av olika faser av kaffebärssjukdom. Detta gör så att modellen förstår vad som är vad. Kaffebärssjukdom (CBD) är en svampsjukdom som påverkar skörden av kaffebönor till ett värde uppemot 300 - 500 miljoner USD varje år i Afrika - och kan med hjälp av kameror och detektion, både spåras och kvantifieras. Genom ett samarbete med RISE och Mpendakazi Agribusiness in Tanzania har vi i vårt projekt fått tillgång till bilder med olika faser av CBD där vi har utvärderat active learning principer på detta dataset. Active learning hjälper oss med den ansträngande situationen som vår stackars annotör fick bemöta i ingressen. Det är ett verktyg som ska förstå i vilken ordning datan ska annoteras för att vår detektionsmodell ska bli så bra som möjligt, så billigt som möjligt, och så snabbt som möjligt. Tankegången är alltså att, om man vill ha en viss nivå av prestanda eller endast har en viss budget att spendera på annotation, så ska man kunna aktivt välja de bästa bären tills vi når någon av dessa mål. Detta utvärderas med två stycken olika styrkor av annotation. Den starka är en box-annotation runt bäret, vilket inkluderar vilken sjukdomsklass som bäret tillhör. Den svaga annotationen är en punktannotation, vilket är en enklare annotation men som inte ger en lika stark signal till modellen. Den har heller inte en klass som säger hur sjukt bäret är till skillnad från den starka annotationen. Vårt resultat visar att det finns värde att hämta när det kommer till att använda sig av active learning för vårt CBD-dataset. Vi kan alltså spara tid på att annotera om vi applicerar active learning principer på datasetet, men fortfarande få en modell som har samma prestanda. Däremot så krävs det fortsatt utveckling för att den svagare punktannotationen kan anses vara värdefull, trots att den är lättare. Vi hoppas att projektet fortsätter att utvecklas vidare mot en färdig produkt och att våra resultat har gett en överblick om active learning och dess potential i detta användningsområde. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9164704

author

Björklund, Daniel ^LU and Amnemyr, Emma ^LU

supervisor

Karl Åström ^LU

organization

Mathematics (Faculty of Engineering)

alternative title

Evaluating the Impact of Strong and Weak Labels for Semi-Supervised Object Detection

course

FMAM05 20241

year

2024

type

H2 - Master's Degree (Two Years)

subject

Technology and Engineering

keywords

active learning, object detection, annotation efficiency, coffee berry disease, semi-supervised learning, computer vision, YOLOv8

publication/series

Master's Theses in Mathematical Sciences

report number

LUTFMA-3551-2024

ISSN

1404-6342

other publication id

2024:E57

language

English

id

9164704

date added to LUP

2024-06-24 11:29:17

date last changed

2024-06-24 11:29:17

@misc{9164704,
  abstract     = {{Arabica coffee production in Africa has declined significantly over the last half century, partly due to Coffee Berry Disease (CBD), caused by the fungus Colletotrichum kahawae. This disease results in substantial economic losses, estimated at USD 350- 500 million annually. Recent advancements in machine learning (ML) and computer vision offer powerful tools for disease detection. However, annotating data for training ML models is both time-consuming and costly. Active Learning (AL) aims to maximize annotation efficiency by strategically selecting data points for annotation, thereby accelerating model performance improvement. This thesis evaluates the impact of utilizing both strong and weak labels in AL for detecting CBD. Initially, an AL framework was implemented, and four query strategies using only strong annotations were developed and evaluated. One of these strategies, ALCU Soft-Rank, showed promise and appeared to outperform the baselines. This strategy was then further developed to determine whether the inclusion of weak labels could enhance the performance. The results indicated that, under the chosen conditions, incorporating weak labels was not beneficial, and the original ALCU Soft-Rank utilizing only strong labels performed best. Further exploration of active learning in this setting, especially using other base models, would be interesting.}},
  author       = {{Björklund, Daniel and Amnemyr, Emma}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{Active Learning Techniques for Annotation Efficiency in Detecting Coffee Berry Disease}},
  year         = {{2024}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Active Learning Techniques for Annotation Efficiency in Detecting Coffee Berry Disease