Estimating the risk of insurance fraud based on tonal analysis
(2022) In Master's Theses in Mathematical Sciences MASM02 20221Mathematical Statistics
- Abstract
- Insurance companies utilize various methods for identifying claims that are of potential fraudulent nature. With the ever progressing field of artificial intelligence and machine learning models, great interest can be found within the industry to evaluate the use of new methods that may arise as a result of new advanced models in combination with the rich data that is being gathered. For this end, we decided to evaluate a Long Short-Term Memory (LSTM) - as well as a residual (ResNet) type of neural network, with the purpose of estimating the risk of insurance fraud based on acoustic properties of conversations between customers and company representatives.
Furthermore, we drew a connection between identifying conversations that regard... (More) - Insurance companies utilize various methods for identifying claims that are of potential fraudulent nature. With the ever progressing field of artificial intelligence and machine learning models, great interest can be found within the industry to evaluate the use of new methods that may arise as a result of new advanced models in combination with the rich data that is being gathered. For this end, we decided to evaluate a Long Short-Term Memory (LSTM) - as well as a residual (ResNet) type of neural network, with the purpose of estimating the risk of insurance fraud based on acoustic properties of conversations between customers and company representatives.
Furthermore, we drew a connection between identifying conversations that regard fraudulent claims and detecting deceptive speech. With this connection in mind, we simulated data representing deceptive speech by artificially altering the pitch and used it to evaluate four types of acoustic features: Filter bank energies, cepstral coefficients, mel-frequency filter bank energies, and mel- frequency cepstral coefficients (MFCC).
We found that a LSTM model could be viable with either feature tried. Additionally, we found that the filter bank energies yielded the best performance and it did so on the grounds of having been computed over a multitaper spectrogram.
We did not find any combination of model and feature that could generalize results from training data onto data used for validation with respect to real conversations between customers and company representatives. (Less) - Popular Abstract
- Trying to identify insurance claims that are invalid or even fraudulent is of great importance for insurance companies. Actually succeeding in doing so may not always be straight forward. Insurance companies use different methods for investigating claims that might be of fraudulent nature, some of which includes speaking directly to the customer. Some may argue that it would be considered valuable for the insurance companies to be assisted in such conversations by artificial intelligence that may be able to hint at whether or not what is being said is typical for fraudulent claims. Such an artificial intelligence could of course be considered beneficial within related areas as well, such as in criminal investigations.
The idea with this... (More) - Trying to identify insurance claims that are invalid or even fraudulent is of great importance for insurance companies. Actually succeeding in doing so may not always be straight forward. Insurance companies use different methods for investigating claims that might be of fraudulent nature, some of which includes speaking directly to the customer. Some may argue that it would be considered valuable for the insurance companies to be assisted in such conversations by artificial intelligence that may be able to hint at whether or not what is being said is typical for fraudulent claims. Such an artificial intelligence could of course be considered beneficial within related areas as well, such as in criminal investigations.
The idea with this project is to look into the possibility of constructing a model that can be used for the aforementioned purpose. We look at fields that we consider related to the task, namely speaker recognition and lie detection. The reason for considering speaker recognition is that we are taking the approach of modelling acoustic properties of the customers speech, as opposed to the content of what is being said. From science on lie detection we know that when someone is being deceptive, the pitch in their voice is increasing. This knowledge is used in the sense that we therefore simulate pitch alterations in ordinary speech and evaluate various types of acoustic features as input to models trying to identify the simulated alterations. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9097486
- author
- Steneld, Henrik LU
- supervisor
- organization
- course
- MASM02 20221
- year
- 2022
- type
- H2 - Master's Degree (Two Years)
- subject
- keywords
- Spectral analysis, Speaker recognition, Tonal analysis, Speaker Diarization, Machine Learning, LSTM, ResNet, Fraud detection
- publication/series
- Master's Theses in Mathematical Sciences
- report number
- LUNFMS-3112-2022
- ISSN
- 1404-6342
- other publication id
- 2022:E53
- language
- English
- id
- 9097486
- date added to LUP
- 2022-08-17 16:38:57
- date last changed
- 2022-08-18 14:23:23
@misc{9097486, abstract = {{Insurance companies utilize various methods for identifying claims that are of potential fraudulent nature. With the ever progressing field of artificial intelligence and machine learning models, great interest can be found within the industry to evaluate the use of new methods that may arise as a result of new advanced models in combination with the rich data that is being gathered. For this end, we decided to evaluate a Long Short-Term Memory (LSTM) - as well as a residual (ResNet) type of neural network, with the purpose of estimating the risk of insurance fraud based on acoustic properties of conversations between customers and company representatives. Furthermore, we drew a connection between identifying conversations that regard fraudulent claims and detecting deceptive speech. With this connection in mind, we simulated data representing deceptive speech by artificially altering the pitch and used it to evaluate four types of acoustic features: Filter bank energies, cepstral coefficients, mel-frequency filter bank energies, and mel- frequency cepstral coefficients (MFCC). We found that a LSTM model could be viable with either feature tried. Additionally, we found that the filter bank energies yielded the best performance and it did so on the grounds of having been computed over a multitaper spectrogram. We did not find any combination of model and feature that could generalize results from training data onto data used for validation with respect to real conversations between customers and company representatives.}}, author = {{Steneld, Henrik}}, issn = {{1404-6342}}, language = {{eng}}, note = {{Student Paper}}, series = {{Master's Theses in Mathematical Sciences}}, title = {{Estimating the risk of insurance fraud based on tonal analysis}}, year = {{2022}}, }