Estimating the risk of insurance fraud based on tonal analysis

Steneld, Henrik

Estimating the risk of insurance fraud based on tonal analysis

Mark

Steneld, Henrik ^LU (2022) In Master's Theses in Mathematical Sciences MASM02 20221
Mathematical Statistics

Abstract: Insurance companies utilize various methods for identifying claims that are of potential fraudulent nature. With the ever progressing field of artificial intelligence and machine learning models, great interest can be found within the industry to evaluate the use of new methods that may arise as a result of new advanced models in combination with the rich data that is being gathered. For this end, we decided to evaluate a Long Short-Term Memory (LSTM) - as well as a residual (ResNet) type of neural network, with the purpose of estimating the risk of insurance fraud based on acoustic properties of conversations between customers and company representatives.

Furthermore, we drew a connection between identifying conversations that regard... (More); Insurance companies utilize various methods for identifying claims that are of potential fraudulent nature. With the ever progressing field of artificial intelligence and machine learning models, great interest can be found within the industry to evaluate the use of new methods that may arise as a result of new advanced models in combination with the rich data that is being gathered. For this end, we decided to evaluate a Long Short-Term Memory (LSTM) - as well as a residual (ResNet) type of neural network, with the purpose of estimating the risk of insurance fraud based on acoustic properties of conversations between customers and company representatives.

Furthermore, we drew a connection between identifying conversations that regard fraudulent claims and detecting deceptive speech. With this connection in mind, we simulated data representing deceptive speech by artificially altering the pitch and used it to evaluate four types of acoustic features: Filter bank energies, cepstral coefficients, mel-frequency filter bank energies, and mel- frequency cepstral coefficients (MFCC).

We found that a LSTM model could be viable with either feature tried. Additionally, we found that the filter bank energies yielded the best performance and it did so on the grounds of having been computed over a multitaper spectrogram.

We did not find any combination of model and feature that could generalize results from training data onto data used for validation with respect to real conversations between customers and company representatives. (Less)
Popular Abstract: Trying to identify insurance claims that are invalid or even fraudulent is of great importance for insurance companies. Actually succeeding in doing so may not always be straight forward. Insurance companies use different methods for investigating claims that might be of fraudulent nature, some of which includes speaking directly to the customer. Some may argue that it would be considered valuable for the insurance companies to be assisted in such conversations by artificial intelligence that may be able to hint at whether or not what is being said is typical for fraudulent claims. Such an artificial intelligence could of course be considered beneficial within related areas as well, such as in criminal investigations.

The idea with this... (More); Trying to identify insurance claims that are invalid or even fraudulent is of great importance for insurance companies. Actually succeeding in doing so may not always be straight forward. Insurance companies use different methods for investigating claims that might be of fraudulent nature, some of which includes speaking directly to the customer. Some may argue that it would be considered valuable for the insurance companies to be assisted in such conversations by artificial intelligence that may be able to hint at whether or not what is being said is typical for fraudulent claims. Such an artificial intelligence could of course be considered beneficial within related areas as well, such as in criminal investigations.

The idea with this project is to look into the possibility of constructing a model that can be used for the aforementioned purpose. We look at fields that we consider related to the task, namely speaker recognition and lie detection. The reason for considering speaker recognition is that we are taking the approach of modelling acoustic properties of the customers speech, as opposed to the content of what is being said. From science on lie detection we know that when someone is being deceptive, the pitch in their voice is increasing. This knowledge is used in the sense that we therefore simulate pitch alterations in ordinary speech and evaluate various types of acoustic features as input to models trying to identify the simulated alterations. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9097486

author

Steneld, Henrik ^LU

supervisor

Maria Sandsten ^LU

organization

Mathematical Statistics

course

MASM02 20221

year

2022

type

H2 - Master's Degree (Two Years)

subject

Mathematics and Statistics

keywords

Spectral analysis, Speaker recognition, Tonal analysis, Speaker Diarization, Machine Learning, LSTM, ResNet, Fraud detection

publication/series

Master's Theses in Mathematical Sciences

report number

LUNFMS-3112-2022

ISSN

1404-6342

other publication id

2022:E53

language

English

id

9097486

date added to LUP

2022-08-17 16:38:57

date last changed

2022-08-18 14:23:23

@misc{9097486,
  abstract     = {{Insurance companies utilize various methods for identifying claims that are of potential fraudulent nature. With the ever progressing field of artificial intelligence and machine learning models, great interest can be found within the industry to evaluate the use of new methods that may arise as a result of new advanced models in combination with the rich data that is being gathered. For this end, we decided to evaluate a Long Short-Term Memory (LSTM) - as well as a residual (ResNet) type of neural network, with the purpose of estimating the risk of insurance fraud based on acoustic properties of conversations between customers and company representatives.

Furthermore, we drew a connection between identifying conversations that regard fraudulent claims and detecting deceptive speech. With this connection in mind, we simulated data representing deceptive speech by artificially altering the pitch and used it to evaluate four types of acoustic features: Filter bank energies, cepstral coefficients, mel-frequency filter bank energies, and mel- frequency cepstral coefficients (MFCC).

We found that a LSTM model could be viable with either feature tried. Additionally, we found that the filter bank energies yielded the best performance and it did so on the grounds of having been computed over a multitaper spectrogram.

We did not find any combination of model and feature that could generalize results from training data onto data used for validation with respect to real conversations between customers and company representatives.}},
  author       = {{Steneld, Henrik}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{Estimating the risk of insurance fraud based on tonal analysis}},
  year         = {{2022}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Estimating the risk of insurance fraud based on tonal analysis