Data Analytics for Business and Societal Challenges

Loading...

Media is loading
 

Paper Number

2423

Paper Type

short

Description

Fraud is a significant issue for insurers. Previous literature has mainly used supervised learning to detect insurance fraud. However, supervised learning must deal with significant difficulties in fraud detection, such as very few cases being labeled as fraud and overfitting to the outcomes of pre-existing fraud detection systems, which can lead to overlooking new fraud patterns. Unsupervised learning methods producing anomaly scores could be a remedy to improve insurance fraud detection systems. However, unsupervised learning must identify anomalies that are conceptionally meaningful for fraud. In this paper, we suggest a theoretical framework for choosing features to include in fraud detection models. We evaluate this framework using isolation forests for anomaly detection based on more than 32,000 automobile insurance claims. We further evaluate textual information based on concepts from deception detection in computational linguistics using straightforward cluster methods and state-of-the-art transformers.

Comments

14-Data

Share

COinS
 
Dec 12th, 12:00 AM

Insurance Fraud and Isolation Forests

Fraud is a significant issue for insurers. Previous literature has mainly used supervised learning to detect insurance fraud. However, supervised learning must deal with significant difficulties in fraud detection, such as very few cases being labeled as fraud and overfitting to the outcomes of pre-existing fraud detection systems, which can lead to overlooking new fraud patterns. Unsupervised learning methods producing anomaly scores could be a remedy to improve insurance fraud detection systems. However, unsupervised learning must identify anomalies that are conceptionally meaningful for fraud. In this paper, we suggest a theoretical framework for choosing features to include in fraud detection models. We evaluate this framework using isolation forests for anomaly detection based on more than 32,000 automobile insurance claims. We further evaluate textual information based on concepts from deception detection in computational linguistics using straightforward cluster methods and state-of-the-art transformers.

When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.