Most studies on the Spam Classification Models have focused on the enhancement of estimation capability assuming the correct training data set is given. However, most of training data are not really generated by the real email receivers. To explore the effects of Real Email Reviewers (RER) and variance between Third Party Reviewers (TPR), we define the types of errors caused in the personalized spam classification model. We experimented with the ham and spam mail data classified by the RER and three groups of TPRs. We found that the discrepancy is as big as 38% between RER and TPR, and the inconsistency is potentially 56% between TPRs. We have also tested using the LSTM model outcomes, and the result is similar. These results imply that the errors relevant to the real email reviewers should be taken into consideration when we develop a personalized AI model for spam classification.
Wang, Fengyao; Lee, Jae Kyu; Huang, Qi; and Dong, Xinpei, "Types of Errors in Personalized Spam Mail Classification" (2022). PACIS 2022 Proceedings. 342.
When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.