Abstract

Many organizations maintain identity information for their customers, vendors, and employees, etc. However, identities being compromised cannot be retrieved effectively. In this paper we first present a case study on identity problems existing in a local police department. The study show that more than half of the sampled suspects have altered identities existing in the police information system due to deception and errors. We build a taxonomy of identity problems based on our findings. The decision to determine matching identities involves some uncertainty because of the problems identified. We propose a probability-based multi-layer graphical model to capture the uncertainty. Experiments show that the proposed model performs significantly better than the searching technique based on exact-match. With 20% of training data labeled, the model with semi-supervised learning achieved performance comparable to that of fully supervised learning.

Share

COinS