Start Date
12-17-2013
Description
Health information systems have greatly increased availability of medical documents and benefited healthcare management and research. However, there are growing concerns about privacy in sharing medical documents. Existing approaches for privacy-preserving data sharing deal mostly with structured data. Current privacy techniques for unstructured medical text focus on detection and removal of patient identifiers from the text, which may be inadequate for preserving privacy and data utility. We propose a novel framework to extract, cluster, de-identify and anonymize patient medical documents. Our framework integrates the approaches developed in both data privacy and health informatics fields. The key novel elements of this framework include (i) a meta-learning approach to extract personal and health information from documents; (ii) a recursive partitioning method to cluster patient documents by medical concept; and (iii) a cluster-level value-enumeration method for anonymization. A prototype system has been implemented and evaluated to demonstrate the effectiveness of our proposed framework.
Recommended Citation
Li, Xiao-Bai and Qin, Jialun, "A Framework for Privacy-Preserving Medical Document Sharing" (2013). ICIS 2013 Proceedings. 5.
https://aisel.aisnet.org/icis2013/proceedings/HealthcareIS/5
A Framework for Privacy-Preserving Medical Document Sharing
Health information systems have greatly increased availability of medical documents and benefited healthcare management and research. However, there are growing concerns about privacy in sharing medical documents. Existing approaches for privacy-preserving data sharing deal mostly with structured data. Current privacy techniques for unstructured medical text focus on detection and removal of patient identifiers from the text, which may be inadequate for preserving privacy and data utility. We propose a novel framework to extract, cluster, de-identify and anonymize patient medical documents. Our framework integrates the approaches developed in both data privacy and health informatics fields. The key novel elements of this framework include (i) a meta-learning approach to extract personal and health information from documents; (ii) a recursive partitioning method to cluster patient documents by medical concept; and (iii) a cluster-level value-enumeration method for anonymization. A prototype system has been implemented and evaluated to demonstrate the effectiveness of our proposed framework.