Start Date

12-17-2013

Description

Health information systems have greatly increased availability of medical documents and benefited healthcare management and research. However, there are growing concerns about privacy in sharing medical documents. Existing approaches for privacy-preserving data sharing deal mostly with structured data. Current privacy techniques for unstructured medical text focus on detection and removal of patient identifiers from the text, which may be inadequate for preserving privacy and data utility. We propose a novel framework to extract, cluster, de-identify and anonymize patient medical documents. Our framework integrates the approaches developed in both data privacy and health informatics fields. The key novel elements of this framework include (i) a meta-learning approach to extract personal and health information from documents; (ii) a recursive partitioning method to cluster patient documents by medical concept; and (iii) a cluster-level value-enumeration method for anonymization. A prototype system has been implemented and evaluated to demonstrate the effectiveness of our proposed framework.

Share

COinS
 
Dec 17th, 12:00 AM

A Framework for Privacy-Preserving Medical Document Sharing

Health information systems have greatly increased availability of medical documents and benefited healthcare management and research. However, there are growing concerns about privacy in sharing medical documents. Existing approaches for privacy-preserving data sharing deal mostly with structured data. Current privacy techniques for unstructured medical text focus on detection and removal of patient identifiers from the text, which may be inadequate for preserving privacy and data utility. We propose a novel framework to extract, cluster, de-identify and anonymize patient medical documents. Our framework integrates the approaches developed in both data privacy and health informatics fields. The key novel elements of this framework include (i) a meta-learning approach to extract personal and health information from documents; (ii) a recursive partitioning method to cluster patient documents by medical concept; and (iii) a cluster-level value-enumeration method for anonymization. A prototype system has been implemented and evaluated to demonstrate the effectiveness of our proposed framework.