Abstract

Digital library collections usually hold resources describing a limited set of topics spanning a wide range of reading levels, requiring complete reading level metadata to filter relevant resources from the collection. In order to suggest the reading level for all resources in the test collection, we propose an SVM-based classification tool which predicts the specific reading level with an F-Measure of 0.70 for all resources, outperforming other classification methods and readability formulas under evaluation. To measure the impact of reading level metadata completeness on retrieval performance, a knowledge based system retrieves documents from three collections containing different reading level completeness: one with complete reading level information generated by the proposed SVM method, one missing all reading level information, and the final collection containing limited, human-expert provided metadata. The dataset with automatically identified complete reading level exceeds the performance of collection-provided reading level metadata for all five sample tasks.

Share

COinS
 

Improving Access to Digital Library Resources by Automatically Generating Complete Reading Level Metadata

Digital library collections usually hold resources describing a limited set of topics spanning a wide range of reading levels, requiring complete reading level metadata to filter relevant resources from the collection. In order to suggest the reading level for all resources in the test collection, we propose an SVM-based classification tool which predicts the specific reading level with an F-Measure of 0.70 for all resources, outperforming other classification methods and readability formulas under evaluation. To measure the impact of reading level metadata completeness on retrieval performance, a knowledge based system retrieves documents from three collections containing different reading level completeness: one with complete reading level information generated by the proposed SVM method, one missing all reading level information, and the final collection containing limited, human-expert provided metadata. The dataset with automatically identified complete reading level exceeds the performance of collection-provided reading level metadata for all five sample tasks.