Abstract

Creation of web wrappers is a subject of study in the field of web data extraction. Designing a domain-specific language for a web wrapper is a challenging task, because it introduces trade-offs between expressiveness of a wrapper’s language and safety. In addition, little attention has been paid to execution of a wrapper in a restricted environment.In this paper we present a new wrapping language -- Serrano -- that has three goals: (1) ability to run in a restricted environment, such as a browser extension, (2) extensibility to balance the tradeoffs between expressiveness of a command set and safety, and (3) processing capabilities to eliminate the need for additional programs to clean the extracted data. Serrano has been successfully deployed in a number of projects and provided encouraging results.

Recommended Citation

Novella, T. & Holubova, I. (2017). User-friendly and Extensible Web Data Extraction. In Paspallis, N., Raspopoulos, M. Barry, M. Lang, H. Linger, & C. Schneider (Eds.), Information Systems Development: Advances in Methods, Tools and Management (ISD2017 Proceedings). Larnaca, Cyprus: University of Central Lancashire Cyprus. ISBN: 978-9963-2288-3-6. http://aisel.aisnet.org/isd2014/proceedings2017/ISDMethodologies/13.

Paper Type

Event

Share

COinS
 

User-friendly and Extensible Web Data Extraction

Creation of web wrappers is a subject of study in the field of web data extraction. Designing a domain-specific language for a web wrapper is a challenging task, because it introduces trade-offs between expressiveness of a wrapper’s language and safety. In addition, little attention has been paid to execution of a wrapper in a restricted environment.In this paper we present a new wrapping language -- Serrano -- that has three goals: (1) ability to run in a restricted environment, such as a browser extension, (2) extensibility to balance the tradeoffs between expressiveness of a command set and safety, and (3) processing capabilities to eliminate the need for additional programs to clean the extracted data. Serrano has been successfully deployed in a number of projects and provided encouraging results.