Abstract

Public broadcasters find themselves in a difficult situation when it comes to digital offers. In more and more use cases, metadata is needed, e.g. to allow radio editors to search for content pieces, to set up content-based recommendation services, to allow users to browse by categories or tags, or to optimize content for search engines. They are in need of proper metadata to manage digital products and to offer new and timely services. Public broadcasters often have their own taxonomy of keywords at hand. The manual distilling of metadata in particular in form of keywords may however become a bottleneck in operation, whereas automatic keyword generation does not always provide the desired accuracy and also requires continuous human effort for training classifiers and controlling the accuracy. Building upon more recent techniques of word embedding we present a novel approach to assign keywords from a taxonomy to documents on the basis of distributed representation of words and documents that does not require annotation by human experts and evaluate it with a large dataset of a German nation-wide broadcaster. Preliminary results are promising that keywords can be automatically generated in an unsupervised way in the public radio sector.

Share

COinS