Document Type



In the age of E-Business many companies are faced with massive data sets that must be analysed for gaining a competitive edge. These data sets are in many instances incomplete and quite often not of very high quality. Although statistical analysis can be used to pre-process these data sets, this technique has its own limitations. In this paper we are presenting a system – and its underlying model – that can be used to investigate the integrity of existing data and pre-process the data into clearer data sets to be mined. LH5 is a rule -based system, capable of selflearning and is illustrated using a medical data set.