Abstract

Discovery of cluster characteristics and interesting rules describing smokers’ clusters and the behavioural patterns of smokers’ quitting intentions is an important task in the development of an effective tobacco control systems. In this paper, we attempt to determine the characteristics of smokers’ clusters and simplified rule for predicting smokers’ quitting behaviour that can provide feedback to build a scientific evidence-based adaptive tobacco control systems. Standard clustering algorithm groups the data based on there inherent pattern. However, they seldom provide human understandable easy description of the clusters’. Again, standard decision tree (SDT) based rule discovery depends on decision boundaries in the feature space. This may limit the ability of SDT to learn intermediate concepts for high dimensional large datasets such as tobacco control. In this paper, we propose a cluster-based rule discovery model (CRDM) that builds conceptual groups from which a set of decision trees (a decision forest) are constructed to find smokers’ quitting rules. We also employ a re-labelling of unsupervised cluster (RLUC) approach to determine the characteristics of the clusters. RLUC approach uses re-labelling and decision tree approach to find the characteristics of the smokers’ clusters. Experimental results on the tobacco control data set show that decision rules from the decision forest constructed by CRDM are simpler and can predict smokers’ quitting intention more accurately than a single decision tree. RLUC approach finds text-based characteristics of the smokers’ clusters which are easily understandable for policy makers in the tobacco control systems.

Share

COinS