LANGUAGE English
SOURCE APCC 2015, Japan, Oct. 1-3,2015.
Published Date:2015-10
ABSTRACT
User behaviour analysis based on traffic log in wireless networks can be beneficial to many fields in real life: not only for commercial purposes, but also for improving network service quality and social management. We cluster users into groups marked by the most frequently visited websites to find their preferences. In this paper, we propose a user behaviour model based on Topic Model from document classification problems. We use the logarithmic TF-IDF (term frequency - inverse document frequency) weighing to form a high-dimensional sparse feature matrix. Then we apply LSA (Latent semantic analysis) to deduce the latent topic distribution and generate a low-dimensional dense feature matrix. K-means++, which is a classic clustering algorithm, is then applied to the dense feature matrix and several interpretable user clusters are found. Moreover, by combining the clustering results with additional demographical information, including age, gender, and financial information, we are able to uncover more realistic implications from the clustering results. Keywords—traffic log, user behaviour modeling, clustering analysis, topic model.