To effectively cluster corpus of ordinary documents and digital books
the clustering algorithms based on LDA model and TC_LDA were proposed
respectively.The topic model named TC_LDA
the extension of LDA
is proposed for digital books corpus for jointly topic modeling from both of Texts and Contents.Unlike traditional clustering methods
topic model based methods cluster documents in a group if they share one or more common topics.Empirical evaluation demonstrates that our approach based on topic analysis can substantially improve the clustering results as compared to related methods.