Hyper-classification framework with heterogeneous web features

Document Type

Conference Proceeding

Publication Title

Business Transformation through Innovation and Knowledge Management: An Academic Perspective - Proceedings of the 14th International Business Information Management Association Conference, IBIMA 2010


Web page classification has now given quite unimaginable facts gathered by different researchers and of course by different algorithms. Classification has grown from its birth - single labeling to now multilabeling. The quest for performance and timely result has also propelled the need for swift switch from supervised to unsupervised approach of web page classification such as demonstrated in the numerous automatic machine learning algorithms available from previous research works. While the numerous facts or results gathered and/or claimed by these classification algorithms remain a varying output depending on the particular approach and classification algorithm adopted, we anticipate the birth of many more of such algorithms as well as their variations, and we propose in this work, a Hyper-Classification Framework that takes in Web page from a given dataset and automatically assigned the best classifying algorithm(s) using geometry features of web page and the combination of multiple web features. We conducted experiments on set of Web pages from Yahoo! Directory which is a renown web taxonomy maintained by human editors of yahoo.com, and our results show the possibilities of improving the performance of existing classifiers as well as the amount of resources consumption over a large scale web taxonomy.

First Page


Last Page


Publication Date


This document is currently not available here.