University of Ottawa at TREC 2010 Web Track: Ranking Web Documents Using Meta-Data.
الباحث الأول:
Falah Al-akashi
الباحثين الآخرين:
Prof Dr Diana Inkpen
المجلة:
Proceedings of The Nineteenth Text REtrieval Conference, TREC 2010, Gaithersburg, Maryland, USA, Nov
تاريخ النشر:
None
مختصر البحث:
This paper describes the details of our participation in the Web track of the TREC 2010. Our approach collects information from the meta data and from some html tags of the web pages. Meta data is often used by the authors to describe the main topic…
This paper describes the details of our participation in the Web track of the TREC 2010. Our approach collects information from the meta data and from some html tags of the web pages. Meta data is often used by the authors to describe the main topic or purpose of the webpage. Usually, the index words are collected from the visible data, but our method is preferable, when the webpage contains only multimedia data, such as images or animations; when the webpage contains only URL links or very little text data (e.g., home pages); when the webpage contains several different topics; when documents are repetitive; when we want to reduce the size of the inverted index; and when the weighting of a webpage depends on specific topics, not on word distribution. We indexed only selected sections of each webpage: “URL”, “Title”, “Meta”, “Header”, and “Alt” fields. The “Header” field is the only visible text field that we used.