Analysis and Evaluation of Measures of Retrieval Performance

Award:		NSF IIS-0534482
PI:		Javed A. Aslam
Institution:		Northeastern University

Summary

Search engines and other information retrieval technologies are critical in the digital age. The goal of this project is to develop novel paradigms for analyzing and efficiently evaluating retrieval performance, with an eye toward fostering and enabling research leading to better search engines and other retrieval technologies.

Two novel frameworks are proposed: (1) an information-theoretic framework within which one can quantifiably assess what various measures of retrieval performance are measuring and (2) a statistical framework within which one can efficiently estimate these measures of retrieval performance. The former provides a theoretical underpinning for retrieval evaluation and analysis; the latter provides a practical methodology for efficiently evaluating search engines on a large scale. Each will foster and enable research leading to better search engines and search technology.

Current Personnel

Javed A. Aslam (PI)
Virgil Pavlu (post-doctoral student)
Keshi Dai (graduate studnet)
Stefan Savev (graduate student)

Former Personnel

Evangelos Kanoulas (now a post-doc at the University of Sheffield, UK)
Emine Yilmaz (now at Microsoft Research, Cambridge UK)
Alan Feuer (founded Blossom Software)
Olen Zubaryeva (pursuing a PhD in Switzerland)
Carlos Rei

Publications

Implementing and Evaluating Phrasal Query Suggestions for Proximity Search
Information Systems, 34(8):711-723, December 2009.
- publisher's link
- bibliographic info
Empirical Justification of the Gain and Discount Function for nDCG
In Proceedings of the Eighteenth ACM Conference on Information and Knowledge Management (CIKM), pages 611-620. ACM Press, November 2009.
- publisher's link
- bibliographic info
Modeling the Score Distributions of Relevant and Non-relevant Documents
In Proceedings of the 3rd International Conference on Theory in Information Retrieval (ICTIR), pages 152-163. Lecture Notes in Computer Science, Vol. 5766. Springer, September 2009.
- publisher's link
- bibliographic info
Document Selection Methodologies for Efficient and Effective Learning-to-rank
In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 468-475. ACM Press, July 2009.
- publisher's link
- bibliographic info
If I Had a Million Queries
In Advances in Information Retrieval: 31st European Conference on IR Research (ECIR), pages 288-300. Lecture Notes in Computer Science, Vol. 5478. Springer-Verlag, April 2009.
- publisher's link
- bibliographic info
Million Query Track 2007 Overview
In The Sixteenth Text REtrieval Conference Proceedings (TREC 2007), pages 85-104. National Institute of Standards and Technology, December 2008. NIST Special Publication SP 500-274.
- publisher's link
- bibliographic info
The Hedge Algorithm for Metasearch at TREC 2007
In The Sixteenth Text REtrieval Conference Proceedings (TREC 2007). National Institute of Standards and Technology, December 2008. NIST Special Publication SP 500-274.
- publisher's link
- bibliographic info
Estimating Average Precision When Judgments are Incomplete
Knowledge and Information Systems, 16(2):173-211, August 2008.
- publisher's link
- bibliographic info
Evaluation Over Thousands of Queries
In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 651-658. ACM Press, July 2008.
- publisher's link
- bibliographic info
A New Rank Correlation Coefficient for Information Retrieval
In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 587-594. ACM Press, July 2008.
- publisher's link
- bibliographic info
A Simple and Efficient Sampling Method for Estimating AP and NDCG
In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 603-610. ACM Press, July 2008.
- publisher's link
- bibliographic info
Empirical Justification of the Discount Function for nDCG [abstract]
In Proceedings of the SIGIR 2008 Workshop: Beyond Binary Relevance: Preferences, Diversity and Set-Level Judgments, page 6. July 2008.
- publisher's link
- bibliographic info
Inferring Document Relevance from Incomplete Information
In Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management (CIKM), pages 633-642. November 2007.
- publisher's link
- bibliographic info
Evaluation of Phrasal Query Suggestions
In Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management (CIKM), pages 841-848. November 2007.
- publisher's link
- bibliographic info
Million Query Track 2007 Overview
In The Sixteenth Text REtrieval Conference Proceedings (TREC 2007). National Institute of Standards and Technology, November 2007. NIST Special Publication SP 500-274.
- publisher's link
- bibliographic info
Estimating Average Precision When Judgments are Incomplete
Knowledge and Information Systems, August 2007.
- publisher's link
- bibliographic info
The Hedge Algorithm for Metasearch at TREC 2006
In The Fifteenth Text REtrieval Conference Proceedings (TREC 2006). National Institute of Standards and Technology, September 2007. NIST Special Publication SP 500-272.
- publisher's link
- bibliographic info
Query Hardness Estimation Using Jensen-Shannon Divergence Among Multiple Scoring Functions
In Advances in Information Retrieval: 28th European Conference on IR Research (ECIR 2007), pages 198-209. Lecture Notes in Computer Science, Vol. 4425. Springer-Verlag, 2007.
- bibliographic info
Estimating Average Precision with Incomplete and Imperfect Judgments
In Proceedings of the Fifteenth ACM International Conference on Information and Knowledge Management (CIKM), pages 102-111. ACM Press, November 2006.
- publisher's link
- bibliographic info
A Statistical Method for System Evaluation Using Incomplete Judgments
In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 541-548. ACM Press, August 2006.
- publisher's link
- bibliographic info
Inferring Document Relevance via Average Precision
In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 601-602. ACM Press, August 2006.
- publisher's link
- bibliographic info

statAP at TREC

Million Query Track
- Information (2009)
- Overview (2008)
- Overview (2007)

infAP at TREC

TRECVID
- Overview (2009)
- Overview (2008)
- Overview (2007)
- Overview (2006)
- Experiments (2006)

Terabyte Track
- Overview (2006)

Acknowledgment and Disclaimer

This material is based upon work supported by the National Science Foundation under Grant No. IIS-0534482. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).