Open Access Open Access  Restricted Access Subscription or Fee Access

An Intelligent Search Engine for Extracting Documents Relevant to Poorly Defined Criteria

Magda B. Fayek, Hatem M. El-Boghdadi, Mohamed A. Gawad

Abstract


Information retrieval (IR) deals with the representation, storage, organization and access to information items. Often users’ queries to search engines are not well formulated and hence donot express what the user is searching for exactly. Such poorly defined criteria result in the retrieval of documents that donot exactly meet user expectations. Many attempts have been made for refining document retrieval based on interaction with user. Mostly, those attempts provide the user with functionalities for editing queries and marking documents. To many users these functionalities are too complicated and hence users hardly use them. In this paper we present an intelligent search engine that targets those poorly defined queries and interactively helps users fine tune their search. The user merely specifies those documents among initially retrieved documents that are most relevant to his request. Then the system makes use of users’ relevance feedback in response to initial search results and automatically updates the search criteria initially submitted by the user. The search results are then updated to improve the selection of documents retrieved. The system adopts RBIR (Ranked Boolean IR), which is a modified Boolean model that estimates document relevance using keyword weights to rank search results. Its accuracy is comparable with Vector Space, while keeping processing overhead low. Results show that a remarkable improvement in precision is achieved already at the first iteration after relevance feedback, especially at very poor criteria and low recall. As recall rate increases the improvement in precision drops, however improvement remains even at a recall rate of 100%. Generally, the average performance of RBIR with relevance feedback is always better than vector space and RBIR. The average improvement ranges between 12% and 60% relative to vector space and 32% and 25% relative to RBIR at low recall rates. As queries become less definitive the enhancement is more profound.

Keywords


Boolean IR Model, IR Evaluation, Relevance Feedback, Recall-Precision Measure, Vector Space Model.

Full Text:

PDF

References


A. Singhal, “ Modern information retrieval: A brief overview.”, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering. 2001, 24 (4): 35–43.

G. Salton, “Introduction to modern information retrieval. New York: McGraw Hill, 1983.

M .AbdelGawad, H. M. El-Boghdadi, M. B. Fayek. On the ranking of information retrieval boolean model. Proceedings of the 25th ISCA International Conference on Computers and their Applications, 2010, Honolulu, Hawaii, USA.

AbdelGawad M, El-Boghdadi HM, Fayek MB. “Information retrieval and text categorization using ranked boolean information retrieval model”, MSC Thesis. Cairo University. 2010.

R. B. Yates, B. R. Neto, ”Modern information retrieval” Addison Wesley Longman Publishing Co. Inc, 1999.

P. Castells, M. Fernandez, D. Vallet. “An adaptation of the vector-space model for ontology-based information retrieval”, IEEE Transactions on Knowledge and Data Engineering (TKDE) 2006, 19 (2): 261-272.

L. A. Paris, H. R. Tibbo, “Freestyle vs. boolean: A comparison of partial and exact match retrieval”, Systems Information Processing and Management.1998, 34(2-3): 175-190

Vagelis Hristidis, Yuheng Hu, Panagiotis G. Ipeirotis, “Ranked queries over sources with boolean query interfaces without ranking support”, IEEE 26th conference on Data Engineering (ICDE 2010), 1-6 March 2010, Long Beach, CA, Print ISBN: 978-1-4244-5445-7, page: 872 – 875.

Wang, J., “On interactive Data Mining” (Ed.) Encyclopedia of Data Warehousing and Mining, 2nd edition, 1085-1090, Idea Group Inc., 2008.

Chen, Q., X. Wu, and X. Zhu. “OIDM: Online interactive data mining”, in Proceedings of the 17th International Conference on Industrial & Engineering Applications of Artificial Intelligence & Expert Systems. 2004. Ottowa, Canada.

W. M. Shaw, J. B. Wood, R. E. Wood, H. R. Tibbo, “The cystic fibrosis database content and research opportunities”, Library and Information Science Research. 1991, 13(4): 347-366.

D. Grossman, O. Frieder, “Information retrieval algorithm and heuristics”, The Springer International Series in Engineering and Computer Science, 2004.

G. Salton, M. E. Lesk, “Computer evaluation of indexing and text processing”, Journal of the ACM (JACM). 1968, 15(1): 8-36.

G. Salton, “The smart retrieval system-experiments in automatic document processing”, Prentice Hall Inc, 1971.

E. D. Liddy, “Document retrieval”, Automatic encyclopedia of language and linguistics. 2005.

W. M. Shaw, “Retrieval expectations cluster-based effectiveness and performance standards in the CF database”, Information Processing and Management. 1994, 30(5): 711-723.

D. Vallet, M. Fernandez, P. Castells, “An ontology based information retrieval model” The Semantic Web: Research and Application. 2005, Springer: 455-470.

Mohamed A.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.