Improving Information Retrieval (IR) Across Multiple Languages (Project)

Duration: 2-4 Months

Internship Type: Undergraduate


Project Overview/Background

  • Information retrieval (IR) involves processing and analysing unstructured data for information that is most relevant to a user's query. Multilingual IR aims to retrieve information that across data in multiple languages. As its name suggests, translation is a key problem in multilingual IR. On top of that, multilingual IR also involves another step of merging multiple query results.
  • Collection of test set of IR is also a challenge as it traditionally involves manually tagging the relevance judgement of each document to query. It would be useful to explore methods to evaluate IR without relevance judgement.
  • One means of Information Retrieval (IR) across documents of multiple languages is to rely on  machine translation to search for information across different languages. As such, the accuracy of the translation is an important factor in determining the accuracy of the results retrieved. However, there are other means of multilingual IR that directly compare documents of different languages without using machine translation.

 

Scope & Deliverables

  • Review research literature to understand state of the art multilingual IR
  • Experiment with existing techniques for IR evaluation without relevance judgement, multilingual query expansion and IR
  • Conceive and implement a methodology to score accuracy/precision of multilingual IR techniques
  • Implement and evaluate the IR techniques

 

Prerequisites/Skills Required 

  • Proficient in Python programming and debugging
  • Innovative and willing to learn on the job
  • Good communication and presentation skills
  • Analytical and possess good troubleshooting skills