Assignment Set 11 (Uploaded on November 8, 2013) Deadline date: November 29, 2013 Assignment 1: This assignment is an open ended assignment on a dummy simplistic web search mechanism. You would be supplied with a bunch of text files. You have to preprocess the text files to extract key words from them. You need to (i) drop conjunctions, prepositions, interjections, common verbs, adverbs, etc. from each file. [The more you can get rid the document from such words, the better it is.] (ii) bring each remaining word to its root form, we would term them as keywords, e.g. from words like "smoking", "smoked", you have to get the keyword "smoke". [You have to think how to do it.] (iii) for each such keyword, count the frequency of the occurrence of the keyword in the document. (iv) Normalize the frequencies of the keywords. (v) index the document by the keywords thus obtained. [You can use hashing as a technique.] Next, you would be given a query by the user. Perform steps (i) and (ii) on the query to get a set of keywords. The user can give the query as "OR" or "AND" of the words. If the user wants "OR", then return all the documents that contain occurrences of the keywords. If the user wants "AND", then return all those documents each of which contains occurrences of all the keywords. Also devise a ranking mechanism of the documents corresponding to the query. ------------------------------------------------------------------------ At the top of each of your program files, add the following. If you are writing multi-file programs, then each file should have it. /*------------------------------------------------------------------ Name: Roll Number: Date of Submission: Deadline date: Program description: Acknowledgements: --------------------------------------------------------------------*/