Academic Projects

Generic placeholder thumbnail

Assignments

Solving the eight puzzle using Iterative Deepening Search.
Following picture is taken from the University of Minnesota website.

Generic placeholder thumbnail

Sudoku Solving using constraint satisfaction using AC3.

Generic placeholder thumbnail

Generating texts using unigram, bigram and trigram Markov generators.

Generic placeholder thumbnail

Shortest path algorithm using Djikstra and A-star search.

Generic placeholder thumbnail

Spam filtering with Naive Bayes.

Problem Statement
In this assignment, we were provided a standard dataset of 2000 labeled emails with equal number of spam and ham mails. We were also provided a labeled validation set of 400 emails again with equal number of spam and ham mails. The objective of the project was to implement a standard Naive Bayes algorithm for spam classification with any improvisation in the feature extraction and smoothing parameters.

Algorithm and Result
The final model chosen is a simple unigram model which was tokenized by splitting by spaces without any stop word but with header removal, and without any case adjustment or lemmatization but with a manually selected smoothing (Lidstone Correction). As shown in the results, this simple fast model was able to achieve an accuracy of 99% on the dev set and an accuracy of 99.1% on cross validation. The in sample training error for Spam and Ham was 0%.

Generic placeholder thumbnail