Assignments
-
Solving the eight puzzle using Iterative Deepening Search.
Following picture is taken from the University of Minnesota
website.
- Sudoku Solving using constraint satisfaction using AC3.
-
Generating texts using unigram, bigram and trigram Markov
generators.
- Shortest path algorithm using Djikstra and A-star search.
-
Spam filtering with Naive Bayes.
Problem Statement
In this assignment, we were provided a standard dataset of 2000
labeled emails with equal number of spam and ham mails. We were
also provided a labeled validation set of 400 emails again with
equal number of spam and ham mails. The objective of the project
was to implement a standard Naive Bayes algorithm for spam
classification with any improvisation in the feature extraction
and smoothing parameters.
Algorithm and Result
The final model chosen is a simple unigram model which was
tokenized by splitting by spaces without any stop word but with
header removal, and without any case adjustment or lemmatization
but with a manually selected smoothing (Lidstone Correction). As
shown in the results, this simple fast model was able to achieve
an accuracy of 99% on the dev set and an accuracy of 99.1% on
cross validation. The in sample training error for Spam and Ham
was 0%.