Skip to the content.

Restaurant-Review-Classifier

This is a simple NLP project based on the NLP section of A-Z Machine Learning Course on Udemy

Steps I folllowed:

Steps Description

Importing the libraries and dataset

  1. Import numpy,pandas and matplotlib.pylot
  2. Import the dataset Restaurant_Reviews.tsv in your editor.We are using .tsv file here as we want to seprate the words by spaces. We always use .csv(comma seperated value) file as our dataset. But comma can appear in reviews ,as a result we can get error in separating columns if we use comma as delimiter.So,we use tab for sepation that’s why we will import .tsv

Text cleaning or pre-processing

  1. Remove Punctuations, Numbers: Punctuations, Numbers doesn’t help much in processong the given text, if included, they will just increase the size of bag of words that we will create as last step and decrase the efficency of algorithm.
  2. Stemming: Take roots of the word
  3. Convert each word into its lower case: For example, it useless to have same words in different cases (eg ‘good’ and ‘GOOD’).

Tokenization

Creating bag of words model

  1. Take all the different words of reviews in the dataset without repeating of words.
  2. One column for each word, therefore there are going to be many columns.
  3. Rows are reviews
  4. If word is there in row of dataset of reviews, then the count of word will be there in row of bag of words under the column of the word.

For this purpose we need CountVectorizer class from sklearn.feature_extraction.text.

Splitting Dataset into training set and test set

Fitting to a predictive model

Predicting final result

Accuracy Measurement:

Deploying it in heroku:

Then, I have deployed it in heroku platform using this github repository which you can visit at restaurantreviewclassifier

** If you want to deploy this in your heroku account,make sure you have all the files uploded in this git repo in your github repository **