Naive Bayes Classification Algorithm

Eda Kılıçaslan
3 min readJun 18, 2021

--

Naive bayes algorithm is one of the classification algorithms which is based on Bayes theorem. If I need to describe Bayes theorem, it depicts the relationship between conditional probability and marginal probability within probability distribution for a random variable.

Image 1
  1. P(A|B) = Probability of A given B had happened
  2. P(B|A) = Probability of B given A had happened
  3. P(A) = Probability of A
  4. P(B) = Probability of B

Naive Bayes algorithm can work unbalanced dataset and it does classification by calculating probability of every situation for every variable and takes the highest probability score.

Naive Bayes Types

  1. GaussianNB : If your datas to be predicted are continuous as real or decimal, you can use it.
  2. BernoulliNB : If your datas to be predicted are binary as 1 or 0, yes or no, you can use it.
  3. MultinomialNB : If your datas to be predicted are nominal as integer numbers or multiclassified categories, you can use it.

I want to talk about advantages and disadvantages of Naive Bayes algorithm.

Advantages

  1. It performs better than Logistic Regression because every feature is accepted as a independent
  2. It processes less data and easy and simple to understand.
  3. It can work continuous and discrete datas.

Disadvantages

  1. Relationships between variables cannot be modeled because every features are independent as i wrote the advantages part.
  2. In Naive Bayes problems, sometimes we have not the sample we want in the dataset and we called this situation ‘Zero Probability’. In this situation, zero probability makes the result zero in every process. To fix this problem, we can give all of datas minimum value.

Python Applications

First I imported some libraries that I will use, i looked short looking my dataset.

Then I splitted dependent and independent variables.

Then I imported sklearn library and some functions to split the variables as train and test. After predicting the confusion matrix gave me this result. The model predicted well except two variables.

The model success is 48 / 50 = 0.96

Thank you for reading :)

Eda

--

--