Evaluate classification models using F1 score.
F1 Score

F1 Score
Evaluation metric for classification algorithms

• F1 score combines precision and recall relative to a specific positive class -The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst at 0
• F1 Score Documentation
In [28]:
# FORMULA
# F1 = 2 * (precision * recall) / (precision + recall)

In [8]:
# imports
import pandas as pd

path = 'titanic_data.csv'


Out[8]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.25 NaN S
In [9]:
# only store numeric data in features
X = X._get_numeric_data()

Out[9]:
PassengerId Survived Pclass Age SibSp Parch Fare
0 1 0 3 22 1 0 7.25
In [11]:
# create response vector y
y = X.Survived

Out[11]:
0    0
1    1
2    1
Name: Survived, dtype: int64
In [21]:
# delete 'Survived', the response vector (Series)
X.drop('Survived', axis=1, inplace=True)

# we drop age for the sake of this example because it contains NaN in some examples
X.drop('Age', axis=1, inplace=True)

In [22]:
# check delete

Out[22]:
PassengerId Pclass SibSp Parch Fare
0 1 3 1 0 7.2500
1 2 1 1 0 71.2833
2 3 3 0 0 7.9250
3 4 1 1 0 53.1000
4 5 3 0 0 8.0500
In [23]:
# imports for classifiers and metrics
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import f1_score

In [24]:
# train/test split
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

In [25]:
# Decision Tree Classifier

# instantiate
dtc = DecisionTreeClassifier()

# fit
dtc.fit(X_train, y_train)

# predict
y_pred = dtc.predict(X_test)

# f1 score
score = f1_score(y_pred, y_test)

# print
print "Decision Tree F1 score: {:.2f}".format(score)

Decision Tree F1 score: 0.55

In [27]:
# Gaussian Naive Bayes

# instantiate
gnb = GaussianNB()

# fit
gnb.fit(X_train, y_train)

# predict
y_pred_2 = gnb.predict(X_test)

# f1 score
score_2 = f1_score(y_pred_2, y_test)

# print
print "GaussianNB F1 score: {: .2f}".format(score_2)

GaussianNB F1 score:  0.53

Tags: