Date Archives

November 2019

Visualize a Decision Tree with Sklearn

Step 1: Install the libraries
sudo apt-get install graphviz

pip install graphviz
pip install pydotplus
pip install sklearn
pip install pydot pip install pandas

Do the imports

import pydotplus
import pandas as pd
from sklearn import tree
from io import StringIO
import pydot
Step 2: Initialize the dataframe
data = [ 
    (0, 5, 0), 
    (1, 6, 0), 
    (2, 7, 1), 
    (3, 8, 1), 
    (4, 9, 1)
]
df = pd.DataFrame(data, index=range(5), columns=['x1','x2','y'])
Step 3: Train the decision tree
x_columns = ['x1','x2']

model = tree.DecisionTreeClassifier()
trained_model = model.fit(df[x_columns], df['y'])
Step 4: Display the decision tree

Two options

Option A: You want to save the decision tree as a file

dotfile = StringIO()

tree.export_graphviz(
    trained_model,  
    out_file        = dotfile,
    feature_names   = x_columns, 
    class_names     = ['[y=0]', '[y=1]'], # Ascending numerical order
    filled          = True,
    rounded         = True
)

(graph,) = pydot.graph_from_dot_data(dotfile.getvalue())
graph.write_png("tree.png")

 

This should generate an image named “tree.png” in your current directory

Option B: You want to display the decision tree in your Jupyter notebook

from IPython.display import Image

out_file = tree.export_graphviz(
    trained_model,
    feature_names   = x_columns,
    class_names     = ['[y=0]', '[y=1]'],# Ascending numerical order
    filled          = True,
    rounded         = True
)
graph = pydotplus.graph_from_dot_data(out_file)
Image(graph.create_png())

In either case this is the tree you should get

 

References:

https://scikit-learn.org/stable/modules/generated/sklearn.tree.export_graphviz.html

 

From Pandas Dataframe To SQL Table using Psycopg2

For a full functioning example, please refer to my Jupyter notebook on GitHub.

 

Step 1: Specify the connection parameters

# Here you want to change your database, username & password according to your own values
param_dic = {
    "host"      : "localhost",
    "database"  : "worldbankdata",
    "user"      : "myuser",
    "password"  : "Passw0rd"
}

 

Step 2: Connect to the database and insert your dataframe one row at the time

import psycopg2
import pandas as pd

def connect(params_dic):
    """ Connect to the PostgreSQL database server """
    conn = None
    try:
        # connect to the PostgreSQL server
        print('Connecting to the PostgreSQL database...')
        conn = psycopg2.connect(**params_dic)

    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
        sys.exit(1) 
    return conn


def single_insert(conn, insert_req):
    """ Execute a single INSERT request """
    cursor = conn.cursor()
    try:
        cursor.execute(insert_req)
        conn.commit()
    except (Exception, psycopg2.DatabaseError) as error:
        print("Error: %s" % error)
        conn.rollback()
        cursor.close()
        return 1
    cursor.close()


# Connecting to the database
conn = connect(param_dic)

# Inserting each row
for i in dataframe.index:

    query = """
    INSERT into emissions(column1, column2, column3) values('%s',%s,%s);
    """ % (dataframe['column1'], dataframe['column2'], dataframe['column3'])
    single_insert(conn, query)

# Close the connection
conn.close()

 

The full working code is available here.