Welcome to another post of my Pandas2PostgreSQL (and vice-versa) series!
So far, I focused on how to upload dataframes to PostgreSQL tables. This time, it’s the other way around: this post will show you how to get a Pandas dataframe from a PostgreSQL table using Psycopg2.
Step 1: Connect to the database
import psycopg2
import pandas as pd
# Connection parameters, yours will be different
param_dic = {
"host" : "localhost",
"database" : "globaldata",
"user" : "myuser",
"password" : "Passw0rd"
}
def connect(params_dic):
""" Connect to the PostgreSQL database server """
conn = None
try:
# connect to the PostgreSQL server
print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(**params_dic)
except (Exception, psycopg2.DatabaseError) as error:
print(error)
sys.exit(1)
print("Connection successful")
return conn
Step 2: Get your dataframe
def postgresql_to_dataframe(conn, select_query, column_names):
"""
Tranform a SELECT query into a pandas dataframe
"""
cursor = conn.cursor()
try:
cursor.execute(select_query)
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
cursor.close()
return 1
# Naturally we get a list of tupples
tupples = cursor.fetchall()
cursor.close()
# We just need to turn it into a pandas dataframe
df = pd.DataFrame(tupples, columns=column_names)
return df
Step 3: Demo
First, this is how the MonthlyTemp table looks like
>>> SELECT * FROM MonthlyTemp id | source | datetime | mean_temp --------+---------+------------+----------- 860897 | GCAG | 2016-12-06 | 0.7895 860898 | GISTEMP | 2016-12-06 | 0.81 860899 | GCAG | 2016-11-06 | 0.7504 860900 | GISTEMP | 2016-11-06 | 0.93 860901 | GCAG | 2016-10-06 | 0.7292
Demo 1: keeping the original column names
# Connect to the database conn = connect(param_dic) column_names = ["id", "source", "datetime", "mean_temp"] # Execute the "SELECT *" query df = postgresql_to_dataframe(conn, "select * from MonthlyTemp", column_names) df.head()
| id | source | datetime | mean_temp | |
|---|---|---|---|---|
| 0 | 860897 | GCAG | 2016-12-06 | 0.7895 |
| 1 | 860898 | GISTEMP | 2016-12-06 | 0.8100 |
| 2 | 860899 | GCAG | 2016-11-06 | 0.7504 |
| 3 | 860900 | GISTEMP | 2016-11-06 | 0.9300 |
| 4 | 860901 | GCAG | 2016-10-06 | 0.7292 |
Demo 2: selecting some of the columns only, and changing their names
# Connect to the database conn = connect(param_dic) column_names = ["timestamp", "temperature"] # Execute the "SELECT *" query df = postgresql_to_dataframe(conn, "select datetime, mean_temp from MonthlyTemp", column_names) df.head()
| timestamp | temperature | |
|---|---|---|
| 0 | 2016-12-06 | 0.7895 |
| 1 | 2016-12-06 | 0.8100 |
| 2 | 2016-11-06 | 0.7504 |
| 3 | 2016-11-06 | 0.9300 |
| 4 | 2016-10-06 | 0.7292 |
# Close the connection conn.close()