Welcome to another post of my Pandas2PostgreSQL (and vice-versa) series!
So far, I focused on how to upload dataframes to PostgreSQL tables. This time, it’s the other way around: this post will show you how to get a Pandas dataframe from a PostgreSQL table using Psycopg2.
Step 1: Connect to the database
import psycopg2 import pandas as pd # Connection parameters, yours will be different param_dic = { "host" : "localhost", "database" : "globaldata", "user" : "myuser", "password" : "Passw0rd" } def connect(params_dic): """ Connect to the PostgreSQL database server """ conn = None try: # connect to the PostgreSQL server print('Connecting to the PostgreSQL database...') conn = psycopg2.connect(**params_dic) except (Exception, psycopg2.DatabaseError) as error: print(error) sys.exit(1) print("Connection successful") return conn
Step 2: Get your dataframe
def postgresql_to_dataframe(conn, select_query, column_names): """ Tranform a SELECT query into a pandas dataframe """ cursor = conn.cursor() try: cursor.execute(select_query) except (Exception, psycopg2.DatabaseError) as error: print("Error: %s" % error) cursor.close() return 1 # Naturally we get a list of tupples tupples = cursor.fetchall() cursor.close() # We just need to turn it into a pandas dataframe df = pd.DataFrame(tupples, columns=column_names) return df
Step 3: Demo
First, this is how the MonthlyTemp table looks like
>>> SELECT * FROM MonthlyTemp id | source | datetime | mean_temp --------+---------+------------+----------- 860897 | GCAG | 2016-12-06 | 0.7895 860898 | GISTEMP | 2016-12-06 | 0.81 860899 | GCAG | 2016-11-06 | 0.7504 860900 | GISTEMP | 2016-11-06 | 0.93 860901 | GCAG | 2016-10-06 | 0.7292
Demo 1: keeping the original column names
# Connect to the database conn = connect(param_dic) column_names = ["id", "source", "datetime", "mean_temp"] # Execute the "SELECT *" query df = postgresql_to_dataframe(conn, "select * from MonthlyTemp", column_names) df.head()
id | source | datetime | mean_temp | |
---|---|---|---|---|
0 | 860897 | GCAG | 2016-12-06 | 0.7895 |
1 | 860898 | GISTEMP | 2016-12-06 | 0.8100 |
2 | 860899 | GCAG | 2016-11-06 | 0.7504 |
3 | 860900 | GISTEMP | 2016-11-06 | 0.9300 |
4 | 860901 | GCAG | 2016-10-06 | 0.7292 |
Demo 2: selecting some of the columns only, and changing their names
# Connect to the database conn = connect(param_dic) column_names = ["timestamp", "temperature"] # Execute the "SELECT *" query df = postgresql_to_dataframe(conn, "select datetime, mean_temp from MonthlyTemp", column_names) df.head()
timestamp | temperature | |
---|---|---|
0 | 2016-12-06 | 0.7895 |
1 | 2016-12-06 | 0.8100 |
2 | 2016-11-06 | 0.7504 |
3 | 2016-11-06 | 0.9300 |
4 | 2016-10-06 | 0.7292 |
# Close the connection conn.close()