SQLAlchemy ORM conversion to pandas DataFrame

Is there a solution converting a SQLAlchemy <Query object> to a pandas DataFrame?

Pandas has the capability to use pandas.read_sql but this requires use of raw SQL. I have two reasons for wanting to avoid it:

I already have everything using the ORM (a good reason in and of itself) and
I’m using python lists as part of the query, e.g.:

db.session.query(Item).filter(Item.symbol.in_(add_symbols) where Item is my model class and add_symbols is a list). This is the equivalent of SQL SELECT ... from ... WHERE ... IN.

Is anything possible?

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Method 5

Method 6

Method 7

Method 8

Method 9

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Below should work in most cases:

df = pd.read_sql(query.statement, query.session.bind)

See pandas.read_sql documentation for more information on the parameters.

Method 2

Just to make this more clear for novice pandas programmers, here is a concrete example,

pd.read_sql(session.query(Complaint).filter(Complaint.id == 2).statement,session.bind)

Here we select a complaint from complaints table (sqlalchemy model is Complaint) with id = 2

Method 3

For completeness sake: As alternative to the Pandas-function read_sql_query(), you can also use the Pandas-DataFrame-function from_records() to convert a structured or record ndarray to DataFrame.
This comes in handy if you e.g. have already executed the query in SQLAlchemy and have the results already available:

import pandas as pd 
from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import scoped_session, sessionmaker


SQLALCHEMY_DATABASE_URI = 'postgresql://postgres:<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="7a0a15090e1d081f093a1615191b161215090e">[email protected]</a>:5432/my_database'
engine = create_engine(SQLALCHEMY_DATABASE_URI, pool_pre_ping=True, echo=False)
db = scoped_session(sessionmaker(autocommit=False, autoflush=False, bind=engine))
Base = declarative_base(bind=engine)


class Currency(Base):
    """The `Currency`-table"""
    __tablename__ = "currency"
    __table_args__ = {"schema": "data"}

    id = Column(Integer, primary_key=True, nullable=False)
    name = Column(String(64), nullable=False)


# Defining the SQLAlchemy-query
currency_query = db.query(Currency).with_entities(Currency.id, Currency.name)

# Getting all the entries via SQLAlchemy
currencies = currency_query.all()

# We provide also the (alternate) column names and set the index here,
# renaming the column `id` to `currency__id`
df_from_records = pd.DataFrame.from_records(currencies
    , index='currency__id'
    , columns=['currency__id', 'name'])
print(df_from_records.head(5))

# Or getting the entries via Pandas instead of SQLAlchemy using the
# aforementioned function `read_sql_query()`. We can set the index-columns here as well
df_from_query = pd.read_sql_query(currency_query.statement, db.bind, index_col='id')
# Renaming the index-column(s) from `id` to `currency__id` needs another statement
df_from_query.index.rename(name='currency__id', inplace=True)
print(df_from_query.head(5))

Method 4

The selected solution didn’t work for me, as I kept getting the error

AttributeError: ‘AnnotatedSelect’ object has no attribute ‘lower’

I found the following worked:

df = pd.read_sql_query(query.statement, engine)

Method 5

from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

engine = create_engine('postgresql://postgres:<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3a4a55494e5d485f497a5655595b565255494e">[email protected]</a>:5432/DB', echo=False)
Base = declarative_base(bind=engine)
Session = sessionmaker(bind=engine)
session = Session()

conn = session.bind

class DailyTrendsTable(Base):

    __tablename__ = 'trends'
    __table_args__ = ({"schema": 'mf_analysis'})

    company_code = Column(DOUBLE_PRECISION, primary_key=True)
    rt_bullish_trending = Column(Integer)
    rt_bearish_trending = Column(Integer)
    rt_bullish_non_trending = Column(Integer)
    rt_bearish_non_trending = Column(Integer)
    gen_date = Column(Date, primary_key=True)

df_query = select([DailyTrendsTable])

df_data = pd.read_sql(rt_daily_query, con = conn)

Method 6

If you want to compile a query with parameters and dialect specific arguments, use something like this:

c = query.statement.compile(query.session.bind)
df = pandas.read_sql(c.string, query.session.bind, params=c.params)

Method 7

Using the 2.0 SQLalchemy syntax (available also in 1.4 with the flag future=True) it looks that pd.read_sql is not implemented yet and it will raise:

NotImplementedError: This method is not implemented for SQLAlchemy 2.0.

This is an open issue that won’t be solved till pandas 2.0, you can find some information about this here and here.

I didn’t find any satisfactory work around, but some people seems to be using two configurations of the engine, one with the flag future False:

engine2 = create_engine(URL_string, echo=False, future=False)

This solution would be OK if you query strings, but using the ORM, the best I could do is a custom function yet to be optimized, but it works:

Conditions = session.query(ExampleTable)
def df_from_sql(query):
    return pd.DataFrame({i:j.__dict__ for i,j in enumerate(query.all())},).T.drop(columns='_sa_instance_state')
df = df_from_sql(ExampleTable)

This solution in any case would be provisional till pd.read_sql has implemented the new syntax.

Method 8

This answer provides a reproducible example using an SQL Alchemy select statement and returning a pandas data frame. It is based on an in memory SQLite database so that anyone can reproduce it without installing a database engine.

import pandas
from sqlalchemy import create_engine
from sqlalchemy import MetaData, Table, Column, Text
from sqlalchemy.orm import Session

Define table metadata and create a table

engine = create_engine('sqlite://')
meta = MetaData()
meta.bind = engine
user_table = Table('user', meta,
                   Column("name", Text),
                   Column("full_name", Text))
user_table.create()

Insert some data into the user table

stmt = user_table.insert().values(name='Bob', full_name='Sponge Bob')
with Session(engine) as session:
    result = session.execute(stmt)
    session.commit()

Read the result of a select statement into a pandas data frame

# Select data into a pandas data frame
stmt = user_table.select().where(user_table.c.name == 'Bob')
df = pandas.read_sql_query(stmt, engine)
df
Out:
  name   full_name
0  Bob  Sponge Bob

Method 9

if use SQL query

def generate_df_from_sqlquery(query):
   from pandas import DataFrame
   query = db.session.execute(query)
   df = DataFrame(query.fetchall())
   if len(df) > 0:
      df.columns = query.keys()
   else:
      columns = query.keys()
      df = pd.DataFrame(columns=columns)
return df

profile_df = generate_df_from_sqlquery(profile_query)

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating