Introduction To Databases In Python- A Detailed Study

Table of Contents

Python is one of the most popular programming language. Due to its simplified syntax, Programmers can develop a wide range of software applications, fast and easy. With Python, you can easily build Data Science and Machine Learning projects, as well as general purpose applications. When combined with the popular SQL Server Data Platform, it provides Software Developers and Data Scientists, with a powerful software development setup, that can be used for building just about anything.

“Ready to take your python skills to the next level? Sign up for a free demo today!”

Structured Query Language (SQL or “Sequel”) is the way we communicate with a relational database. It’s an intuitive and powerful language and mastering it is rapidly becoming a prerequisite for career paths outside of software development.

SQL has been one of the most common skill sets required by employers for software developers for years, according to research published by Indeed. That fact is not likely to change any time soon, as businesses have woken up to the reality that the data they capture, generate, and store has tremendous business value to improve performance and efficiency and uncover new paths for growth.

Databases in Python

Basics of Relational Databases

Introduction to Databases

Relational model

Tables, Columns, Rows, and Relationships are part of the relational model.

Connecting to your database

Database types:

SQLite
PostgreSQL
MySQL
MS SQL
Oracle

Autoloading Tables from a database

SQLAlchemy can be used to automatically load tables from a database using something called reflection. Reflection is the process of reading the database and building the metadata based on that information. It’s the opposite of creating a Table by hand and is very useful for working with existing databases.

To perform reflection, you will first need to import and initialize a MetaData object. MetaData objects contain information about tables stored in a database. During reflection, the MetaData object will be populated with information about the reflected table automatically, so we only need to initialize it before reflecting by calling MetaData().

You will also need to import the Table object from the SQLAlchemy package. Then, you use this Table object to read your table from the engine, autoload the columns, and populate the metadata. This can be done with a single call to Table(): using the Table object in this manner is a lot like passing arguments to a function. For example, to autoload the columns with the engine, you have to specify the keyword arguments autoload=True and autoload_with=engine to Table().

Finally, to view information about the object you just created, you will use the repr() function. For any Python object, repr() returns a text representation of that object. For SQLAlchemy Table objects, it will return the information about that table contained in the metadata.

“Experience the power of our web development course with a free demo – enroll now!”

Viewing Table details

Now you can learn more about the columns and structure of your table. It is important to get an understanding of your database by examining the column names. This can be done by using the .columns attribute and accessing the .keys() method. For example, census.columns.keys() would return a list of column names of the census table.

Following this, we can use the metadata container to find out more details about the reflected table such as the columns and their types. For example, information about the table objects are stored in the metadata.tables dictionary, so you can get the metadata of your census table with metadata.tables[‘census’]. This is similar to your use of the repr() function on the census table.

from sqlalchemy import create_engine, MetaData, Table

engine = create_engine(‘sqlite:///census.sqlite’)

metadata = MetaData()

# Reflect the census table from the engine: census

census = Table(‘census’, metadata, autoload=True, autoload_with=engine)

# Print the column names

print(census.columns.keys())

# Print full metadata of census

print(repr(metadata.tables[‘census’]))

Introduction to SQL: raw SQL vs SQLAlchemy

Selecting data from a Table: raw SQL

We first need to establish a connection to it by using the .connect() method on the engine. This is because the create_engine() function that you have used before returns an instance of an engine, but it does not actually open a connection until an action is called that would require a connection, such as a query.

Using what we just learned about SQL and applying the .execute() method on our connection, we can leverage a raw SQL query to query all the records in our census table. The object returned by the .execute() method is a ResultProxy. On this ResultProxy, we can then use the .fetchall() method to get our results – that is, the ResultSet.

“Get hands-on with our python course – sign up for a free demo!”

Selecting data from a Table with SQLAlchemy

It’s now time to build your first select statement using SQLAlchemy. SQLAlchemy provides a nice “Pythonic” way of interacting with databases. When you used raw SQL in the last exercise, you queried the database directly. When using SQLAlchemy, you will go through a Table object instead, and SQLAlchemy will take case of translating your query to an appropriate SQL statement for you. So rather than dealing with the differences between specific dialects of traditional SQL such as MySQL or PostgreSQL, you can leverage the Pythonic framework of SQLAlchemy to streamline your workflow and more efficiently query your data. For this reason, it is worth learning even if you may already be familiar with traditional SQL.

You will also fetch only a few records of the ResultProxy by using .fetchmany() with a size argument specifying the number of records to fetch.

Handling a ResultSet

ResultProxy: The object returned by the .execute() It can be used in a variety of ways to get the data returned by the query.
ResultSet: The actual data asked for in the query when using a fetch method such as .fetchall()on a ResultProxy.

This separation between the ResultSet and ResultProxy allows us to fetch as much or as little data as we desire.

Applying Filtering, Ordering and Grouping to Queries

Filtering and targeting data

Connecting to a PostgreSQL database

There are three components to the connection string in this exercise: the dialect and driver (‘postgresql+psycopg2://’), followed by the username and password (‘username:password’), followed by the host and port (‘@postgresql.csrrinzqubik.us-east-1.rds.amazonaws.com:1234/’), and finally, the database name (‘census’). You will have to pass this string as an argument to create_engine() in order to connect to the database.

# Import create_engine function

from sqlalchemy import create_engine

# Create an engine to the census database

engine = create_engine(‘postgresql+psycopg2://username:password@postgresql.csrrinzqubik.us-east-1.rds.amazonaws.com:1234/census’)

# Use the .table_names() method on the engine to print the table names

print(engine.table_names())

# [‘census’, ‘state_fact’, ‘vrska’, ‘census1’, ‘data’, ‘data1′, ’employees3′, ‘users’, ’employees’, ’employees_2′]

“Ready to take your python skills to the next level? Sign up for a free demo today!”

In addition to standard Python comparators, we can also use methods such as in_() to create more powerful where() clauses. You can see a full list of expressions in the SQLAlchemy Documentation.

Method in_(), when used on a column, allows us to include records where the value of a column is among a list of possible values. For example, where(census.columns.age.in_([20, 30, 40])) will return only records for people who are exactly 20, 30, or 40 years old.

Along with in_, you can also use methods like and_,any_ to create more powerful where() clauses. You might have noticed that we did not use any of the fetch methods to retrieve a ResultSet like in the previous exercises. Indeed, if you are only interested in manipulating one record at a time, you can iterate over the ResultProxy directly!

SQLAlchemy also allows users to use conjunctions such as and_(), or_(), and not_() to build more complex filtering. For example, we can get a set of records for people in New York who are 21 or 37 years old with the following code:

select([census]).where(

and_(census.columns.state == ‘New York’,

or_(census.columns.age == 21,

census.columns.age == 37

)

Overview of ordering

Ordering by a single column

To sort the result output by a field, we use the .order_by() method. By default, the .order_by() method sorts from lowest to highest on the supplied column.

# Build a query to select the state column: stmt

stmt = select([census.columns.state])

# Order stmt by the state column

stmt = stmt.order_by(census.columns.state)

# Execute the query and store the results: results

results = connection.execute(stmt).fetchall()

# Print the first 10 results

print(results[:10])

# [(‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,)]

Ordering in descending order by a single column

You can also use .order_by() to sort from highest to lowest by wrapping a column in the desc() function. Although you haven’t seen this function in action, it generalizes what you have already learned.

Pass desc() (for “descending”) inside an .order_by() with the name of the column you want to sort by. For instance, stmt.order_by(desc(table.columns.column_name)) sorts column_name in descending order.

Ordering by multiple columns

We can pass multiple arguments to the .order_by() method to order by multiple columns. In fact, we can also sort in ascending or descending order for each individual column.

“Experience the power of our web development course with a free demo – enroll now!”

Counting, summing and grouping data

Counting distinct data

SQLAlchemy’s func module provides access to built-in SQL functions that can make operations like counting and summing faster and more efficient.

We can use func.sum() to get a sum of the pop2008 column of census as shown below:

select([func.sum(census.columns.pop2008)])

If instead you want to count the number of values in pop2008, you could use func.count() like this:

select([func.count(census.columns.pop2008)])

Furthermore, if you only want to count the distinct values of pop2008, you can use the .distinct() method:

select([func.count(census.columns.pop2008.distinct())])

Count of records by state

Often, we want to get a count for each record with a particular value in another column. The .group_by() method helps answer this type of query. You can pass a column to the .group_by() method and use in an aggregate function like sum() or count(). Much like the .order_by() method, .group_by() can take multiple columns as arguments.

# Import func

from sqlalchemy import func

# Build a query to select the state and count of ages by state: stmt

stmt = select([census.columns.state, func.count(census.columns.age)])

# Group stmt by state

stmt = stmt.group_by(census.columns.state)

# Execute the statement and store all the records: results

results = connection.execute(stmt).fetchall()

# Print results

print(results)

# Print the keys/column names of the results returned

print(results[0].keys())

[(‘Alabama’, 172), (‘Alaska’, 172), (‘Arizona’, 172), (‘Arkansas’, 172), (‘California’, 172),

…

(‘Wisconsin’, 172), (‘Wyoming’, 172)]

[‘state’, ‘count_1’]

Determining the population sum by state

To avoid confusion with query result column names like count_1, we can use the .label() method to provide a name for the resulting column. This gets appended to the function method we are using, and its argument is the name we want to use.

We can pair func.sum() with .group_by() to get a sum of the population by State and use the label() method to name the output.

We can also create the func.sum() expression before using it in the select statement. We do it the same way we would inside the select statement and store it in a variable. Then we use that variable in the select statement where the func.sum() would normally be.

“Get hands-on with our python course – sign up for a free demo!”

Use pandas and matplotlib to visualize our data

ResultsSets and pandas dataframes

# import pandas

import pandas as pd

# Create a DataFrame from the results: df

df = pd.DataFrame(results)

# Set column names

df.columns = results[0].keys()

# Print the Dataframe

print(df)

Advanced SQLAlchemy Queries

Calculating values in a query

Connecting to a MySQL database

# Import create_engine function

from sqlalchemy import create_engine

# Create an engine to the census database

engine = create_engine(‘mysql+pymysql://username:password@courses.csrrinzqubik.us-east-1.rds.amazonaws.com:3306/census’)

# Print the table names

print(engine.table_names())

# [‘census’, ‘state_fact’]

Calculating a difference between two columns

Often, you’ll need to perform math operations as part of a query, such as if you wanted to calculate the change in population from 2000 to 2008. For math operations on numbers, the operators in SQLAlchemy work the same way as they do in Python.

You can use these operators to perform addition (+), subtraction (-), multiplication (*), division (/), and modulus (%) operations. Note: They behave differently when used with non-numeric column types.

Let’s now find the top 5 states by population growth between 2000 and 2008.

census.columns.keys()

[‘state’, ‘sex’, ‘age’, ‘pop2000’, ‘pop2008’]

connection.execute(select([census])).fetchmany(3)

[(‘Illinois’, ‘M’, 0, 89600, 95012),

(‘Illinois’, ‘M’, 1, 88445, 91829),

(‘Illinois’, ‘M’, 2, 88729, 89547)]

# Build query to return state names by population difference from 2008 to 2000: stmt

stmt = select([census.columns.state, (census.columns.pop2008–census.columns.pop2000).label(‘pop_change’)])

# Append group by for the state: stmt_grouped

stmt_grouped = stmt.group_by(census.columns.state)

# Append order by for pop_change descendingly: stmt_ordered

stmt_ordered = stmt_grouped.order_by(desc(‘pop_change’))

# Return only 5 results: stmt_top5

stmt_top5 = stmt_ordered.limit(5)

# Use connection to execute stmt_top5 and fetch all results

results = connection.execute(stmt_top5).fetchall()

# Print the state and population change for each record

for result in results:

print(‘{}:{}’.format(result.state, result.pop_change))

California:105705

Florida:100984

Texas:51901

New York:47098

Pennsylvania:42387

“Ready to take your python skills to the next level? Sign up for a free demo today!”

Determining the overall percentage of women

It’s possible to combine functions and operators in a single select statement as well. These combinations can be exceptionally handy when we want to calculate percentages or averages, and we can also use the case() expression to operate on data that meets specific criteria while not affecting the query as a whole. The case() expression accepts a list of conditions to match and the column to return if the condition matches, followed by an else_ if none of the conditions match. We can wrap this entire expression in any function or math operation we like.

SQL relationships

Automatic joins with an established relationship

If you have two tables that already have an established relationship, you can automatically use that relationship by just adding the columns we want from each table to the select statement. Recall that Jason constructed the following query:

stmt = select([census.columns.pop2008, state_fact.columns.abbreviation])

In order to join the census and state_fact tables and select the pop2008 column from the first and the abbreviation column from the second. In this case, the census and state_fact tables had a pre-defined relationship: the state column of the former corresponded to the name column of the latter.

Joins

If you aren’t selecting columns from both tables or the two tables don’t have a defined relationship, you can still use the .join() method on a table to join it with another table and get extra data related to our query.

The join() takes the table object you want to join in as the first argument and a condition that indicates how the tables are related to the second argument.

Finally, you use the .select_from() method on the select statement to wrap the join clause.

For example, the following code joins the census table to the state_fact table such that the state column of the census table corresponded to the name column of the state_fact table.

stmt = stmt.select_from(

census.join(

state_fact, census.columns.state ==

state_fact.columns.name)

More practice with joins

You can use the same select statement you built in the last exercise, however, let’s add a twist and only return a few columns and use the other table in a group_by() clause.

stmt = select([

census.columns.state,

func.sum(census.columns.pop2008),

state_fact.columns.census_division_name

])

connection.execute(stmt).fetchmany(3)

# [(‘Texas’, 15446707263, ‘South Atlantic’)]

“Experience the power of our web development course with a free demo – enroll now!”

Working with hierarchical tables

Using alias to handle same table joined queries

Often, you’ll have tables that contain hierarchical data, such as employees and managers who are also employees. For this reason, you may wish to join a table to itself on different columns. The .alias() method, which creates a copy of a table, helps accomplish this task. Because it’s the same table, you only need a where clause to specify the join condition.

Here, you’ll use the .alias() method to build a query to join the employees table against itself to determine to whom everyone reports.

employees.columns.keys()

# [‘id’, ‘name’, ‘job’, ‘mgr’, ‘hiredate’, ‘sal’, ‘comm’, ‘dept’]

connection.execute(select([employees.columns.name, employees.columns.mgr])).fetchmany(5)

# [(‘JOHNSON’, 6), (‘HARDING’, 9), (‘TAFT’, 2), (‘HOOVER’, 2), (‘LINCOLN’, 6)]

Leveraging functions and group_bys with hierarchical data

It’s also common to want to roll up data which is in a hierarchical table. Rolling up data requires making sure you’re careful which alias you use to perform the group_bys and which table you use for the function.

Dealing with large ResultSets

Working on blocks of records

Sometimes you may have the need to work on a large ResultProxy, and you may not have the memory to load all the results at once.
To work around that issue, you can get blocks of rows from the ResultProxy by using the .fetchmany() method inside a loop. With .fetchmany(), give it an argument of the number of records you want. When you reach an empty list, there are no more rows left to fetch, and you have processed all the results of the query.
Then you need to use the .close() method to close out the connection to the database.

Creating and Manipulating your own Databases

Creating databases and tables

Creating tables with SQLAlchemy

Previously, you used the Table object to reflect a table from an existing database, but what if you wanted to create a new table? You’d still use the Table object; however, you’d need to replace the autoload and autoload_with parameters with Column objects.

The Column object takes a name, a SQLAlchemy type with an optional format, and optional keyword arguments for different constraints.

When defining the table, recall how in the video Jason passed in 255 as the maximum length of a String by using Column(‘name’, String(255)). Checking out the slides from the video may help: you can download them by clicking on ‘Slides’ next to the IPython Shell.

After defining the table, you can create the table in the database by using the .create_all() method on metadata and supplying the engine as the only parameter. Go for it!

metadata

MetaData(bind=None)

engine

Engine(sqlite:///:memory:)

Constraints and data defaults

You’re now going to practice creating a table with some constraints! Often, you’ll need to make sure that a column is unique, nullable, a positive value, or related to a column in another table. This is where constraints come in.

You can also set a default value for the column if no data is passed to it via the default keyword on the column.

Inserting data into a table

Inserting a single row

There are several ways to perform an insert with SQLAlchemy; however, we are going to focus on the one that follows the same pattern as the select statement.

It uses an insert statement where you specify the table as an argument, and supply the data you wish to insert into the value via the .values() method as keyword arguments

Inserting multiple records at once

When inserting multiple records at once, you do not use the .values() method. Instead, you’ll want to first build a list of dictionaries that represents the data you want to insert, with keys being the names of the columns.

In the .execute() method, you can pair this list of dictionaries with an insert statement, which will insert all the records in your list of dictionaries.

Loading a CSV into a table

One way to do that would be to read a CSV file line by line, create a dictionary from each line, and then use insert(), like you did in the previous exercise.

But there is a faster way using pandas. You can read a CSV file into a DataFrame using the read_csv() function (this function should be familiar to you, but you can run help(pd.read_csv) in the console to refresh your memory!). Then, you can call the .to_sql() method on the DataFrame to load it into a SQL table in a database. The columns of the DataFrame should match the columns of the SQL table.

“Get hands-on with our python course – sign up for a free demo!”

.to_sql() has many parameters, but in this exercise we will use the following:

nameis the name of the SQL table (as a string).
conis the connection to the database that you will use to upload the data.
if_existsspecifies how to behave if the table already exists in the database; possible values are “fail”, “replace”, and “append”.
index(True or False) specifies whether to write the DataFrame’s index as a column.

connection

<sqlalchemy.engine.base.Connection at 0x7feca8d64e10>

# append the data from census_df to the “census” table via connection

census_df.to_sql(name=‘census’, con=connection, if_exists=‘append’, index=False)

Updating data in a database

Updating individual records

The update statement is very similar to an insert statement. For example, you can update all wages in the employees table as follows:

stmt = update(employees).values(wage=100.00)

The update statement also typically uses a where clause to help us determine what data to update. For example, to only update the record for the employee with ID 15, you would append the previous statement as follows:

Updating multiple records

By using a where clause that selects more records, you can update multiple records at once.

Unlike inserting, updating multiple records works exactly the same way as updating a single record (as long as you are updating them with the same value).

Correlated updates

You can also update records with data from a select statement. This is called a correlated update. It works by defining a select statement that returns the value you want to update the record with and assigning that select statement as the value in update.

Removing data from a database

Deleting all the records from a table

Often, you’ll need to empty a table of all of its records so you can reload the data. You can do this with a delete statement with just the table as an argument. For example, delete the table extra_employees by executing as follows:

delete_stmt = delete(extra_employees)

result_proxy = connection.execute(delete_stmt)

Deleting specific records

By using a where() clause, you can target the delete statement to remove only certain records. For example, delete all rows from the employees table that had id 3 with the following delete statement:

delete(employees).where(employees.columns.id == 3)

Deleting a table completely

You’re now going to practice dropping individual tables from a database with the .drop() method, as well as all tables in a database with the .drop_all() method!

Do be careful when deleting tables, as it’s not simple or fast to restore large databases!

Remember, you can check to see if a table exists on an engine with the .exists(engine) method.

Grab the opportunity to learn Python with Entri! Click Here

Putting it all together

Census case study

Setup the engine and metadata

In this exercise, your job is to create an engine to the database that will be used in this chapter. Then, you need to initialize its metadata.

Recall how you did this by leveraging create_engine() and MetaData().

# Import create_engine, MetaData

from sqlalchemy import create_engine, MetaData

# Define an engine to connect to chapter5.sqlite: engine

engine = create_engine(‘sqlite:///chapter5.sqlite’)

# Initialize MetaData: metadata

metadata = MetaData()

engine

# Engine(sqlite:///chapter5.sqlite)

type(engine)

# sqlalchemy.engine.base.Engine

metadata

# MetaData(bind=None)

type(metadata)

# sqlalchemy.sql.schema.MetaData

Create the table to the database

Having setup the engine and initialized the metadata, you will now define the census table object and then create it in the database using the metadata and engine.

To create it in the database, you will have to use the .create_all() method on the metadata with engine as the argument.

When creating columns of type String(), it’s important to spend some time thinking about what their maximum lengths should be.

“YOU DON’T HAVE TO BE GREAT TO START, BUT YOU HAVE TO START TO BE GREAT.”

A Step-by-Step Guide for Getting a Job as a Python Developer	Why Python Is Used For Data Science?
Guide for getting a job as a Python Developer	Python Advanced Interview Questions and Answers
Best Online Python Course with Certificate	What is Type Conversion in Python?

Introduction To Databases In Python- A Detailed Study

Ayesha Surayya

Related Posts

SEM Interview Questions

Top 10 Quantity Surveying Companies in Dubai

On-page SEO Interview Questions (2024 Updated)

Tips to Improve Your Conversational English

Latest Posts

Trending Posts

List of 28 States of India and their Capitals and Languages 2024 – PDF Download

List of Government Banks in India 2024: All you need to know

Kerala Water Authority (KWA) Lab Assistant Mock Test 2024: Attempt Series

TNPSC Group 2 Posts and Salary Details 2022

New Map of India with States and Capitals 2024

Courses

Company

Spoken English Courses

Quick Links

Other Courses

Popular Exam

Introduction To Databases In Python- A Detailed Study

Databases in Python

Basics of Relational Databases

Introduction to Databases

Relational model

Connecting to your database

Autoloading Tables from a database

Viewing Table details

Introduction to SQL: raw SQL vs SQLAlchemy

Selecting data from a Table: raw SQL

Selecting data from a Table with SQLAlchemy

Handling a ResultSet

Applying Filtering, Ordering and Grouping to Queries

Filtering and targeting data

Connecting to a PostgreSQL database

Overview of ordering

Ordering by a single column

Ordering in descending order by a single column

Ordering by multiple columns

Counting, summing and grouping data

Counting distinct data

Count of records by state

Determining the population sum by state

Use pandas and matplotlib to visualize our data

ResultsSets and pandas dataframes

Advanced SQLAlchemy Queries

Calculating values in a query

Connecting to a MySQL database

Calculating a difference between two columns

Determining the overall percentage of women

SQL relationships

Automatic joins with an established relationship

Joins

More practice with joins

Working with hierarchical tables

Using alias to handle same table joined queries

Leveraging functions and group_bys with hierarchical data

Dealing with large ResultSets

Working on blocks of records

Creating and Manipulating your own Databases

Creating databases and tables

Creating tables with SQLAlchemy

Constraints and data defaults

Inserting data into a table

Inserting a single row

Inserting multiple records at once

Loading a CSV into a table

Updating data in a database

Updating individual records

Updating multiple records

Correlated updates

Removing data from a database

Deleting all the records from a table

Deleting specific records

Deleting a table completely

Putting it all together

Census case study

Setup the engine and metadata

Create the table to the database

Related Articles

Related Posts

Latest Posts

Trending Posts

Courses

Company

Spoken English Courses

Quick Links

Other Courses

Popular Exam