Mark Mikofski -- SQL and relational databases

February 21, 2018 at 5-6:30pm in BIDS, 190 Doe Library

Agenda

  1. Requirements
  2. Objectives
  3. SQL Examples
  4. Relational Databases
  5. Summary

XKCD 327: Exploits of Mom

XKCD 327: Exploits of a Mom

Requirements

To prepare for this tutorial make sure you have the following:

  1. We’re going to use some Python, so make sure you have it installed on a laptop, and of course, don’t forget to bring your laptop to the tutorial.
  2. We’re going to use an example database and a Jupyter notebook with some code examples, so make sure your computer has working internet access. AFAIK anyone can use the Cal AirBears WiFi connection for free.
  3. A willingness to participate, try new things, make mistakes, learn and have fun!

Objectives

At the end of this tutorial you will be able to do the following:

SQL Examples

We’re going to use the examples from code_examples/SQL, so point your browser to this link or clone The Hacker Within - Berkeley and navigate to this folder.

Relational Databases

Wikipedia defines a database as …

An organized collection of data. A relational database, more restrictively, is a collection of schemas, tables, queries, reports, views, and other elements. … the most popular database systems since the 1980s have all supported the relational model - generally associated with the SQL language.

The main difference between a database and a object model like JSON or an simple spreadsheet is the size and complexity, necessitating database management software to quickly create, query, and retrieve data.

The relational database differs from other databases due to its strictly tabular structure consisting of rows of records and columns of fields. E.G.:

primary key text field integer field date field real field boolean field
1 foo 234 2018-02-21T1700Z 5.67E-8 TRUE
2 bar 123 2018-02-21T1830Z 1.6E-19 FALSE

Other databases, called noSQL, have a more flexible structure, allowing nested relations between keys, values, and arrays. Some NoSQL databases are more scalable than relational databases and can handle more data, making them useful for data science. Some examples of NoSQL databases are: CouchDB, MongoDB, Cassandra, AWS DynamoDB, etc.

Schema

The database schema formerly describes the structure of a database. For example the database in the table above could be described as a table with six fields:

  1. a unique non-null field called the primary key.
  2. a text field
  3. an integer field
  4. etc.

SQL - A Structured Query Language

The language used to define the database schema, insert data, and make queries is called SQL or Structured Query Language.

Database Management Software

Database management typically consists of a server and a client. There are several popular relational databases:

Clients and APIs

There are many ways to interface with a SQL database. Most databases come with a command line client, e.g.: psql or a GUI, e.g.: pgAdmin. Most databases also provide an API for programmatically interaction, e.g.: libpq.

Python Bindings

There are Python bindings to most database APIs:

Object Relational Mapping

It also possible to bind the database records directly to objects using object relation mapping (ORM) with software such as Django or SQLAlchemy. The advantage of using an ORM is that instead of using SQL commands, you create objects native to the languange, and the ORM takes care of creating the corresponding schema in the database.

Extra SQL commands

When setting up a SQL database server, eg PostgreSQL, you will also need to create a user, set a password, and create a database. I’ll leave these to the reader to investigate on their own.

Summary

SQL is not glamorous, and it’s been around for a long time, but it’s not that difficult to teach yourself. There are ton of links here and in the code_examples/SQL so I hope this will serve as a good starting point, but there is still so much more to learn. If you have any suggestions, feel free to comment here or please send a PR to The Hacker Within, Berkeley

Thanks!

Share