LS 181 Study Guide

SQL

What is it?

SQL is a language used by Relational Database Management Systems (RDBMSs) to work with relational databases. Relational databases are structured collections of data persisted as a set of relations (aka tables). Put more simply, relational data means working with multiple tables at once. SQL is a special purpose and declarative language: special purpose in that it is used exclusively to work with relational data, and declarative in that the code describes what needs to be accomplished but the server abstracts away the details of the execution.

Why use relational data? Relational data reduces the amount of duplicated data that is persisted, and provides a useful structure to interact with data.

TL;DR:

  • Relation = table
  • relational database = connected tables
  • RDBMS = applications to work with connected tables
  • SQL = language used by RDBMSs
  • special purpose = only used for one thing
  • declarative = execution details abstracted

Joining Tables: Types of Joins

What are joins?

When working with databases, it’s important to be able to write queries that can return data from multiple tables at the same time. That’s where joins come in. You can read data from anywhere within the database’s schema using the word JOIN and a join condition.

Let’s say you have a table containing data about artwork. This table might be related to a table with data about artists. If I want to read some information about the title of the artwork as well as the artist’s name and birth year, I would have to access both tables at once. So I write a query using JOIN and a join condition that links data in the two tables (e.g. artists.id = artwork.artist_id). This query creates a “join table” from the two tables, then selects from said join table. You can also join more than two tables using multiple JOIN statements. Each JOIN statement creates a new virtual table joining the relevant data, and the JOIN statements compound on each other (the second JOIN is joining with the virtual data from the first JOIN, etc.)

Below are some types of joins.

INNER JOIN

What is it? INNER JOIN is the default join type that SQL uses when it see the word JOIN. INNER JOIN creates a join table that contains ONLY the entries where the join condition is met. In other words, the set of all common elements between the two tables.

If my join condition is artists.id = artwork.artist_id, the inner join table will return data from all table entries where the artist id is in both the artists table and the artwork table. It will ignore any entries where artwork.artist_id or artists.id don’t match the join condition.

When should you use it? This is useful when you only want to look at common elements between tables and don’t want to read any null entries.

Example:

SELECT artwork.title, artists.name, artwork.year
FROM artwork
INNER JOIN artists
ON artwork.artist_id = artists.id;
title | name | year
------------------------+---------------------+--------
Cafe Terrace at night | Vincent Van Gogh | 1888
Antibes | Claude Monet | 1888

LEFT OUTER JOIN

What is it? LEFT OUTER JOIN creates a join table containing all of the entries from the table to the left of the JOIN statement, and combines it with the matching rows of any entries from the table to the right of the JOIN statement that match the join conditition.

With the join statement artwork LEFT OUTER JOIN artists ON artists.id = artwork.artist_id, the results set will contain data from every row in artwork, and combine it from any rows in the artists table that match the join condition. It will ignore any entries where artwork.artist_id doesn’t match the join condition, but fill in null values for entries where artists.id doesn’t match the join condition.

When should you use it? Use a left outer join when you want to see every entry from one table, and don’t care if they match the second table or not (I want to see all artwork in my list, whether or not I know the artist, or I want to see the number of artworks for each artist, even if that number is 0).

Example:

SELECT artwork.title, artists.name, artwork.year
FROM artwork
LEFT OUTER JOIN artists
ON artwork.artist_id = artists.id;
title | name | year
--------------------------+---------------------+--------
Cafe Terrace at night | Vincent Van Gogh | 1888
Antibes | Claude Monet | 1888
untitled watercolor | | 1880

RIGHT OUTER JOIN

What is it? As with LEFT OUTER JOIN, RIGHT OUTER JOIN is concerned with all of the entries from the table to the right of the JOIN keyword, and any matching entries from the table written to the left of the word JOIN.

Take the statement artwork RIGHT OUTER JOIN artists ON artists.id = artwork.artist_id. Here, the results set will contain data from every row in artists, and combine it from any rows in the artwork table that matches the join condition. It will ignore any entries where artists.id doesn’t match the join condition, but fill in null values for entries where artwork.artist_id doesn’t match the join condition.

When should you use it? Use a right outer join when you want to see every entry from the second table, and don’t care if they match the first table or not (I want to see all artists in my list, regardless of if my database contains any artwork associated with them).

Example:

SELECT artwork.title, artists.name, artwork.year
FROM artwork
RIGHT OUTER JOIN artists
ON artwork.artist_id = artists.id;
title | name | year
--------------------------+---------------------+--------
Cafe Terrace at night | Vincent Van Gogh | 1888
Antibes | Claude Monet | 1888
| Paul Cezánne |
| Edgar Degas |

FULL OUTER JOIN

What is it? FULL OUTER JOIN returns a set containing all data from all entries from the two tables regardless of whether or not they match the join condition. Functions like a combination of left and right join.

Let’s look at the statement artwork FULL OUTER JOIN artists ON artists.id = artwork.artist_id. This resulting join table includes every row from both tables, filling in null values for entries where artists.id or artwork.artist_id doesn’t match the join condition.

When should you use it? Use a full outer join when you want to see all the data from both tables, and don’t care about data that doesn’t match.

Example:

SELECT artwork.title, artists.name, artwork.year
FROM artwork
FULL OUTER JOIN artists
ON artwork.artist_id = artists.id;
title | name | year
--------------------------+---------------------+--------
Cafe Terrace at night | Vincent Van Gogh | 1888
Antibes | Claude Monet | 1888
untitled watercolor | | 1880
| Paul Cezánne |
| Edgar Degas |

CROSS JOIN

What is it? A CROSS JOIN is the only kind of join that doesn’t use a join condition. It returns the set of all rows in the first table crossed with all rows in the second table, in order words, every possible combination of rows from the two tables. It’s also known as a cartesian join, as it maps all the rows in one table to all the rows in the other. Does not include null data.

When should you use it? When you want to look at combinations and possibilities that aren’t in the database?

Example:

SELECT artwork.title, artists.name, artwork.year
FROM artwork
CROSS JOIN artists
title | name | year
--------------------------+---------------------+--------
Cafe Terrace at night | Vincent Van Gogh | 1888
Antibes | Vincent Van Gogh | 1888
untitled watercolor | Vincent Van Gogh | 1880
Cafe Terrace at night | Claude Monet | 1888
Antibes | Claude Monet | 1888
untitled watercolor | Claude Monet | 1880
Cafe Terrace at night | Paul Cezánne | 1888
Antibes | Paul Cezánne | 1888
untitled watercolor | Paul Cezánne | 1880
Cafe Terrace at night | Edgar Degas | 1888
Antibes | Edgar Degas | 1888
untitled watercolor | Edgar Degas | 1880

Sublanguages of SQL: DML, DDL, and DCL

“NAME THAT SUBLANGUAGE!”

DDL

What is it? Data Definition Language is one of the three sublanguages of SQL. It deals with defining and altering a database’s schema: the structure and rules of a database and it stables and columns.

When should I use it? Use it when dealing with a database’s structure.

Examples: CREATE TABLE, ALTER TABLE, DROP TABLE, ADD COLUMN, ALTER COLUMN, DROP COLUMN

CREATE TABLE artists (
id serial PRIMARY KEY
name varchar(256)
/* column definitions include type, name, optional constraints */
)
ALTER TABLE artists
ADD COLUMN birthday date NOT NULL
;
ALTER TABLE artists
ALTER COLUMN birthday
DROP NOT NULL,
ALTER COLUMN name TYPE text
;
ALTER TABLE artists
DROP COLUMN birthday
;
ALTER TABLE artwork
RENAME COLUMN title TO artwork_title
;
ALTER TABLE artwork
RENAME TO artworks
;
ALTER TABLE artwork
ALTER COLUMN title
DROP DEFAULT;
DROP TABLE artwork;

DML

What is it? Data Manipulation language is another of the three sublanguages, and it deals with reading and updating a database’s data — the values stored within the tables/relations.

When should I use it? Use it to add, read, manipulate, or delete data in a relation.

What is CRUD? A set of standard data manipulations for any relational data operation: Create, Read, Update, Delete.

Examples: INSERT, UPDATE, DELETE ,SELECT

SELECT * FROM artists;INSERT INTO artists (name)
VALUES ('Matisse');
UPDATE artists
SET name = 'Anonymous'
WHERE id = 3
;
DELETE FROM artists
WHERE id = 3
;

DCL

What is it? Data Control Language is the third sublanguage of SQL. It is used to deal with security settings that are part of a database’s schema. It grants and revokes access to relations, and defines the rights and roles of individual users.

When should I use it? When you need to limit or grant access to data within a database.

Examples: GRANT, REVOKE

Making Queries: GROUP BY, ORDER BY, WHERE, and HAVING

Below are several common query keywords that allow you to group, sort, and filter data in your relational database.

GROUP BY

What does it do? GROUP BY allows you to group data in your tables based on identical values in a specified column. This allows you to aggregate information about groups of entries that share common elements, like the number of paintings by each artist.

When do I use it? Use GROUP BY when you want to aggregate information about multiple groups and read it at the same time. Used with aggregate functions to get a sum, count, or concatenated string for entries within different groups (number of artworks grouped by year, list of all works created by each artist, total cost of each food type, etc.)

Example:

SELECT artists.name AS artist_name, count(artwork.id) AS num_works
FROM artists
INNER JOIN artwork
ON artists.id = artwork.artist_id
GROUP BY artists.name;
artist_name | num_works
--------------------------+-----------
Vincent Van Gogh | 2
Claude Monet | 1
SELECT artists.name AS artist_name, string_agg(artwork.title, ', ') AS works
FROM artists
INNER JOIN artwork
ON artists.id = artwork.artist_id
GROUP BY artists.name;
artist_name | works
--------------------------+-----------------------------------------
Vincent Van Gogh | Cafe Terrace at night, The Starry Night
Claude Monet | Antibes

Why did I get an error? One common error that occurs with GROUP BY clauses happens when you try to select a column without adding it to your group by clause or performing an aggregate function on it. For instance:

SELECT artists.name, count(artwork.id) AS num_works, artwork.year
FROM artists
INNER JOIN artwork
ON artists.id = artwork.artist_id
GROUP BY artists.id
;
ERROR: column "artwork.year" must appear in the GROUP BY clause or be used in an aggregate function/* The above query throws an error. It cannot return the year of an artwork when grouped by artist. */name | num_works | year
--------------------------+--------------+--------------------------
Vincent Van Gogh | 2 | /* what could go here? */
Claude Monet | 1 |

PostgresQL cannot return the year of a single artwork that is grouped by artist, unless incorporated into an aggregate function that works with the date type (min, max, count).

Common aggregate functions:

count() : takes a column name as argument and returns the number of values in that column. (ex. SELECT count(id) FROM artists;)

sum() : takes a column as argument, sums numeric values for all selected rows, returns the total. (ex. SELECT sum(price) FROM artwork;)

min() : takes a column as argument and returns the lowest value in said column for all of the selected rows. It can be used with various data (numeric, date/ time, string, etc) . (ex. SELECT min(year) FROM artwork;)

max() : takes a column as argument and returns the highest value in said column for all of the selected rows. It can also be used with various data (numeric, date/ time, string, etc) . (ex. SELECT max(year) FROM artwork;)

avg() : takes a column name as argument and returns the average/mean of numeric values for the selected rows. (ex. SELECT avg(id) FROM artists;)

string_agg() : takes a column name and delimiter as argument and returns all string values from the selected rows concatenated with the delimiter. (ex. SELECT string_agg(name) FROM artists;)

ORDER BY

What does it do? ORDER BY allows you to sort data in your tables based on the values in a specified column. You can use the keywords ASC and DESC as arguments to determine sort order. You can also sort by multiple column values, or use an ORDER BY clause to sort grouped data. Defaults to ascending order.

When do I use it? When you need to return values in a certain order.

Example:

SELECT title, year FROM artwork
ORDER BY year ASC
;
title | year
--------------------------+--------
untitled watercolor | 1880
Cafe Terrace at night | 1888
Antibes | 1888
SELECT title, year FROM artwork
ORDER BY year ASC, title ASC
;
title | year
--------------------------+--------
untitled watercolor | 1880
Antibes | 1888
Cafe Terrace at night | 1888
SELECT year FROM artwork
GROUP BY year
ORDER BY year DESC
;
year
--------
1888
1880

WHERE

What does it do? WHERE is a keyword used in a SQL query to check each row of data in a relation against a condition and return the data that matches the WHERE condition. WHERE conditions can include various logical expressions and operators to help filter the data correctly.

When do I use it? When you need to filter data.

Example:

SELECT title, year FROM artwork
WHERE year > 1881
;
title | year
--------------------------+-----------
Cafe Terrace at night | 1888
Antibes | 1888

SELECT year, title FROM artwork
WHERE year > 1881
AND title LIKE '%night'
;
year | title
----------+---------------------
1888 | Cafe Terrace at night

Common logical expressions in WHERE conditions:

  • LIKE operator: LIKE is a case sensitive string matching operator. It is used in conditions to checks for a matching substring. A wildcard character, like % or _, is commonly used in the match value of a LIKE expression to look for specific placement of substring. _ stands for single character, and % is any number of characters. For example, SELECT * FROM artists WHERE name LIKE ‘%Monet’ would select any rows where the data in the name column of artists ends with ‘Monet’. SELECT * FROM artists WHERE name LIKE ‘_______Monet’ would select any rows where the data in the name column of artists begins with 6 characters and ends with ‘Monet’. SELECT * FROM artists WHERE name LIKE ‘Claude%’ would select any rows where the data in the name column of artists begins with ‘Claude’.
  • ILIKE operator: ILIKE works in the same way as the LIKE operator, but it is not case sensitive. (SELECT * FROM artists WHERE name LIKE ‘%moNEt’)
  • SIMILAR TO operator: SIMILAR TO also works in the same way as the LIKE operator, but it uses regex instead of a substring string to match the strings. (SELECT * FROM artists WHERE name SIMILAR TO '/Claude\sMonet\b/gm')
  • >,=, <=, =, <>: Operators used to compare values of data and a specified literal (WHERE year > 1881). REMEMBER that you can’t use these operators on NULL, but instead should use the IS NULL or IS NOT NULL operator.
  • AND, OR: Works like && to || in Javascript to combine conditions (WHERE year > 1881 OR title LIKE ‘%night’)

Why did I get an error? If you try to use WHERE on a grouped data, you will get an error. Instead you can use the keyworkd HAVING.

HAVING

What does it do? HAVING is a keyword that allows you to specify a condition for grouped data, and only return grouped data that meets said condition. It works like a WHERE clause, but, unlike WHERE clauses, it can be used along with a GROUP BY condition in a query.

When do I use it? When you want to fiter grouped data.

Example:

SELECT artists.name AS artist_name, count(artwork.id) AS num_works
FROM artists
INNER JOIN artwork
ON artists.id = artwork.artist_id
GROUP BY artists.name --note that artists.name must be used here
HAVING num_works = 2;
/* note also that having is evaluated before select, so you cannot use aliases from the select statement */artist_name | num_works
--------------------------+-----------
Vincent Van Gogh | 2

Data Integrity: Using Constraints

What are constraints?

Constraints are rules used to define what data is allowed in specified columns. Some examples are CHECK, UNIQUE, PRIMARY KEY, NOT NULL, and FOREIGN KEY.

Why use constraints?

By providing rules for what kinds of data are allowed in your relations, you are able to maintain data integrity and quality, and are less likely to encouter data duplication and null values (and therefore anomalies).

CHECK CONSTRAINTS

What does it do? CHECK constraints allow you to specify your own unique rules for data. As each row of data is about to be added to a table, RDBMS’s will verify that the row of data conforms to the check constraint’s condition. It uses conditions like a where clause, but unlike a where clause, constraints are part of a database’s schema (DDL) and control what values are inserted into a table, not just the values queried (DML). If you try to insert or update with an invalid value, an error will be raised.

When do I use it? When you have a specific rule for data that isn’t already a constraint, like if you only want to insert positive numbers, or capitalized words, or email addresses.

Examples:

  • CHECK (spectral_type IN (‘O’, ‘B’, ‘A’, ‘F’, ‘G’, ‘K’, ‘M’))
  • CHECK (mass > 0.0)
  • CHECK (full_name <> ‘’)
  • CHECK (length(director) >= 3 AND position(‘ ‘ in director) > 0)
  • CHECK (year BETWEEN 1900 AND 2100)
  • CHECK (length(title) >= 1)
/* Adding a CHECK constraint on CREATE */
CREATE TABLE artwork (
id serial PRIMARY KEY,
title text,
year numeric CHECK(year BETWEEN 1400 AND 2022)
);
/* Adding a CHECK constraint on ALTER */
ALTER TABLE artwork
ADD CHECK (year BETWEEN 1300 AND 2022)
;
ALTER TABLE artwork
ADD CONSTRAINT artwork_year_check CHECK (year BETWEEN 1300 AND 2022);
/* Dropping a CHECK constraint */
ALTER TABLE artwork
DROP CONSTRAINT artwork_year_check
;
ALTER TABLE artwork
DROP CONSTRAINT artwork_year_check
;

UNIQUE CONSTRAINTS

What does it do? Before adding data to a table, UNIQUE checks that the data has not already been added to that column, ensuring that every value in said column is actually unique. If you try to insert or update with a duplicate value, an error will be raised.

When do I use it? Use UNIQUE when you want to ensure that there are not duplicated values in a given column.

Examples:

/* Adding a UNIQUE constraint on CREATE */
CREATE TABLE artwork (
id serial NOT NULL,
title text UNIQUE
);
/* Adding a UNIQUE constraint on ALTER */
ALTER TABLE artwork
ADD CONSTRAINT id_unique UNIQUE (id)
;
/* Dropping a UNIQUE constraint */
ALTER TABLE artwork
DROP CONSTRAINT id_unique
;

NOT NULL CONSTRAINTS

What does it do? Before adding data to a table, NOT NULL checks that the data is not null, ensuring that every column contains no null values. If you try to insert or update with a null value, an error will be raised.

When do I use it? Use this constraint when it would be damaging to include null values in your table. Null values are risky: they can cause unexpected sorting behavior, make queries more complicated, and don’t transfer well between systems. Use if a column might be used as a primary key, if it is related to another table, if every row should contain this column of data, or generally if it is a foreign key.

Examples:

/* Adding a NOT NULL constraint on CREATE */
CREATE TABLE artwork (
id serial UNIQUE NOT NULL,
title text
);
/* Adding a NULL constraint on ALTER */
ALTER TABLE artwork
ALTER COLUMN title SET NOT NULL;
/* Dropping a NOT NULL constraint */
ALTER TABLE artwork
ALTER COLUMN title DROP NOT NULL;

PRIMARY KEY CONSTRAINTS

What does it do? Primary key constraints function like NOT NULL UNIQUE constraints, and are used to define a column that will be used to uniquely identify rows. There can only be one primary key per table.

When do I use it? Use a primary key constraint to set up a column with unique row indentifiers. Primary keys are often used with serial data types.

Examples:

/* Adding a PRIMARY KEY constraint on CREATE */
CREATE TABLE artwork (
id serial PRIMARY KEY,
title text
);
CREATE TABLE artwork (
id serial,
title text,
PRIMARY KEY (id)
);
/* Dropping a PRIMARY KEY constraint */
ALTER TABLE artwork
DROP CONSTRAINT artwork_pkey
;
/* Adding a PRIMARY KEY constraint on ALTER */
ALTER TABLE artwork
ADD PRIMARY KEY (id)
;

FOREIGN KEY

What does it do? Foreign key constraints set up relationships between tables, by associating a row in one table to a row in another table. Used with the keyword REFERENCES, FOREIGN KEY will ensure that every entry in the foreign key constrained column of one table has a corresponding entry in the table being referenced. This preserves the referential integrity of the database. Unlike PRIMARY KEY, FOREIGN KEY constraints do not set up a NOT NULL or UNIQUE constraint. You can insert null or duplicate data into a foreign key column. Also unlike PRIMARY KEY constraints, you can have a table with multiple foreign key columns. When using a foreign key, you should also set up a delete condition to tell the application what to do with the child table data when the corresponding row in the parent table is deleted.

When do I use it? You use a foreign key when you need to relate two tables to each other in a one-to-one, one-to-many, or many-to-many relationship.

Examples:

/* Adding a FOREIGN KEY constraint on CREATE */
CREATE TABLE artwork (
id serial PRIMARY KEY,
artist_id integer REFERENCES artists (id)
ON DELETE CASCADE
);
CREATE TABLE artwork (
id serial PRIMARY KEY,
artist_id integer,
FOREIGN KEY (artist_id) REFERENCES artists (id)
ON DELETE CASCADE
);
/* Adding a FOREIGN KEY constraint on ALTER */
ALTER TABLE artwork
ADD CONSTRAINT artwork_artist_id_fkey
FOREIGN KEY (artist_id) REFERENCES artists (id)
ON DELETE CASCADE
;
/* Dropping a FOREIGN KEY constraint */
ALTER TABLE artwork
DROP CONSTRAINT artwork_artist_id_fkey
;

Subqueries

What is it?

Subqueries are queries nested within other queries. They can be used instead of JOIN statements to query data from multiple tables at once.

SELECT name FROM artists 
WHERE id NOT IN
(SELECT artist_id FROM artwork
WHERE artist_id IS NOT NULL);
/* list of artists whose work is not in artwork table */
/* note the is not null clause for the NOT IN statement. you can't check id against null values */
name
-----------------
Paul Cezánne
Edgar Degas

Why Use Subqueries?

Subqueries are not always preferable to joins, but they do offer some advantages in readability, and occasionally efficiency (though joins are often faster). Subqueries may be more logical when you only need to retrieve data from one table, but you want to use a conditional on data from another table, as in the above example. You can use the EXPLAIN or ANALYZE keywords to investigate the efficiency of the different query plans. Efficiency varies partially because RDBMSs will execute SQL’s declarative commands differently.

Common Subquery Expressions:

EXISTS/NOT EXISTS: This expression checks if a nested query returns any rows that match the given condition, and returns a boolean. True if at least one row is returned by nested query, false if not.

SELECT 1 WHERE EXISTS
(SELECT id FROM artists
WHERE name = 'Claude Monet');
/* if there are any artists named Claude Monet, it returns 1 */?column?
-----------
1
SELECT title FROM artwork WHERE EXISTS
(SELECT id FROM artists
WHERE name = 'Claude Monet'
AND artists.id = artwork.artist_id);
/* checks each row in artwork, and if there are any artists named Claude Monet whose id matches the artist_id of that row in artwork, the exists statement returns true, and adds that row of artwork to data set */title
-----------
Antibes
SELECT name FROM artists WHERE NOT EXISTS
(SELECT id FROM artwork
WHERE artists.id = artwork.artist_id);
/* checks each row in artists, and if there are any artworks whose artist_id matches the id of that row in artists, the NOT EXISTS statement returns false. */name
-----------------
Paul Cezánne
Edgar Degas

IN / NOT IN: This expression compares a value to every row in a nested query result. If the value is found within a row of subquery result, the expression evaluates to ‘true’.

SELECT name FROM artists WHERE id IN
(SELECT artist_id FROM artwork
WHERE title LIKE 'Ant%');
/* goes through each row of artists table and compares the id value of that row to a table of artist_id's that correspond to a title beginning with 'Ant'. If the id value is in the subquery result, the name of the artist is added to the returned dataset. */name
---------------------
Claude Monet
SELECT name FROM artists
WHERE id NOT IN
(SELECT artist_id FROM artwork
WHERE artist_id IS NOT NULL);
/* list of artists whose work is not in artwork table */
/* note the is not null clause for the NOT IN statement. you can't check id against null values */
name
-----------------
Paul Cezánne
Edgar Degas

ANY/SOME: (used interchangeably) These expressions are used with operators like =, <, >, etc. Returns true if there is any value in the results of the nested query for which the operator expression evaluates to true given the value to the left of the operator.

SELECT name FROM artists WHERE length(name) > ANY
(SELECT length(title) FROM artwork);
/* Returns a set of artist names that have a length greater than any of the artwork titles (here, all of them have a length greater than 'Antibes') */name
------------------
Vincent Van Gogh
Claude Monet
Paul Cezánne
Edgar Degas
SELECT title FROM artwork WHERE artist_id = ANY
(SELECT id FROM artists);
/* Returns a set of artwork titles that have a corresponding artist id in the artists table. Here two of the titles have ids in the table. */title
-----------------------
Cafe Terrace at night
Antibes

ALL: Like ANY and SOME, ALL is used with operators to evaluate a logical expression. ALL evaluates to true if all of the results of the nested query are true when compared to the value to the left of the operator.

SELECT name FROM artists WHERE length(name) > ALL
(SELECT length(title) FROM artwork);
/* Returns a set of artist names that have a length greater than all of the artwork titles (here, none of them have a length greater than 'Cafe Terrace at night') */name
------------------

JOINs rewritten as Subqueries

/* JOIN */SELECT artwork.title
FROM artwork
INNER JOIN artists
ON artwork.artist_id = artists.id;
title
-----------------------
Cafe Terrace at night
Antibes

/* Subqueries */
SELECT title FROM artwork
WHERE EXISTS
(SELECT id FROM artists
WHERE id = artwork.artist_id);
SELECT title FROM artwork
WHERE artist_id IN
(SELECT id FROM artists
WHERE id = artwork.artist_id);
SELECT title FROM artwork
WHERE artist_id = ANY
(SELECT id FROM artists
WHERE id = artwork.artist_id);
/* JOIN */
SELECT artwork.year, artwork.title FROM artwork
INNER JOIN artists ON artwork.artist_id = artists.id
WHERE artwork.year > 1881
AND artists.name LIKE '%Monet'
;
/* Subqueries */
SELECT year, title FROM artwork
WHERE year > 1881 AND
EXISTS (SELECT id FROM artists
WHERE id = artwork.artist_id
AND name LIKE '%Monet');
SELECT year, title FROM artwork
WHERE year > 1881 AND artist_id =
ANY (SELECT id FROM artists
WHERE id = artwork.artist_id
AND name like '%Monet');
SELECT year, title FROM artwork
WHERE year > 1881 AND artist_id IN
(SELECT id FROM artists WHERE id = artwork.artist_id
AND name like '%Monet');

PostgreSQL

What is it?

PostgreSQL is an RDBMS (relational database management system) with a client-server architecture. It allows you to define and interact with databases using SQL.

How it works?

You can access PostgreSQL from your CLI, or from enter an interactive console like psql (sort of like a REPL). psql allows you to use both SQL statements and psql meta-commands to interact with your data.

NOTE: PostgreSQL and other RDBMS’s contain many specific reserved keywords. If you need to use any of these reserved words as indentifiers in your data, you can simply interact with the identifiers using double quotation (“count”, etc.)

Sequences

What Are They?

Sequences are a type of relation used to automatically generate a series of numbers. They are used to generate the number sequences used by the serial type, and in general to create unique row identifiers (like a surrogate key). Numbers in a sequence are unique and can only be used once. If you call the function nextval('artists_id_seq'), you can access the next number in the sequence, but that number can now never be used again. The serial data type automatically creates a sequence to generate identifiers. You can also manually create sequences with the CREATE SEQUENCE command.

/* How to create a sequence */
CREATE SEQUENCE counter;
/* How to access a sequence number */
SELECT nextval('counter'); /* note the single quotations */
/* How to drop a sequence */
DROP SEQUENCE counter;

Column types

Common data types

  • serial: pseudotype that creates auto-incrementing, not null, integer sequences to help identify rows of data in a table.
  • text, varchar, char: string types. varchar specifies a limit to the number of characters, char specifies and exact number of characters and fills in whitespace when not all characters are used, and text can contain an unlimited length of text and can be used for long strings.
  • integer, numeric, decimal, real, float: numeric types. integer specifies a whole number, decimal & numeric have the option to specify artbitrary precision and scale (e.g. numeric(5, 2) can be any number between -999.99 and 999.99), and real is used for floating-point numbers.
  • timestamp, date: date types, often entered in the format ‘YYYY-MM-DD HH:MM:SS’. The difference between them is simply that the date type does not include a time stamp (HH:MM:SS). There is also a timestamp with time zone data type ( timestamptz).
  • boolean: true or false. Displayed by psql as ‘t’ and ‘f’.

How serial pseudotype works? (aka how to create an auto-incrementing column)

The serial type simple creates a new sequence of integers, and sets the default of the column to the next value in the sequence. Serial also adds a NOT NULL constraint to the column, but does not specify a UNIQUE constraint. Every time we add a new row to the column without explicitly inserting a value for the column, the row will default to the auto-incremented next value in the sequence.

How to change a column’s type?

ALTER TABLE artists
ALTER COLUMN name TYPE text
;

Default values

What is a default value?

Default values are values that you can specify when defining a table’s schema, which are inserted into a column when another value is not explicitly given. For instance, I might set a default value for the name column of my artists table as ‘unknown’. When I insert data into my table below, the application will automatically insert ‘unknown’ for any unspecified values in the name column.

title                     |     name            |  year
--------------------------+---------------------+--------
untitled watercolor | unknown | 1880

How does it work?

/* Define a default value with CREATE TABLE */
CREATE TABLE artists2 (
id serial PRIMARY KEY,
name text DEFAULT 'unknown'
);
/* Define a default value with ALTER TABLE */
ALTER TABLE artists2
ALTER COLUMN name SET DEFAULT 'unknown';
/* Drop a default value */
ALTER TABLE artists
ALTER COLUMN name DROP DEFAULT;

Inserting null values: If you explicitly insert null values into a table, they will override the default (as long as there is no not null constraint).

INSERT INTO artists (name) VALUES (null);     name            
---------------------
null

Defaults vs constraints: You can also define a default value for a column that violates a check constraint, BUT it will never let you use the default. throws an error when you try.

ALTER TABLE artists
ADD CONSTRAINT name_check CHECK (name LIKE '% %');
/* adds new constraint to require space between words */
ALTER TABLE artists
ALTER COLUMN name
SET DEFAULT 'unknown';
/* no error thrown even though 'unknown' is one word */
INSERT INTO artists (id) VALUES (8);
/* error thrown. can't insert new entry without specifying value for name, because default violates constraint */
ERROR: new row for relation "artists" violates check constraint "name_check"

Relating Data with Keys

What are keys?

Keys are a special constraint type that helps to identify unique rows in a table, and establish relationships between tables.

PRIMARY KEYs

What are they? As mentioned before, primary keys uniquely identify rows in a table, and help to maintain data integrity by ensuring that all values in the primary key column are unique and not null. It’s common for a primary keys to be a column named id. See “PRIMARY KEY CONSTRAINT” section above.

FOREIGN KEYs

What are they? Foreign keys establish relationships between rows on different tables. You can set a foreign key to reference the primary key of another table. This helps to maintain referential integrity by ensuring that every value in a foreign key column exists in the primary key column of the referenced table. See “FOREIGN KEY CONSTRAINT” section above.

What’s the difference between a foreign key column and a foreign key constraint?

  • The column specifies the relationship between rows. The foreign key column contains data that points to corresponding data in a specific primary key column.
  • The constraint is what enforces the rules about what values are permitted in the foreign key relationship (only values with corresponding data in primary key column of referenced table).

NATURAL KEYs

What are they? Natural keys are values that already exist in a data set, that can be used to uniquely identify rows of data.

Examples: Names, emails, phone numbers…

Disadvantages: Possible null data (e.g. no email address), possible duplicate data (e.g. shared phone number), possible changes to data (e.g. phone number reassigned, last name changed)

SURROGATE KEYs

What are they? Surrogate keys are values that are created in a data set for the sole purpose of uniquely identifying it within that database.

Examples: id column with serial type with auto-incrementing integer values, product ids, order numbers, etc.

Database Diagrams

Schema

What is schema?

Schema is a word used to talk about the structure of the database. Schema can refer to higher level descriptions of relationships between entities (conceptual schema), or more database-specific implementation of structure (physical level), or somewhere in between (logical level). Elements involved in the schema include table, database, and column names, constraints, data types.

Conceptual schema

What is it? Conceptual schema is the highest level of schema abstractions, and is focused on identifying relationships and entities (real world objects aka “nouns” and the data used to model real world objects). Conceptual schema can be described using the entity-relationship model, and diagrammed using various notations including Crow’s Feet (see below). The relationships on the level of conceptual schema usually fall into one of the three standard relationships (one-to-one, one-to-many, and many-to-many).

Conceptual Crow’s Foot Diagram with information about entities, relationship, and modality.

Physical schema

What is it? Physical schema is the lowest level of schema abstraction, and is concerned with the implementation of schema, including data types, keys, constraints, and rules. Physical schema can be diagrammed using tables that are labeled with data types and constraint/primary/foreign key information, and include lines drawn between keys. NOTE: Make sure that foreign keys in one-to-many and many-to-many relationships do not allow NULL entries!

Physical schema diagram, containing information about keys, constraints, identifiers, and data types.

Logical schema

What is it? Logical schema is somewhere between physical and conceptual schema — concerned with the logic of the relationships, but not as many of the implementation details.

Modeling Schema: Cardinality & Modality

Cardinality

What is it? Cardinality refers to the number of objects on each side of the relationship.

  • One-to-one: A user has one username, and each username belongs to one user. On a lower level of abstraction, the foreign key of email would also be its primary key, and it would reference the primary key of user.
  • One-to-many: An artist creates many works of art, but each work of art was created by one artist. On a lower level of abstraction, the foreign key of artwork would correspond with the primary key of artist.
  • Many-to-many: An art collector has many different artworks, and each artwork over time can be in many different collections. On a lower level of abstraction, an additional “table” must be created to store the foreign keys of both the artwork and the art collector (which reference the primary keys of their respective tables). In this way, the extra table can store the many-to-many relationship between the two entities.

Modality

What is it? Modality refers to whether a relationship is required or optional: if each instance of an entity must have a relation to the other entity. Required is notated with a 1, and optional with a 0.

  • Required: Each concert attendee is required to have a ticket. Each artwork is required to have at least one artist.
  • Optional: A ticket of the concert may not have an associated attendee. A work of art may not belong to an art collector.

Representing Cardinality and Modality with crow’s foot notation

One to one:

Each user is required to have one username, and each username is required to be associated with one user.
Each ticket holder must have a ticket. Each ticket may or may not be assigned to one ticket holder.
Each student can have up to one baseline test score. Each baseline test score must correspond to one student.

One to many:

Each artist can make multiple works of art, but artists are not required to have made a work of art. Each work of art is required to be associated with one artist.
Each ticket must correspond to a seat. Seats may be associated with zero, one, or more tickets.
Each Student may or may not have received multiple test scores. Each test score must belong to only one student.

Many to many:

A product can be sold to many customers or to no customers. A customer must buy one or more products.
Events must sell one or more seats. Seats may be associated with zero, one, or multiple events.
Each student may or may not have taken multiple practice tests. Each practice test may or may not have been taken by multiple students.

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store