LS 181 Study Guide
SQL
What is it?
SQL is a language used by Relational Database Management Systems (RDBMSs) to work with relational databases. Relational databases are structured collections of data persisted as a set of relations (aka tables). Put more simply, relational data means working with multiple tables at once. SQL is a special purpose and declarative language: special purpose in that it is used exclusively to work with relational data, and declarative in that the code describes what needs to be accomplished but the server abstracts away the details of the execution.
Why use relational data? Relational data reduces the amount of duplicated data that is persisted, and provides a useful structure to interact with data.
TL;DR:
- Relation = table
- relational database = connected tables
- RDBMS = applications to work with connected tables
- SQL = language used by RDBMSs
- special purpose = only used for one thing
- declarative = execution details abstracted
Joining Tables: Types of Joins
What are joins?
When working with databases, it’s important to be able to write queries that can return data from multiple tables at the same time. That’s where joins come in. You can read data from anywhere within the database’s schema using the word JOIN
and a join condition.
Let’s say you have a table containing data about artwork. This table might be related to a table with data about artists. If I want to read some information about the title of the artwork as well as the artist’s name and birth year, I would have to access both tables at once. So I write a query using JOIN
and a join condition that links data in the two tables (e.g. artists.id = artwork.artist_id
). This query creates a “join table” from the two tables, then selects from said join table. You can also join more than two tables using multiple JOIN
statements. Each JOIN
statement creates a new virtual table joining the relevant data, and the JOIN
statements compound on each other (the second JOIN
is joining with the virtual data from the first JOIN
, etc.)
Below are some types of joins.
INNER JOIN
What is it? INNER JOIN
is the default join type that SQL uses when it see the word JOIN
. INNER JOIN
creates a join table that contains ONLY the entries where the join condition is met. In other words, the set of all common elements between the two tables.
If my join condition is artists.id = artwork.artist_id
, the inner join table will return data from all table entries where the artist id is in both the artists table and the artwork table. It will ignore any entries where artwork.artist_id
or artists.id
don’t match the join condition.
When should you use it? This is useful when you only want to look at common elements between tables and don’t want to read any null entries.
Example:
SELECT artwork.title, artists.name, artwork.year
FROM artwork
INNER JOIN artists
ON artwork.artist_id = artists.id;title | name | year
------------------------+---------------------+--------
Cafe Terrace at night | Vincent Van Gogh | 1888
Antibes | Claude Monet | 1888
LEFT OUTER JOIN
What is it? LEFT OUTER JOIN
creates a join table containing all of the entries from the table to the left of the JOIN
statement, and combines it with the matching rows of any entries from the table to the right of the JOIN
statement that match the join conditition.
With the join statement artwork LEFT OUTER JOIN artists ON artists.id = artwork.artist_id
, the results set will contain data from every row in artwork, and combine it from any rows in the artists table that match the join condition. It will ignore any entries where artwork.artist_id
doesn’t match the join condition, but fill in null values for entries where artists.id
doesn’t match the join condition.
When should you use it? Use a left outer join when you want to see every entry from one table, and don’t care if they match the second table or not (I want to see all artwork in my list, whether or not I know the artist, or I want to see the number of artworks for each artist, even if that number is 0).
Example:
SELECT artwork.title, artists.name, artwork.year
FROM artwork
LEFT OUTER JOIN artists
ON artwork.artist_id = artists.id;title | name | year
--------------------------+---------------------+--------
Cafe Terrace at night | Vincent Van Gogh | 1888
Antibes | Claude Monet | 1888
untitled watercolor | | 1880
RIGHT OUTER JOIN
What is it? As with LEFT OUTER JOIN
, RIGHT OUTER JOIN
is concerned with all of the entries from the table to the right of the JOIN
keyword, and any matching entries from the table written to the left of the word JOIN
.
Take the statement artwork RIGHT OUTER JOIN artists ON artists.id = artwork.artist_id
. Here, the results set will contain data from every row in artists, and combine it from any rows in the artwork table that matches the join condition. It will ignore any entries where artists.id
doesn’t match the join condition, but fill in null values for entries where artwork.artist_id
doesn’t match the join condition.
When should you use it? Use a right outer join when you want to see every entry from the second table, and don’t care if they match the first table or not (I want to see all artists in my list, regardless of if my database contains any artwork associated with them).
Example:
SELECT artwork.title, artists.name, artwork.year
FROM artwork
RIGHT OUTER JOIN artists
ON artwork.artist_id = artists.id;title | name | year
--------------------------+---------------------+--------
Cafe Terrace at night | Vincent Van Gogh | 1888
Antibes | Claude Monet | 1888
| Paul Cezánne |
| Edgar Degas |
FULL OUTER JOIN
What is it? FULL OUTER JOIN
returns a set containing all data from all entries from the two tables regardless of whether or not they match the join condition. Functions like a combination of left and right join.
Let’s look at the statement artwork FULL OUTER JOIN artists ON artists.id = artwork.artist_id.
This resulting join table includes every row from both tables, filling in null values for entries where artists.id
or artwork.artist_id
doesn’t match the join condition.
When should you use it? Use a full outer join when you want to see all the data from both tables, and don’t care about data that doesn’t match.
Example:
SELECT artwork.title, artists.name, artwork.year
FROM artwork
FULL OUTER JOIN artists
ON artwork.artist_id = artists.id;title | name | year
--------------------------+---------------------+--------
Cafe Terrace at night | Vincent Van Gogh | 1888
Antibes | Claude Monet | 1888
untitled watercolor | | 1880
| Paul Cezánne |
| Edgar Degas |
CROSS JOIN
What is it? A CROSS JOIN
is the only kind of join that doesn’t use a join condition. It returns the set of all rows in the first table crossed with all rows in the second table, in order words, every possible combination of rows from the two tables. It’s also known as a cartesian join, as it maps all the rows in one table to all the rows in the other. Does not include null data.
When should you use it? When you want to look at combinations and possibilities that aren’t in the database?
Example:
SELECT artwork.title, artists.name, artwork.year
FROM artwork
CROSS JOIN artiststitle | name | year
--------------------------+---------------------+--------
Cafe Terrace at night | Vincent Van Gogh | 1888
Antibes | Vincent Van Gogh | 1888
untitled watercolor | Vincent Van Gogh | 1880
Cafe Terrace at night | Claude Monet | 1888
Antibes | Claude Monet | 1888
untitled watercolor | Claude Monet | 1880
Cafe Terrace at night | Paul Cezánne | 1888
Antibes | Paul Cezánne | 1888
untitled watercolor | Paul Cezánne | 1880
Cafe Terrace at night | Edgar Degas | 1888
Antibes | Edgar Degas | 1888
untitled watercolor | Edgar Degas | 1880
Sublanguages of SQL: DML, DDL, and DCL
“NAME THAT SUBLANGUAGE!”
DDL
What is it? Data Definition Language is one of the three sublanguages of SQL. It deals with defining and altering a database’s schema: the structure and rules of a database and it stables and columns.
When should I use it? Use it when dealing with a database’s structure.
Examples: CREATE TABLE
, ALTER TABLE
, DROP TABLE
, ADD COLUMN
, ALTER COLUMN
, DROP COLUMN
CREATE TABLE artists (
id serial PRIMARY KEY
name varchar(256)
/* column definitions include type, name, optional constraints */
)ALTER TABLE artists
ADD COLUMN birthday date NOT NULL
;ALTER TABLE artists
ALTER COLUMN birthday
DROP NOT NULL,
ALTER COLUMN name TYPE text
;ALTER TABLE artists
DROP COLUMN birthday
;ALTER TABLE artwork
RENAME COLUMN title TO artwork_title
;ALTER TABLE artwork
RENAME TO artworks
;ALTER TABLE artwork
ALTER COLUMN title
DROP DEFAULT;DROP TABLE artwork;
DML
What is it? Data Manipulation language is another of the three sublanguages, and it deals with reading and updating a database’s data — the values stored within the tables/relations.
When should I use it? Use it to add, read, manipulate, or delete data in a relation.
What is CRUD? A set of standard data manipulations for any relational data operation: Create, Read, Update, Delete.
Examples: INSERT
, UPDATE
, DELETE
,SELECT
SELECT * FROM artists;INSERT INTO artists (name)
VALUES ('Matisse');UPDATE artists
SET name = 'Anonymous'
WHERE id = 3
;DELETE FROM artists
WHERE id = 3
;
DCL
What is it? Data Control Language is the third sublanguage of SQL. It is used to deal with security settings that are part of a database’s schema. It grants and revokes access to relations, and defines the rights and roles of individual users.
When should I use it? When you need to limit or grant access to data within a database.
Examples: GRANT
, REVOKE
Making Queries: GROUP BY, ORDER BY, WHERE, and HAVING
Below are several common query keywords that allow you to group, sort, and filter data in your relational database.
GROUP BY
What does it do? GROUP BY
allows you to group data in your tables based on identical values in a specified column. This allows you to aggregate information about groups of entries that share common elements, like the number of paintings by each artist.
When do I use it? Use GROUP BY
when you want to aggregate information about multiple groups and read it at the same time. Used with aggregate functions to get a sum, count, or concatenated string for entries within different groups (number of artworks grouped by year, list of all works created by each artist, total cost of each food type, etc.)
Example:
SELECT artists.name AS artist_name, count(artwork.id) AS num_works
FROM artists
INNER JOIN artwork
ON artists.id = artwork.artist_id
GROUP BY artists.name;artist_name | num_works
--------------------------+-----------
Vincent Van Gogh | 2
Claude Monet | 1
SELECT artists.name AS artist_name, string_agg(artwork.title, ', ') AS works
FROM artists
INNER JOIN artwork
ON artists.id = artwork.artist_id
GROUP BY artists.name;artist_name | works
--------------------------+-----------------------------------------
Vincent Van Gogh | Cafe Terrace at night, The Starry Night
Claude Monet | Antibes
Why did I get an error? One common error that occurs with GROUP BY
clauses happens when you try to select a column without adding it to your group by clause or performing an aggregate function on it. For instance:
SELECT artists.name, count(artwork.id) AS num_works, artwork.year
FROM artists
INNER JOIN artwork
ON artists.id = artwork.artist_id
GROUP BY artists.id
;ERROR: column "artwork.year" must appear in the GROUP BY clause or be used in an aggregate function/* The above query throws an error. It cannot return the year of an artwork when grouped by artist. */name | num_works | year
--------------------------+--------------+--------------------------
Vincent Van Gogh | 2 | /* what could go here? */
Claude Monet | 1 |
PostgresQL cannot return the year of a single artwork that is grouped by artist, unless incorporated into an aggregate function that works with the date type (min, max, count).
Common aggregate functions:
count()
: takes a column name as argument and returns the number of values in that column. (ex. SELECT count(id) FROM artists;
)
sum()
: takes a column as argument, sums numeric values for all selected rows, returns the total. (ex. SELECT sum(price) FROM artwork;
)
min()
: takes a column as argument and returns the lowest value in said column for all of the selected rows. It can be used with various data (numeric, date/ time, string, etc) . (ex. SELECT min(year) FROM artwork;
)
max()
: takes a column as argument and returns the highest value in said column for all of the selected rows. It can also be used with various data (numeric, date/ time, string, etc) . (ex. SELECT max(year) FROM artwork;
)
avg()
: takes a column name as argument and returns the average/mean of numeric values for the selected rows. (ex. SELECT avg(id) FROM artists;
)
string_agg()
: takes a column name and delimiter as argument and returns all string values from the selected rows concatenated with the delimiter. (ex. SELECT string_agg(name) FROM artists;
)
ORDER BY
What does it do? ORDER BY
allows you to sort data in your tables based on the values in a specified column. You can use the keywords ASC
and DESC
as arguments to determine sort order. You can also sort by multiple column values, or use an ORDER BY
clause to sort grouped data. Defaults to ascending order.
When do I use it? When you need to return values in a certain order.
Example:
SELECT title, year FROM artwork
ORDER BY year ASC
;title | year
--------------------------+--------
untitled watercolor | 1880
Cafe Terrace at night | 1888
Antibes | 1888
SELECT title, year FROM artwork
ORDER BY year ASC, title ASC
;title | year
--------------------------+--------
untitled watercolor | 1880
Antibes | 1888
Cafe Terrace at night | 1888
SELECT year FROM artwork
GROUP BY year
ORDER BY year DESC
; year
--------
1888
1880
WHERE
What does it do? WHERE
is a keyword used in a SQL query to check each row of data in a relation against a condition and return the data that matches the WHERE
condition. WHERE
conditions can include various logical expressions and operators to help filter the data correctly.
When do I use it? When you need to filter data.
Example:
SELECT title, year FROM artwork
WHERE year > 1881
;title | year
--------------------------+-----------
Cafe Terrace at night | 1888
Antibes | 1888
SELECT year, title FROM artwork
WHERE year > 1881
AND title LIKE '%night'
;year | title
----------+---------------------
1888 | Cafe Terrace at night
Common logical expressions in WHERE
conditions:
LIKE
operator:LIKE
is a case sensitive string matching operator. It is used in conditions to checks for a matching substring. A wildcard character, like%
or_
, is commonly used in the match value of aLIKE
expression to look for specific placement of substring._
stands for single character, and%
is any number of characters. For example,SELECT * FROM artists WHERE name LIKE ‘%Monet’
would select any rows where the data in the name column ofartists
ends with ‘Monet’.SELECT * FROM artists WHERE name LIKE ‘_______Monet’
would select any rows where the data in the name column ofartists
begins with 6 characters and ends with ‘Monet’.SELECT * FROM artists WHERE name LIKE ‘Claude%’
would select any rows where the data in the name column ofartists
begins with ‘Claude’.ILIKE
operator:ILIKE
works in the same way as theLIKE
operator, but it is not case sensitive. (SELECT * FROM artists WHERE name LIKE ‘%moNEt’
)SIMILAR TO
operator: SIMILAR TO also works in the same way as theLIKE
operator, but it uses regex instead of a substring string to match the strings. (SELECT * FROM artists WHERE name SIMILAR TO '/Claude\sMonet\b/gm'
)>,=, <=, =, <>
: Operators used to compare values of data and a specified literal (WHERE year > 1881
). REMEMBER that you can’t use these operators onNULL
, but instead should use theIS NULL
orIS NOT NULL
operator.AND, OR
: Works like&&
to||
in Javascript to combine conditions (WHERE year > 1881 OR title LIKE ‘%night’
)
Why did I get an error? If you try to use WHERE
on a grouped data, you will get an error. Instead you can use the keyworkd HAVING
.
HAVING
What does it do? HAVING
is a keyword that allows you to specify a condition for grouped data, and only return grouped data that meets said condition. It works like a WHERE
clause, but, unlike WHERE
clauses, it can be used along with a GROUP BY
condition in a query.
When do I use it? When you want to fiter grouped data.
Example:
SELECT artists.name AS artist_name, count(artwork.id) AS num_works
FROM artists
INNER JOIN artwork
ON artists.id = artwork.artist_id
GROUP BY artists.name --note that artists.name must be used here
HAVING num_works = 2; /* note also that having is evaluated before select, so you cannot use aliases from the select statement */artist_name | num_works
--------------------------+-----------
Vincent Van Gogh | 2
Data Integrity: Using Constraints
What are constraints?
Constraints are rules used to define what data is allowed in specified columns. Some examples are CHECK
, UNIQUE
, PRIMARY KEY
, NOT NULL
, and FOREIGN KEY
.
Why use constraints?
By providing rules for what kinds of data are allowed in your relations, you are able to maintain data integrity and quality, and are less likely to encouter data duplication and null values (and therefore anomalies).
CHECK CONSTRAINTS
What does it do? CHECK
constraints allow you to specify your own unique rules for data. As each row of data is about to be added to a table, RDBMS’s will verify that the row of data conforms to the check constraint’s condition. It uses conditions like a where
clause, but unlike a where
clause, constraints are part of a database’s schema (DDL) and control what values are inserted into a table, not just the values queried (DML). If you try to insert or update with an invalid value, an error will be raised.
When do I use it? When you have a specific rule for data that isn’t already a constraint, like if you only want to insert positive numbers, or capitalized words, or email addresses.
Examples:
CHECK (spectral_type IN (‘O’, ‘B’, ‘A’, ‘F’, ‘G’, ‘K’, ‘M’))
CHECK (mass > 0.0)
CHECK (full_name <> ‘’)
CHECK (length(director) >= 3 AND position(‘ ‘ in director) > 0)
CHECK (year BETWEEN 1900 AND 2100)
CHECK (length(title) >= 1)
/* Adding a CHECK constraint on CREATE */
CREATE TABLE artwork (
id serial PRIMARY KEY,
title text,
year numeric CHECK(year BETWEEN 1400 AND 2022)
);/* Adding a CHECK constraint on ALTER */
ALTER TABLE artwork
ADD CHECK (year BETWEEN 1300 AND 2022)
;ALTER TABLE artwork
ADD CONSTRAINT artwork_year_check CHECK (year BETWEEN 1300 AND 2022);/* Dropping a CHECK constraint */
ALTER TABLE artwork
DROP CONSTRAINT artwork_year_check
;ALTER TABLE artwork
DROP CONSTRAINT artwork_year_check
;
UNIQUE CONSTRAINTS
What does it do? Before adding data to a table, UNIQUE
checks that the data has not already been added to that column, ensuring that every value in said column is actually unique. If you try to insert or update with a duplicate value, an error will be raised.
When do I use it? Use UNIQUE
when you want to ensure that there are not duplicated values in a given column.
Examples:
/* Adding a UNIQUE constraint on CREATE */
CREATE TABLE artwork (
id serial NOT NULL,
title text UNIQUE
);/* Adding a UNIQUE constraint on ALTER */
ALTER TABLE artwork
ADD CONSTRAINT id_unique UNIQUE (id)
;/* Dropping a UNIQUE constraint */
ALTER TABLE artwork
DROP CONSTRAINT id_unique
;
NOT NULL CONSTRAINTS
What does it do? Before adding data to a table, NOT NULL
checks that the data is not null, ensuring that every column contains no null values. If you try to insert or update with a null value, an error will be raised.
When do I use it? Use this constraint when it would be damaging to include null values in your table. Null values are risky: they can cause unexpected sorting behavior, make queries more complicated, and don’t transfer well between systems. Use if a column might be used as a primary key, if it is related to another table, if every row should contain this column of data, or generally if it is a foreign key.
Examples:
/* Adding a NOT NULL constraint on CREATE */
CREATE TABLE artwork (
id serial UNIQUE NOT NULL,
title text
);/* Adding a NULL constraint on ALTER */
ALTER TABLE artwork
ALTER COLUMN title SET NOT NULL;/* Dropping a NOT NULL constraint */
ALTER TABLE artwork
ALTER COLUMN title DROP NOT NULL;
PRIMARY KEY CONSTRAINTS
What does it do? Primary key constraints function like NOT NULL UNIQUE constraints, and are used to define a column that will be used to uniquely identify rows. There can only be one primary key per table.
When do I use it? Use a primary key constraint to set up a column with unique row indentifiers. Primary keys are often used with serial data types.
Examples:
/* Adding a PRIMARY KEY constraint on CREATE */
CREATE TABLE artwork (
id serial PRIMARY KEY,
title text
);CREATE TABLE artwork (
id serial,
title text,
PRIMARY KEY (id)
);/* Dropping a PRIMARY KEY constraint */
ALTER TABLE artwork
DROP CONSTRAINT artwork_pkey
;/* Adding a PRIMARY KEY constraint on ALTER */
ALTER TABLE artwork
ADD PRIMARY KEY (id)
;
FOREIGN KEY
What does it do? Foreign key constraints set up relationships between tables, by associating a row in one table to a row in another table. Used with the keyword REFERENCES
, FOREIGN KEY
will ensure that every entry in the foreign key constrained column of one table has a corresponding entry in the table being referenced. This preserves the referential integrity of the database. Unlike PRIMARY KEY
, FOREIGN KEY
constraints do not set up a NOT NULL
or UNIQUE
constraint. You can insert null or duplicate data into a foreign key column. Also unlike PRIMARY KEY
constraints, you can have a table with multiple foreign key columns. When using a foreign key, you should also set up a delete condition to tell the application what to do with the child table data when the corresponding row in the parent table is deleted.
When do I use it? You use a foreign key when you need to relate two tables to each other in a one-to-one, one-to-many, or many-to-many relationship.
Examples:
/* Adding a FOREIGN KEY constraint on CREATE */
CREATE TABLE artwork (
id serial PRIMARY KEY,
artist_id integer REFERENCES artists (id)
ON DELETE CASCADE
);CREATE TABLE artwork (
id serial PRIMARY KEY,
artist_id integer,
FOREIGN KEY (artist_id) REFERENCES artists (id)
ON DELETE CASCADE
);/* Adding a FOREIGN KEY constraint on ALTER */
ALTER TABLE artwork
ADD CONSTRAINT artwork_artist_id_fkey
FOREIGN KEY (artist_id) REFERENCES artists (id)
ON DELETE CASCADE
;/* Dropping a FOREIGN KEY constraint */
ALTER TABLE artwork
DROP CONSTRAINT artwork_artist_id_fkey
;
Subqueries
What is it?
Subqueries are queries nested within other queries. They can be used instead of JOIN
statements to query data from multiple tables at once.
SELECT name FROM artists
WHERE id NOT IN
(SELECT artist_id FROM artwork
WHERE artist_id IS NOT NULL);/* list of artists whose work is not in artwork table */
/* note the is not null clause for the NOT IN statement. you can't check id against null values */name
-----------------
Paul Cezánne
Edgar Degas
Why Use Subqueries?
Subqueries are not always preferable to joins, but they do offer some advantages in readability, and occasionally efficiency (though joins are often faster). Subqueries may be more logical when you only need to retrieve data from one table, but you want to use a conditional on data from another table, as in the above example. You can use the EXPLAIN or ANALYZE keywords to investigate the efficiency of the different query plans. Efficiency varies partially because RDBMSs will execute SQL’s declarative commands differently.
Common Subquery Expressions:
EXISTS
/NOT EXISTS
: This expression checks if a nested query returns any rows that match the given condition, and returns a boolean. True if at least one row is returned by nested query, false if not.
SELECT 1 WHERE EXISTS
(SELECT id FROM artists
WHERE name = 'Claude Monet');/* if there are any artists named Claude Monet, it returns 1 */?column?
-----------
1SELECT title FROM artwork WHERE EXISTS
(SELECT id FROM artists
WHERE name = 'Claude Monet'
AND artists.id = artwork.artist_id);/* checks each row in artwork, and if there are any artists named Claude Monet whose id matches the artist_id of that row in artwork, the exists statement returns true, and adds that row of artwork to data set */title
-----------
AntibesSELECT name FROM artists WHERE NOT EXISTS
(SELECT id FROM artwork
WHERE artists.id = artwork.artist_id);/* checks each row in artists, and if there are any artworks whose artist_id matches the id of that row in artists, the NOT EXISTS statement returns false. */name
-----------------
Paul Cezánne
Edgar Degas
IN
/ NOT IN
: This expression compares a value to every row in a nested query result. If the value is found within a row of subquery result, the expression evaluates to ‘true’.
SELECT name FROM artists WHERE id IN
(SELECT artist_id FROM artwork
WHERE title LIKE 'Ant%');/* goes through each row of artists table and compares the id value of that row to a table of artist_id's that correspond to a title beginning with 'Ant'. If the id value is in the subquery result, the name of the artist is added to the returned dataset. */name
---------------------
Claude Monet
SELECT name FROM artists
WHERE id NOT IN
(SELECT artist_id FROM artwork
WHERE artist_id IS NOT NULL);/* list of artists whose work is not in artwork table */
/* note the is not null clause for the NOT IN statement. you can't check id against null values */name
-----------------
Paul Cezánne
Edgar Degas
ANY
/SOME
: (used interchangeably) These expressions are used with operators like =, <, >, etc. Returns true if there is any value in the results of the nested query for which the operator expression evaluates to true given the value to the left of the operator.
SELECT name FROM artists WHERE length(name) > ANY
(SELECT length(title) FROM artwork);/* Returns a set of artist names that have a length greater than any of the artwork titles (here, all of them have a length greater than 'Antibes') */name
------------------
Vincent Van Gogh
Claude Monet
Paul Cezánne
Edgar Degas
SELECT title FROM artwork WHERE artist_id = ANY
(SELECT id FROM artists);/* Returns a set of artwork titles that have a corresponding artist id in the artists table. Here two of the titles have ids in the table. */title
-----------------------
Cafe Terrace at night
Antibes
ALL
: Like ANY and SOME, ALL is used with operators to evaluate a logical expression. ALL evaluates to true if all of the results of the nested query are true when compared to the value to the left of the operator.
SELECT name FROM artists WHERE length(name) > ALL
(SELECT length(title) FROM artwork);/* Returns a set of artist names that have a length greater than all of the artwork titles (here, none of them have a length greater than 'Cafe Terrace at night') */name
------------------
JOIN
s rewritten as Subqueries
/* JOIN */SELECT artwork.title
FROM artwork
INNER JOIN artists
ON artwork.artist_id = artists.id;title
-----------------------
Cafe Terrace at night
Antibes
/* Subqueries */SELECT title FROM artwork
WHERE EXISTS
(SELECT id FROM artists
WHERE id = artwork.artist_id);SELECT title FROM artwork
WHERE artist_id IN
(SELECT id FROM artists
WHERE id = artwork.artist_id);SELECT title FROM artwork
WHERE artist_id = ANY
(SELECT id FROM artists
WHERE id = artwork.artist_id);/* JOIN */
SELECT artwork.year, artwork.title FROM artwork
INNER JOIN artists ON artwork.artist_id = artists.id
WHERE artwork.year > 1881
AND artists.name LIKE '%Monet'
;/* Subqueries */
SELECT year, title FROM artwork
WHERE year > 1881 AND
EXISTS (SELECT id FROM artists
WHERE id = artwork.artist_id
AND name LIKE '%Monet');SELECT year, title FROM artwork
WHERE year > 1881 AND artist_id =
ANY (SELECT id FROM artists
WHERE id = artwork.artist_id
AND name like '%Monet');SELECT year, title FROM artwork
WHERE year > 1881 AND artist_id IN
(SELECT id FROM artists WHERE id = artwork.artist_id
AND name like '%Monet');
PostgreSQL
What is it?
PostgreSQL is an RDBMS (relational database management system) with a client-server architecture. It allows you to define and interact with databases using SQL.
How it works?
You can access PostgreSQL from your CLI, or from enter an interactive console like psql (sort of like a REPL). psql allows you to use both SQL statements and psql meta-commands to interact with your data.
NOTE: PostgreSQL and other RDBMS’s contain many specific reserved keywords. If you need to use any of these reserved words as indentifiers in your data, you can simply interact with the identifiers using double quotation (“count”, etc.)
Sequences
What Are They?
Sequences are a type of relation used to automatically generate a series of numbers. They are used to generate the number sequences used by the serial type, and in general to create unique row identifiers (like a surrogate key). Numbers in a sequence are unique and can only be used once. If you call the function nextval('artists_id_seq')
, you can access the next number in the sequence, but that number can now never be used again. The serial
data type automatically creates a sequence to generate identifiers. You can also manually create sequences with the CREATE SEQUENCE
command.
/* How to create a sequence */
CREATE SEQUENCE counter;/* How to access a sequence number */
SELECT nextval('counter'); /* note the single quotations *//* How to drop a sequence */
DROP SEQUENCE counter;
Column types
Common data types
serial
: pseudotype that creates auto-incrementing, not null, integer sequences to help identify rows of data in a table.text
,varchar
,char
: string types.varchar
specifies a limit to the number of characters,char
specifies and exact number of characters and fills in whitespace when not all characters are used, andtext
can contain an unlimited length of text and can be used for long strings.integer
,numeric
,decimal
,real
,float
: numeric types.integer
specifies a whole number,decimal
&numeric
have the option to specify artbitrary precision and scale (e.g.numeric(5, 2)
can be any number between -999.99 and 999.99), andreal
is used for floating-point numbers.timestamp
,date
: date types, often entered in the format ‘YYYY-MM-DD HH:MM:SS’. The difference between them is simply that thedate
type does not include a time stamp (HH:MM:SS). There is also a timestamp with time zone data type (timestamptz
).boolean
: true or false. Displayed by psql as ‘t’ and ‘f’.
How serial
pseudotype works? (aka how to create an auto-incrementing column)
The serial type simple creates a new sequence of integers, and sets the default of the column to the next value in the sequence. Serial also adds a NOT NULL constraint to the column, but does not specify a UNIQUE constraint. Every time we add a new row to the column without explicitly inserting a value for the column, the row will default to the auto-incremented next value in the sequence.
How to change a column’s type?
ALTER TABLE artists
ALTER COLUMN name TYPE text
;
Default values
What is a default value?
Default values are values that you can specify when defining a table’s schema, which are inserted into a column when another value is not explicitly given. For instance, I might set a default value for the name
column of my artists
table as ‘unknown’. When I insert data into my table below, the application will automatically insert ‘unknown’ for any unspecified values in the name column.
title | name | year
--------------------------+---------------------+--------
untitled watercolor | unknown | 1880
How does it work?
/* Define a default value with CREATE TABLE */
CREATE TABLE artists2 (
id serial PRIMARY KEY,
name text DEFAULT 'unknown'
);/* Define a default value with ALTER TABLE */
ALTER TABLE artists2
ALTER COLUMN name SET DEFAULT 'unknown';/* Drop a default value */
ALTER TABLE artists
ALTER COLUMN name DROP DEFAULT;
Inserting null values: If you explicitly insert null
values into a table, they will override the default (as long as there is no not null
constraint).
INSERT INTO artists (name) VALUES (null); name
---------------------
null
Defaults vs constraints: You can also define a default value for a column that violates a check constraint, BUT it will never let you use the default. throws an error when you try.
ALTER TABLE artists
ADD CONSTRAINT name_check CHECK (name LIKE '% %');
/* adds new constraint to require space between words */ALTER TABLE artists
ALTER COLUMN name
SET DEFAULT 'unknown';
/* no error thrown even though 'unknown' is one word */INSERT INTO artists (id) VALUES (8);
/* error thrown. can't insert new entry without specifying value for name, because default violates constraint */ERROR: new row for relation "artists" violates check constraint "name_check"
Relating Data with Keys
What are keys?
Keys are a special constraint type that helps to identify unique rows in a table, and establish relationships between tables.
PRIMARY KEYs
What are they? As mentioned before, primary keys uniquely identify rows in a table, and help to maintain data integrity by ensuring that all values in the primary key column are unique and not null. It’s common for a primary keys to be a column named id
. See “PRIMARY KEY CONSTRAINT” section above.
FOREIGN KEYs
What are they? Foreign keys establish relationships between rows on different tables. You can set a foreign key to reference the primary key of another table. This helps to maintain referential integrity by ensuring that every value in a foreign key column exists in the primary key column of the referenced table. See “FOREIGN KEY CONSTRAINT” section above.
What’s the difference between a foreign key column and a foreign key constraint?
- The column specifies the relationship between rows. The foreign key column contains data that points to corresponding data in a specific primary key column.
- The constraint is what enforces the rules about what values are permitted in the foreign key relationship (only values with corresponding data in primary key column of referenced table).
NATURAL KEYs
What are they? Natural keys are values that already exist in a data set, that can be used to uniquely identify rows of data.
Examples: Names, emails, phone numbers…
Disadvantages: Possible null data (e.g. no email address), possible duplicate data (e.g. shared phone number), possible changes to data (e.g. phone number reassigned, last name changed)
SURROGATE KEYs
What are they? Surrogate keys are values that are created in a data set for the sole purpose of uniquely identifying it within that database.
Examples: id column with serial type with auto-incrementing integer values, product ids, order numbers, etc.
Database Diagrams
Schema
What is schema?
Schema is a word used to talk about the structure of the database. Schema can refer to higher level descriptions of relationships between entities (conceptual schema), or more database-specific implementation of structure (physical level), or somewhere in between (logical level). Elements involved in the schema include table, database, and column names, constraints, data types.
Conceptual schema
What is it? Conceptual schema is the highest level of schema abstractions, and is focused on identifying relationships and entities (real world objects aka “nouns” and the data used to model real world objects). Conceptual schema can be described using the entity-relationship model, and diagrammed using various notations including Crow’s Feet (see below). The relationships on the level of conceptual schema usually fall into one of the three standard relationships (one-to-one, one-to-many, and many-to-many).
Physical schema
What is it? Physical schema is the lowest level of schema abstraction, and is concerned with the implementation of schema, including data types, keys, constraints, and rules. Physical schema can be diagrammed using tables that are labeled with data types and constraint/primary/foreign key information, and include lines drawn between keys. NOTE: Make sure that foreign keys in one-to-many and many-to-many relationships do not allow NULL entries!
Logical schema
What is it? Logical schema is somewhere between physical and conceptual schema — concerned with the logic of the relationships, but not as many of the implementation details.
Modeling Schema: Cardinality & Modality
Cardinality
What is it? Cardinality refers to the number of objects on each side of the relationship.
- One-to-one: A user has one username, and each username belongs to one user. On a lower level of abstraction, the foreign key of email would also be its primary key, and it would reference the primary key of user.
- One-to-many: An artist creates many works of art, but each work of art was created by one artist. On a lower level of abstraction, the foreign key of artwork would correspond with the primary key of artist.
- Many-to-many: An art collector has many different artworks, and each artwork over time can be in many different collections. On a lower level of abstraction, an additional “table” must be created to store the foreign keys of both the artwork and the art collector (which reference the primary keys of their respective tables). In this way, the extra table can store the many-to-many relationship between the two entities.
Modality
What is it? Modality refers to whether a relationship is required or optional: if each instance of an entity must have a relation to the other entity. Required is notated with a 1, and optional with a 0.
- Required: Each concert attendee is required to have a ticket. Each artwork is required to have at least one artist.
- Optional: A ticket of the concert may not have an associated attendee. A work of art may not belong to an art collector.
Representing Cardinality and Modality with crow’s foot notation
One to one:
One to many:
Many to many: