20  Creating and populating a relational database

Warning

Any changes you make to the code on this page, including your solutions to exercises, are temporary. If you leave or reload the page, all changes will be lost. If you would like to keep your work, copy and paste your code into a separate file or editor where it can be saved permanently.

When designing a database, our goal is to organize information so that it can be stored, retrieved, and analyzed efficiently. Careful table design helps make querying faster, avoids duplicate data, and maintains consistency.

If you plan to design new databases or improve existing ones, it is worth exploring more advanced topics like data modeling (e.g., entity-relationship diagrams, relational model, or the Unified Modeling Language), database normalization, and indexes.

When designing databases, use the following practical guidelines:

  1. Identify the entity types you need to store in the database (an entity is a “thing” or “item” you need to track). Each entity type typically corresponds to a separate table. Choose clear and descriptive names for these tables.

  2. Decide which information to store about each entity type. Each piece of information corresponds to a column. For each column, choose a descriptive name, data type, and any constraints (e.g. mandatory or optional, unique).

  3. Set a primary key for each table (a column or combination of columns that uniquely identifies each row). A common approach is to add an id column with a unique sequential or random identifier.

  4. Identify relationships between entities:

    • One-to-one relationship (each entity in one table can be linked to at most one record in another table): This can be implemented by adding a foreign key column in either table to store the primary key value of the related entity.

    • One-to-many relationship (each entity in one table can be linked to many entities in another table): Implemented by adding a foreign key column on the “many” side.
      Example: In the database used in the last chapter, each employee belongs to one department, so the employee table stores a department_id. If we wanted to store employee IDs in the department table, we would need to store multiple values for each department, which cannot be done efficiently without introducing a new table.

    • Many-to-many relationship (many entities in one table can be linked to many entities in another table): Requires a separate table with two foreign key columns storing the primary key values from both tables. These two columns usually form the primary key of the linking table.

  5. Avoid redundancy. For example, do not include employee and project details in the table that stores which employee works on which project. This would lead to duplicate information and increase the risk of data inconsistencies. Instead, store employee and project details in separate tables, and reference them using their primary key values.

  6. Think about how the data will be used. Considering your queries and reports in advance can help you design tables that are practical and efficient for querying and analysis.

For example, suppose we want to store final grades for students in different courses in a database. We can split the information into three tables:

Exercise. Can we use only student_id or only course_id as the primary key in the grade table?

Once our database design is ready, we can implement it using SQL CREATE TABLE statements:

In the student and course tables, which use a single attribute as the primary key, we included PRIMARY KEY at the end of the specification for the id column. For multi-column primary keys, as in grade, we define the primary key separately, as shown in the CREATE TABLE statement for the grade table.

Let us insert a couple of rows into the student table:

We can look at the contents of the table by running a SELECT query:

If we attempt to insert a new row in the table using id 2 (there already exists a student with this id), we get an error (UNIQUE constraint failed: student.id):

Let us insert two courses to our database: GRA 4142 Data Management and Python Programming and GRA 6227 (we will update the title later). Recall that we have allowed NULL for the course title:

The title for GRA 6227 is Business Optimisation, so let us update the course:

Let us look at all courses again:

If we attempt to insert a grade (e.g. F) for Homer Simpson (id 1) and the course with id 3, we will get an error (FOREIGN KEY constraint failed) since the course does not exist:

The grade is actually for the Data Management and Python Programming course (id 1). Marge took that course too, with a grade of A:

If we attempt to remove Homer Simpson from the student table, we get an error:

This does not work because there are rows in other tables referencing Homer. We could remove these first and then remove Homer, but let us introduce a different approach instead.

When creating the grade table, we can use ON DELETE CASCADE for both foreign keys:

Let us try to remove Homer from the student table again:

This time it worked. What is the content of the student table?

And what is the content of the grade table?

Homer was removed from the student table, and so were all rows from the grade table referencing him.

The first ON DELETE CASCADE ensures that when removing a student, all grade rows referencing that student are removed too. Similarly, all grade rows for a particular course are removed when that course is deleted because of the second ON DELETE CASCADE.

A similar situation occurs when we try to change the primary key for a student. If we want to change Marge Simpson’s id to e.g. 100 (nobody has this id), we will get an error (FOREIGN KEY constraint failed):

The reason is that the grade table contains a row referring to a “student with id 2”; after changing the id to 100, the row would reference a non-existing student.

We can automatically update related rows by adding ON UPDATE CASCADE:

Now let us try again:

The student table looks as expected (Marge now has id 100):

What about the grade table?