Best Practices for Database Schema Design (Beginners)

Three Ways to Plan Database Design

There are many opinions about what is ‘best’. Here we discuss some basic guidelines which may or may not be best for your specific needs.”

-Jim Murphy, President, SQLWatchmen, LLC.

Introduction

In this article, we will discuss a few common design decisions when defining a Schema. We’ll focus on the planning stage. There are many opinions about what is ‘best’. Here we discuss some basic guidelines which may or may not be best for your specific needs.

Therefore, the scope of this brief article is less on the engineering side of database design. It is, instead, focused on some of the top tips for making it easy to work with the data. We will also avoid other intermediate and advanced concepts of database design like natural versus artificial keys and defining the cluster index as part of table design, etc. These additional topics should be researched for a complete schema engineering strategy.

1. Know your usage patterns

There are two primary ways a database is used. One is Online Transaction Processing (OLTP) and the other is Decision Support Systems (DSS). Depending on the expected usage patterns, you may choose to Normalize or Denormalize the data.

Database Design State Simple Description
Unnormalized When a database has not yet been broken apart by business entities* into separate tables (Normalized).
Normalized When a database has been broken into several tables so column values don’t repeat (see description below).
Denormalized When a database was first Normalized, then, for strategic reasons, carefully modified to violate Normalization rules to increase reporting speed.

* A business entity is a unique identity with both state and behavior. An example might be Customer and Order.

Basically, an OLTP database typically has lots of smaller transactions (data modifications). An example might be a banking system where lots of little data modifications occur throughout the day to constantly process debit card or check activity, including managing the account register, security logging, updating the account balance, etc. The goal with these types of databases is to organize the data to avoid duplication of fields and repeating data, and to ensure the purity of the data. So the Customer’s Name is in one column in one table only, and other pieces of data, for example multiple Orders, will ‘link’ to that single Customer row by the use of Primary and Foreign Keys. The process of breaking the data into multiple tables and columns is referred to as Normalization.

Consider the following diagram:

Customer and order tables and their relationships

The keys are listed as follows:

Key Listing

Primary Key (PK) Foreign Key (FK)
Customer.CustomerID
Order.OrderID Order.CustomerID

A DSS database is used more for reports, analytics, data warehouses, etc. Generally, these databases are mostly read-only and have the job of aggregating lots of data and providing summary or detail reports, which include calculations, to decision makers. The data is typically updated more irregularly, perhaps weekly or daily, but could be updated hourly or on some other schedule. The goal here, is to structure the system in such a way that huge amounts of data can be retrieved more quickly as well as to have the formulas and calculations performed before the user requests it (at night, perhaps).

It is hard to find a strictly OLTP or strictly DSS database. Each database typically has a blend of usage, but will favor one type or another.

2. Naming Conventions

I don’t want to dwell on naming conventions because everyone has their own style and may choose to adhere to an ISO standard (ISO-11179, etc.) or not. One very important point is to be consistent in how you name fields. I did want to note a few common tips that you may find helpful.

  1. Avoid using just ID as the PK of each table. It will lead to lots of aliasing when joining other tables and returning multiple IDs from several tables.
  2. Beware of using SQL Server reserved words (User, Date, etc.) in table names, column names and elsewhere. Use of a reserved word will give a syntax error unless you specify [square brackets] around the value, making development slower and the statements longer.
  3. Don’t use hyphens, spaces, quotes, etc. Because they will be invalid or require [square brackets]. e.g. SELECT [category-id] FROM [custom-category] I personally don’t like to use underscores even though they are used in many databases. Although they do not require [square brackets], they do require some light double-pinky acrobatics to type.
  4. Name the tables in the singular, not plural. For example, name the table Customer and Order rather than Customers and Orders. It is obvious that a table contains multiple customers and hopefully not a single row, so the plurality is somewhat redundant and may introduce inconsistency issues with some table names.

. E.g. Categories.CategoryID

  • The table name is inconsistent with other tables in the database because of the …ies. The names may be consistently plural in your database, but the spelling suffix is different between tables. This is removed if the tables and columns are named like Category.CategoryID, which would retain consistency with the rest of the database.

3. Use Constraints

Constraints such as required fields, unique values, allowed values, etc., at the database level can perform additional validation to ensure the integrity of the data. These checks should not be the only place where validation occurs. Validation should be baked into the front end application as well. If the application catches a validation issue, a “pretty” error can be displayed to the end user.

Properly handled validation error

If the application misses it, the database will catch it and render a slightly more obtuse error.

At least this offending data was caught and the database was not defiled. Also, if the constraints are defined at the database level, many modern application generators can pull those ‘requirements’ into the application and render the screen with the correct related data elements, as well as perform proper validation and display user-readable messages, all without needing to code it yourself!

An example would be a field marked as NOT NULL in SQL Server. The application generator can detect this and ensure the necessary User Interface validation is included with a user understandable error like the one below.

Validation error close up

instead of an error that users may find cryptic like this:

Validation error when it is not properly handled

There are many different types of constraints. Allowing NULLs, enforcing uniqueness and allowed value lists are just a few examples.

Conclusion

Hopefully, these tips will get you going down the right path, or at least raise some questions that should be considered in the planning stage, to help create a design that will work for your particular needs.

One thing that helps me ‘prove’ my design, is to use a code generator like Iron Speed Designer to build a throw-away application which allows me to work with the data in a similar way as a user would. Users are not expected to use SQL Server Management Studio are they? Hopefully not! I find it to be a real time saver to quickly build an application and fiddle with the data and see how it all comes together. Then, I correct bad relationships or datatypes, and generate another app. Once I am comfortable with the database, THEN I create the real application.