Essential Database Naming Conventions (and Style)
style
- use lowercase characters
- eliminates question of proper case as well as errors related to case-sensitivity
- speeds typing rate and accuracy
- differentiates table and column names from uppercase SQL keywords
- separate words and prefixes with underlines, never use spaces
- promotes readability (e.g. book_name vs. bookname)
- avoid having to bracket names (e.g. [book name] or `book name`)
- offers greater platform independence
- avoid using numbers
- may be a sign of poor normalization, hinting at the need for a many-to-many relationship
table names
- choose short, unambiguous names, using no more than one or two words
- distinguish tables easily
- facilitates the naming of unique column names as well as lookup and linking tables
- give tables singular names, never plural (update: i still agree with the reasons given for this convention, but most people really like plural table names, so i’ve softened my stance)
- promotes consistency with naming of primary key columns and lookup tables
- ensures alphabetical ordering of a table before its lookup or linking tables
- avoid confusion of english pluralization rules to make database programming easier (e.g. activity becomes activities, box becomes boxes, person becomes people, data remains data, etc.)
- SQL statements read better, e.g. SELECT activity.name vs. SELECT activities.name
- avoid abbreviated, concatenated, or acronym-based names
- promotes self-documenting design
- easier for developer and non-developer to read and understand
- prefix lookup tables with the name of the table they relate to
- groups related tables together (e.g. activity_status, activity_type, etc.)
- prevents naming conflicts between generic lookup tables for different entities
- for a linking (or junction) table, concatenate the names of the two tables being linked in alphabetical order
- orders linking table with a related entity table
- expresses composite purpose of the table
- this should be waived if the linking table has a natural, standard, or obvious name (e.g. “item” in: [order] 1 to M [item] M to 1 [product])
- this must be waived if there are multiple linking tables between the same two tables, (e.g. “student” and “instructor” between the tables “person” and “class”)
- rare problem
- what about a linking table for a generic lookup table?
- this can be avoided by renaming the generic lookup (and removing the prefix)
- suggestion: as a variation on the linking table naming convention above, use a special character (such as a plus-sign “+”, a hyphen “-“, or a double-underline “__”) to separate concatenated table names (e.g. the linking table for activity and activity_type would be activity__activity_type)
column names
the primary key should be the singular form of the table name suffixed with “_id” (update: i now just name all my auto-increment primary keys “id”, however i’ve left the old rationale below)
- allows primary key to be deduced/recalled from the table name alone (e.g. primary key of the product table would be product_id)
- consistent with the name of the foreign key
- prevents having to alias primary keys in programming
prefix the name of every column with the table name, excluding foreign keys (update: i no longer follow this convention because it’s very tedious to type, and because it’s obviated by the use of abbreviated table aliases in SQL)
- prevents using “name”, “order”, “percent”, etc. as column names and clashing with SQL/RDBMS reserved words
- creates near unique column names (e.g. product_name, product_code, product_description, etc., often simplifying query design and SQL coding, recommended for PHP)
- makes the column names consistent with the primary key
- differentiates foreign key columns from columns native to the table
- maintains semantic transparency of column names when using table aliases (e.g. SELECT a.activity_name FROM activity a)
- prevents naming a column the same name as the table
- this can be waived for databases with many tables (30+), tables with many columns (30+), and long and obviously unique column names
- this can be waived if your database programming always refers to columns in the form tablename.columnname
- foreign key columns should have the same name as the primary key to which they refer (update: obviously now that i just name my primary keys “id”, i name my foreign keys the singular form of the table name + “_id”)
- makes the table to which they refer completely obvious
- if there are multiple foreign keys referencing same table, prefix the foreign key column name with an appropriately descriptive adjective (e.g. lead_person_id, technical_person_id, etc. which transparently reference person_id in the person table)
- suffix date-type columns with “_on”, suffix datetime-type columns with “_at”, and prefix boolean-type columns with “is_” or “has_” (update: i now tend to use the “_date” suffix for date-type columns and “_time” for datetime-type columns—even if they are stored as integers)
- prevents confusing with more common text/number data types
Don’t soften your singular views. A table is an entity, thus the term “Entity Diagram”. Entities are singular and relate so.
Link, I blame DHH
It’s true; a fairy dies every time a table name is pluralised.
Each table is a single entity which gets a single name. The name reflects its function, which is to contain multiple records of similar type. The plural name Customers accurately describes a table containing multiple records of type Customer.
A single record contains multiple attributes, but they are dissimilar. The singular name Person describes the single essence of the multiple attributes Name, Birthdate, SSN, etc.
That doesn’t negate Justin’s reasoning, of course. Just explains why people like plural table names so much.
Justin,
Why do you no longer follow column rule 2?
Douglas, you’re asking why I no longer “prefix the name of every field with the table name”? There are a few reasons.
Primarily, it’s just murder to have to type out SQL like that.
Secondly, after I started using plural table names (because that is the standard where I work), it didn’t make sense to use a singular prefix with a plural table name.
Thirdly, I frequently use one or two letter table aliases in my queries that join multiple tables, which is much easier to type than the prefixed table name, over and over and over again.
Note: I’ve clarified my reasoning in the post itself.
Personally I prefer plural table names. I echo the reasons laid out by Douglas in a previous comment, although I believe this is just a nuance.
A consistent convention that effectively communicates the database’s design is the ultimate goal, and your article is an excellent guide.
Thanks for this. I followed this guide when i started developing, because it seemed most thought through to me.
After my third DB with now ~150 tables, i hope i can make a useful suggestion.
lets take orders as the base table. So we have:
– the base table orders
(order_id, order_type_id, date_created, date_changed, price, etc.)
– the a lookup table orders_type. Note the missing s in the end (is this n->1 or 1->n ?!). This is just common sense, because for one order_type there are multiple orders ;)
– the n:m table orders_products. Note both parts are plural. products is another base table.
Now the problem: the third type of tables, where order_id is referenced in a n->1 (or 1->n? no idea ;) ) style. fe, when you want to add multiple rows of notes. Lets say notes is an stand-alone table. Then we have (note_id, order_id, note_text, etc.). In this case, no reference to orders in the table name is needed.
IF we want to connect notes hard for orders (fe, if we want the table notes for other purpose) you can switch to:
– order_notes. Note the missing s in the first part. Like before, this is just common sense: There are multiple notes for one order.
So i came down to 5 different table types:
– base: object1+s
– lookup (n:1 ?): object1+s_object2
– n:m: object1+s_object2+s
– reference (1:n ?) type 1, standalone object: object2+s (object1_id is referenced)
– reference (1:n ?) type 2, non-standalone object: object1_object2+s
I could now safely use some skripts which will automatically (!) join tables and distinguish which type a table is and what to do with this. For instance my func. $order = v(“order”, 12345) will insert the order 12345 in the assoc. array $order with the correct joined order_type automatically!
or $order_note = v(“order_note”, 12345) will create a 2-dimensional array like $order_note[$order_note_id][order_note_text] etc.
last one ;) n:m. $order_product = v(“order product”, 12345) gets 2-dimensional like the one above where complete info for the products is referenced (YES, the product_type ALSO !, so 2 automatic joins). This can be used the other way around with v(“product order”) which will reference the same table because it is named alphabetically but look up for all orders and their order_type. Note, that there is no _ in between, this way the script notes there is a n:m relation.
Of course you can do some more magic with this or just cut it down to 1 or no automatic inclusions, include a where-clause or prepare a listing.
I hope you like this and can feed me some more ideas.
If anybody else has a suggestion mail me at riwale at gmx net
ROFFFLLLL over fairy dies!!! of course my co-worker tells me that only a geek would laugh @ that. when you have an object, do you have many of the same or one distinctive? (i’d hope you’d have one)so, logically, a single thing would not be pluralised. so if i am holding one apple, i would not call it apples or seeds. OR how about a basket holding many apples. would i call it apples or baskets or a BASKET?
I agree 100% but taking your linguistic analogy one step further, if there was a basket of fruit in the corner of a room that happened to contain apples and I pointed at it and asked you “What’s in there?” How would you respond? Probably “apples”. Or “it’s a basket of apples”. In both cases plural.
I have always used plural table names; moreover, I have never seen a production system that used singular names. But after reading the arguments presented here, I am considering changing my practice.
However, the argument for “more grammatical SQL” is erroneous. It is more grammatical in a SELECT clause (if you even use the full table name in the select statement and not a prefix), but it is less grammatical in a WHERE clause, e.g. “FROM activity” instead of “FROM activities”.
These are solid conventions. I use singular table names for simplicity. Obviously a table can hold more than one record so whats the point of a plural table name when it “should be” understood.
style 1. user lowercase
The ANSI SQL standard is that object names are case insensitive. A plague on Microsoft for introducing case sensitive names which when translated to other database can require wrapping them in double-quotes. So pick a case, lower or upper, but please avoid mixed case.
table names 2. Use of the singular form
I have always used the singular form for both table and class names for the simple reason that I got sick and tired (25 years ago) of trying to remember whether a table or class was SOMETHING or SOMETHINGS and decided that if I used a more primitive grammar that only included singulars I would be able to use my memory for more important things.
Thanks for the list, it is very similar to my practice (been programming for 30+ years) but it is good to know that someone else has arrived at very similar conclusions and was nice enough to write it out and blog it.
Mark, thanks for leaving a comment. Can you believe I created this 6 years ago, after working with Microsoft Access for only 3 years. Since then I’ve been working almost exclusively with MySQL, and save for a few adjustments (marked in pink) based on the agreed upon styles of my employers, they’ve held very well.
justin, i hope not to be a interuptor. i’m just a casual access 2003 personal user. this has been one of the most difficult areas for me, i spend much time trying to keep names of things straight and logical.
i do split my databases, my question is: would you prefix table names with a 3 letter acronym of the database it comes from especially if there is multiple back-end databases connected to one front-end?
i’m not a hefty programmer but enjoy learning. i want to use generic relink code – maybe modified a bit – to call a relinking if it was moved using table prefix and wildcard or would i just go for the link string itself?
this discussion again has been immensly helpful.
i can learn a thing or two and it stays with me but when lots of things need to be renamed it’s a big time waster.
thanks
Would you believe there are not too many such articles like this on the web and most of them are m$ sql not standard arrogant ones that promote stuff like uppercase and stuff? thanks for this great article
One more comment: Since an id can indicate a real-world identifier (PASSPORT_ID, TAX_ID, DRIVERS_LICENSE_ID), I use exampletablenm_PK for a primary key column name. When it is referred to in a foreign key, the FK column name becomes exampletablenm_FK.
Hypatia, thanks for the comment. In all my experience of database design, I’ve never once run into a need to name a non-foreign-key column that ended with “_id”. Isn’t that funny? Given how many primary and foreign key’s I have (using “id”) and how few actual columns I’ve run into with that problem, I’d probably bend the column name to prevent confusion, by suffixing it with something like “_num” or “_number”, e.g. passport_num, tax_num, drivers_license_num—I think only the “tax_num” is the most awkward, since it’s really called a Tax ID. But at the same time, in that case, I don’t think I’d have any problem naming it “tax_id” even though it wasn’t a foreign key, given how common its actual name is, and given the definition of the column in the larger application.
Precise, concise, easy to remember. Great stuff!
Thanks!
How about naming of table views? Say you got a customer table and order table, and a table view joining the two?
So far add a prefix “t_” to tables and “v_” for views to distinguish them. Well any other good suggestion?
VCD, I never use views in my primary application at work, so it’s not been an issue. Actually that’s not 100% true. I don’t use views, but we integrated with another system and used views of our tables to create a limited picture of our data. In that case we prefixed the views with “v_” and the consulting firm that wrote the views named them however they saw fit.
Personally I’d suggest prefixing the least used object, instead of both tables and views. If you referenced views mostly, then yeah sure, prefix the tables with “t_” and vice versa. But to prefix both is just painful to have to type over and over again.
Woo a good suggestion! (and too bad I couldn’t think of it -.-) Thanks! :)
My two cents:
Never use reserved words as column names. Obvious problems ensue given columns named “left” and “select”.
Furthermore, it’s generally a bad practice to use reserved words in programming languages. So icksnay on the columns named “type”, “static”, “class” and “var”, too.
Someday when you use a nifty little script to clean up your data, migrate data in or out, or otherwise interact with SQL in a way that does not depend on core model functions in your app stack, you’ll thank me.
I’m self-taught, so I’ve been looking for someone, _anyone_, to just tell me what to do. This is great. Thanks!
Justin,
Great source of information. The conventions you layed out make plenty of sense and really helped to define my set of normaliation guidelines.
Thank you!
Any naming conventions for “private” tables and functions? By “private” I mean tables and functions that are only used by other DB entities and will never be queried or called directly by users of the DB. I am leaning towards an underscore prefix as this is how I prefix private methods in c++.
Foreign Key Fields – what is the best practice when there exists more than one relationship between two entities? For example:
issue (id, title, …)
user (id, name, …)
And now we need to model the relationships “reporter” (user that created the issue) and “assignee” (user that the issue is assigned to).
Obviously, we cannot encode just the name of the referenced entity “user_id”, because we have two relationships.
What information should we encode in the foreign key name then?
a) only the relationship:
– “reporter_id”, “assignee_id”
b) both the relationship and the referenced entity:
– “reporter_user_id”, “assignee_user_id”
– “user_id_reporter”, “user_id_assignee”
Hmm now I realize that this problem can occur even when having only one foreign key, but the name of the relationship (“reporter”) differs from the name of the referenced entity (“user”).
Rusi, actually this exact scenario is what I’ve described under “field/column names”, #3.
In your example, which occurs all the time in my work, I would use:
reporter_user_id
andassignee_user_id
.RE: “3. Foreign key fields”. Extending the minimalist naming conventions for “id”, why not just drop the “_id” suffix? Instead of:
table order (
id SERIAL PRIMARY KEY,
customer_id integer REFERENCES CUSTOMER,
. . .
We would have:
table order (
id SERIAL PRIMARY KEY,
customer integer REFERENCES CUSTOMER,
. . .
I’ve seen this before and it seems more concise. For example, Instead of:
order.customer_id = customer.id
we would have:
order.customer = customer.id
I’m wondering if anyone has had any positive or negative experiences with this naming convention?
Rob, I think suffixing “_id” on foreign key fields helps to differentiate them from other non-referential fields. If you refer to “order.customer_id”, it’s dead clear that that column refers back to the customers primary key. With “order.customer” it could be anything.
Actually, data does not remain data. The singular of data is datum ;) Great post. I wish more DBAs would follow these suggestions.
Some good suggestions here. Often I’m not sure what to do with linked/related tables.
Say we have two tables, “support_topic” and “support_category”. Now a support topic can go in several different categories. So which one of the following would be the correct name for the table that links the two together?
support_topic__support_category
support_topic__category
support_topic_category
support_topic_support_category
Darren, not knowing anything more about the application and how it might otherwise be refactored, I would go with the last option, but concatenated alphabetically: support_category_support_topic (c before t)…
And ‘person’ becomes ‘persons’, but that’s just awful, so we use ‘people’!
“the primary key should be the singular form of the table name suffixed with “_id” (update: i now just name all my auto-increment primary keys “id”)”
From my personal experience its seems good to have the same names (table_name + _id) as PK and FK so you can easily join tables just by using join(table_name + id) clause. At least in Oracle.
Very useful post. Thank you.
I needed a quick hit on good naming conventions. This page was the first hit I chose (though not the first hit on Google) and I find your reasoning makes a great deal of sense to me.
IMO using the id prefix as “id_table_name” is better than using it as “table_name_id”, this because, like most languages, I’m used to read from left to right and because of this I believe is easier to read the first 3 characters of a word instead of reading the last 3 characters, all this obviously to identify a foreign key (I use “id” for all my primary keys).
Anyone has had any positive or negative experiences with this naming convention?
Rob, I would only counter by saying that SQL references database columns using table_name.column_name, which table_name_id is consistent with, as opposed to vice versa. My two cents.
Regarding #2 in “field/column names”: The biggest reason to prefix all column names is to make 100% unique column names across your entire schema. Each table goes with a unique prefix and all columns in the table use it.
You can then search your entire code-base and determine what’s using the column. Need to remove a column? Need to expand the meaning of a column? Need to see what uses the data in the column? It becomes a very simple, and reliable search.
The extra typing pales in comparison to that one benefit. And with unique column names, you don’t have to use table aliases except for a few cases (e.g. self-joins).
I struggled with naming conventions quite a bit. I’ve broken the program I’m writing a couple times when I’ve changed my mind. I’m still not sure I’m entirely satisfied, which is how I ended up here.
I believe I will end up changing the names of my two link tables as I’m about to add a third.
Re: Field names.
I preface all my field names with a 2-5 letter short form of the table name, then an underscore, then the name of the field. No underscores in the name, just to separate the table from the field name.
Foreign keys are something I recently changed, possibly for the worse, but I liked it better.
Ex. clients table. key=cl_ID
accounts table, key=acct_ID, fk=cl_ID
I used to have the cl_ID field in accounts as acct_cl_ID, which was fine since my convention made it obvious that this was a FK (two underscores) but it made things look cluttered.
I’ve read that people prefer it that way so that when performing joins you know which cl_ID you are dealing with, but then I fully qualify this in my SQL statements (ie clients.cl_ID, accounts.cl_ID) so its not a big deal for me. It also made coding a particular routine MUCH simpler since I didn’t have to refer to essentially the same field by 3 different names.
Had a CRAZY Friday night reading up on DB naming conventions. This article makes a lot of sense.
I just updated the conventions and replaced all instances of the word “field” with the more standard “column”.
It is VERY interesting that your “updates” negate most of your rules.
I work on a very large database and I would like to to say that I can almost see that your experience has grown based on your updates.
I disagree with your column #1 update. When you have 27 tables that you are joining, along with multiple sub-queries, it REALLY helps to see more than just “id”.
Keep up the good work.
the_dude, it’s funny, naming primary keys “id” across all of the tables is one of the revisions I like the most.
The table_id versus id debate (for tables with a single-column pk) is interesting. Id has the advantage that it makes it more clear that the column is a primary key (not a foreign key). IMO, it also looks better in queries that require aliasing (select table1.id from table1 left join table2 on …) instead of (select table1.table1_id …). But, a big disadvantage is that it forces the use of “on” instead of “using” when joining tables. I definitely prefer “using”, since it allows me to avoid aliasing (select table1_id from table1 left join table2 using(table1_id)) in a lot of situations. This is useful when building queries from code.
Bryce, interesting, I’d never head of the
USING()
clause, but then it sounds like it’s a fairly recent addition to the SQL standard(s).take a look at Oracle internal table/view naming convention:
ALL_OBJECTS.OBJECT_ID – plural name for a view that contains information on objects and PK column is called OBJECT_ID instead of just ID.
Worth noting that all lower case is also a good idea because of how MySQL treats table names on a Windows server vs a Linux server (restore a mysqldump fom Linux to Windows and your all caps tables are now lower case), AND if you have to port to or from MSSQL.
By historical reasons we are namig tables in capital letters. How to name linking tables?
For example CUSTOMER and PRODUCT. would I call it CUTOMER_PRODUCT ? I’m from Germany. In our language it would also be possible to connect the words, so that a “new” word is created, in english it would be something like “customer of products” or “product of customers” in german we would say “customerproduct” or “productcustomer”. (sounds a bit strange to explain it in english) so I also could call the linking table CUSTOMERPRODUCT.
This would grammatically be better style, but could leed to some missunderstandings I think…