Normalized Data, Why You Should Consider Your Data Structure before Using MongoDB

      After going over some of the new libraries currently coming out for Node.js, I’ve noticed an increasing trend in their affinity for using non-relational databases such as MongoDB. Having looked into non-relational databases in the past, I’ve come to see their upsides and why you should consider using them in certain scenarios.

      When using a MongoDB you don’t have allocate a set size every time you create a new item on your table, you don’t need to do several joins whenever you want to query a database for your data, and your data is stored as a JSON object which is easy to use and manipulate. The problem begins when your data is not really the large unrelated complex object you thought it was in the beginning. Most of the data that we work with on a daily basis tends to be data that can be normalized and easily stored in a SQL database.

      A very good example of this would be users. Several of their newer Node.js libraries uses MongoDB to store user data and to help with authentication, and at a first glance that would look like a great idea. You can just store one document with a user and all of his information. But once we start looking deeper into the relationships this user might have, the document model starts to become less and less ideal. The user could have friends, who in turn would be users themselves. The users could write messages or comments, or they could belong to different groups. Now what are you going to do? Are you going to save the user data of each friend inside the user document and have that same data saved in the document of the other user? Are you going to have each message contain a copy of all of data for that user? And how about groups, is each user going to contain a list of all the groups he is a member with all of the data from that group? Is the group going to contain a list of all the users that are a member if it? What if two users were members of the same group?

      Let’s say that you are ok with the data redundancy and are willing to deal with the extra space being utilized due to the speed benefit that you are going to have since you don’t have to do a SQL join statement every time you want to get the data out. Let’s explore another issue. What happens when you have to update one user? Now you have to go through your entire database and replace the user data in every place it’s being stored. What happens if you miss one? Now not only do you have repetitive data, but you also have data that doesn’t match. So instead of doing this let’s say that you decide to pull users out and instead use an userid reference instead. But now you don’t have all of your data stored as a single document anymore, and you need to get data from more than one place so how do you do that? It’s around this time that you remember that MongoDB doesn’t have joins and you are forced to carry out these separate queries on your applications which are not optimized to do these kinds of queries.

      Now let’s take a look at what you would do if you were using SQL instead, you would have created a table for users, ones for messages, one for groups, and a few extra helper tables to handle the many to many relationships between users and groups, and users and users. (I am not going to take much time discussing how this data would have been properly normalized on here but if you are interested in learning more about that I recommend reading my blog post on data normalization and playing around with my SQL Helper Application

      Now when you go to update the data you would only have to update it in one place, and when you go to get the data out you would be able to do a server side join query for which SQL has been optimized for.

      This is not to say that MongoDB doesn’t have its uses, if all you were doing was storing an user to handle authentication, and he didn’t have any relationships, then MongoDB would be a very good solution for that. However, you must first be aware of the relationships between your data and how they will affect your data structures before you choose to go with MongoDB. Despite MongoDB seeming easier to use, especially since you don’t have to learn the SQL syntax in order to user them, it is a mistake to use it simple because you think it will be easier to manage then a SQL database.

Paulo Diniz