Big Data and Data Warehousing

Datetime:2017-04-18 05:39:36         Topic: Data Warehouse          Share        Original >>
Here to See The Original Article!!!

I’m annoyed that many people label the normal Data Warehousing & BI stuff as “Big Data”. It is not. For me, Big Data is the “Hadoop stuff” (HDFS). It is is in SQL Server, Oracle or Teradata, it is just a normal database, not Big Data.

Big Data is stored on HDFS (Hadoop Distributed File System), not in RDBMS. Oracle, IBM, Microsoft, SAP, Teradata, all use Hadoop to store Big Data. Big Data is queried using MapReduce.

The reason why Big Data can’t be stored in RDBMS is because the format is not tabular. Sometimes it is 2 columns, sometimes it is 200 columns. Like Twitter data. The second reason is because it is too big. Sensors can make 100 measurements in a second, and in a year it could be Petabytes. Web Logs is another example. Tracking the ask and offer price of every transaction in every stock market is another example. Yes we can put Petabytes into SQL Server or Oracle, into Netezza or Teradata, but not at this speed. Hadoop on the other hand is designed exactly to cope with these kind of speed and volume.

Now the usage. What is Big Data Analytics? Big Data Analytics is when we do analytics on Hadoop Data.

Is Fraud Detection Big Data Analytics? Not always. Fraud Detection can be done on a normal Data Warehouse or a database. Is Machine Learning Big Data? Not always. Machine Learning can be done on a normal Data Warehouse or a database. If the Fraud Detection or the Machine Learning is done on data stored in Hadoop, then it is Big Data Analytics.

Even if it is only 200 GB, if it is stored in Hadoop, it is a Big Data. Even if the data is 5 Petabyte, if it is stored in an SQL Server database, it is not a Big Data, in my opinion.

Even if the data is in tabular format (i.e. columns and rows), if it is stored in Hadoop, it is a Big Data. But if it is stored in an Oracle database, it is not a Big Data.

Every Big Data Architecture that I know uses Hadoop. No companies (or government) implement Big Data on an RDBMS. Or on a non HDFS files. Every single company, every single Big Data case I read implement the Big Data on Hadoop. I may be wrong and would be happy to be corrected. If you know a case which implement Big Data on a non-Hadoop system/architecture, I will grateful if you could let me know, either through comments, or via .


Put your ads here, just $200 per month.