Big data has already become part of our lives. If we review what big data is, s.i.c. from Wikipedia, it is “a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.” As the cloud services – especially PaaS solutions – offered us the processing power that is needed to process the data, we were able to play with the vast amounts of data at hand. The complex analytics and reporting options were simply amazing. Unfortunately this amazement buried deeper the two hidden problems that was inherent in big data: the past predicts the future fallacy and the traditional IT data center approach.
In IT, our daily tasks are solving technical problems that usually have a defined answer: if something is not working we research the error code, the possible solutions, or if it is launching applications or architecturing systems we research the guides and best practices and carry on with the implementation. The business side (with this I mean the non-IT part of the enterprise) is also experienced at working with this type of IT, both at the staff level and the investment level. They cannot be more right; at least big data is defining/planning/allocating some IT resources, developing to meet the business needs/requirements and people to be managed/hired/fired/relocated.
For the big data, this traditional data center/IT approach is no longer valid. Although you can shuffle words and put it in the same context I have outlined above, the vast amounts of data surpass the computing ability of the traditional data centers. Big data is not just doubling the load by throwing in more processing power and storage space (well, maybe yes, depending on your enterprise’s budget but is it really viable). At this point, we need to rethink about the hybrid cloud scenarios: again, the processing power of the enterprise will not be enough for the processing needs. In addition, the IT staff currently working is concentrated on the data center resources – operating systems, hardware, applications, databases and development – while big data is about more databases, complex analytics and reporting. The short term solution will be to reshuffle the current IT staff to work wıth the big data but it will not be logical to expect too much from them in the very near future: they will require rigorous training to understand the business (and yes, this may mean less IT, more business as I have previously discussed). In a successful big data implementation, the big data team will be an intersection between the business and IT; perhaps even more.
Then comes the fallacy of the past predicting the future. In the first class of statistics, you learn that statistics help us understand the past to have better predictions about the future. Statistics is not a tool to know the future, it is just an aid to make better predictions. When businesses think about the big data, they have a simple logic that analyzing vast amounts of data is a surefire way to understand what will happen in the future. Since big data is actually big statistics, big data inherently includes this definition: you cannot see the future with the past events. Although in some limited cases this may hold true, generally speaking it is not. If it did, the performance analyzers working in financial institutions would have had all the world’s income.
The IT’s role at this point should be to assist in the business by helping them keep their sanity. The IT has to present the business with cases that let them see the complex computations and their presentations are just tools that help them make more informed decisions about the future. The IT, being inherently analytic and objective, is in the best position to help the board to recheck their guts.
I perfectly understand that from the business perspective, who succeeds in better predicting the future will be a couple of steps ahead in the competition. Since big data is helping the businesses better understand the past and helping them predict the future, big data has to be invested in. It is just a matter of seeing both sides of the coin and know what it can do and what it cannot.
Featured Image: www.greenbookblog.org