Big data brings a big load on the data center. Its name already implies that there is a vast amount of data to be managed. The management of the data is not about simply managing the databases, but is about capturing, curated, maintaining, storing, searching, sharing, analyzing and presenting it. In order to meet with these requirement, the data center metrics need to be reevaluated.
The first impact on the data center is the change in transactions. Current data centers are focused on the online transaction processing. That is, the data center focuses on receiving the request and sending the response, such as file servers keeping the data and delivering it when it is required, the DNS servers answering the queries when they receive them, the web servers serving the contents when they receive requests and the like. The batch processes are either background tasks executed with lower priorities or executed during the after-office hours. In the case of big data analytics, the batch processes become as important as the online transactions, and they have to be run at real time. An example to this can be online retailer: when a prospective customer is engaged in a buying activity, such as selecting an item and placing it in a shopping cart, the system should profile the customer, match it with similar customer profiles and make its recommendations towards the analytics’ results. And these all should be done in real time. The processing power, storage I/Os and network bandwidths should be more than enough to support the workload. “Enough capacities” will only handle the current load but provide the business from exploiting the opportunities that big data brings; because “enough” simply cannot handle the additional load.
The second impact will be on the databases. The data has to be captured, stored and curated. That means, you cannot let the data stored at some place. Since the quality of the big data analytics is only as good as the data stored, the data needs to be clean. “Clean data” is data that is complete, accurate, and is not duplicate. Cleaning data may seem like a wasted time to an outsider, but is an important item that has a direct effect on the success of the big data analytics project.
The third impact will be on the data retention. Depending on your company’s industry and/or your company’s departments, you may be subject to different data retention regulations. Regulators will be thinking about accessing the data for the retention periods but it is the IT department’s responsibility to ensure that the data is properly stored (archived where/when/if necessary), is accessible, can be presented upon request, is backed up and is eliminated at the end of the retention period. In the case of big data, the requirements on something already big may have a bigger data accumulation that brings bigger load on the systems. Although this is a clerical task from IT’s point of view, it should rank higher in big data case.
The fourth impact is the business requirements. The requirements of the business will have a direct impact on the three items I have just discussed. If the business requirements state that the business development will be performing scenario analysis and the scenarios will use the data that goes beyond the legal retention period, then the additional load that comes with the requirements will have to be carefully evaluated. At this point, my experience suggests that, the more the enterprise takes big data seriously, the more the business requirements be. If the company is investing in data engineers, the training of the staff and implementing big data analytics in day-to-day operations, my rule of thumb says to at least double the requirements and prepare the expansion plans. Once the enterprise enjoys the big data returns, the higher will be the expectations.
The last item on the data center is the IT staff. Currently big data employees are mainly engineers who have been trained in mathematics, statistics, computer science or similar complex problem-solving disciplines. And your IT staff (and also your employees) are not the people who are carrying out these duties now. Data engineers, on the other hand, are hard to find and too expensive to employ. That brings the lack of personnel that is capable of carrying out the big data operations – both in the data center and in the enterprise. To overcome the problem, the training should be carefully evaluated. When deciding on who to send to training, my experience says that it is best to choose the employees who are enthusiastic about the subject. They have a higher ROI.
All these items that I have mentioned boil down to the company’s budget, which is the biggest constraint. If the enterprise begins by setting down its requirements and priorities, it will be easier for it to decide on what to include in the project given the constraint. The cases should be backed up with the strategy and CIOs should have a thorough understanding of each element in the budget to properly present it to the board.
- Featured Image: executivemagazine.com