This is part 2 of a 4 part series on DataOps
What is DataOps
DataOps is an emerging idea, with many perspectives. For this post, I am using the following definition is from Hitachi-Vantara:
DataOps is enterprise data management for the AI era. Now you can seamlessly connect your data consumers and creators to rapidly find and use all the value in your data. Data operations is not a product, service or solution. It’s a methodology, a technological and cultural change to improve your organization’s use of data through better data quality, shorter cycle time and superior data management.
The most significant challenge to implement DataOps is going to be cultural, not technology. The following recommendations are suggested to address some of the main cultural barriers that exist.
Quid est veritas?
The single source of truth. Philosophers have been pursuing this since philosophy began. Data Warehouses promised to solve the single source of truth problem.
In truth, the pursuit for a single source of truth killed most Data Warehouse initiatives. The effort to get all areas of a company to agree upon the meaning of a data element was significant and a cause for hair loss. The subsequent data clean up, process changes, report updates far outweighed any value of having a universal data element in the Data Warehouse. And then the struggle to get the data out of the warehouse to be used!
The absence of single source of truth was not the problem. It was not understanding how similar KPIs were calculated, and could have different results. Or KPIs that should have been the same were using different data and calculations.
It was naïve to believe technology could do what philosophers could not. The single source of truth has a fundamental, flawed assumption - it implies there is only one way to do business.
- Leave the pursuit of a single truth to the philosophers.
- Accept there may be many sources of truth for similar things. Counting customers depends on who is counting, when they are counting, and for what purpose. It is more important that the data and calculations are accurate for the needed purpose.
- Cataloging data will generate insight into how data is used for business so up and downstream impacts can be understood.
- Use Core Data as a construct. Core Data is common and shared. Focus on what reduces wasted time, like having a single identifier for a customer, rather than trying define every data element.
Interested in learning more about DataOps? We recommend the DataOps.NEXT virtual conference by our premier partner Hitachi-Vantara. There are 4 tracks:
- Optimize the Data Fabric
- Build and Manage Data Pipelines
- Improve Data Governance and Agility
- Expand Analytics and Machine Learning
There is no cost to join live or to watch replays for 90 days.
Date: Thu 14 May, 2020
Time: 9.00am EST / 1.00pm UTC
Where: Online