Data is deceptively complex. It requires a multi-disciplinary approach to fully understand its value. Currently such knowledge is fragmented across people and departments in an organization. This fragmented understanding results in low utility and lost potential. And data is a terrible asset to waste.
To paraphrase Charlie Munger, “You’ve got to have multiple models in your head. And you’ve got to array your experience—both vicarious and direct—on this latticework of models. And the models have to come from multiple disciplines—because all the wisdom of the world is not to be found in one little academic department. That’s why poetry professors, by and large, are so unwise in a worldly sense. They don’t have enough models in their heads. So you’ve got to have models across a fair array of disciplines.”
The same is true about understanding data. Let’s examine some ideas that have helped me.
Data life cycle & variability:
Raw data is messy and typically not ready for human consumption. It requires processing – not unlike a factory which takes raw material and processes it into finished goods. Some data has a very short Time-to-Live (TTL) but other types of data have more enduring value. Some data contains secrets that could provide your organization a competitive advantage. Other data is mostly noise. In other words, not all data is created equal. It is important to understand this variability and use the appropriate “data factory” to process data. Using a one size fits all approach to process data is a waste of money and guarantees low returns.
Technology & Infrastructure:
The technical skills needed to understand how data is created and captured is often regarded by business users as the stuff they don’t need to know. While this is largely true, it prevents a business user from understanding the full life-cycle of data. This is similar to the argument, “you don’t need to know how the engine of a car works in order to drive it”.
However, a basic understanding of the engine is helpful in making you a better driver and taking care of your vehicle. Similarly business users don’t require an engineering degree but some understanding of databases, how applications generate data, and the infrastructure that hosts it is helpful. For e.g. it is worth knowing the reasons why relational databases have dominated but may no longer be sufficient to meet all your needs. What are in-memory data bases? And so on.
Valuation & Finance:
Understanding how to quantify the value of an intangible asset like data is a rare skill. The book “How to measure anything” by Douglas W. Hubbard is recommended reading. There is also a new discipline called Infonomics that addresses this issue.
Essentially, if you’re not measuring the actual benefits generated from data versus its potential, then you’re in a poor position to recognize and close the gap. It is likely that you are incurring greater “inventory carrying cost” for data than the economic value it generates.
Data models & modeling:
Data consists of two components. A data model and data values. At the cost of oversimplification – a data model is like a blank meeting calendar. There is a structure but no content. Data values complete the picture. Most consumers of data think of data values and forget the structure – the data model – that houses it. Good data models are abstractions of the real world that help define what the data is all about.
The process of data modeling can be abstract and includes things like defining entities, attributes, relationships of interest, assembling these into a data structure, optimizing performance and creating metadata. A basic understanding of data models – how they are created, and how to retrieve and insert data into them is important. The advent of big data and the appeal of schema-less, non uniform, or poly-structured data has not diminished the value of this concept. There is always some structure to the data – you just need to find it.
Systems thinking:
An organization behaves as a system regardless of whether it is being managed as one. Without a systems view, the onslaught and flow of data appears chaotic. According to Geary Rummler, a systems view of an organization is best understood at 3 levels. The organization level, the process level, and the job or performer level. Data is created at all three levels but the bulk of it is generated at the process level. Unlike organizational structures and reporting hierarchies which are vertical in nature, processes flow horizontally across an organization, cutting through departmental silos. However, most efforts remain focused on departmental level improvements – ignoring the white space between departments. This causes problems.
You need to think of the integrated whole. This allows efforts to be aligned at all levels of the organization and provides visibility into core business processes. One of the key reasons that data projects (BI/Analytics/DW) fail to deliver on their projected ROI is due to the lack of clarity in the business process change needed once the solution is deployed.
In Part 2 we will look at additional concepts that will help you get the most out of your data.