I demand a personal data cloud. And you should too.
The company I work for has lots and lots of different units and teams, and all of them use a multitude of custom applications, databases, SaaS offerings, spreadsheets, data warehouses, ERP systems and OLAP cubes of every possible shape, size and color you can imagine.
And I have no idea where I can find them, what they contain or with whom I need to talk to so I can get access. The same is true for everyone else who isn’t directly involved with these specific data sources or processes.
This is really bothering me. Why? Because it makes my job — and I guess a lot of other people’s jobs — way harder than it should be.
Not knowing about existing data means I need to waste time searching for it — if I even know it exists in the first place. And due to the lack of a useful way of publishing data it also limits the chances of my coworkers to benefit from my findings and the data I create.
Why is it that way? I think it is partly because we are lacking the tools and partly because we are lacking a culture of collaborative data usage and sharing.
Some people are thinking about data as being their personal property which they need to guard. But we need to get rid of this thinking and replace it with an open and welcoming attitude while enabling more widespread and ubiquitous access.
This data is not mine or yours, it is ours, and it is there so we can gain new insights and make better decisions. Closely guarding access to it only creates unnecessary barriers and hinders innovation.
Imagine for a moment a different reality: You could discover datasets relevant to you easily, because everything is curated, indexed, described and annotated with metadata.
You could use the data for whatever project you are working on. Not only that, you do not have to worry about this data being outdated, unlike the Excel sheet you got mailed last week. The same Excel sheet that has been send from person to person for the last couple of years, and whose original author has long been forgotten.
We should enable easy discoverability, use, remix, combination, discussion, annotation and publishing of data — similar to the early years of the internet, where people have been able to do just that with the web or how it is done today with code on sites like stackblitz.com, github.com or codepen.io
Maybe we should see our company data as a landscape, for which you need a good map or a guide, showing you the highlights and important landmarks.
Who could be that guide?
Maybe similar to how with DevOps the role of operation teams shifted away from keeping applications up and running towards a service provider like role, we could see the business side of our companies to shift into a provider role for data, contributing insights and guidance in the vast data landscape.
Because who could be better suited to interpret it than the ones whose primary artefacts of their daily work is collecting, processing and generating new data in their respective business context.
Try to visualise this landscape and imagine how you too could build your own little area in there, cultivating data like you would cultivate your own garden. The products of that are again something you’d share with your colleagues, so they can benefit from your work tending to it.
And like a physical garden, this one also needs the right tools.
The good thing is, these tools are available today, we just need to roll them out to everyone. Together with a shift in how we think about data and how you think about other using the data you take care of, this would bring real benefits in a short amount of time and would make yet unimagined new applications of it possible in the future.
Unlike the physical world, where things are more or less fixed in place, this data can follow you, can become a digital twin of your doings, something like your own personal data cloud that always hovers nearby, providing you with the knowledge you and others need.
I demand that personal data cloud.
And you should too.