I Want My Data Back! (A Vision Of Data Vaults, APIs, Open Systems And Standards)


Have you stopped and thought about your data out there on the internet? When I say your data, I really mean data about you. Your Twitter followers, your Facebook friends, your eBay reputation. Or, data about your company: purchase record, customer lists, customer credit card numbers.

I think a lot about these data, and in fact I worry. I’ll use LinkedIn to illustrate my point (and I don’t want to single them out, all I’m going to write below applies to many other companies that have your data). I’ve spent countless hours building up a “persona” on LinkedIn, fine tuning my profile, searching contacts and adding them as connections. Adding and receiving recommendations, and, most recently, endorsements.

My LinkedIn profile is a very, very precious asset to me.

So, what if -

- (god forbids) LinkedIn goes bust and disappears?

- LinkedIn somehow destroys this data (say, due to a bug)?

- My account on LinkedIn is hijacked by hackers and they use my connections for their own purpose?

- LinkedIn is hijacked by hackers who destroy all the data?

My worry is that I depend on LinkedIn, and an increasing number of other companies (Apple, Google, Facebook, Twitter to name a few), to make sure my data is safe and not compromised. Most of them I trust as I would trust my bank with my money. So my worry is really about statistics – the more there are, the more likely one will screw up.

There’s an additional topic for consideration- the above companies own some of my data and they make money by aggregating it with other data and selling to companies eager to know many details about me. I don’t get to have any share of that revenue. I’ll write about this in another blog.

Innotribe’s Digital Asset Grid (DAG) project could help with my worry. The basic assumption behind the project is that you get to control your data – the digital assets such as my LinkedIn connections or my company’s purchase record. You get to truly own your data and choose where to store it, or who to entrust with this data. The DAG acts a as global directory, tracking your decisions. You also get to control who can access your data and what they do with it.

I’m a big fan of the DAG project. I was thinking about how LinkedIn (again to pick them as an illustration) would work in a DAG world.

Here’s a list of things that need to happen for this to be possible:

1. Data classification: in the DAG world, LinkedIn would publish several classes of data. To name a few: profile, connections, status, endorsements.

2. Empowerment of the user: LinkedIn would provide to each user the ability to choose,  for each of the classes above, where to store it. The user could choose to let it in LinkedIn (for example, the status updates, very public and short lived, can stay), or choose some other digital asset storage provider (for example, the sensitive connections data would go to another storage provider, more trusted by the user).

3. Availability of an open,  API based standard for digital asset storage access. The API (application programming interface), available publicly on the internet, would allow LinkedIn to fetch my data from the relevant storage provider. The standard encompasses how to get to the data, and how LinkedIn authenticates itself to the storage provider.

4. A thriving economy of storage providers, offering all degrees of services, security, convenience and trust. The key tasks of the providers are to store the data, check whether any party requiring the data has been authorised to do so, and to provide the data. The DAG acts as a secure directory on the internet, connecting LinkedIn and all the providers. On top of these very basic services, one can easily imagine may value added services that can be provided (long term storage, archival, monetization, cross-referencing with other users, etc). In fact, I can envision a user building her own storage based on open source standards and code, for data that she absolutely doesn’t want to store anywhere else but with herself.
The DAG project positions banks as data storage providers – truly a very natural evolution from their mission of today (safekeeping monetary assets) to the mission of tomorrow (safekeeping digital assets)

How does all of this change the user experience in LinkedIn? This is the best part: it does not. LinkedIn looks the same to you as it was. Everything happens behind the scenes. For example, when you login, LinkedIn fetches your connections from the relevant storage provider (if you have chosen to have this asset stored elsewhere). It then queries the statuses and updates of your connections, in the same way as today. The difference is that fetching this information may imply going to other data storage providers, based on the choices your connections have all made for themselves.

The software architect in me sees some issues with this architecture -

- the traffic on the internet will increase exponentially, as LinkedIn (and all others who have your data) have to go through additional steps to get to your data (rather than simply getting it from their own local storage): request the DAG to identify the relevant storage provider, and then go to the storage provider to actually get the data. The next step in my thinking will certainly be to try to assess how much more traffic this means.

- the architecture does cover data ownership, but doesn’t cope with the privacy of the data. Each provider such as LinkedIn could “cache”  the data that is being transferred (in fact, storing temporary data for a longer period) and still end up knowing a lot about you. The good thing is that you still end up owning the master copy of the data, a major benefit compared to the current situation.

The architecture is promising – it delivers the basic benefit of user empowerment, and creates a thriving new economy of storage providers. What we urgently need are the open standards.

Who can help?