Taking back our information: Let’s get out of the export/import business & build places of our own for digital information.

Note: this is the first in a series of posts on the theme of “Taking back our information.” The phrase “our information” can mean different things which, in turn, suggest different meanings of “taking back”. Our information – your information, my information, personal information – can be the stuff we keep on our hard drives, in physical filing cabinets and in various software applications and services. In a world where applications and services are, ever more so, web-based, even simple services of disk backup and archival can leave our information vulnerable to compromises of privacy and viral infections. For this kind of personal information – the information we nominally “own” – “taking back” might be followed by “ownership”.
But personal information can have several other senses. (See either of my books, Keeping Found Things Found or “The Future of Personal Information Management”, for a more complete description of the 6 senses of in which information can be personal.) A 2nd sense of personal information is the information that various agencies, organizations, web services, etc. keep about us. “They” collectively know an astonishing amount about us including whether we pay our credit cards on time, how many rooms we have in our houses, whether we have high blood pressure and what kinds of books we like to buy. “Taking back” for this kind of information is often followed by “control”.
Alas, information about us as once “out there” is not easily controlled by us. For information about us as well as for information we post (whether about us or a neutral topic like the weather), we might at least seek to take back a greater awareness for how this information is being used by whom and when. This is one of the primary points behind services ranging from Equifax to Reputation.com to services for “plagiarism detection” (e.g., http://www.dustball.com/cs/plagiarism.checker/). We do a form of this even when we “google” ourselves.
This initial post addresses a more fundamental issue in our efforts to “take back our information”. Whenever we work with information we do so through tools. Though we may not think of it that way, we use tools even for paper-based information. (Paper, pencil and pen are tools.) But the dependency on tools is much more apparent for our digital information and especially so for Web-based applications and services like Facebook or Google Docs that also store our information. Whose information is it? How can we “take back” our information from there very tools we use to create and work with this information?
In our use of software applications—whether desktop-, mobile- or Web-based, we have come to expect at least some support for the export of our information from a given application and a concomitant import of this information into other applications. Support is not new. But this support becomes more important (and trickier) when there is no longer such a thing as a “file” that can be the target of interchange efforts nor common, widely-used supporting file system APIs for access to the data in question. Most popular Web-based applications support APIs for programmatic access to data and these can form the basis for a data interchange between applications. The principles of data interchange are nicely articulated by Google’s so-called “Data Liberation Front": “Our team's goal is to make it easier to move data in and out.” To paraphrase, it should take no extra money and relatively little extra time to get data out of our Google accounts and into alternate (“open, interoperable, portable”) formats. This is a noble goal. Obviously, we’d rather have big companies like Google articulate such goals even if imperfectly realized than to drop all pretense of being “nice”. The trouble is that the export/import model of personal information management (PIM) never worked well not even in olden days when desktop monoliths (think MS Word) ruled the world and most of our information was still stored and sent in paper form. Now consider our modern world. We can carry hundreds of apps and hundreds of gigabytes of information in a pocket device. Seemingly every other Web site we visit doubles as an application or a service. We leave at least a little (often a lot) of our information (and, by proxy, ourselves) at each. Export/import? Data interchange? From where to where? No matter what is done to smooth, shorten and simplify the process, data interchange takes time and is lossy (i.e. information is lost “in translation”). There are at least three reasons for this:

  1. Even as content is expressed, structure is often left behind. There is generally no notion for how the structures of an application can be represented separately from the application or how these might translate to the structures that can be built in another application. How, for example, does a tag hierarchy in Evernote map to the flat tagging structure supported in MS OneNote? Or, conversely, how should the “notebooks”, “section groups”, “sections”, “pages” and “sub-pages” of MS OneNote map onto the “notebooks” and the “tags” of Evernote?
  2. The context in which the original information was used and referenced is usually not preserved. External links into the original information are apt to be broken as the information is transferred. Uses of the information in Application A often need to be converted by hand to comparable uses of the information in Application B. For example, bibliographic references may transfer trouble-free (or nearly so) from Endnote to Zotero. But the in-line uses of these references in documents (i.e. citations) do not. Support for such a conversion was a feature requested back in 2008 and still not supported. The importance of context and a preservation of incoming external links to information become ever more important as the quantity of our persistent digital information grows and as we seek to interconnect and integrate this information.
  3. There is a more basic reason for the lossyness of data interchange. Some aspects of our information are and always will be intertwined with the tools we use. Furthermore, this tool-dependent aspect of the information interaction cannot be fully explicated in a way that allows for its preservation separate from those tools.

As a simple illustration of reason #3, consider the “thank you” card I received some years ago with paper as its “tool” of delivery. The same message might have been sent using an alternate tool. The sender might have called me instead or emailed me. Each tool, each method of delivery, would have given me a somewhat different experience. The email might have received only cursory attention from me before it was nudged off the screen.  The phone call, had it caught me in a busy moment, might even have been a source of mild annoyance. As it was, the paper thank you card stayed on the top of my desk for several weeks and each time I viewed it I was touched by the gesture of the person who sent it. Now consider a digital information item – perhaps a mostly textual document or, even more so, a picture, a song or a full-motion video. Different tools can present the same item in distinctly different ways. Timing, resolution, color saturation, support for highlighting, navigation, spell-checking, link resolution, word look-ups, fonts, revision marking, and so on are likely to vary to give us distinctly different experiences for the “same” information. The grass is always greener… as one more problem with the export/import model of PIM, consider what may happen even after we’ve completed the transfer of information to Application B: we may very well look back longingly to the features of the Application A that we left behind. If we troll the discussion boards for any class of application (e.g. note-taking, task management, bibliographic references, collaborative authoring, etc.), we can see many comments of the form “Application A is great for features r and s but Application B is really good for features t and u… If only I could combine the two applications!” What to do? As our information grows and takes shape on the Web, we can no longer accept a situation where our information must be shunted from tool to tool in order to take advantage of this or that feature. We look for tools of a different kind. We look for tools that can be applied to our information – its content and structure — where and as it is.
But how? This is a topic for upcoming blog posts in this “Taking back our information” series.


Great article. When I got to

Great article. When I got to the line; "But How?", I was excited to wee the answer. Like a good author, you left it as a cliffhanger to make sure I would read your next blog post!

Post new comment

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

This question is for testing whether you are a human visitor and to prevent automated spam submissions.