Scripting News: Wednesday, November 03, 2010

Home > Archive > 2010 > November > 3

Previous / Next

About the author

Contact me

My sites

Meeting at Library of Congress

A picture named floppy.gif I'm on my way to Washington for the second time this week. Last time it was to have a discussion about rebooting the news at the Online News Association. This time I'm going down for a two-day meeting at the Library of Congress to talk about creating an archive of what they refer to as citizen journalism. I think that term doesn't capture what's going on, it's anachronistic, as horseless buggy seems today. Who would think you could hitch a horse to a car to get around, but people did used to do that. In the future, we'll get our news from the sources, so the question is how to create a record of what the sources are saying.

I'm one of the people who will kick off the meeting by saying how I came to be interested in this subject.

My story will be about various formats that seemed so pervasive that they were safe choices for archiving content. There was a time when the 5 inch Apple II floppy disk was ubiquitous. You didn't have to carry a computer with you, because you could be sure there would be an Apple II when you got where you were going. Today, such a disk would be useless. CP/M 8 inch floppies seemed the same way, and the hard-shell 3 inch disks used by the first Mac. Yet none of them have held up over time, and most of the stuff that was written on computers in the 80s is gone now, unless it was printed out. Printing turns out to be a pretty good way to back up digital content. Or it was. Today we create far too much material to rely on printing as a backup. We're going to have to come up with something else.

I came up against this after I left Berkman, when the RSS 2.0 spec, which was stored on one of their servers, became inaccessible. I was using a CMS I had written, and somehow the app had stopped running. The sysadmin of the Berkman site didn't know how to keep it going. That was a big lesson. If you want content to stick around, you have to take deliberate steps to make sure it survives. And there are some best practices. When I focused on this problem, I was able to arrive at a way to store the spec on a Harvard server such that now, six years later, it's still accessible. Whether it will be available next year is anyone's guess.

Academics have always had this problem. A university employs a scholar, sometimes for a lifetime. He or she creates a body of work, that then must be made available to future generations. That's why we have libraries at universities. But lately, as with all kinds of intellectual work, scholarship is being done on computers. So when a professor retires or dies, we are left with an array of electronic files and folders in a variety of formats. What use will they be in the future if the apps that can read them aren't maintained.

This blog and its related sites are another good example. As much as I don't like thinking about it, someday I am going to die. And when that happens, unless someone pays the ISPs, and someone relaunches the servers when they crash, and cleans out the databases when they fill up -- poof -- there goes Dave's online presence.

But we can do a lot better than we are doing, we just have to have the will to do it.

I've written about this many times, I call the topic future-safe archives.

A few bullet-points:

1. I want my content to be just like most of the rest of the content on the net. That way any tools create to preserve other people's stuff will apply to mine.

2. We need long-lived organizations to take part in a system we create to allow people to future-safe their content. Examples include major universities, the US government, insurance companies. The last place we should turn is the tech industry, where entities are decidedly not long-lived. This is probably not a domain for entrepreneurship.

3. If you can afford to pay to future-safe your content, you should. An endowment is the result, which generates annuities, that keeps the archive running.

4. Rather than converting content, it would be better if it was initially created in future-safe form. That way the professor's archive would already be preserved, from the moment he or she presses Save.

5. The format must be factored for simplicity. Our descendents are going to have to understand it. Let's not embarass ourselves, or cause them to give up.

6. The format should probably be static HTML.

7. ??

11/3/2010; 9:03:41 AM. .

Previous / Next

November 2010
Sun	Mon	Tue	Wed	Thu	Fri	Sat
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30