Thursday 7 March 2013

Managing work in progress data


As I've mentioned elsewhere, I've recently become interested in the story of the Confederate commerce raiders during the American civil war, and I've been using QueryPic to search the Australian and New Zealand newspapers of the time.

Of course, I'm not a historian, I'm a dilletante, in fact I'm a digital dilletante.

And in the course of searching for newspaper articles one thing that is amazingly useful is the ability of the Trove newspaper database to create a pdf of the article, and that of evernote to grab the pdf and upload it to a notebook with my own folksonomy of tags.

Now, of course, Evernote is not the only game in town, Zotero is also a pretty good product, but Evernote is what I use and know.

One of the problems I face in these amateur projects is the 'big heap of everything' problem – when one starts with an idea, clips a few things of interest, the odd jpeg without any clear of idea of what you're doing or even whether it's going to turn into something half serious.

When I was a psychophysiology researcher – yes I was a proper scientist once – I had the same problem – except then it was differentiating the interesting from the relevant, but again it was all down to categorisation and organisation.

I've talked to enough researchers in a range of disciplines to know that this is a common problem.

The problem comes down to the accumulation of material and then its organisation and reorganisation, at which point it becomes a body of evidence to support what we rather grandly these days call 'scholarly outputs'.

In the old days people would file their material in old envelopes, write something relevant on the envelope and if they were very organised write some relevant stuff on an index card and file it. Basically they saved the data and created some metadata around the article.

Resources have of course now gone electronic, and this is where tools like evernote come in – they allow us to capture and organise material, and annotate it – and we can organise it and reorganise it to our heart's content.

So, when we come to repositories or data archives we tend to think of places to put finished outputs, be it a conference paper or a dataset. We don't tend to think of work in progress stuff, like my Evernote notebook of 1860's press cuttings about Raphael Semmes, yet of course it is just this work in progress material that enables scholarly outputs.

Any work in progress storage is necessarily an active filestore as the material is subject to reorganisation – something that has implications for its backup and management.

As a data manager the real question is how to support this activity. As I said, Evernote and Zotero do it well, but should we also be trying, on an institutional basis, try to provide some sort of workspace to allow people to accumulate and save material, while marking up tags.

As Evernote and the like already do a good job, trying to replace them is probably a waste of time and money, but being able to provide a general mechanism to allow users to export the material once they are happy with it to a local archive server is probably a good thing as it ensures that the data is backed up and available for reuse.

The other thing is that recently we looked at data management practices in a cohort of beginning Arts and Humanities researchers. Frighteningly, a lot of them were jst storing material on their laptops and dumping it out to a usb disk. Some did use Drobox, but none made meuch use of Evernote or Zotero.

So as well as helping provide a reseource for the organised we also need to consider what to do with the less organised. Training would help, and training focunsed on managing your data rather than simply backing it up, but again there is a need for a work in progress archive solution.

The question is what to provide and how best to do it – probably some sort of relaxed content management solution would provide a starting point ...

3 comments:

Anonymous said...

The other thing is that recently we looked at data management practices in a cohort of beginning Arts and Humanities researchers. Frighteningly, a lot of them were jst storing material on their laptops and dumping it out to a usb disk. Some did use Drobox, but none made meuch use of Evernote or Zotero.

I'm by no means starting out--moe like finishing--but that describes me very well. I have many backups, because I work on any of three different machines and synch them all by using a friend's linux box as an FTP dropbox. I've only been introduced to Zotero as a bibliographical engine, for which my experience is that it sucks, or at least doesn't do what I want such an engine to do, which is pick up data I already have in citation format and incorporate it. I have all my citations in a massive Word file I started before this sort of software became common-place; that file also tells me which folder my (paper, longhand) notes on the citations and if I have a full text somewhere. This is not data I want to have to retype. I do random thinking-in-text in a shareware program called TextPad, but only because I am old enough to remember carrying files round on a floppy disk and still think of Word as `heavy' when plain text is all one needs. (Also TextPad has a *far* more powerful search-and-replace.)

So, does all this make me a problem user or just an old person?

dgm said...

It makes you normal - it fits with my experience of how a lot of researchers in the Arts and Humanities work - scientists tend to have piles of old notebooks and research diaries ....

Anonymous said...

Frighteningly, a lot of them were jst storing material on their laptops and dumping it out to a usb disk.

I guess it was that bit that caught my conscience, because I do that too. I suppose the difference is that the laptop is paired up with a desktop and the USB stick is only a backup in case I get to work or wherever and find the linux box is unavailable, but... I could save myself a lot of time with a reference manager, though, if I made enough time to make it useful in the first place. Anyway: I shall think out loud at my own blog. Thanks for the reassurance of normality!