Monday 27 December 2010

It really isn’t just the GST Gerry …

A few weeks ago, I blogged about how some retailers here in Australia were complaining that they were losing sales to overseas online retailers as overseas tax retailers were cheaper as they didn’t charge GST, and that Australian customers were legally avoiding GST due to Australia’s generous tax free allowance of $1000 per transaction for goods ordered from overseas.

Well I’ve just had a concrete example of the fact that it’s not just GST. We wanted to buy a high quality A3 inkjet for photographic prints.

After checking some websites, including Amazon.co.uk, we settled on the Epson Stylus R2880. Amazon.co.uk does not ship printers to Australia so we used shopbot.com.au to search for an Australian retailer.

And this is where the prices speak for themselves:

Amazon price (GBP) Amazon price equivalent (AUD) Shopbot price (AUD)
inc VAT/GST as appropriate 450 720 1150
tax free 383 613 1045

assuming an exchange rate of GBP1.00 = AUD 1.60, a VAT rate of 17.5% in the UK and 10% GST in Australia, and rounding to the nearest whole currency unit. Prices as found on 27/12/2010.

And basically what they tell us is that either Amazon is ridiculously cheap or Australian retailers are stupidly expensive. We did check a few other UK retailers and we did find that Amazon were quite cheap, but that none of the big retailers were much over GBP500/AUD800 inclusive of VAT.

Incidentally the same’s not true of an iPad – the worldwide tax free price is more or less the same round the world, if one looks at the mail order Apple store prices.

Given that the printers are manufactured in Asia, I doubt if shipping costs are much of a factor. Basically, even paying GST, it’s much cheaper to buy a printer from overseas. Whether this is due to retailers gouging the market, or importers using an unfair exchange rate is anyone’s guess, but if Apple can do it, why can’t others?

[update 05/01/2011 - without going through the whole rigmarole again we've found that Canon A3 Pixma's are relatively cheap from printersupermarket.com.au, although relatively cheap means a $150 premium over the UK mail order price before takeing VAT versus GST into account ...]


Wednesday 22 December 2010

media consumption 2010

Way back in January I worked out that we were spending around $1000 a year on print media, and that if we canned all the print media we'd have enough money to buy an ipad and some content subscriptions.

We are of course possibly unique in that we never bought the ipad but we did cancel the print media subscriptions - in reality due to my increasing irritation with the Canberra Times and the fact I simply wasn't reading the New Scientist any more rather than any great urge to worship at the cult of Saint Steven.

And how did it go?

Well, we'd kept the Guardian Weekly and Weekend Australian in the hope of slow and lazy Saturdays when you had the time to read the papers properly.

We've also got to confess to finding ourselves picking up free copies of the Canberra Times from Ziggy's or Wiffen's in the market on Saturday, but we can rationalise that.

What we did find is that we missed a morning paper, even though we just skimmed it. I found myself taking my coffee into the study for 10 minutes to start on working through the day's email before driving into work.

So when the Australian came up with a cheap summer deal for home delivery for uni staff I signed up for it, and well, it confirms we're hopelessly addicted to a daily paper. The major difference is that instead of being irritated by the vapidity and superficiality of the CT I now get annoyed by the right wing economic and political stance of the Oz.

J instead merely complains about having the syndicated London Times crossword instead of the Manchester Guardian one, and continues to pine for the Age, which you can't get on subscription in Canberra.

So I guess we're newspaper readers. What we'll do when the Oz summer subscription runs out is anyone's guess.

What is interesting is that at the same time I've basically given up listening to podcasts. Much as I enjoy talk radio, I've been finding it difficult to find time to listen properly - I think I havn't listened to From our own Correspondent as a podcast for about six months now, and have only managed one episode of the iPlayer version of I Claudius.

Reading and TV provide our downtime recreation, and strangely not because of the extra channels that have come with the digital switchover. I guess we're just simply reading more ....

email as a sign of fogeydom ....

According to an article in the New York Times, enjoying using email is a sign you're well on the way to becoming an old fart.

Personally, I'm not convinced. Now while I sent my first email message some time around 1978, I never started using email seriously until 1986 or thereabouts. No one much to email you see.

However I've used it extensively since then, but apparently I'm now an old bastard for doing so. That may well be the case but email has for me always had the advantage of asychronicity - so that being a store and forward solution it works well when you deal routinely with people in other time zones, as well as providing a nice little audit trail.

And while I've use instant messaging across timezones, it doesn't work so well when your fellow IM-er is nine timezones away - you kind of need to have someone who's awake to interact with.

So, I don't think that using email is a sign of incipient senility, what it means is that you have a requirement for asychronous communication, be it with colleagues in different timezones or even just being able to send a message out of hours to Parks and Wildlife about a typo in the rego number on our new National Parks sticker (we bought new sticker for our new car, and they helpfully transferred the balance from our previous vehicle and in the process of the update, well Q is next to W on the keyboard ...).

What the story does show is that the iGeneration typically has a small circle of acquaintences, mostly in the same locale, that they text to about parties, meetups, school and such like. They use text because it's cheap to use, and so naturally make the switch to text like messaging on Facebook.

They need instant response.

An example. If you want to know if someone fancies a beer after work, you're more likely to text them than email them, especially if they're in the next building and you're not sure if they're in this afternoon. On the whole you don't want a reply in three days time - the moment has passed.

Twitter originally looked like it would turn into a service to broadcast social updates. So rather than SMS half a dozen people about you're sudden deep fascination for a middy of VB, you would tweet your followers about your sudden craving. Facebook messaging sans Facebook.

But, interestingly, twitter hasn't turned out like that. While people do use the direct message feature as an SMS replacement (I'm on the train!) it's clear that people are mostly either using it as a curated RSS feed of interesting links, such as my own (@moncur_d) or as a status update service (@UoYITservices as an example) or for live blogging events such as press conferences and presentations.

Twitter has turned into a curated broadcast service. You follow Fred because he has a knack of posting interesting things about papyrology, you don't follow Debbie, even though you're friends with her, as she doesn't post stuff you find interesting, and you while you don't follow qantas you always do a search to check for flight delay notices ...

So, in short, the key take aways are (a) that the communication media used are a reflection of people's lives, and that as people get older they have more and more professional and non social interactions, that require a communications medium that is both asysnchronous, and traceable. Not so much "I'm on the train" and more "I'm on the train and being looking at your project design and ....", and (b) the communications medium used is appropriate to the purpose of the communication.

Friday 17 December 2010

yahoo to close delicious

There are suggestions that Yahoo are to close delicious, the social bookmarking service.

I, for one, would be disappointed, as I use it to bookmark interesting items for either professional or private research.

And this neatly exposes a problem with the use of free online cloud services in the support of academia. They can go away, leaving one with a whole heap of nothing. Just as wikileaks has shown us how the cloud is not content neutral, this shows us that it is not immune from commercial pressures.

So should we stop using the cloud?

No it's too damn useful for enabling collaboration. And building our own private cloud isn't necessarily the answer - governments can (and do) cut funding as viciously as commercial organisations do.

The answer is to (a) have multiple online stores as far as is possible, and (b) to store the content in open formats as much as possible to allow content to be downloaded and reloaded as easily as possible. That way we have an escape route if a particular service dies on us, yet saves us from the risks of having everything stored on a single machine that dies on you.

Of course, if like me you're not anally retentive enough to do your own proper backups you will always be at risk. The simplest answer to this is a vendor and platform agnostic dropbox style service that copies working files between your home and office machines, and also stores them on the web, be it an academic data fabric or a commercial service such as skydrive ...

Wednesday 15 December 2010

Clouds, chrome and wikileaks

Cloud computing is seductive. And useful.

The moment you find yourself wanting to share data with someone else, or yourself between home and work. you need a location to store it that's accessible by those you're sharing with.

In the old days we stuck our data on a corporate server somewhere inside the firewall, and when we wanted to share data copied our files to a password protected ftp area. And it worked. And the ftp server in time became a web server and it continued to work

But in the meantime something happened. Applications became bigger. Hard disks became bigger, and laptops became more portable, meaning they moved about more including outside the firewall.

And because you couldn't provide an instant always on disk mount across the firewall people started storing documents on their laptops. Good conscientious people always synced them to some sort of central repository, but we're all human.

So organisations started becoming serverless, or more accurately fileserverless. Database servers were always with us but all that unstructured information was on people's laptops.

Not backed up. Not easily shareable.

Cloud computing seemed to be an answer to this. Put your documents on the cloud. Share them as you want. And use the light weight apps provided when you're using a netbook or other low powered machine (eg an ipad) and don't have the editing tools to hand.

And it's truly excellent. No more messing with versions and connections, or finding that the file is on a machine that's powered off. I use and like this a lot.

But ...

One bugbear is security - you're trusting someone else to control access to your data the way you want. This is the nub of Richard Stallman's gripe about chrome. Like a lot of Stallman's gripes, it's undoubtedly true, but as we all can't have a firewalled fully patched server in the garage or the skills or time to maintain it - one has to be practical.

Not being in the habit of storing pornographic images or developing plans to burn down buildings I'm relaxed if the security sometimes gets a little lax. I'm even reasonably relaxed if you saw a pdf of my credit card statement, or bank statement, or phone bill. I'd be angry if you could, but I doubt if much harm could come of it.

Probably all you could tell is that we have a revolving mortgage, we buy food, petrol, books and clothes, make phone calls and have friends in the UK, NZ and the US. The information gained is nothing I wouldn't tell a friend, and I don't think the men in funny shoes could make me into a criminal mastermind on the basis of the online information.

We of course don't keep the user ids and passwords online. We do have an encrypted cd and memory stick of things like that, including scanned passport pages, and a few sentimental documents and pictures, just in case the nature reserve on the hill above us ever caught fire and we had a bushfire emergency. Our escape plan involves grabbing a netbook, cd, memory stick, mobile phone and cat.

I also assume that my doctor and dentist store all my medical data securely.

So, cloud computing is useful and providing one makes a value judgement about the risks, secure. The same goes for the majority of corporate documents online. If you're sensible a security breach is annoying. But then you face the same problem if someone steals your laptop, or a memory stick, or whatever. I remember once having to explain to the bank that there were unencrypted copies of faxes (ok it was a few years ago) with credit card numbers on a laptop that went walkabout. Not a pleasant experience, though the bank were fine about it.

The danger with chrome, and other cloud only solutions, is that everything is online and people might start inadvertantly putting things they shouldn't online.

The question, as in the outsourcing student email question is whether the consequences of a leak are bad, and is it more likely to happen with an outsourced service than an internally run service.

The wikileaks saga shows us something else. It shows us that cloud data can be taken offline by the providers. Most commercial usage agreements say that you can't post nasty stuff and we can take your account offline for a whole lot of reasons. Now we might agree about not breaching copyright, and not posting live chicken action movies, but basically when we give our data to a service provider, we're saying look after this, try not to lose it or share it with anyone we don't like, but otherwise - hey, it's cool.

So wikileaks was taken offline due to external pressure from the US government. That's fine.
All that happens is that wikileaks is so high profile half a dozen mirrors spring up in other jurisdictions, and the US government looks foolish.

Now suppose I'm not high profile, but have outspoken views about conserving native forest. This embarasses the state government so they get a court order to stop me posting pictures of a protest online where not everything was carried out by the book. For example, people were a little more rough than they could have been removing protesters.

And they then go to flickr and the like and ask them to pull my account. Perhaps they suggest I also have an unnatural interest in chickens. And because I'm unimportant my account gets pulled.

And if I have my own local backup of my cloud data I can find someone else to host, make cd's of the pictures and hand them out, or whatever.

If I don't and everything's on the cloud I'm just a bitter and twisted loony ...

Tuesday 14 December 2010

2010 - what worked

Last year, I did an end of year post on what worked for me in 2009. Here's what worked for me in 2010:

Windows 7

I was a really reluctant convert to Windows 7. Having been a Linux and OS X user for years I felt kind of dirty going back to Microsoft. But, it's like driving a Holden - they're pretty good these days, and kind of fun ...

Microsoft OneNote

I've tried various notebooking applications over they years, and the only one that (used) to work for me was Tranglos Keynote. I've found Microsoft OneNote a really good snippet catcher, and I find being able to add from the web via OneNote Live and sync with your desktop notebook a killer feature. Certainly helped power my Sighelm obsession

WikiDot

The other powerhouse in the Sighelm obsession. This is the year I really 'got' the flexibility of being able not only to create, maintain and edit documents but to share the editing

Nokia E63 push email

I found this absolutely invaluable when travelling as away of keeping up and doing quick email responses when using a laptop was difficult (no free wifi, etc, etc) and unlike some other devices, it's not anything near chatty enough to blow your data budget when travelling

Ubuntu 10.10

it works, and it's really good. It's a toss up between Ubuntu 10.10 and Windows 7 as to which makes me more productive

Cloud services

Windows Live Skydrive, Google Docs, all these services that let you create, maintain and store documents remotely have really helped this year, making it easy to build and maintain a portfolio of working documents and backgrounders on line and accessible from anywhere. Coupled with One Note and wikidot, invaluable.

Still delivering...

Cooler e-reader

still wonderful, light and versatile with wonderful battery life

Asus Netbook

Still good, and as my dash to Providence showed, light weight, reliable, versatile, and coupled with cloud services. highly effective

Tuesday 7 December 2010

Nennius and data archiving

Nennius has always been neglected in the history of archiving.

As a monk writing some time in the ninth century he put together a History of the Britons based on the sources he could find, some of which are now lost, and yet shamefacedly confessed in the introduction " ...have undertaken to write down some extracts that the stupidity of the British cast out; for the scholars of the island of Britain had no skill, and set down no record in books. I have therefore made a heap of all that I have found ..."

Whether Nennius existed, whether his History of the Britons was his work or the work of several authors, are open questions.

And what has this to do with me?

Well, I've started a new project on a datset archiving and publication solution, and to accompany the project I've set up a blog as a sort of commonplace book to capture relevant background information. And the blog of course needed a both a name and a url, so, while the name is fairly boring, the url remembers that monk or monks trying to capture what information he could - http://nennius.wordpress.com

Sunday 5 December 2010

wikileaks

I am not going to comment on the morality of wikileaks actions or on the correctness or otherwise of the withdrawal of wikileaks' hosting services, paypal account or otherwise. We're all adults and we can make up our own minds

What I am going to say is that governments, good or bad have, until now, maintained themselves in part by controlling access to information and dissembling when advantageous. Some more than others, and of course not all governments are bad, in the same way that not all people are bad.

But governments do lie to their people.

To quote one of my great-uncles on why, as an 18 year old, he volunteered for the Luftwaffe on the Eastern Front in the Second World War: 'They told us we were winning'.

(For his pains he ended up being captured during the retreat from Stalingrad and spent the rest of the war in a labour camp in Siberia before being sent home to help build socialism in the GDR.)

Wikileaks has killed secrecy. Much as in the same way privacy has diminished with the advent of social networking so has secrecy. It is simply much more difficult to keep secrets on an online connected world.

This can be both good and bad. In the same way that twitter has allowed both student protesters in the UK to organise, and Iranian protesters get the message out, the advent of these technologies changes the game, and rather than wring our hands we need to adapt and move on.

Friday 3 December 2010

snow and student protests in the age of twitter

An interesting little phenomenon - in this week's UK snow lots of people have posted photos of the snow, including crowded buses, iced trains and snow bound freeways.

The same is happening with the recent UK student protests - not just informally organised via twitter and facebook but pictures posted and made available online giving the lie to any 'official' images of the evenments.

We of course saw a similar phenomenon after the rigged election in Iran, and while we have to be careful to guard against both the picture takers and gallery/collection assemblers selecting images that support a particular view, it is nevertheless an interesting phenomenon. Everyone has a phone, every phone has a camera.

Power may not have fallen into the streets, but control of information is certainly heading that way ...

Powered by Qumana

Evidence of connection ii

I recently railed about the compartmentalised view of history in which societies are viewed as separate entitities and the connections between them de-emphasised.

Of course societies have always been connected by trade and the like, one need only look at the spread of lapis lazuli, found only in Afghanistan, around the world.

My original post fired off a minor enthusiasm about whether an anglo saxon cleric called Sighelm ever went to India and from that I've found a bigger more interesting puzzle - assuming that he did go all the way to India - how did he get there?

The answer is of course obvious - he followed one of the well established spice trading routes, either via Baghdad and the Gulf, or via Alexandria and the Red Sea, or even, more exotically by following the silk route to Samarkand and then across the Karakorum and Hindu Kush to India.

All of these were well established routes, and ones which have persisted up to recent times, to the latter half of the twentieth century.

It's only with divisions of recent times caused by the advent of Stalin's Soviet Union, the wars in Afghanistan, the Iranian revolution and the war in Iraq that these traditional trade routes have been disrupted. These long, hard, journeys would have seemed perfectly sensible to a nineteenth century Russian or or an early twentieth century British traveller - after all Eric Newby travelled overland to the Hindu Kush, as did Robert Byron to Iran, Afghanistan and Tibet.

And, I've always been quietly amazed by the fact that Agatha Christie travelled with her husband, Max Mallowan, to his dig in Nineveh by train. Not because of the length of the journey, but because it was possible - Orient Express to Istanbul, and then on across Syria via Aleppo and on to Iraq on the Baghdad railway.

And of course it seemed perfectly sensible to British colonial administrators to govern the Trucial states from India and to use the Indian Rupee as a currency not only in the Gulf, Aden and Oman, but also in the British colonies in East Africa, and when one sees Kenyan security guards in a Dubai shopping mall it seems as if the wheel has turned full circle.

A consequence of the divisions of the last fifty or sixty years is that we have become extraordinarily ignorant of the cultures and history of central asia and the role of these cultures in mediating the trade between India, China and the west, be it Byzantium. Rome, or late medieval Europe.

As a for instance, a story periodically surfaces that there are the descendants of one of Crassus's lost legion living in a village of western China. Now it has been claimed population in Lanzhou area had caucasian characteristics and DNA studies do confirm that western DNA markers are present.

Lanzhou is traditionally the endpoint of the silk route through Xianjang to Urumqi, so other opportunities for irregular unions (and western looking babies) doubtless presented themselves due to passing western traders. It's also worth not forgetting that the original Tokharian population of the area were caucasian in appearance.

Equally, because the area is not that far from Bactria it's not impossible that Crassus' legion myth had some basis in fact and that some Roman trained soldiers (or their descendants) ended up in Xianjiang, and that the story was perpetuated to explain occasional western looking babies born in villages.

The other key thing about these trade routes is that they are persistent. Again an anecdote.

In 2002 I was sitting in a roadside cafe in northern Greece close to the Albanian border. As I sipped my coffee a convoy of old Albanian-registered Mercedes sedans, loaded up with an extraordinary range of domestic paraphenalia and packing cases drove past, heading back towards Albania while an Iranian truck went past in the opposite direction.

At the time I said something flippant about the Albanian mafia going shopping in Istanbul, but I was probably more than a little right, but what I actually saw was a traditional trade route re-establishing itself...


Powered by Qumana

Tuesday 30 November 2010

Composable Environments

Further to my post last week on composable environments inspired by Ian Dolphins recent Sakai presentation at ANU, Ian's slides are now online.

I'm also putting together a set of material on the possible congruences on Sakai 3 and Project Bamboo.

Thursday 25 November 2010

No Gerry, it's not just GST ...

I've periodically done a fairly simple analysis of the comparative costs of buying books in Australia from a physical shop versus a virtual store overseas, most recently last August.

Recently here in Australia we've had a political stoush about whether the $1000 limit for GST free transaction is too high.

Now, maybe we're unusual but we don't buy a whole lot of big ticket items. From anyone. Sure we bought ourselves a flat screen TV and a data recorder from a store in Canberra back in January, and I bought a laptop online from an Australian vendor but that's about it.

What we do buy online from overseas are books and gizmos (oh and clothes sometimes). And in most of these transactions freight is a significant part of the cost. And the reasons we do this are:
  1. Choice and availability
  2. Cost
Yes, it can be cheaper to buy from overseas, but I fail to see why a book ordered from a second hand bookseller in the US or UK should routinely be half the price of one from a comparable Australian reseller, and that's including freight costs. Or why a book from Amazon or BookDepository cheaper, sometimes a lot cheaper, than one from the Borders or Dymocks stores in the Canberra Centre.

The same goes for gizmos like kvm switches and sd card players. Basically if I can buy a new usb computer mouse from an electronics shop in Hong Kong for $10 including postage, yet pay $25 in an over the counter transaction in Canberra for the same product that's not just GST that's making the difference in cost.

The sad fact is that a lot of retailers sell a poor range of overpriced items, and that they can't compete.

Retailers might argue about how we're only 22 million people on a continent the size of the continental US, but we're not uniformly distributed, in fact three quarters of us live in the big cities in the South East, which should ease rather than complicate logistics.

So, who's up for a bit of competition?

Tuesday 23 November 2010

reputation in a digital world

Just watched a stimulating presentation by Allan Rusbridger of the Guardian on the future on media and newspapers in particular (the lecture in full is available on the ABC 702 site and there's a nice summary of what he said about twitter on The Age).

Now what he's saying actually has resonances with what is happening to scholarly communication.

In the age of social media the gates to publication are no longer controlled by a group of older men who are journal publishers and who invite their friends to carry out peer review. That's not to say it didn't work quite well for the past 150 years, but we all know there have been incidences of nepotism and worse.

However, that's not the case anymore. These days are gone. More so in journalism than academia, but they're gone. The genie is out of the bottle.

Anyone can publish anything, and can publish the data and the analysis that they used to back it up.

And if it's interesting it will be picked up and will spread virally. Retweeted, linked to, cited and the rest. Without wanting to seem to be waving parts of my anatomy, a couple of posts of this blog have been picked up by a journalist working for the Wall Street Journal, and I've had follow up questions and comments from a range of reasonably reputable people.

Now if you get an email out of the blue asking for your opinion of X on the basis of something you wrote you do a few checks - such as googling the person concerned to see that they are who they say they are - and if they check out you probably put some effort into a more detailed reply than you otherwise might.

In other words you are assigning an implicit reputational score.

The same with twitter. You (usually) only follow people you find interesting (and/or witty). You assume people follow you for the same reason. Twitter's 'who to follow' suggestions works the same way by suggesting people followed by both those you follow and people who follow you. At best it can be frighteningly good at picking out the twitter personas of people you know professionally. In my experience, while it might suggest people that you don't want to follow for a variety of reasons, it very rarely comes up with oddball suggestions.

And what of course is happening is that you are building a web of trust. Not perfect but no worse than with people you meet at a conference.

So in essence you vouch for people and people vouch for you. A tacit version of ebay's scoring system for sellers and buyers. If 10 people say Fred is a good person to deal with he probably is. You don't know these people but you go on what they say because enough of them say the same thing.

Another such example is Boden a UK clothes website. Selling clothes, primarily to women, online is difficult as users want to know how it fits and hangs on them, not how it hangs on a skinny model at least ten years younger than the average age of the people using the site. Boden encourages users to post anonymous reviews of products. No one reading the review knows the age or shape of the person writing the review, but if there are a number of reviews all saying that the material is too shiny or the cut too tight, it's likely that there is a problem. Again the reviews are anonymous, but you go on what they say because enough of them say the same thing - essentially the same thing that goes on in opinion polling. Not totally accurate but close enough.

Translating this to scholarly publication, what this means is that if we assume that the people who follow or regularly read particular more academic blogs are a self selecting population of interested individuals we can then start to say that if they cite posts, either as links in articles they themselves write or as retweets, it suggests that the article has some worth, just as in the old days one would track citations in the science citations index to decide if a particular paper was worth following up on.

And by examining the cross links, the social graph, one can define the inner community and consequently identify the loonies. Basically they may cite you but no one in the group cites them.

So link analysis allows you to assign weight to posts by people you may not know. And this probably benefits less established scholars, as if they say interesting things it is likely to be picked up on and the set of bidirectional links established. This isn't particularly new - web metrics companies such as Alexa have been using the number of inbound links as a reputational index for a number of years, and Google, as we all know uses links in its page rank algorithm as part of its ranking of a site's likely relevance.

And of course it is possible to establish these reputational scores algorithmically. And we also lose the distorting effects of work that is published in a journal with a generally higher reputational score being ranked higher than work of equal significance being published in less well regarded journal.

For example if the editor of Nature was to ask me to rework this as an article for publication, I'd probably get a note of congratulation from one of the great and the good of the institution I work for. I doubt however if I would get one if the article ended up in the Australian Journal of Research in Information Science. Of course on the whole Nature chooses wisely and chooses articles of significance.

However, if it is the case that I write something that is published in an obscure journal, it is likely the article will be missed and treated as being of little significance, purely due to the journal of publication. Certainly bibliometric systems such as Socrates tend to weight publications on the basis of the journal of publication rather than just an assessment of worth.

And this is equally important with dataset citation. There are no journals. And if the dataset is a result of experimental or observational work the likelihood of its reuse will depend on the reputation of the research group that produced it. The same is true of literary corpus's. We trust the data more if we trust the people who produced it.

Reputational scoring based on the social graph of the author rather than raw citation rates increases the chance of innovative work being picked up on. And that is surely a good thing.

Monday 22 November 2010

Composable Environments

Interesting seminar from Ian Dolphin of the Sakai Foundation on virtual research environments.

He's promised to make the slides available so I won't provide a blow by blow account that the nub of the problem is that the environments have got to consider reusability and remixability, so that tools can be added and reused easily by individual groups of researchers.

Definitely resonances there with the work I'm involved in on data reuse with ANDS and collections interoperability (which is really dataset reuse) with Project Bamboo.

And the reason is that if people are going to engage in cross institutional and cross disciplinary research they need access to data sets sitting in archives and share datasets generated.

To take a simple example: if one wished to do an analysis of early 19th century squatter settlements, such as the informal one at Ororral Valley, one might want to tie the names of these settlements to a thesaurus of aboriginal place names and a GIS system and show that settlements tended to be on grassy paddocks that were also good kangaroo hunting grounds and therefore possibly providing reason for conflict.

So it's my view is that it is not about composable environment but being able to connect data sources easily to these environments to facilitate reuse not only of tools but data, which of course means standards for both tools and data.

And as we know that quite a bit of academic work has skipped the fence with collaborations taking place on Google Docs and wikidot, and with collections of material being hosted on flickr,
we need integration as well with external tools to harvest and ingest external content.

The advantage about this is that this also provides a way of capturing scholarly output, so that people can not only co-operate on research but deposit the results electronically and make available both pre-prints and datasets for review, as well as generating researcher profiles for other purposes.

Potentially powerful, but I don't think we're there yet

Supplicants of the 802.1x kind

last week I recounted my near total success getting crunchbang and ubuntu 10.10 to not only live happily to but to connect to our campus network.

The executive summary is that Ubuntu works with our secure network and Crunchbang doesn't, because the Crunchbang supplicant, the software that mediates the connection doesnt't support GTC.

So the logical thing would be to upgrade the supplicant on Crunchbang. Well there's an alternative, Xsupplicant, that appears to work. It doesn't - it turns out that the configuration tool is broken under Cruchbang, and possibly other configurations and doesn't let you browse for a certificate.

Now one could get all geeky at this stage and go and edit the raw configuration files to fix it. This is of course stupid as a solution as you've then got to explain to users how to use an editor, edit system configuration files to which they may not have access etc etc.

Not sustainable.

Earlier versions of xsupplicant appear to have the same requirements of virtuoso editing of configuration files under Crunchbang, so are equally unsustainable.

The other solution I guess would be to add the appropriate repository from the Ubuntu 10.10 distribution and force an upgrade and hope one didn't break any dependencies. This at least would have the merit of being scriptable, and would mean users would still be using the network manager tool they know and love to set things up after running the magic script ...

print on demand ....

I have periodically sounded off about the use of print on demand technology to make available out of print books.

I now have another example - I was searching for a copy of Strabo on AbeBooks, and the search threw up some copies available from BookDepository - all print on demand.


Apart from the arguable lunacy of having a print on demand book shipped from the other side of the planet, such a nice example of selling useful low volume texts in paper format.

Of course that begs the question that it should also be available as an e-book, but so far an electronic version is proving elusive ...

Friday 19 November 2010

Crunchbang and Ubuntu 10.10 on the same machine

Following on from my success installing Ubuntu 10.10 on a physical machine I thought I'd install Crunchbang 9.04 side by side to see if it played nice.

And it almost did. Nine out of ten - not completely free of errors but pretty good.

Installing was fairly straightforward, but the automatic resizing routine had problems mounting the existing ubuntu swap partition /sda5 but if you ignored these installation was fine. My only problem was that the mousepad was skittish and I ended up plugging in a usb mouse.

On restart Crunchbang had of course made itself the default operating system, but Ubuntu started and ran correctly. Rebooting and going into Crunchbang also worked as did various combinations of restart and power down.

In fact my only real gripe was wireless networking. We have two networks on campus - the first one is an older slower network that doesnt require complex end user configuration - basically just like in a lot of hotels and airports, connect to it, open a browser, and in this case, just login.

We also have a second more secure network, that requires a little more finger in the ear stuff. Crunchbang almost worked, detecting the network type and authentication scheme correctly, only to fall at the last hurdle by not providing GTC as an option for inner authentication. (Ubuntu 10.10 provides this and it works well)

This is kind of important as our secure network uses the same security model as Eduroam, meaning that one couldn't easily take Crunchbang to another university campus and expect to connect, and hence reducing Crunchbang's uselfulness as a lightweight environment for checking mail etc ...

Thursday 18 November 2010

Authenticating in Academia

Back in August I blogged about a UMich blog service that had a number of means to authenticate user comments but not Shibboleth or a local UMich id.

This actually is quite a serious problem, as if agree that there is scholarly communication taking place in the blogosphere, we want comments to be authenticated, if only as a hurdle to prevent the comment system from being gummed up with salacious e-mail invitations from loose moralled East European floozies and invitations to buy various types of performance enhancing drugs which I have no need for.

We probably don't wish t restrict comments to people who can authenticate via shibboleth as there are range of people who can't provide a shib based id.

Valid reasons include:
  • Their university doesn't yet provide an IdP
  • They work for a non .edu institution, eg .gov or .org
  • They're an adjunct or an affiliate and use a non .edu account for correspondance
So how to solve the problem.

Clearly one needs to provide an authentication mechanism that allows people to authenticate by a range of means, but I didn't have a solution until I came across this email from Bob Morgan on one of the Shibboleth lists:

In my recent talk at the Internet2 Member Meeting I showed some examples of sites accepting both SAML-based federated signon and OpenID (eg NIH). In the same session Russ Yount from CMU talked about their plans for a "social network" proxy/gateway for their environment. As an aside, I observe that sites interested in this kind of thing these days tend not to focus on OpenID per se but on whatever protocol is needed to bring in the sites where the users are (OAuth for Twitter, proprietary for Facebook, etc). This protocol standardization failure creates a market opportunity for services like Janrain and Gigya.

If you'd like I could put you in touch with the people at UW who put together this interface:

https://isds-auth.cirg.washington.edu/distribute-auth-gate/gate.php?req=%2F

which supports UW and ProtectNetwork via InCommon, and Google via OpenID.

Which probably would do the job really nicely, except that we still have the problem of knowing who someone is and more importantly weighting their comments.

The (possible) need to weight comments of course comes from the need to provide some evidence of peer review - in the points means prizes world of contemporary academia, if one wants to have one's blogging considered as evidence of professional esteem one needs to show that one is having some sort of meaningful interchange with one's peers.

However it's not just counting .edu's after all there is nothing to stop Professor V. Eminent happening across your work via the facebook Byzantine Prosopography group and using his facebook account to post a comment.

So in a federated world how do we assess weight ? Or should we just not bother?

Wednesday 17 November 2010

RSS feeds and (re)usability

We all know what an RSS feed is these days - basically it allows the syndication and reuse of content.

And this content can be anything, text, images, data, and as such provides a really easy way of getting the data out from one instrument and into something else. A nice example is the Canberra weather feed which provides a nice set of structured data wich you can periodically poll and extract the data you want to display it in a nice little application.

This is in fact what iPad and Mac weather widgets do. And that's fine for structured data. As it's structured we can interpret and pick.

Then we come to RSS as a substitute for usenet news, or indeed using it for article and document syndication, and then using applications such as Google reader as an aggregator, or newsreader substitute.

And this is where we come to usability, and twitter can teach us a lot here.

One of the virtues of twitter is that it limits you to a 140 characters, so, if you share links on a regular basis it's basically article headline, article source, and a shortened url.

Headlines of course can be misleading, but it does have the virtue of being easy to scan a scad of posts and choose the interesting ones.

RSS aggregators (well now Bloglines has had a near death experience, we basically mean Google Reader) uncritically display the feed content.

Which is fine. Except if one wants to scan a load of aggregated material in a hurry one only really wants the first paragraph or so.

But of course there are these publications (ok the Guardian) who put the full text of every article in the feed - great is you're viewing it a a Guardian reader app, not so great if you are using an aggregator. And this is why the Guardian's feed is less usable than say the SMH's.

This of course doesn't matter if you are using individual custom apps, as they can be configured to work with the individual feed. However generic readers are different, they have to work with all feeds and rely on the individual feed being sensibly formatted.

For example, a number of blogging solutions allow one to configure the feed to only show the first hundred or so words, the idea being to attract eyeballs to content (hey, this post about the Byzantine spice trade looks cool, let's go read some more).

And this works well in aggregators - enough to scan to see if it's worth following up.

Now, one could of course write an aggregator that read a feed and displayed only the the first sentence of an article, or the keywords, or the first hundred words or whatever. But the trouble with rules like these is that they will break something, like the really key update that's a 102 words long that's not worth clicking through to the article itself.

Configuring the rss content at the originator end at least means that the originator knows not to put anything more than a 100 words in the first paragraph, and to make it punchy (The spice trade played an important part in the sex lives of Byzantine Greeks ...)

But then there's a view that RSS is only a distribution mechanism, and certainly newspaper publishers like the idea of locking people into individual applications rather than having them read widely, but if one wants to read widely one needs some sort of aggregator like tool.

Google Reader (it's what I use so that's why I use it as an example) is basically no more sophisticated than the Pan Usenet newsreader.

Perhaps what we need in these presentation centric days is something more like paper.li, a twitter aggregator that aggregates content from twitter feeds.

It of course requires processing time and power and hence is not realtime. The trick would be is to offload processing to the client machine and download all feeds raw, but then we start getting away from the great advantage of web apps - it's always the same be you on mozilla, chrome or ie and if you're on a mac, a pc, or some ten year old recycled machine running linux...

So I'm stuck. Remote and simple is truly platform agnostic, local and sophisticated starts asking questions about os and host restrictions ...

Tuesday 16 November 2010

storage, storage, storage

The key to the digital archiving game is reliable, replicated persistent storage, ie storage where we can put stuff in and be assured that what comes out the other end at a later date is what we put in.

On small archives this is simple to do of course, you copy the files multiple times, and periodically do md5 checksums to makes sure that they are the same as the original checksum and hope to do this often enough that you don't end up with all the copies going bad.

Statistically this is unlikely, although as David Rosenthal has recently pointed out the bigger your archive and the longer you keep it the greater the chance of spectacular failure. However, for most small academic archives this is less of an issue, principally as the size of the archive and the hardware refresh cycle should mean that the problem of disk reliability decreasing with age is less of a problem.

Basically if you buy replace the hardware every three years you should get twice as much newer and more reliable storage for your dollar. And the size of the archive is such that you can probably even do periodic tape backups as a belt and braces exercise.

Large archives are of course different and have various problems of scale.

However one problem that happens is vendor change. Vendor regularly decide to stop making things. For example we had a student filestore technology solution based on Xserves that did replication and the like and could conceivably have been turned into an archival filesystem.

Apple have of course, end of lifed the Xserves. Which means that the solution will need to be migrated to new hardware. Whether this is an opportunity or a challenge I suppose depends on your view of life. And to be fair we only started with Xserves to provide better AFP support.

Now for student filestore we have a range of options from migrating to Stornext to outsourcing the whole thing.

Archives are different. While student data is as valuable as any other data it is short lived, meaning that as long as we can move the files reliably once we probably don't need to move them again.

Archival filestores are of course different. Even if we only plan to keep the contents for ten years, that's three migrations. If we think about keeping stuff for a lifetime that's twentyfive migrations, each with their attendant problems and risk of corruption.

Now most migrations go smoothly, and of course you (usually) have a usable backup.

Ninety percent of most reasonably large archival stores are never accessed after the first few years, so there is a temptation to save costs by only actively verifying the more commonly accessed data, which of course means we start to risk silent corruption.

Now I have a lot of photos online of my wife and cat. And I'll probably still occasionally want to look at them twenty five years from now. Can I be assured I can access them? Or pdf's, or this blog?

And of course these sit on large commercial providers. For small academic archives the problem is worse as they may hold the only copy and be resource limited to test and verify, leaving them at the mercy of migration anomalies, especially as these migrations tend to be single point in time changes, rather than the evolutionary changes seen in large archives ...

netbooks vs ipads

I've periodically ranted about pad based computers and their (dis)advantages vis a vis netbooks.

Well at last week's eResearch Australasia conference I periodically sneaked a peek at what other delegates were using. Given the sort of conference audience you get at computing events you'd expect them all to be technophiles, but it was quite interesting:
  1. roughly half the delegates in any presentation used pen and paper to take notes
  2. of the remaining half very few had full sized laptops
  3. there was a roughly 50/50 split between netbook users and ipad users
  4. most of the netbook users seemed to be running windows 7 (based on a small sample size generated by shoulder surfing)
items 1 and 2 are easy - just as I found when I went to Providence RI for two and a half days earlier this year, netbooks are lighter and easier to deal with than full size computers, and with pervasive wi-fi, being able to access cloud based services such as windows live and the google ecology frees you from the need for grunt. I was one of the pen and paper brigade, I took my office MacBook with me and found, elegant though it is, it was just too heavy and clunky to balance on my lap - using it in the departure lounge Brisbane airport to do some work left me with cramp in my left leg. After that the laptop stayed in my hotel room -next time I'm definitely taking a netbook.

Item 3 - I guess it's what you would expect given we're over six months into the ipad frenzy. What's interesting is that of this highly computer literate audience not everyone had rolled over - some people clearly preferred being able to have a versatile machine they could type on.

Item 4 was a surprise - I'd have expected more linux users, but then most netbook linux interfaces are dumbed down in an effort not to scare the children and windows 7 is (a) pretty good and (b) has a massive software base - it's windows that has an app for everything - not Apple. And, having installed Ubuntu 10.10 on a computer, it's good, but comparing it to both my windows 7 and OS X machines, not better.

And that's a consequence of Microsoft having put the Vista disaster behind them - Vista was unbelievably clunky and XP was distinctly unslick, which made linux an attarctive option, especially as you could upgrade these XP machines to Linux without having to buy new hardware.

Times have moved on, most of that hardware has been replaced by natural attrition and W7 is the product Vista should have been making it a highly attractive option. And machines running W7 are a lot cheaper than Apple's offerings.

I could imagine a scenario, especially given the current economic climate, where Apple could see MacBook sales cannibalized both by the iPad by people who just want a content access device, and W7 from those who want a general purpose computer ...

Monday 15 November 2010

eResearch Australasia 2010

I spent last week at the eResearch Australasia conference on the not quite as sunny as expected Gold Coast.

A lot of the value of conferences comes from talking to people but there were a number of presentations I thought were particularly interesting:

Helen Bailey : e-Dance: pioneering e-research in the arts

Those who know me will know I'm not noted for my tolerance of touchy feely waffle and I did start out feeling fairly negative about this presentation but I warmed to it when she began to describe how the methodology they had devised to allow two dancers in separate studios to dance together over the internet had also given them a way of recording dance choreography.

I remember a discussion with Allan Marret one evening in Darwin about recording indigenous dance as part of the NRPIPA. At the time there was no clear solution, but one could see that Helen Bailey's work could be adapted to provide a mechanism for doing this, not just for Australian Aboriginal performance but for indigenous performance worldwide.

Bryan Lawrence: Provenance, metadata and e-infrastructure to support climate science

Very simple, very straight forward but very interesting presentation on the value of provenance, and implied trust when dealing with older datasets

David Carlin, Jane Mullett : Performing data: the Circus Oz Living Archive

Fascinating, and witty paper asking fundamental quiestions about what does one do with this archived stuff in a performing arts context?

René van Horik, Dirk Roorda : Smart migration of file formats: the MIXED framework

Document formats change over time, meaning that we need to store them in a well known format to be able to read them later. Very interesting solution especially in the light of Pete Sefton's work on using ePub as an intermediate storage format for text documents.

It's since struck me that the (manifest+contents) model can be extended to cover things like spreadseet data by saving the columns and then saving a description of the meaning of the document as part of the saved archive - portable metadata

Andrew Wells : Growing virtual research environments in the fine arts: tricks and traps

Interesting presentation on how what started off essentially as building a digital version of an existing print resource turned developed a life of their own

Toby Burrows : Archiving Humanities Data for E-Research: Conceptual and Technical Issues

An interesting presentation from a practicing historian that explained what a medieval historian would want out of a solution, rather than what people taking/extending the scientific model of data sharing might think. Especially enlightening both in the light of my personal dabbling with Sighelm, and with my involvement in project Bamboo

Simon Porter, Lance De Vine, Robyn Rebollo : Building an Australian User Community for Vivo: Profiling Research Data for the Australian National Data Service

Vivo is an interesting product as it allows one to automagically tie together researchers publication data, research projects, and HR information to generate such things as researcher hompages, citation data, and link directly to content, and thus help build a research community, as well as satisfying sunding council reporting obligations.

There were a lot of other papers, some were less fascinating but none of the ones I went to was a complete dud. And as always there are a few presentations one wished in retrospect you'd gone to:

Anna Gerber,Roger Osborne, Jane Hunter : Visualising Australian Literary Networks

Pauline Mak, Kim Finney, Xiao Ming Fu,Nathan Bindoff,Ming Wang : Building the Polar Information Commons Cloud (on a Shoestring)

And I couldn't go without mentioning the ATSIDA poster session - a really good example of archiving cultural artefacts and a project I'm deeply envious of and wish that I'd done when I was with AIATSIS ....

Friday 5 November 2010

What my Sighelm obsession has taught me

Regular readers will know that I have recently become mildly obsessed with the question as to whether an AngloSaxon cleric went to India in the late ninth century.

Now while undoubtedly geeky, it's also been extremely valuable as a demonstrator of what can be done purely from the desk.

Almost all the research was done with wikipedia, search engines, online texts, and Google Books, often using it to follow up on wikipedia articles to check references.

I did check my paper copy of the AngloSaxon chronicle for the text of the entries of 883 and search my copy of Debby Banham's Food and Drink in AngloSaxon England to check on information on the use of pepper, although using Google Books to search Katherine Beckett's book turned out to be more useful. I also bought myself a copy of the Penguin Classics edition of Asser's biography of Alfred as I couldn't find a decent text online.

Otherwise everything else was done with online search. One interesting thing is that Google doesn't always turn up the best answers when searching for obscure items - Bing and Yahoo are sometimes more useful.

The other thing of course is the use of a wiki page to structure, edit and re-edit the document. Google Docs or indeed any competent word processor would have been able to do the task, but being able to have a lightweight living draft open when searching was useful.

Professionally it's been valuable - it's given me an insight as to how easy desk based research in the humanities are and how powerful and valuable online resources are.

It used to be joke that the cheapest subject to support in a university was Maths, as all they needed was chalk, a blackboard and a supply of coffee.

Well, I think I have demonstrated to myself that some humanities research requires little more than a computer and an internet connection. Less flippantly, it's also given me an insight as to what a digital humanities workspace, such as that envisaged by Project Bamboo will have to deliver, and how it will have to incorporate easy mechanisms to incorporate new and existing resources.

And Sighelm? Do I think he went to India?

I think he probably did, but whether one Sighelm went or two Sighelms went is a different question.

Wednesday 3 November 2010

Sighelm

Yesterday's discussion of whether an Anglo Saxon bishop went to India or not so piqued my curiosity that I wasted way too much time trying to track down the evidence, or not.

I've put together a wiki page on what I've managed to glean - which is not much more than we started out with.

As an aside I've learned a lot more about using Google Books, in combination with various book search sites, to track down online sources, which is kind of useful ...

Tuesday 2 November 2010

Scholarship in the age of the internet

Tenthmedieval and I have been having a little discussion about my recent post 'Evidence of Connection' and the intriguing idea that Alfred sent one Sighelm to Kerala, and whether he came back with spices, or whether Indea was a copyist's mistranscribing of Iudea. In this evidence as to whether Sighelm actucally came back with pepper seems crucial, given Kerala's role in the spice trade.

Now, I'm a geek, and couldn't resist googling Sighelm to see what I came up with. The earliest online reference I could find was in Robin Kerr's General History of Voyages from the late eighteenth century.

This reads:

Voyage of Sighelm and Athehtan to India, in the reign of Alfred King qf England, in 883 '.

Though containing no important information, it were unpardonable in an English collection of voyages and travels, to omit the scanty notice which remains on record, respecting a voyage by two Englishmen to India, at so early a period. All that is said of this singular incident in the Saxon Chronicle, is *, " In the year 883, Alfred sent Sighelm and Athelstan to Rome, and likewise to the shrine of Saints Thomas and Bartholomew, in India, with the alms which he had vowed." [Bartholomew was the messenger of Christ in India, the extremity of the whole earth.]—The words printed in Italics are added in translating, by the present editor, to complete the obvious sense. Those within brackets, are contained in one MS. Codex of the Saxon Chronicle, in addition to what was considered the most authentic text by Bishop Gibson, and are obviously a note or commentary, afterwards adopted into the text in transcription.

This short, yet clear declaration, of the actual voyage, has been extended by succeeding writers, who attribute the whole merit to Sighelm, omitting all mention of Athelstan, his coadjutor in the holy mission. The first member of the subsequent paraphrase of the Saxon Chronicle, by Harris, though unauthorized, is yet necessarily true, as Alfred could not have sent messengers to a shrine, of which he did not know the existence. For the success of the voyage, the safe return, the promotion of Sighelm, and his bequest, the original record gives no authority, although that is the obvious foundation of the story, to which Aserus has no allusion in his life of Alfred.

" In the year 883, Alfred, King of England, hearing that there existed a Christian church in the Indies, dedicated to the memory of St Thomas and St Bartholomew, dispatched one Sighelm, or Sithelm, a favourite ecclesiastic of his court, to carry his royal alms to that distant shrine. Sighelm successfully executed the honourable commission with which he had been entrusted, and returned in safety into England.

After his return, he was promoted to the bishoprick of Sherburn, or Shirebum, in Dorsetshire; and it is recorded, that he left at his decease, in the treasury of that church, sundry spices and jewels, which he had brought with him from the Indies."

Of this voyage, William of Malmsbury makes twice mention ; once in the fourth chapter of his second book, De Gestis Regum Anglorum ; and secondly, in the second book of his work ; entitled, De Gestis Pontificum Anglorum ; and in the chapter devoted to the Bishops of Shirebum, Salisbury, and Winchester, both of which are here added, although the only authority for the story is contained in what has been already given from the Saxon Chronicle 3.

" King Alfred being addicted to giving of alms, confirmed the privileges which his father had granted to the churches, and sent many gifts beyond seas, to Rome, and to St Thomas in India. His messenger in this business was Sighelm, bishop of Sherburn, who, with great prosperity, which i9 much to be wondered at in this age, penetrated into India ; whence he brought on his return, splendid exotic gems, and aromatic liquors, of which the soil of that region is prolific."

" Sighelm having gone beyond seas, charged with alms from the king, even penetrated, with wonderful prosperity, to Saint Thomas in India, a thing much to be admired in this age; and brought thence, on his return, certain foreign kinds of precious stones which abound in that region ; some of which are yet to be seen in the monuments of his church."

In the foregoing accounts of the voyage of Sighelm, from the first notice in the Saxon Chronicle, through the additions of Malmsbury, and the amplified paraphrase by Harris, we have an instance of the manner in which ingenious men permit themselves to blend their own imaginations with original record, superadding utterly groundless circumstances, and fancied conceptions, to the plain historical facts. Thus a motely rhetorical tissue of real incident and downright fable is imposed upon the world, which each successive author continually improves into deeper falsehood. We have here likewise an instance of the way in which ancient manuscripts, first illustrated by commentaries, became interpolated, by successive transcribers^adopting those illustrations into the text;

and how many fabricators of story, first misled by these additaments, and afterwards misleading the public through a vain desire of producing a morsel of eloquence, although continually quoting original and contemporary authorities, have acquired the undeserved fame of excellent historians, while a multitude of the incidents, which they relate, have no foundations whatever in the truth of record. He only, who has diligently and faithfully laboured through original records, and contemporary writers, honestly endeavouring to compose the authentic history of an interesting period, and has carefully compared, in his progress, the flippant worse than inaccuracies of writers he has been taught to consider as masterly historians, can form an adequate estimate of the enormity and frequency of this tendency to romance. The immediate subject of these observations is slight and trivial; but the evil itself is wide-spread and important, and deserves severe reprehension, as many portions of our national history have been strangely disfigured by such indefensible practice"

So Kerr had doubts, but also gives us clues as to where the enhancements may have came from.

Fascinating though this is, I actually want to make a serious point - I found this out in three minutes with a Google search. It's not scholarship, but is shows how powerful and useful having resources online and instantly searchable is. It also shows that time spent tracking down resources is really no longer part of scholarship, but that knowing, understanding and analysing things (still) is.

So, desk based research is possible, even easy. What is interesting is that while the mechanics of e-research are simpler, the processes of scholarship remain the same.

This presentation from last month's educause gives one view, one in which well regarded blogging and publication online journals are seen as important as print journals, ie one in which we are still talking implicitly about peer review, or more accurately being able to demonstrate that one's work is held in reasonable regard by other scholars working in the field.

Monday 1 November 2010

Ubuntu 10.10 on a real machine.

Building on my trouble free installation on Ubuntu 10.10 on a VM I though I'd try it on a real machine - a very standard Dell Latitude E3400 laptop which had previously had XP installed on it.

It almost 'just worked' - the automatic partitioner tried to save the XP installation but had difficulty doing so and failed nicely - I suspect the presence of a hidden recovery partition caused it a problem, but it did fail nicely, offer me the chance of a manual partition, or just going and blasting everything and starting over.

Other than that everything just worked, including wireless, and after 20 minutes I had a working Linux laptop - quietly impressive ...

Evidence of connection

Tenthmedieval this morning has a rather wonderful piece on some of the less thinking coverage of the discovery of a rather corroded Chinese coin in East Africa.

I think it's fair to say that the areas that are now Somalia, Kenya and so on have always been connected as long as people have sailed ships - there's evidence of Hellenistic and Roman contacts to say the least. And these trade networks persist across political and religious changes because they are useful. So it's not surprising that the Chinese followed existing trade routes and reached east Africa, just in the same way that Chinese merchants followed the sea cucumber trade and probably ended up in Arnhem land - certainly rock art paintings show their Makassan trading partners from what is now Sulawesi did.

And if we were to find a record of a kangaroo in China it might be surprising, but not inexplicable.

I increasingly find the connections between cultures fascinating. And why certain trade routes developed they way they did, often due to geography - ocean currents, mountain ranges, availability of resources etc.

There is a tendency in western society to think that we discovered them and that they occupied separate little compartments.

They didn't. The Assyrians traded with India via Dubai and Bahrain. The Greeks went to Afghanistan on the back of Alexander's conquest of Persia and carved some rather nice portraits of the Buddha in very Greek looking robes. Chinese traders and merchants expanded over large parts of south east Asia. And of course people met and traded. It's why the Staffordshire hoard contains jewels originating from India. Not that an AngloSaxon warrior went to India (although one might) More likely it was traded via Byzantium and Dubai (or via Somalia and Egypt).

Renaissance Europe 'discovered' South East Asia and Africa due to trying to cut out the middlemen in the spice trade, and in the course of doing that came across a range of societies previously unknown to them. These societies were of course not unknown to each other.

And this is different from the situation in the Americas where the Amerindian civilisations developed independently, or Australia, which while known to Indonesian fishermen, was viewed as being hostile and valueless. If one of these fishermen had known there were opals out there in the desert, history might have been different

Friday 29 October 2010

Resourcing academic computing

Have just been to a rather interesting presentation on the ARCS data fabric.

The ARCS data fabric is in essence an initiative to build a shared storage cloud in Australia for use by researchers and also to combine this with a grid based execution environment.

I have previously written about how people tend to assemble their own toolkits of resources, and how this has some odd effects, such as wikidot becoming a wiki provider of last resort in academia. It's also the case that any toolkit of resources should include some offline storage for key documents if for no other reason that hard drives break and computers get stolen. And often an extra benefit of remote storage is that they have a permissions model meaning that you can share some of the content of your remote store with friends and colleagues, some with the world and keep some private - in effect owner:group:world. And one example of such a service is windows live's skydrive, which is bundled with windows live accounts, and which is free at the point of delivery.

Now interesting things about the ARCS data fabric is that it uses webdav and provides very similar sharing functionality to Windows Live skydrive (and the same amount of storage as default) and is currently free at the point of delivery.

Unlike the ARCS service Skydrive does not come with a desktop connector but Gladinet will sell you one that will link to a whole range of storage providers, including webdav hosts and windows live, allowing cross mounts - which for example allows you to move content from Google Docs to Windows Live.

One of the great debates we always have in university computing is whether to outsource email, and if we do, whether to choose Microsoft or Google.

Ignoring theology, the differentiator has always been that Google provides better tools in the form of Google Apps including shared document editing, and Microsoft provides better storage in the form of skydrive. It's also true to say that Microsoft is more windows oriented, but with the new Office web applications they are becoming much more agnostic about such things and that their web apps are now as functional and environment-agnostic as Google's

So, purely for the sake of argument, let us say we outsource email to Microsoft, and use the money we save to licence an appropriate connector to mount skydrive on the desktop. Now we know we don't save a lot of money getting rid of student email as we know students increasingly self outsource and use a webmail service for email. If we don't also outsource staff email we end up having to provide almost as much infrastructure as before.

So there is no great win in only outsourcing student email. But remember that going to Microsoft gets you skydrive. And this could be a radical opportunity, possibly too radical, and decide to no longer provide student filestore and tell them to use skydrive.

Now, we do save money by not providing student filestore, providing performance is at least as good as any existing student filestore, and that we trust bigM not to lose any data. Student data lives in the cloud and ideally is accessible from any location on any platform via a browser at a minimum.

Now the endpoint of this is we end up providing a service very much like the ARCS data fabric to students and by implication, all members of the university, and even these days 25GB is a reasonable amount of online storage.

So, in this scenario, do we see Microsoft as a suitable provider of storage for work in progress researchers and do we then see initiatives such as the ARCS data fabric turned into providers of specialist storage for either large datasets or to get around legal/jurisdiction related problems with sensitive data?

And does that mean we increasingly see a landscape of outsourced filestore?

Thursday 28 October 2010

Microsoft rolls over

Like a lot of people in the technology game I have multiple email addresses and blog sites, one of which is a microsoft live site which I use primarily for email for those people who never update their address books and a skydrive account which I use primarily for always on filestore to access content from wherever.

Bundled in this or rather was a service called spaces, which included a blog offering, which I must say was not as slick as wordpress or blogger, and which I hardly ever used.

But I might do in the future - I've just received an email from Microsoft that reads in part:

"Important changes are coming to your Spaces account that affect you and will require you to choose an option that is right for you. We are very excited to announce our collaboration with a premier and innovative blogging service, WordPress.com, to offer you an upgraded blogging experience. We'll help you migrate your current Windows Live Spaces blog to WordPress.com or you can download it to save for later. You should know that On 16th March 2011 your current space will close."

Which is kind of interesting. You could read it as Microsoft saying "we're having difficulty getting uptake with this blog thing, so we're going to can our product and outsource it".

And of course the two biggest providers are Wordpress and Google, which kind of leaves them with only one place to go.

Anyway, I've migrated my Microsoft blog, such as it is and you can find it at http://moncurdg.wordpress.com/. I wouldn't hurry though...

p-books versus e-books

Last night, for the first time since I fell in love with my e-reader ten weeks or so ago, I started reading a paper book, ie a traditional paperback, in fact a reprint of J F Ackerley's Hindoo Holiday.

And despite being a new edition, it is a reprint of an earlier version - when one looks at the printing it has been photographically enlarged to fit the newer 190x130 paperback format rather than the original 177x107 format, and clearly photoset from an earlier printed version.

What is very noticable after using an e-reader are the minor printing defects, eg occasional broken ligatures and other imperfections introduced by the printing process. Text is slightly less contrasty and slightly easier on the eye, but at the same time slightly more difficult to read in lower light situations.

But the thing that is most noticable is how much more comfortable a proper book is to read, purely due to not having to hold the e-reader rigidly in one hand in order to press the ipod-like next page button.

This may of course be easier on other models of e-reader, and a redesign of the next page button, say as a squeezable edge, might solve the problem.

I'm also being unfair - having been an avid reader since the age of eight I have almost half a century's experience of reading traditional books and ten weeks with one particular model of e-reader, and in time one would undoubtedly adapt - try going back to driving a car without power steering, or using a manual typewriter to discover how much one's technique changes over the years.

That said, I'm still more than happy to read books on my e-reader and have a pile of public domain books I want to read. I've also got a pile of paper books to read. It'll be interesting to see how my reading habits change over the next year to eighteen months ...

University Cuts

I've been studiously avoiding commenting on the impact of the UK's Comprehensive Spending Review on university funding, particularly on the Arts and Humanities.

Despite spending eight years studying in one way and another and twenty years working in UK universities (including a stint as an AUT branch secretary), I simply do not feel that after seven years in Australia and only an occasional visitor to the UK I have the right to comment. It's like going back to somewhere you used to live - there are inevitably changes, some better, some worse, and some just plain confusing.

However, one thing I feel strongly about the move to a 'user pays' model, and Australia is probably closer to that than the UK, is that it results in a set of imbalances in the system as students start to boycott 'difficult' subjects and subjects that are seen as unlikely to enhance employment prospects.

And so the hard sciences and the arts courses wither away and we have the rise of business studies and the like. Now, despite my occasional rants on the subject I'm not against business degrees per se. When I first started having to look after projects and purchasing contracts I could definitely have used a whole range of business skills, in budgeting, contract management, contract law, project planning and the like.

I do not however feel that Business Studies is a stand alone subject. Like a number of IT courses it is an applied enabling subject that allows you to be more effective. Not more original or innovative.

Such courses do not give you the depth from studying a complex subject in depth, be it molecular biology or the history and archaeology of the near east.

Studying complex and difficult subjects, where there are no right answers, teaches you to think, assimilate often contradictory and complex information, analyse, present, argue and the rest of it.

Of course, in a user pays environment there is also an expectation that students will get a decent degree at the end of it, and with the expansion of higher education, there are increasing numbers of less able students. The result is of course grade inflation and a drift towards safe and easy subjects.

Elitist? Yes. To continue to develop and innovate societies need to produce thinkers, movers and shakers, and to do that we need to get the best out of people, and to do that we need an environment where people can be stretched and taken in different directions. And to produce that takes money, and a tolerance of apparently useless subjects. Not for what they do, but for what they provide in the way of stimulation ...

Monday 18 October 2010

ipads and excavation

I recently tweeted a link about the use of iPads at Pompeii.

As a veteran of many discussions with archaeologists as to what sort of machine would work best down a wet hole, the answer always used to be something cheap and disposable, preferably with decent battery life. Linguists doing field work in NT and PNG have similar problems, as do botanists, anthropologists and the like, but archaeologists always seemed both to be first out the gate and to come up with the most extreme environments for data capture and recording.

Until digital technology became all pervasive other disciplines tended to stick with analog technologies as they tended to be just that little bit more robust.

Prior to the netbook revolution, the answer to what computer works down a muddy hole always seemed to be second user thinkpads or macbooks, or if the budget would stretch to it, Panasonic Toughbooks.

Post the netbook revolution, cheap machines with ssd drives seemed to be the way to go, even if they were still prone to damp and dirt and dust getting sucked in. The iPad seems to be a logical evolution - a touch screen means fewer risks of keyboards getting clogged, and the sealed design of the iPad with few if any holes helps guard against damp and dirt sneaking in.

What would be interesting is the attrition rates of netbooks against iPads.

For example, here in Australia, an iPad costs a little over $600 and a (non SSD) Samsung netbook a little under $400. Which basically means you can wreck three netbooks for every two iPads you lose.

And of course the netbook is a general purpose computer (which means it can be used for things other than data capture and data entry) and connecting usb devices and printers is a damn sight more easy than with an iPad.

And tablets for data entry in hospitals have been around for years, and while expensive these ones really are rugged and proof against fluids etc.

So, gee whizzery aside, does the iPad provide a cost effective alternative to data entry for the field sciences, and by implication to the classic field notebook?

Ubuntu 10.10

last thing Friday I built myself an Ubuntu 10.10 vm on top of virtual box running on a mac.

It just worked - ok I had to add a couple of personal favourites like kwrite and abiword, but they installed neatly and updated menus correctly. No insertion of fingers in ears or dancing round rowan trees required. Basically a Windows 7 or OS X style experience

The next stage would, obviously, be to build a real machine. but so far everything looks very good, Nothing to carp about at all ....

Friday 15 October 2010

Docs.com - a first look

I've just had a very quick first look at Docs.com, Microsoft's Facebook authenticated competitor to Google Docs. Basically:

  1. Authentication is via facebook. There appears to be no obvious way to link docs.com to an existing windows live account
  2. Editing feels to be as responsive as Google Docs, and perhaps a little better than Zoho
  3. Documents can be printed on a local printer
  4. Documents can be shared, but only with existing Facebook friends, but each document has a public url that can be given to other people to allow them to access documents providing they have a Facebook account. I havn't tested what happens - my test document is at http://docs.com/8O5U.
  5. Documents can be downloaded and opened locally with Office in a single click operation. There do not appear to be other export/download options
  6. The interfaces are Office 10 like with a tabbed structure
Would I use it?

Probably not, personally I'm happy with Google Docs for my lightweight wordprocessing and spreadsheet needs, but given that with some of the 500 million Facebook users using Facebook to the exclusion of other services it's an interesting and useful edition to the Facebook ecology.

I imagine we'll start to see documents created in Docs being submitted as part of student assignments etc in due course ...

Thursday 14 October 2010

Source Code and data archiving

Interesting article (pdf doi:10.1145/1831407.1831415) from this month's 's Communications of the ACM on whether scientists should release their source code along with experimental data for review.

It's my view that they should - large experiments in the disciplines of genomics, astronomy, physics and the like often produce terabytes of data which is unmanageable by standard processing techniques, often meaning that data is often filtered at the instrument level, and sometimes by custom built FPGA's.

And this very simply means that there is a risk of introducing artefacts due to errors in the gate array code, meaning that we are looking at the risk of producing chimeras, ie results that actually aren't there and producing the digital equivalent of cold fusion.

This is a risk in all disciplines with the preprocessing of the data, and here the source code is simply part of the experimental method and hence should be open to review. The same also refers to code designed to process the results. Errors can creep in, and not necessarily due to coding errors on the part of the people carrying out the analysis. Both the pentium floating point bug and the VAX G_FLOAT microcode bug could have introduced errors. (In fact the latter error was noticed precisely because running the code on a VAX 8650 gave different results to running it on an 8250).

And this introduces a whole new problem for archiving:

If we archive the source code can we be sure that it will run identically when recompiled and run under a different operating system and different compiler?

It should, but experience tells us that this won't always be the case. And emulation, while it helps, is probably only part of the answer.