Monday 30 January 2012

Number Crunching Historians

Only three projects left to cover in my retrospective look of papers I have published, now they are all going up on UCL's open access repository. This time, its time to go back, way back, to 2005, when e-science and cyberinfrastructure were all the rage.

A call came out from the AHRC, looking for workshops on this topic - how can we use e-science technologies in the Arts and Humanities? Now, UCL have one of the best Research Computing facilities in the world, so the question was, how could we apply this facility to humanities research? The biggest data set in the Humanities at the time (that I could think of at the time of grant writing) was the historical Census data, held at The National Archives, which was digitised by the commercial genealogy firm, Ancestry. We formed a research collaboration to hold three different workshops to look at how useful, possible, or feasible it would be to analyse the historical census data using the high performance computing facilities at UCL.

I have to say that this is one of the most intellectually stimulating projects I have worked on since completing my doctorate, as we grappled with the academic, technical, managerial, and legal issues when attempting to apply HPC and scientific methods to historical data sets. We brought together disparate expertise on history, records management, genealogy, computing science, information studies, and humanities computing, to ascertain how useful or feasible it would be to set up a pilot project, applying e-science methods to the dataset.

However, whereas scientific data tends to be large scale, homogenous, numeric, and generated (or collected/sampled) automatically, humanities data has a tendency to be fuzzy, small scale, heterogeneous, of varying quality, and transcribed by human researchers, making humanities data difficult (and different) to deal with computationally. The conclusion of the series was that there was not the quantity nor quality of information available to allow useful and usable results to be generated, checked, and assessed. Automatic record linkage was the main thing on the wish list from the historians, but this was impossible given the gaps in the historical information. The problems were not technical (we could mount everything on the system and run matches), but methodological (because of inherent issues with census data, the results of any analysis would be problematic).

Some things to say about this: It would be worth revisiting the historical data that is available soon. Crowdsourcing wasnt a terribly well adopted technique at the time - the FreeBMD had just started, for example, transcribing the Civil Registration index of births, marriages and deaths for England and Wales. Since then, there has been a huge uptake and interest in contributing to these resources - what historical data sets exist nowadays that didnt then? What can we use to do useful research, and what of that research, can we automate across a large scale?

I still plan on doing something, at some point, with Research Computing at UCL. I'm signed up to their next training course so I can get retrained on how to mount data on the system. I still think we need to think carefully about how and why we need to use this level of computing on humanities data - but if there is anyone out there with a huge data set that needs some number-crunching, you know where to find me if you fancy talking collaboration...

The grant was put in in 2005, the research was done in 2006, the paper accepted in 2007, but even for an online journal there was a bit of a time lag and it didnt come out til 2009 - just goes to shows you that online journals dont always publish quicker than print. (I'm one of the general editors of the journal in question, I should say, so its as much my fault as anyone elses).

So here's the paper. It's one of the ones I'm most proud of, even if the result of the workshop series was "its never gonna work!":
Terras, M (2009). The Potential and Problems in using High Performance Computing in the Arts and Humanities: the Researching e-Science Analysis of Census Holdings (ReACH) Project. Digital Humanities Quarterly, 3 (4). PDF.

Who ya gonna call? DH Winebuyers

Claire Warwick has blogged about the, erm, adminstrative snafu that meant no wine appeared
after her inaugural
. So its probably ok to post this pic of the team undertaking the wine run to the nearest supermarket (right to left): Me, Simon Mahony, Andy Hudson-Smith and Steven Gray, with Tim Weyrich on camera duty plus wine carrying duty. Also thanks to the Dean, who dealt with payment, and choosing the vintage of the red.

We like to think no-one noticed. Until we told them. and tweeted. and blogged...

Thursday 26 January 2012

Congratulations, Claire Ross!

Image (c) Matt Clayton/UCL Grant Museum. The QRator project in place - the brain-child of Claire Ross. The skulls are not Claire's, you'll be relieved to hear.

I'm terribly proud and excited to share the news that one of my PhD students, Claire Ross, has won the student category of the UCL Provost’s Awards for Public Engagement.

As Hilary Jackson
(UCL Public Engagement Unit) explains:
Claire was nominated for engaging museum visitors with collections at UCL and beyond, using innovative, digital methods and social media applications. The selection panel loved the fact that this subject is plainly not just Claire’s PhD, but her passion. What’s great is that Claire’s work, alongside colleagues on the QRator project (amongst others), has enabled the public to influence what’s going on in UCL’s museums and the university more widely.
Its important to note that this award is competed for from the whole student body - around 18,000 students - currently at UCL. Its a real honour for Claire to win, and shows the fantastic work she has been doing in her research, but also for UCL Museums, and UCLDH. Well done Claire, I'm proud to be your supervisor!

Monday 23 January 2012

On that there infographic: some critical discussion

The reception to the infographic I put together on Quantifying Digital Humanities has been very positive. In the first 12 hours of it being online, 2600 people had viewed it. At time of writing, 3665 people have looked at it. Which amuses me - many more people have viewed it than are self confessed members of the "Digital Humanities" community.

I've had many notes saying "this is great", some suggestions for other sources of information, particularly financial information from other countries, and the spotting of the one typo that managed somehow to get through our rounds of proofreading. Its all good, keep 'em coming.

There was some chat on the twitters about whether this was up to date. Thatcamp, for example, already has over 2000 subscribers. The Australasian Association for Digital Humanities joined ADHO on the 1st of January, as did Centernet, but these are not reflected in the counts here - I had to draw a line somewhere, and lets give them a chance to get their members all signed up before we start counting them, eh? There was also some discussion about the fact that these figures are.... well... so small? Have I shot my discipline in the foot by pointing out that there arent actually that many of us, self identifying as Digital Humanists?

I'm also aware of the things that aren't in there. I feature, for example, Digital Classicist, and Digital Medievalist, and Antiquist. Are these part of "Digital Humanities"? Is "Digital Humanities" just the group of people, and events, allied to ADHO and its respective societies? I'm very aware of Digital Classicist and Antiquist - these are the lists I hang out most on, most aligned to my areas of interest. But in the infographic, where is the Association for Computing and History? Where is the Computers and History of Art association? Where is the Digital Resources in the Humanities and Arts conference? (and if anyone has a better, centralised URL for that, do let me know, they suffered spectacular failure to renew their domain name). Where is the Association for Computational Linguistics? The list goes on and on. What else would you count as Digital Humanities that didnt make it on?

Why didnt I include these different things on the infographic? Partly because no-one from these organisation or domains came forward to give me stats, as I'm not on their radar. Partly because of my own personal bias, I suppose: my Digital Humanities is going to be different from yours. And partly because I did try to focus on the associations which make up ADHO, and their related initiatives. Or things that call themselves "Digital Humanities".

Which leads me to think, there is a lot of stuff out there which is Digital Humanities (the use of computers and computational techniques within the humanities), but only a subset of that which calls itself so, or identifies itself with the community. Which is fine. But we at ADHO may have to work at better reaching out to the constituencies we dont currently reach, if we are to be the voice of DH at any level. Or should we not? Should we just continue to work on the type of activity which self identifies as Digital Humanities?

There are other gaps in the infographic. The funding section - there is a lot more investment in infrastructure worldwide than featured here. But look at the stacks o' cash already listed. We're so bad, as a community, at showing where the successes are, and the use and usefulness of that investment. I'm almost scared to ask, quite frankly, for usage statistics from the projects mentioned, or outcomes, etc. That is the nature of research, I know, but if I am presenting piles of cash as a good thing - look! see! DH exists as it has had money thrown at it! - then surely we should be better versed in saying what the results of all this were?

I started the timeline from 2000, as I had various stats beginning there, but as we all know (we all know, dont we?) DH has a much longer gestation than that. Something I need to mull over.

Overall, its been great fun putting together this infographic on Digital Humanities. It is, however, an infographic on DH, not the infographic on DH. What would your mapping of DH look like? Are you someone who does "Digital Humanities", or someone who does Digital Humanities? Does this even matter?

Enough navel gazing. Time to get to work.

Friday 20 January 2012

Infographic: Quantifying Digital Humanities

You may remember I gathered some stats about Digital Humanities. Well, I turned them into an infographic, which is available in full technicolour and much higher res than blogger will allow, over at the UCLDH Flickr account.

Wait! You want a print version? Well, find a 300dpi CYMK version here.

I have to say, when I started this, I thought that there must be some software out there that turns statistics into these type of posters pretty easily - there are so many of these about, that have this look about them. I thought it would be a fun thing to do in class - and I've dabbled a fair bit in photoshop and the like - but it turns out that everyone is hiring graphic designers to turn their stats into posters that look "like infographics". So what they hey, we hired a graphic design firm that specialises in infographics. As a result, this is courtesy of UCL Centre for Digital Humanities, as they paid for the graphic design.

We at UCLDH are going to get some printed up for to stick on walls -more about that soon, hopefully, once we figure out the costs on that.

Just a few words on the process. This was an inclusive, not an exclusive, attempt at trying to pull together available statistics on Digital Humanities. I'm aware there are a lot of things that dont appear on the infographic - major individual projects, for example. But it was the best that I could do, with the information available. I'm still collecting statistics, and interested in anything else that comes to light - I need to dig out the subscription numbers for LLC in the early 2000s, for example - but if you are not represented here, and are narked about it, or pleased to be included in any future iterations, let me know. Depending on reception, we may do an updated version of this.

Additionally, I'd love to hear your comments and suggestions on other things we can do in this vein to scope out and promote our field. Its been fun to put together - even if snow in Seattle stopped play for a week or so in the round of final edits with the designers - and after I've done some serious academic work, like write books and stuff, I plan on doing some more of these.

Hope you like it!

Thursday 12 January 2012

Digging Your Scene, Vera

My next couple of papers come from the VERA project: Virtual Environments for Research in Archaeology. VERA was based at the still ongoing dig at Roman Silchester, managed by the Archaeology Department at the University of Reading. The project aimed "to produce a fully-fledged virtual research environment for the archaeological community". Basically, we took the Integrated Archaeological Database system and took it into the trench, by putting wifi over the dig and experimenting with people uploading information right into the database (instead of capturing it with pencil and paper in the usual archaeological way, for digitisation after the dig ended) using Nokia 800s and digital pens. This was in 2007/8.

The project was funded by JISC as part of phase 2 of the Virtual Research Environments programme, and the reason for us at UCL getting involved was their doing: they specified that in this round of the programme, there should be user studies and user testing embedded into the project. Shortly after the call came out, I was chatting with Professor Mike Fulford, who manages the dig, at a conference, and he mentioned something about this and said something like, you dont know any people who do user studies do you? Why yes, yes we do, I said, and UCL promptly signed up to do this. Its the kind of thing that goes to show why you should bother networking...

It was great fun to be involved, and we visited the dig on various occasions, having a researcher on site to look at how users could use tech in the trench. The british weather was a bit of an issue, there were all kinds of shenanigans involved in keeping live wifi over the dig, and the technology was met with both excitement, and Not On My Land opinions. A picture I took on the site, above, even made it all the way to, and got featured in the Guardian newspaper. What larks.

Since then, of course, the market for handheld technology has changed dramatically - there was no iPhone at the start of this project when we were making decisions about tech, for example. I do also wonder if it is easier now to provide wifi over an external area - you would hope - although British Summertime weather is probably never going to be conducive to having hand-held tech in the trench (we did wonder if we should buy everyone waterproof sombreros, to keep screens in the shade, or to keep the rain off....)

We've also got another stack of papers on this that we simply haven't had the time to finish up, so there may be more to come, although time is ticking on that one, as the tech we are talking about is rapidly changing, and 2008 to 2012 is a long time, in technology circles. The dig at Silchester still continues. I bumped in to Mike at a service station half way up the M6 a few months ago, and we said we'd keep our ears peeled for future funding calls to work together again...

And here are the published papers:
Warwick, C., Fisher, C., Terras, M., Baker, M., Clarke, A., Fulford, M. Grove, M., O'Riordan, E. and Rains, M. (2009) iTrench: a study of user reactions to the use of information technology in field archaeology. LLC , 24 (2): PDF.

Terras, M., Warwick, CLH and Fisher, C (2010) Integrating New Technologies into Established Systems: a case study from Roman Silchester. In: Proceedings of the 37th Annual International Conference on Computer Applications and Quantitative Methods in Archaeology (CAA) March 22-26 2009, Williamsburg, Virginia, US. CAA: The Netherlands: PDF.

Friday 6 January 2012

Twitters and research paths

Happy New Year, folks!

Right, lets resume my posting about previous research. I have only 5 more projects to write about, (well, that had papers emanating from them, I'll do book chapters after that) and I'm going to do it in reverse chronological order.

Sometimes when you get a new research assistant, its good to give them a really defined task to get their teeth into, and to also see how they work - can they write? can they do self directed research? What are their strengths, and what's the most useful support can you give them? We hit the jackpot when we employed Claire Ross, to work on the Linksphere project, which was a joint project with Reading University to create a unified system that would provide a single virtual interface for searching across all the repositories and collections at Reading. We were asked to give advice about use and users of museum online catalogues and museum related social media, and the platform they were developing at Reading. Whilst the folks at Reading were starting to program up the interface to be tested, we set Claire a task: analyse the twitter feeds from the various DH conferences held in 2009 (this was at the end of 2009) and see what we can say about the use of twitter. At the time, there were relatively few methodological investigations into how twitter was being used, and what we came up with was really interesting:
Ross, C and Terras, M and Warwick, C and Welsh, A (2011) Enabled backchannel: conference Twitter use by digital humanists. J DOC , 67 (2) 214 - 237. PDF
The Linksphere project itself proved to be problematic - the linking of different collections isnt as easy as it sounds, and the programming team will hopefully be publishing more about the difficulties that they faced in cross-collection searching, etc. Nevermind, though - we had Claire for a year, and after this study she did some excellent work with the British Museum, looking at use of their Collection Online, and also set up the QRator project, at the Grant Museum, which is looking at how people can add their own Interpretation to museum objects, as well as the research necessary on the Linksphere interface. Claire is now still with us doing PhD research - more about that soon.

The twitter paper got a lot of interest at the time (it was up in pre-publish much sooner than the print version) as it was one of the first to look at the methods for studying tweet streams, although there are now hundreds of papers out there that have looked at the methodologies since, and developed and refined them in ways we couldn't do all the way back in 2009. It does seem like an awfully long way away, even though its less than 3 years ago. And is interesting to see that when you do get a great research assistant to work with, that you can have spin offs and publications from the process that were not originally in the research plan. The joy of the research path: turny, twisty, unexpected.