Melissa Terras' Blog: 2011

Friday, 16 December 2011

Father Busa's Last Christmas Card

2011 was a big year for Digital Humanities, but we also saw the loss of Father Roberto Busa (1913-2011), who pioneered the discipline of Humanities Computing by organising the work of Saint Thomas Aquinas in the Index Thomisticus.

I've been given permission to share with you the image above, and the story below, about Father Busa's last Christmas card. It came through on email last week and made me well up - and I asked permission from Marco Passarotti (on behalf of the CIRCSE research centre)to share it here so it reached the wider Digital Humanities community. Enjoy the last Christmas card from Father Busa.

Dear friends of father Busa,

For many years, our beloved father Roberto Busa entrusted the painter and caricaturist Marina Molino to illustrate his message of Christmas greetings with a caricature representing him along with St. Thomas.

Over the last years, the drawing (each time suggested by father Busa himself) had one recurring element: father Busa climbing a mountain with increasing difficulty, but with undiminished enthusiasm towards the ultimate goal, where St. Thomas was waiting for him.

On August 9th father Busa has reached that goal, giving, until his last day, large and clear testimony of peace on the ground of his faith and certainty of eternal life.

The research center CIRCSE - which pursues the work of its founder - has asked Marina Molino to realize once again the traditional drawing. In a moving caricature, the artist has represented father Busa, finally come on top of the mountain, while meeting his St. Thomas. We send you the drawing as a nice memory of father Busa, with our best wishes for a Merry Christmas and Happy New Year to all who have loved him.

Cari saluti,

With best wishes,

(on behalf of CIRCSE)

Marco Passarotti
Head of the ‘Index Thomisticus’ Treebank
Secretary of CIRCSE
Largo Gemelli, 1
20123 – Milan
Italy

Tel. 02-72342380
e-mail marco.passarotti@unicatt.it
CIRCSE: http://centridiricerca.unicatt.it/circse

Marco tells me they are planning to publish a book in 2012 which collates all of Father Busa's Christmas Cards. I look forward to seeing it.

Wednesday, 14 December 2011

Digitisation Studio Setup

I was asked recently about guidelines for setting up a digitisation suite. I'm a bit rusty on the very latest guidelines for this, so turned to the twitters, where a discussion surround #digstudio quickly happened, mostly co-ordinated by Simon Tanner who is an expert in digitisation and frequently advises institutions on their digitisation processes. Now, we all know the fragility of twitter feeds, so I thought I would post the main points here (indeed, it was Simon who suggested I post it here). The bullet points below were all made by Simon - thank you!

The JISC Digital Media - Still Images: Setting up a Workspace for Digitising Images document is a good overview on how to set up a digitisation suite http://www.jiscdigitalmedia.ac.uk/stillimages/advice/setting-up-a-workspace-for-digitising-images
Use standards: comply with the ISO 3664:2000 "Viewing conditions - Graphic technology & photography"
Replace fluorescents in the room. CIE Standard Illuminant D65 tubes correspond roughly to mid-day light in Europe
LED Lighting is the way to go. They are small, portable, with extremely high control. The top place to get these is here http://www.cdiny.com/LEDproducts.html
However, you might have price issues! Simon suggest's procuring a standard LED. There is a good guide here: http://www.diyphotography.net/studio-at-home-introduction-to-led-lighting
Complying with the ISO 3664:2000 "Viewing conditions - Graphic technology & photography" standard also allows for good colour management
The Colour of the workspace should always be NEUTRAL grey - no tones or colours & doesn't screw with camera's white point settings
AVOID post-it notes & other colourful things round the working areas - it distracts the eye, and messes with calibration
The same goes for clothes - avoid overly colourful clothes & shiny things for the operators.
Control light: think about if can exclude daylight/control light (with curtaining for instance) for a stable lighting environment
Remember: think lots of space & look after the operator!
Simon's paper on the Dead Sea Scrolls includes content on the studio: http://bit.ly/vvLW5u (PDF)

@fletcherdurant pointed out that the Federal Agencies Digitazation Guidelines Initatives' main set of guidelines gives some (if limited), set-up advice on page 5 http://www.digitizationguidelines.gov/guidelines/digitize-technical.html

@ernestopriego pointed out that the New Opportunities Fund has some guidelines http://www.ukoln.ac.uk/nof/support/help/papers/digitisation_process/ - these were written by ... SimonTanner!

Tuesday, 13 December 2011

Multi-Spectral Connections

One of Alejandro's many test images, looking at the characteristics of multi-spectral document imaging. Used with Permission.

Another interdisciplinary research project I am currently working on is with Adam Gibson, from The Department of Medical Physics and Bioengineering at UCL. Adam and I started at UCL at the same time, and met on the teaching training course that all new staff have to do. At one point he remarked that he did some work in multi-spectral imaging, to which I replied "oh yeah, we look at documents sometimes using that technique" - although I described our usually laissez faire approach of sticking things under the filtered lens and seeing what you can see, which turns out to differ greatly from tried and tested benchmarked multi-spectral methods used in medical physics to, for example, measure blood flow through the body.

5 years later, Adam won a very prestigious EPSRC Challenging Engineering grant on "Intelligent image acquisition and analysis". He got in touch to wonder if we could use a small slice of money to investigate the potential crossovers between the medical physics methods of multi spectral analysis, and the approaches we used in document imaging analysis. Can the robust, tested, techniques used in medical physics be used in document analysis? Can we benchmark the process of multi-spectral imaging of documentary material?

Our PhD student working on this is Alejandro Giacometti. Alejandro has a background in computer science, and came highly recommended from the MA in Humanities Computing under the supervision of Stan Ruecker at the University of Alberta. Alejandro is really well placed to carry out this research, having the technical background as well as appreciating the Humanities angle. Simon Mahony from UCLDH has also joined us on the team, and he brings his knowledge and expertise in Digital Classics.

Alejandro is now almost half way through his thesis work - and as well as being really eye opening, this project is turning out to be so much fun. We are now in the phase of starting to test our hypotheses on real world examples, and building up our practical expertise in multi-spectral document imaging. It also turns out that we have a lot in common with both the work of Tim Weyrich from UCL Department of Computer Science (who I also jointly supervise a student with) who uses multi-spectral imaging to model skin surfaces, and Stuart Robson from the UCL Department of Civil, Environmental and Geomatic Engineering (who I also jointly supervise two students with, but shall talk about them at some later point) who is interested in a huge variety of image capture techniques both for industry and heritage. We're talking between teams, departments, and disciplines, now, and learning a lot from each other, while cooking up plans for future work.

I feel very lucky to work in an institution such as UCL which has such diverse expertise - but also such interested colleagues, willing to work together across disciplinary boundaries. Its also great to have such an opportunity for PhD study, which could potentially contribute to many fields.

Monday, 12 December 2011

Me under the spotlight at DISH

Here I am, giving my pitch for Transcribe Bentham in the opening session of DISH. Thanks to Inge Angevaare, of the digitaalduurzaam blog, for sending this, and others, onto me. You can see the size of the crowd here.

Thursday, 8 December 2011

Greetings from DISH

Hullo from Rotterdam, where I am at DISH2011, "the conference about digital strategies for heritage". There are over 550 delegates at the Rotterdam WTC, across a wide range of libraries, archives and museums, predominantly across Europe. Its a really interesting mix of people, and not so many of the usual suspect academics here (one of the key notes yesterday did a show of hands, about who was here from libraries, who was here from museums, who was a student, who was in industry, and.... no show of hands for who was at a university. Nice to be in the minority for once, and to meet lots of heritage professionals!) Its been really enjoyable, so far.

Yesterday was a fairly big day: Transcribe Bentham was one of the 5 international projects nominated for the Digital Heritage Award 2011 (you can see our specific nomination here). I had to give a 3 minute pitch in front of the entire crowd on behalf of the project team, bright lights and all, in the opening plenary session, followed by manning an information booth, above, in all the breaks to solicit votes. You can see the voting system above - people had to place a sticker on our sheet. By the end of the day we had filled quite a few of these - fantastic to have such support, and I talked to a lot of very interesting and interested people about the project. The winner of the award was Digital Koot, well done all! - a little bird tells me we came a close runner up. But to be honest, having the opportunity to pitch to such a large audience, and meet so many interesting people, was wonderful, and it was an honour to be nominated. All good fun.

Today I am actually giving the proper paper about Transcribe Bentham! 45 minutes rather than 3. So another big day, but standing up in front of a normal lecture hall in daylight is nothing compared to 3 minutes with the cameras and lights on you, so it will be fine.

I hope to visit DISH again. Its usually difficult for me to travel at this time of year due to the teaching schedule, but its definitely been a useful conference for me so far. And now to go and tackle another conference day...

Tuesday, 6 December 2011

On Killing Metaphorical Birds with Statistical Stones

All this talk of DH stats (and the many emails and tweets I am firing off to gather up the evidence DH can muster) has both distracted me from posting anything from the back catalogue, and reminded me of a paper I wrote trying to articulate what Digital Humanities is by analysing the conference attendees and abstracts of the ALLC/ACH conferences (which is now known as Digital Humanities).

Its a bit of a right of passage that those working in DH attempt, at some time or other, to write a paper on what is Digital Humanities, or Humanities Computing as then was. My attempt was prompted by the fact I had to achieve a Postgraduate Certificate on Learning and Teaching in Higher Education as part of my probation when I first joined UCL. We had to write a 10,000 word dissertation on some aspect of learning and teaching. I have to say, undertaking that course in the first couple of years of an academic role was a bit of a millstone - it used up huge amounts of time at a time when I was writing whole courses from scratch and trying to turn my thesis into a monograph. I'm not one for wasted effort, so I tried to see if I could write a dissertation that would then become a publication. Killing three birds with one stone, I got the PGCLTHE, a conference paper, and a journal paper out of it. Bingo. To be honest, I would never have written the "what is humanities computing?" paper without having to do a dissertation for my teaching qualification.

I looked at trying to define the scope of our discipline, and therefore what we should be teaching, with the available evidence to hand. This was the conference abstracts from 1996-2005 of ALLC/ACH, plus the archive of postings to Humanist. I then number crunched them using the usual statistical methods that we teach as DH. Heh heh. Using DH to analyse DH! Again, birds with one stone. Why teach a methodology if you cant use it for your own dissertation? It was a quick win for me.

We can see, from the graph above, that Humanities computing research up to 2005 was pretty much text-o-centric. So we should be teaching that in our programs, goes the theory. Discuss. And you have the paper.

I wonder how much has changed now, actually. It would be fun to do another analysis.

This year I missed DH2011 as I was on maternity leave with the boys. I woke up one morning to find lots of new followers on twitter - always a sign that someone has been talking about you - to find that Lisa Spiro had cited this paper and methodology in her Making Sense of 134 DH Syllabi paper. A nice surprise - you never know if what you are working on is ever useful to anyone else.

And here it is:

Terras, M. (2006). "Disciplined: Using Educational Studies to Analyse ‘Humanities Computing'." Literary and Linguistic Computing, Volume 21. 229 - 246. PDF.

Thursday, 1 December 2011

Imaging the Great Parchment

Image © City of London, London Metropolitan Archives. Used with Permission.

One of the things that I am enjoying most in my current incarnation is the interdisciplinary work I am doing with various doctoral students, scattered across many of UCL’s computational and engineering science faculties. I’m delighted to be working as secondary supervisor alongside, as primary supervisor, Tim Weyrich, from UCL’s Department of Computer Science, on an EngD project that is sponsored by London Metropolitan Archives, to aid in reading one of their unique holdings: The Great Parchment Book.

The Great Parchment Book contains a survey of forfeited Irish estates claimed by Charles I in 1639, consolidating all contracts and particulars of all rental lands in the county into one volume. The resulting book holds invaluable information about the County of Londonderry in the early 17th century. The book was apparently passed to the Irish Society in London when it was reconstituted by Charles II in 1662, but a fire in 1786 at Guildhall caused extensive damage to their historical collections, destroying a large proportion of the 17th century material entirely, and causing dramatic ‘shrivelling’ and fire damage to the vellum pages of the Great Parchment Book. 165 folios of the volume survive, in 6 boxes such as the one featured above. This hugely important document has therefore been unavailable to historians since the date of the fire, as the pages cannot be handled because of their state of conservation.

Our task is deceptively simple – can we use image capture and processing techniques to make the rippled, twisted, and buckled text readable? Can we produce digital image surrogates which scholars can use to access the content of the document? A ever, the devil is in the detail. Our EngD student, Kazim Pal , has started (after a one year taught component of the course) to investigate approaches that can be used to digitally reconstruct the manuscript. We hope to work with the online edition team at King’s College London’s Department of Digital Humanities to produce an online, transcribed edition of the text in time for the 400 year anniversary of the incorporation of the Irish Society by the Corporation of London in 1613.

It’s really useful for me to work on the development and application of technologies in this area: it keeps pushing the limits of my understanding of the way we can use imaging with documentary material. I’m learning a huge amount from working with Tim and Kazim, and it is really great fun to step from beyond the textbooks/computer screen back into a real archive, to work on a real document, with real archivists and conservators, on approaches that will potentially benefit many scholars in the future. For me, personally, this is the joy – and excitement! of Digital Humanities.

Monday, 28 November 2011

Stats and the Digital Humanities

Update 01/12/2011: do read the comments! lots of good stuff below this post.

So yesterday, I was having a relatively laid back day, rolling about the living room with the bairns, occasionally seeing what was happening on the twitters. A few people were retweeting How To Design Your Own Infographics and I thought "hey, I should make an infographic." I concede that the world does not need any more infographics, but its just my line of work that it pays to play and tinker with these kind of things, before we start talking about them in class. What should I make an infographic about? I thought. Digital Humanities, of course!

But then you run into the problem... where are the stats about Digital Humanities that could be used for such a thing? There is nothing about it on the (problematic) wikipedia page, and facts and figures arent terribly close to hand. Now, I may never get round to making that infographic, but it would pay, I think, to start to gather up information about our discipline. Here is a list of some things I have come up with - or at least, could track down pretty quickly. I also asked the twitters, so have credited some people, below, who were quick to answer.

The growth in interest could be charted by the growth in subscriptions to the Humanist discussion list, 2002 onwards, which is available here, although Willard would have to be prodded for the more up to date figures, or they could be gleaned from the AHDO minutes from the past few years. (Willard has emailed me stats - current subscription is 1831 on the list!).

Update: Willard has provided me with the number of posts to humanist over the years:
Year Messages
1993-4 646
1994-5 489
1995-6 775
1996-7 919
1997-8 727
1998-9 617
1999-2000 576
2000-1 841
2001-2 640
2002-3 668
2003-4 847
2004-5 776
2005-6 765
2006-7 610
2007-8 680

Willard has also supplied the number of members of the list:
year members
2003 1385
2004 1300
2005 1383
2006 1458
2007 1537
2008 1650
2009 1359
2010 1518
2011 1831

Lou Burnard has also pointed me to a report he made about early use of Humanist, between August 1987 and January 1988. There are some crucial stats there about both individuals and topics in that period. As Lou says... "I leave it to the reader to determine whether anything much has changed".

The number of submissions to the DH annual conference (formerly ALLC/ACH), compared with the acceptance rate, could be gathered, from the ADHO minutes. (I'm digging on this. Paul Spence has told me there are just under 400 submissions for dh2012).
John Unsworth gave me the keys to the kingdom to generate the stats from conftool myself:

	long papers	accepted	%accepted
DH2007	90	68	75.6
DH2008	156	95	60.9
DH2009	210	114	54.3
DH2010	231	86	37.2
DH2011	122	53	43.4

DH2011 was the first conference to have short paper formats in addition to the long papers: 57 were submitted. 21 of those were accepted, which was an acceptance rate of 36.8%. (Thanks, Matthew Jockers for the prod on that one).

The number of people on twitter identified as Digital Humanities scholars, in Dan Cohen's comprehensive list. (It currently stands at 359 individual scholars doing Digital Humanities who are on twitter).

Rachel Murphy pointed out that 46 phd students are currently enroled in the Digital Arts and Humanities PhD Programme in Ireland.

The number of people subscribing to Literary and Linguistic Computing, which means they are part of at least one membership association tied to the Alliance of Digital Humanities Organisations, and the individual numbers for ALLC, ACH, and Sedi/semi. Dave Beavan has responded: as 2 Nov 2011 @LLCjournal subscribers: ACH 89, ALLC 78, SDH-SEMI 36, joint 172. Total is 375! (will dig out historical figures to show the trajectory).

Update: here is a table showing the growth in membership of ADHO through subscription to LLC, culled from Dave and I's membership reports:


type	Nov-07	Nov-08	Nov-09	Nov-10	Nov-11
ACH	73	57	87	76	89
ALLC	84	72	82	84	78
SDH/SEMI	n/a	13	17	39	36
Joint	67	92	121	115	172
Total	224	234	307	314	375

Update: Edward Vanhoutte tells me that as of December 6th, there are now 378 Subscribers to LLC. There are also 3,018 institutional subscriptions.

Edward has also provided some statistics regarding the home countries of authors who submit to the journal:

In 2011, the breakdown of the submitted papers per country shows that although most submissions come from Europe (81) and the US & Canada (27 & 14), Asia is
following with 22 submissions. There is potential growth in South America (3), the
Arab world (3), Africa (2) and Australia (1).

He also provided an overview of growth of total submissions to the journal:

2008 (65), 2009 (47), 2010 (41), 2011 (123).

The acceptance rate for papers submitted to LLC in 2010 was 54.84% coming down from 63.16% in 2009 and 71.70% in 2008. The acceptance rate for 2011 is 55.10%.

The annual hits and downloads to LLC online are thusly:

	Home Pages	TOC pages	Abstracts	HTML Full-text	PDF Full-text	Total Full-text
2008	32,413	13,120	93,619	6,109	17,404	23,513
2009	24,793	13,511	92,685	8,226	21,775	30,001
2010	28,644	15,341	101,649	7,476	20,770	28,246
2011 ytd	36,096	15,811	111,759	8,350	20,172	28,522

A list - and $ total - of all the grants that the National Endowment for the Humanities, from the Office of Digital Humanities. This is via @brettbobley. I'm getting a total of 250 projects, with an outright award of $15,268,130 total (although I'm doing this on a tiny screen, so do correct me if I'm wrong, I need to see the spreadsheet on a much larger monitor to make sure that is correct!).

wow. thats a lot.

The list - and $ total - of the joint NEH and JISC grants awarded for Digital Humanities projects (via @brettbobley, and @alastairdunning ). I'm getting 8 projects with an outright award of $966,691 (although again, teeny screen and large spreadsheet issues, do check my working).

There is a list here of 330 projects funded by the AHRC between 1998 and 2004 that had some form of digital output. They dont include the funding amounts in the spreadsheet though (why? scared?) - I contacted them, and Ian Broadbridge, Portfolio Manager for the AHRC, provided me with this information

A list of AHRC-funded digital research resources is available on line at http://www.arts-humanities.net/ahrc. As you can see, it includes a wide range of browsing, sorting and filtering options, and connects to detailed information about each project, including content type and methods used. A report on this site gives a  some brief statistical information about the costs of the projects involved.  All these DRR projects also represent a significant investment of public money, to a total cost of approximately £121.5m across all the years and schemes covered. Taking, once again, Standard Research Grants for the years 2000-2008 as the more reliable basis for illustration, we can see that the average cost of DRR projects is significantly higher than that of all projects in this group taken together: £309,110 as against £232,948. The average costs increased substantially from 2006 because of the move to full-cost funding: in 2008 the respective figures for Standard Research Grants were £413,838 and £324,703: the cash difference remains in line with the overall figures, though the proportional difference is reduced as a result of a larger overhead element. The ICT Methods Network Award was £1,037,382 The Award amount for the ICT Strategy Projects is £979,364 The Award Amount for e-Science Workshops is £65,498 The Award amount for Research Grant (e-Science) is £2,014,626 The details of all of the individual value of the grants is  obtainable from http://www.ahrc.ac.uk/FundedResearch/BrowseResearch.aspx

There is also a review, by David Robey, of the AHRC ICT Programme here.

The Andrew W. Mellon Foundation fund a large amount of projects in the Humanities which have a digital component. Their Scholarly Communication and Information Technology strand of funding paid $30,870,567 to projects in 2010 (gleaned from their grant report). I'm asking about previous years, as their reports work differently across other years.

Lisa Spiro identified 134 courses in Digital Humanities worldwide in her 2011 paper at DH. This can be compared to the 9 institutions offering courses in 1999 from McCarty and Kirschenbaum's overview of Humanities computing units and institutional resources in 1999, which I had to claw from the wayback engine. This needs a closer looking at, to see what comparisons can be made.

Mark Sample has been charting the growth of Digital Humanities sessions at MLA over the past few years. 2010: 27, 2011: 44, 2012: 58/753.

Alastair Dunning suggests we could look at http://arts-humanities.net for some more project information, although I think that takes some digging to get some stats.

There is the uber cool Network Visualisation of DH2011 which could be used to generate some useful stats, such as where everyone who came to DH2011 came from. which I have used above in this post. This was created by Elijah Meeks.

Centernet - an international network of Digital Humanities centres - "has over 200 members from about 100 centers in 19 countries". ~~I cant see any easy way to get an exact figure from the listings.~~ I emailed them - there are currently 167 digital humanities centres that are members of Centernet and "we get another couple every few weeks or so". Neil Fraistat has added "Member centers come from 26 countries and currently 247 folks are on its listserv". Karen Dalziel pointed me to the google map which plots all these centres. Should be relatively easy to export the KML file and datamine it to get the co-ordinates if you want to plot it in any other software.

Kristel Pent has pointed out that there is another list of centres of Digital Humanities on the ALLC pages, and these should be cross referenced with the list on Centernet.

Bethany Nowviskie has posted over at DH answers that there were:

28,837 unique visitors to DH Answers in the first year
from 164 countries
969 registered DH Answers users
contributing 1387 posts on 223 topics

And here are the numbers of Humanities Computing / Digital Humanities-related sessions held at MLA -- by count of the ACH: http://ach.org/mla-pages

1996: 34
1997: 34
1998: 45
1999: 42
2000: 59
2001: 55
2002: 44
2003: 36
2004: 39
2005: 37
2006: 48
2007: 56
2008: 65
________
604 panels over 13 annual conventions

Fred Gibbs did a lovely post categorizing Definitions of Digital Humanities, which would make a nice pie chart.

Dave Beavan suggested looking at the numbers of people contributing to the Day of DH, but the server always seems to be down. I'm emailing real people instead.

Peter Organisciak has provided me with the stats re those who registered for the Day of DH over the past few years:

Day of DH 2009: 103 Registered (83 participated) Day of DH 2010: 154 Registered Day of DH 2011: 244 Registered

There are some interesting visualisations of the Day of DH up there, too.

James Cummings tells me there are 700 subscribers to the Digital Medievalist Discussion List. Further details: DM-L started in 2003. 15th Jan 2005 there were 306 members, 28th April 2010 537 members, 27th August 2011 672 members, 5th December 2011 700 members.
There are also 584 followers of the twitter account.
James also gave me access to the user statistics of the Digital Medievalist website:
in 2011 there were 16,808 Visits, from 12,763 Unique Visitors, with 35,546 Pageviews. 25% were returning visitors.

James also gave me access to the stats for the Text Encoding Initative's website:
in 2011 there were 176,469 Visits from 107,320 Unique Visitors, with 537,750 Pageviews. 40% were returning visitors.

Syd Bauman tells me there are currently 949 people subscribed to TEI-L.

Gabriel Bodard tell me there are currently 374 subscribers to Digital Classicist email list.

Leif Isaksen has told me the subscriber counts to Antiquist discussion list:

Aug 2006 - 3 Jan 2008 - 180 Jan 2010 - 264 Dec 2011 - 330

@DHNow has 2676 followers on twitter. @Dancohen has told me that

last month the DHNow website had 14.5K visits from 5K unique visitors, and 48K page views.

@DHquarterly has 688 followers on twitter. A quick look at google analytics tells me that in the past 6 months (when google analytics was switched on in DHQ) there have been 23,636 visits from 15,547 Unique Visitors who have looked at 52,370 pages in total (average of 2 and bit pages per visit). There have been visits from 137 different countries.

@LLCJournal has 513 followers on twitter.

Boone Gorges provided me with some stats about the code behind Anthologize - code lines, files, and commits another great way to measure intellectual investment in DH! (It was Bethany Nowviskie who suggested this, actually). There are 61 files, with 8693 lines of code, and 1722 comments. That's some programming.

Ray Siemens has provided me stats about the Digital Humanities Summer Institute:

DHSI people stats: 2012 (300+ confirmed so far), 2011 (230), 2010 (180), 2009 (150), 2008 (125), 2007 (115), 2006 (95),  2005 (80), 2004 (75), pre-2004 (35-40 at each offering).   540 twitter followers @DHInstitute, 1429 members on the email announcement list.

Elisabeth Burr has told me that the European Summer School "Culture & Technology" at Leipzig had 85 students in 2009, and 2010, and 21 lecturers, from 21 different countries
(Brazil, Burkina Faso, Germany, Finland, France, Great Britain, India, Ireland, Israel, Italy, Canada, Netherlands, Austria, Poland, Serbia, Spain, Turkey, Ukraine, Hungary, USA, Cyprus)

Julianne Nyhan has provided me with stats about the Computers and the Humanities journal (commonly known as CHum) which ran from 1997 to 2004.
There were 1244 papers published in total, from 811 single authors, and 433 joint authors.

What other facts and figures exist about DH? I'm looking for some other stats that "exist" - ie, dont tell me "count all the projects that say they are DH!" - erm, yeah. Tell me the count, and how you worked it out. Otherwise its a research project, rather than a "grab the existing stats" thing, which we should, as a community, be able to do, right? right?

I'll ask the twitters, and DHanswers, but if you tweet me or message me or leave a comment here, I'll update the list, above. Maybe someone will make that infographic...

Update: Desmond Schmidt did an analysis of the jobs posted to Humanist
"There have been a lot of advertisements for jobs lately on Humanist.
So I used the Humanist archive to do a survey of the last 10 years.
I counted jobs that had both a digital and a humanities component, were
full time, lasted at least 12 months and were at PostDoc level or higher".

2002: 11
2003: 6
2004: 15
2005: 15
2006: 18
2007: 24
2008: 27 (incomplete - 1/2 year)
2009: 36
2010: 58
2011: 65 so far

Breakdown by country:
US: 133
GB: 65
CA: 35
IE: 18
DE: 13
FR: 8
IL: 3
NO: 2
NL: 2
ES: 2
AT: 1
AU: 1
BE: 1

Normalised by population:
IE: 4.0
GB: 1.051779935
CA: 1.038575668
US: 0.433224756
NO: 0.416666667
IL: 0.405405405
DE: 0.158924205
FR: 0.127795527
NL: 0.121212121
AT: 0.119047619
BE: 0.092592593
AU: 0.04587156
ES: 0.043572985

You can also see the Digital Humanities job trends from indeed.com but the percentages are so small I'm not sure its statistically worth including.

Friday, 25 November 2011

Computer Games and author lists

One of the more unusual titles on my list of publications, that doesnt seem to fit in with my previous trajectory, is

Gooding, P and Terras, M (2008) ‘Grand Theft Archive’: a quantitative analysis of the current state of computer game preservation. The International Journal of Digital Curation, 3 (2). PDF.

Its an interesting one, this: trying to articulate the extent of the large-scale loss of the early years of gaming history (particularly in the UK), highlighting games' vulnerability. We did a quantitative case study - trying to get hold of copies of known games through every channel possible, highlighting the inadequacies of even available metadata. There is clearly a PhD study in this, should anyone want to take it forward. I also think its important work, that needs doing sooner rather than later. (For those interested in this area, do also see the Preserving Virtual Worlds project, although they didnt cite our previous work. grump).

So that's the topic, but there's an even more interesting point to be made about this paper. I'm second author, because it was the work of one of my Master's students at the time, Paul Gooding. Paul was on the MA in Library and Information Studies at UCL. I usually supervise around 10 student dissertations a year, previously from Electronic Communication and Publishing, and this year I'll mostly be supervising the cohort from the MA/MSc in Digital Humanities, which is running for the first time this session. I really enjoy supervising our master's students - most are really very bright, driven, and dedicated. I dont have to supervise any librarians or archivists: but occasionally I choose to take on a few extra students from these programmes for their dissertation, both to help my colleagues in LIS, and to work on interesting digital topics in those areas. Paul approached me with this topic, we worked up a methodology, and he did the leg work and the write up. I then encouraged him to submit this to a journal, and I spent a few days turning his masters dissertation into the published paper you see here.

The question is, then: when is it ok to be named as author on work which emanates from a student dissertation? When should you leave it, and say, "this is the students work"? This is a huge issue in graduate studies, and one I tread very carefully in. I hear tale, and have had colleagues, who insist on having their name as first author in anything their research groups publish, even when they havent had anything to do with the research in question. (I called him out on it for being morally wrong: he didnt answer any email from me for a year, which was slightly problematic, given he was superior to me and had to sign off on various things). What makes me want to put my name as co-author in this paper? Why havent I published more with students? Why are the guidelines on this all so woolly? Why do some colleagues insist on having their names on papers when they havent been involved in them? Why do some students feel so maligned by their supervisors when they ask to be included on an author list, even when the supervisor has done huge amounts of work on their project? Its such a touchy subject. And actually - this touchiness carries on throughout interdsciplinary projects: publications and named authors are often the sticking point. (I'd advise anyone to look up Ruecker and Radzikowska's work on project charters: they say, decide all this at the start of a project, not at the end).

With this paper, I could say, hand on heart, that it would not exist without my continued work and input. The dissertation itself was based on a methodology I devised, and I worked very closely with Paul to undertake the study. The paper itself, whilst based on Paul's dissertation, required rewriting: it would not have got to this stage without my time and effort, and prior knowledge regarding what journals expect and want. I have no qualms, therefore, in having my name as second author on this piece: I did the work. As far as I know, Paul is delighted that this paper got published. But I've supervised a whole lot of stuff - some of which was of publishable quality, some not, some that made it to publication, some not - that I would never, ever, ask to be second author on. I have witnessed at first hand colleagues who do not have my scruples.

After a few years out in the real world, Paul is back with us! He is heading into his second year of PhD study, supervised by Claire Warwick (first supervisor) and myself (as secondary supervisor), looking into large scale digitisation initiatives, particularly doing some user studies on the British Library's digital collections. It's a great project, and I'm glad he's come back to do some further study with us.

I'll continue to tread careful about author names, and publication, though, particularly when graduate student work is involved.

Incidentally, the journal that this is published in, The International Journal of Digital Curation, is open access - all articles are available for free. Its a good read.

Monday, 14 November 2011

Should We Just Send A Copy? On Digitisation... and the Mona Lisa

In late 2008 I was asked to give a couple of plenaries/big guest lectures the next summer: one for the Digital Humanities Summer Institute, and one for the Art Libraries Society (ARLIS) 40th Anniversary Conference. I was getting a bit bored of standing in front of a powerpoint talking through bullet points of my research, and wanted to do something a bit more exploratory. But what to do? I had my antennae up for clues.

Around that time, Europeana - the online digital library - launched, and promptly fell over. Many people on the site were searching for "Mona Lisa" when it crashed.

Around that time, I was watching a lot of documentaries about art history on BBC4. I watched one by the critic Robert Hughes called "the Business of Art" (also called the Mona Lisa Curse in some listings), where he traces the obscene growth of the art world to the trip the Mona Lisa took to New York and Washington in 1963. 1,600,000 visitors – more than 30,000 viewers per day – filed past the painting. In particular, he remarked that Andy Warhol - then a struggling artist yet to have his major breakthrough -refused to join the hoardes queuing up to see it, remarking

“Why don’t they just have someone copy it and send the copy? No-one would know the difference.” (Hughes 2006, p. 223)

Put two and two together and what do you get? An overview paper on some of the issues on digitisation, use, and usefulness. What are we doing when we create digital surrogates of cultural and heritage material? What are they for? Should we just send a copy?

You can see me give the plenary in 2009 at the DHSI summer school on youtube (no I havent watched it myself) And here is the resulting paper. It's a gallop round the houses, but I am very fond of it:

Terras, M (2010) Should we just send a copy? Digitisation, Use and Usefulness. Art Libraries Journal, 35 (1). PDF.

Reference: Hughes, R. (2006). “Things I Didn’t Know”. Knopf. London. I've heard this quote repeated elsewhere, but this is the only source I can find. It may, indeed, be apocryphal.
The above image is Andy Warhol's "Thirty are better than one".

On missing out

When you choose to go on leave from a University for a year, you make the choice to miss certain things. Meetings about your research. Team meetings about your centre. Grant writing sessions for projects you were previously on. Guest lectures. Project Launches. Having access to physical resources such as libraries. Hanging out in the pub with colleagues and students. Coffee. Lunch. Bumping into people in the corridor, and just saying hi. I didn't mind missing any of these things, really, when I was on maternity leave: it was my choice to have children, and to take almost the maximum (very generous) maternity leave I was granted from my job. It was all good. Except one thing.

The one thing I missed when I was on leave was the PhD Viva, and resulting celebrations, of my first PhD student to go through the system under my watchful eye: Ernesto Priego. I was Ernesto's secondary supervisor (Claire Warwick his first), on his PhD in web comics, where he researched

the impact of analog and digital technologies on comics... debating the manner in which theories of materiality illuminate the media-specificity of comics, webcomics, mobile comics apps, and how comic book culture fits within current debates about the future of the book.

Ernesto passed back in the spring, and I've watched on the twitters as he does corrections, hands in, gets things bound up, and gets the final letter through. Well done Doctor Ernesto! Sorry I wasnt there to help drink champagne. I know you know it was impossible for me to come into the city at that time. It really was the only thing I regret missing over the year that I was on leave.
Ernesto initiated and co-organises The Comics Grid, a web-based international collaboratory of comics scholars. You can also find him on the twitters, @ernestopriego. And tomorrow he is coming to see me, to catch up, and to start doing a little research assistant work for me on a Super Top Secret Project which we cant talk about until the Spring. I'm looking forward to it.
Which leads me to think... I have ten PhD students at the moment. A lot of my new research is focussing around what they are doing, and I'm having tremendous fun working with them, and colleagues across UCL who are also supervising. With their permission, I'm going to start blogging about the work that we are doing right now.

But Ernesto, as honorary PhD student (do we ever leave our PhD supervisors?) is the first to be mentioned on here.

The image above, btw, is taken from my autographed copy of PhD Comics. Ernesto took Jorge Cham on a tour of UCL and they swung by my office. It was like a medieval scholar meeting Chaucer, and the look on Ernesto's face, to be chatting with one of the people he had studied in depth, was one of my favourite memories of supervising Ernesto's PhD.

Monday, 7 November 2011

The Birth of TEI By Example

Sometimes academic projects come about due to a combination of necessity and wishful thinking. TEI by Example was a product of needing to fill a gap in funding, whilst wishing something into existence that I wanted someone else to do in the first place.

I've talked before about a pop-stack of ideas that exist for "need further work!". As well as that I have a pop-stack, or wishlist, of Things I Wish Existed That Would Make My Academic Life Easier. One of these was an online set of tutorials to teach TEI.

The Text Encoding Initiative guidelines, as we all know, are one of the bedrocks of Digital Humanities. If you want to be in DH, you need to know XML, and TEI, and markup. Which means it has to be taught. Even if, like me, you are not terribly interested in markup per se, and you never use it in your research, and you are not part of the TEI community, you will still need to cover it at some point in a class in Digital Humanities.

Now, there are some excellent people within the TEI doing some excellent teaching. There is some good teaching stuff available online. But what I needed, really, was some point and click tutorials that I could direct my masters students to after an introductory lecture on TEI. When learning code, it's the done thing that you look at and play with examples of code. Where oh where oh where was TEI by example? Where were examples of marked up texts people could see to learn from?

In late 2005 I was chatting to Edward Vanhoutte about his team at the Centrum voor Teksteditie en Bronnenstudie, Ghent, about how things were going up in the Centre. Unlike me, Edward and Ron Van den Branden markup texts all the time, and have considerable expertise in the markup of correspondence material. Edward was saying there was a gap in funds coming up for Ron, as they would be between projects. He needed to find a couple of months of work to pay him. Easy, I said. Apply to the Association for Literary and Linguistic Computing for a small grant, to build a set of tutorials for teaching TEI. Call it TEI by Example.
So we did. And we got the money, and we did.

It will take two months! we said. We started in 2006. Now, given I dont really do TEI, I couldnt really write the tutorials. Edward took that on, but he was in the phase of doing a lot of childcare with very young kids (and this was above and beyond our dayjobs). Ron built the infrastructure and did a fantastic job sorting out the quizzes, and set to building the online validator. I chivvied and tested and harangued. I was asked to give a plenary about the project at TEI@20: 20 Years of Supporting the Digital Humanities Conference, University of Maryland, in November 2007. It will be finished by then! we thought. Of course it wasnt. I gave the plenary when I was 8 weeks pregnant with the bump that turned out to be The Boy. (I cant recommend flying across the pond to give a plenary when the only thing you can stomach is salt and vinegar crisps. I was terrified that customs would take away my stash, and what would I eat when I was there? I also cant recommend giving a plenary when you think you are going barf over the front row's shoes. But I digress). We worked on the tutorials some more. 2008 came and went,with me mostly on Maternity leave. Then Ron also joined the parenting club! Hurrah! TEI by Example was finally born in July 2010, a mere 4 years late. Some gestation that was. Its like the closing credits of Toy Story 3, when they list all the kids born during the production phase.

Its worth also saying that we encountered a fair bit of resistance to TEI by Example from the TEI community. Folks were not interested, in general, in giving us access to fragments of their code. Promises were made and not kept, emails not replied to, snark was thrown on mailing lists. Who were we to build something beyond the TEI community! Who did we thinks we was!

Meanwhile, TEI by Example has been a quiet success. As Edward said in his plenary to this year's TEI meeting:

Fifteen months after the launch of the tutorials, the site has attracted close to 30,000 unique page views with 1,900 unique views for the modules on primary sources and critical editing together. The statistics and logs show that users are finding their way to the tutorials directly, via Digital Humanities courses or via the TEI website and we see that there is high activity from the US, Germany, the UK, France, and Canada: not surprisingly countries with a high digital humanities and digital editing profile. And we're particularly proud of our single visit from Vatican City. 18% of the visitors stay for more than 15 minutes on the site, which suggests that they really do some work. We also see a decent amount of returning visitors.

We hear its being translated into French. We know people within the TEI use it in their teaching. We're getting positive comments about it, and relationships with certain folks have improved dramatically. And I can sit my students in front of it for an hour, after an introductory lecture about TEI. Well I will do, when I return from my second maternity leave break to resume teaching next year.

As for the resulting paper? Here it is - my plenary which explains why we needed to go down this route.

Terras, M and Van den Branden, R and Vanhoutte, E (2009) Teaching TEI: The Need for TEI by Example. Literary and Linguistic Computing , 24 (3) 297 - 306. 10.1093/llc/fqp018. PDF

What happens when you tweet an Open Access Paper

So a few weeks ago, I tweeted and posted about this paper

Terras, M (2009) "Digital Curiosities: Resource Creation Via Amateur Digitisation". Literary and Linguistic Computing, 25 (4) 425 - 438. Available in PDF.

I thought it worth revisiting the results of this. Is it worth me digging out the full text, running the gamut with the UCL repository, and trying to spend the time putting my previous research online? Is Open Access a gamble that pays - and if so, in what way?

Prior to me blogging and tweeting about the paper, it got downloaded twice (not by me). The day I tweeted and blogged it, it immediately got 140 downloads. This was on a friday: on the saturday and sunday it got downloaded, but by fewer people - on monday it was retweeted and it got a further 140 or so downloads. I have no idea what happened on the 24th October - someone must have linked to it? Posted it on a blog? Then there were a further 80 downloads. Then the traditional long tail, then it all goes quiet.

All in all, its been downloaded 535 times since it went live, from all over the world: USA (163), UK (107), Germany (14), Australia (10), Canada (10), and the long tail of beyond: Belgium, France, Ireland Netherlands, Japan, Spain, Greece, Italy, South Africa, Mexico, Switzerland, Finland, Denmark, Norway, Sweden, Portugal, Europe, UAE, "unknown".

Worth it, then? Well there are a few things to say about this.

I have no idea how many times it is read, accessed, downloaded in the journal itself. So seeing this - 500 reads in a week! makes me think, wow: people are reading something I have written!
It must be all relative, surely. Is 500 full downloads good? Who can tell? All I can say is that it puts it into the top 10 - maybe top 5 - papers downloaded from the UCL repository last month (I wont know until someone updates the webpage with last months stats).
If I tell you that the most accessed item from our department ever in the UCL repository, which was put in there five years ago, has had 1000 full text downloads, then 500 downloads in a week aint too shabby. They didnt blog or tweet it, its just sitting there.
There is a close correlation to when I tweet the paper and downloads.
There can be a compulsion to start to pay attention to stats. Man, it gets addictive. But is this where we want to be headed: academia as X-factor? Hmmm.

Ergo, if you want people to read your papers, make them open access, and let the community know (via blogs, twitter, etc) where to get them. Not rocket science. But worth spending time doing. Just dont develop a stats habit.

I'll feature the next one from my back catalogue, shortly...

Update 08/11/11: As a result of posting this, and this post getting retweeted far and wide (thanks all!) the paper got downloaded a further 120 times. See? See?

Update 08/11/11: The UCL stats page for downloads last month has now been updated: this was the 5th most downloaded paper in the UCL repository in October 2011. Yeah, I'm up there with fat tax, seaworthiness, preventative nutrition, and the peri-urban(?) interface. I'm not sure how many papers in total there are in the repository - I cant find that stat - but a search for "the" or "a" both brings back 224,575 papers, if that is anything to go by.

Update 10/11/11: The Digital Curation Manager at UCL, Martin Moyle, has been in touch to confirm that 6486 of the 224, 575 papers in the repository have downloadable full text attached. And told me where I can generate this stat. Whoops! (Thanks Martin).

Update 10/11/11: After this post, there is the predictable long tail happening with stats. Another 60 downloads on the 8th, 10 on the 9th. Its all quite predictable - yet nice that the paper is wending its way to interested parties!

Update 25/11/11: This post was mentioned in the Times Higher last week, and the paper has now been downladed 805 times in total.