Monday, 28 November 2011

Stats and the Digital Humanities

Update 01/12/2011: do read the comments! lots of good stuff below this post.

So yesterday, I was having a relatively laid back day, rolling about the living room with the bairns, occasionally seeing what was happening on the twitters. A few people were retweeting How To Design Your Own Infographics and I thought "hey, I should make an infographic." I concede that the world does not need any more infographics, but its just my line of work that it pays to play and tinker with these kind of things, before we start talking about them in class. What should I make an infographic about? I thought. Digital Humanities, of course!

But then you run into the problem... where are the stats about Digital Humanities that could be used for such a thing? There is nothing about it on the (problematic) wikipedia page, and facts and figures arent terribly close to hand. Now, I may never get round to making that infographic, but it would pay, I think, to start to gather up information about our discipline. Here is a list of some things I have come up with - or at least, could track down pretty quickly. I also asked the twitters, so have credited some people, below, who were quick to answer.

The growth in interest could be charted by the growth in subscriptions to the Humanist discussion list, 2002 onwards, which is available here, although Willard would have to be prodded for the more up to date figures, or they could be gleaned from the AHDO minutes from the past few years. (Willard has emailed me stats - current subscription is 1831 on the list!).

Update: Willard has provided me with the number of posts to humanist over the years:
Year Messages
1993-4 646
1994-5 489
1995-6 775
1996-7 919
1997-8 727
1998-9 617
1999-2000 576
2000-1 841
2001-2 640
2002-3 668
2003-4 847
2004-5 776
2005-6 765
2006-7 610
2007-8 680

Willard has also supplied the number of members of the list:
year members
2003 1385
2004 1300
2005 1383
2006 1458
2007 1537
2008 1650
2009 1359
2010 1518
2011 1831

Lou Burnard has also pointed me to a report he made about early use of Humanist, between August 1987 and January 1988. There are some crucial stats there about both individuals and topics in that period. As Lou says... "I leave it to the reader to determine whether anything much has changed".

The number of submissions to the DH annual conference (formerly ALLC/ACH), compared with the acceptance rate, could be gathered, from the ADHO minutes. (I'm digging on this. Paul Spence has told me there are just under 400 submissions for dh2012).
John Unsworth gave me the keys to the kingdom to generate the stats from conftool myself:

long papers accepted
DH2007 90 68
DH2008 156 95
DH2009 210 114
DH2010 231 86
DH2011 122 53

DH2011 was the first conference to have short paper formats in addition to the long papers: 57 were submitted. 21 of those were accepted, which was an acceptance rate of 36.8%. (Thanks, Matthew Jockers for the prod on that one).

The number of people on twitter identified as Digital Humanities scholars, in Dan Cohen's comprehensive list. (It currently stands at 359 individual scholars doing Digital Humanities who are on twitter).

Rachel Murphy pointed out that 46 phd students are currently enroled in the Digital Arts and Humanities PhD Programme in Ireland.

The number of people subscribing to Literary and Linguistic Computing, which means they are part of at least one membership association tied to the Alliance of Digital Humanities Organisations, and the individual numbers for ALLC, ACH, and Sedi/semi. Dave Beavan has responded: as 2 Nov 2011 @LLCjournal subscribers: ACH 89, ALLC 78, SDH-SEMI 36, joint 172. Total is 375! (will dig out historical figures to show the trajectory).

Update: here is a table showing the growth in membership of ADHO through subscription to LLC, culled from Dave and I's membership reports:

type Nov-07 Nov-08 Nov-09 Nov-10 Nov-11

ACH 73 57 87 76 89

ALLC 84 72 82 84 78

SDH/SEMI n/a 13 17 39 36

Joint 67 92 121 115 172

Total 224 234 307 314 375

Update: Edward Vanhoutte tells me that as of December 6th, there are now 378 Subscribers to LLC. There are also 3,018 institutional subscriptions.

Edward has also provided some statistics regarding the home countries of authors who submit to the journal:
In 2011, the breakdown of the submitted papers per country shows that although most submissions come from Europe (81) and the US & Canada (27 & 14), Asia is
following with 22 submissions. There is potential growth in South America (3), the
Arab world (3), Africa (2) and Australia (1).
He also provided an overview of growth of total submissions to the journal:
2008 (65), 2009 (47), 2010 (41), 2011 (123).

The acceptance rate for papers submitted to LLC in 2010 was 54.84% coming down from 63.16% in 2009 and 71.70% in 2008. The acceptance rate for 2011 is 55.10%.

The annual hits and downloads to LLC online are thusly:

Home Pages TOC pages Abstracts HTML Full-text PDF Full-text Total Full-text
32,413 13,120 93,619 6,109 17,404 23,513
24,793 13,511 92,685 8,226 21,775 30,001
28,644 15,341 101,649 7,476 20,770 28,246
2011 ytd 36,096 15,811 111,759 8,350 20,172 28,522

A list - and $ total - of all the grants that the National Endowment for the Humanities, from the Office of Digital Humanities. This is via @brettbobley. I'm getting a total of 250 projects, with an outright award of $15,268,130 total (although I'm doing this on a tiny screen, so do correct me if I'm wrong, I need to see the spreadsheet on a much larger monitor to make sure that is correct!).

wow. thats a lot.

The list - and $ total - of the joint NEH and JISC grants awarded for Digital Humanities projects (via @brettbobley, and @alastairdunning ). I'm getting 8 projects with an outright award of $966,691 (although again, teeny screen and large spreadsheet issues, do check my working).

There is a list here of 330 projects funded by the AHRC between 1998 and 2004 that had some form of digital output. They dont include the funding amounts in the spreadsheet though (why? scared?) - I contacted them, and Ian Broadbridge, Portfolio Manager for the AHRC, provided me with this information
A list of AHRC-funded digital research resources is available on line at As you can see, it includes a wide range of browsing, sorting and filtering options, and connects to detailed information about each project, including content type and methods used. A report on this site gives a some brief statistical information about the costs of the projects involved. All these DRR projects also represent a significant investment of public money, to a total cost of approximately £121.5m across all the years and schemes covered. Taking, once again, Standard Research Grants for the years 2000-2008 as the more reliable basis for illustration, we can see that the average cost of DRR projects is significantly higher than that of all projects in this group taken together: £309,110 as against £232,948. The average costs increased substantially from 2006 because of the move to full-cost funding: in 2008 the respective figures for Standard Research Grants were £413,838 and £324,703: the cash difference remains in line with the overall figures, though the proportional difference is reduced as a result of a larger overhead element. The ICT Methods Network Award was £1,037,382 The Award amount for the ICT Strategy Projects is £979,364 The Award Amount for e-Science Workshops is £65,498 The Award amount for Research Grant (e-Science) is £2,014,626 The details of all of the individual value of the grants is obtainable from

There is also a review, by David Robey, of the AHRC ICT Programme here.

The Andrew W. Mellon Foundation fund a large amount of projects in the Humanities which have a digital component. Their Scholarly Communication and Information Technology strand of funding paid $30,870,567 to projects in 2010 (gleaned from their grant report). I'm asking about previous years, as their reports work differently across other years.

Lisa Spiro identified 134 courses in Digital Humanities worldwide in her 2011 paper at DH. This can be compared to the 9 institutions offering courses in 1999 from McCarty and Kirschenbaum's overview of Humanities computing units and institutional resources in 1999, which I had to claw from the wayback engine. This needs a closer looking at, to see what comparisons can be made.

Mark Sample
has been charting the growth of Digital Humanities sessions at MLA over the past few years. 2010: 27, 2011: 44, 2012: 58/753.

Alastair Dunning suggests we could look at for some more project information, although I think that takes some digging to get some stats.

There is the uber cool Network Visualisation of DH2011 which could be used to generate some useful stats, such as where everyone who came to DH2011 came from. which I have used above in this post. This was created by Elijah Meeks.

Centernet - an international network of Digital Humanities centres - "has over 200 members from about 100 centers in 19 countries". I cant see any easy way to get an exact figure from the listings. I emailed them - there are currently 167 digital humanities centres that are members of Centernet and "we get another couple every few weeks or so". Neil Fraistat has added "Member centers come from 26 countries and currently 247 folks are on its listserv". Karen Dalziel pointed me to the google map which plots all these centres. Should be relatively easy to export the KML file and datamine it to get the co-ordinates if you want to plot it in any other software.

Kristel Pent has pointed out that there is another list of centres of Digital Humanities on the ALLC pages, and these should be cross referenced with the list on Centernet.

Bethany Nowviskie has posted over at DH answers that there were:

28,837 unique visitors to DH Answers in the first year
from 164 countries
969 registered DH Answers users
contributing 1387 posts on 223 topics

And here are the numbers of Humanities Computing / Digital Humanities-related sessions held at MLA -- by count of the ACH:

1996: 34
1997: 34
1998: 45
1999: 42
2000: 59
2001: 55
2002: 44
2003: 36
2004: 39
2005: 37
2006: 48
2007: 56
2008: 65
604 panels over 13 annual conventions

Fred Gibbs did a lovely post categorizing Definitions of Digital Humanities, which would make a nice pie chart.

Dave Beavan suggested looking at the numbers of people contributing to the Day of DH, but the server always seems to be down. I'm emailing real people instead.

Peter Organisciak has provided me with the stats re those who registered for the Day of DH over the past few years:
Day of DH 2009: 103 Registered (83 participated) Day of DH 2010: 154 Registered Day of DH 2011: 244 Registered
There are some interesting visualisations of the Day of DH up there, too.

James Cummings tells me there are 700 subscribers to the Digital Medievalist Discussion List. Further details: DM-L started in 2003. 15th Jan 2005 there were 306 members, 28th April 2010 537 members, 27th August 2011 672 members, 5th December 2011 700 members.
There are also 584 followers of the twitter account.
James also gave me access to the user statistics of the Digital Medievalist website:
in 2011 there were 16,808 Visits, from 12,763 Unique Visitors, with 35,546 Pageviews. 25% were returning visitors.

James also gave me access to the stats for the Text Encoding Initative's website:
in 2011 there were
176,469 Visits from 107,320 Unique Visitors, with 537,750 Pageviews. 40% were returning visitors.

Syd Bauman tells me there are currently 949 people subscribed to TEI-L.

Gabriel Bodard tell me there are currently 374 subscribers to Digital Classicist email list.

Leif Isaksen has told me the subscriber counts to Antiquist discussion list:
Aug 2006 - 3 Jan 2008 - 180 Jan 2010 - 264 Dec 2011 - 330
@DHNow has 2676 followers on twitter. @Dancohen has told me that
last month the DHNow website had 14.5K visits from 5K unique visitors, and 48K page views.

@DHquarterly has 688 followers on twitter. A quick look at google analytics tells me that in the past 6 months (when google analytics was switched on in DHQ) there have been 23,636 visits from 15,547 Unique Visitors who have looked at 52,370 pages in total (average of 2 and bit pages per visit). There have been visits from 137 different countries.

@LLCJournal has 513 followers on twitter.

Boone Gorges provided me with some stats about the code behind Anthologize - code lines, files, and commits another great way to measure intellectual investment in DH! (It was Bethany Nowviskie who suggested this, actually). There are 61 files, with 8693 lines of code, and 1722 comments. That's some programming.

Ray Siemens has provided me stats about the Digital Humanities Summer Institute:
DHSI people stats: 2012 (300+ confirmed so far), 2011 (230), 2010 (180), 2009 (150), 2008 (125), 2007 (115), 2006 (95),  2005 (80), 2004 (75), pre-2004 (35-40 at each offering).   540 twitter followers @DHInstitute, 1429 members on the email announcement list.

Elisabeth Burr has told me that the European Summer School "Culture & Technology" at Leipzig had 85 students in 2009, and 2010, and 21 lecturers, from 21 different countries
(Brazil, Burkina Faso, Germany, Finland, France, Great Britain, India, Ireland, Israel, Italy, Canada, Netherlands, Austria, Poland, Serbia, Spain, Turkey, Ukraine, Hungary, USA, Cyprus)

Julianne Nyhan has provided me with stats about the Computers and the Humanities journal (commonly known as CHum) which ran from 1997 to 2004.
There were 1244 papers published in total, from 811 single authors, and 433 joint authors.

What other facts and figures exist about DH? I'm looking for some other stats that "exist" - ie, dont tell me "count all the projects that say they are DH!" - erm, yeah. Tell me the count, and how you worked it out. Otherwise its a research project, rather than a "grab the existing stats" thing, which we should, as a community, be able to do, right? right?

I'll ask the twitters, and DHanswers, but if you tweet me or message me or leave a comment here, I'll update the list, above. Maybe someone will make that infographic...

Update: Desmond Schmidt did an analysis of the jobs posted to Humanist
"There have been a lot of advertisements for jobs lately on Humanist.
So I used the Humanist archive to do a survey of the last 10 years.
I counted jobs that had both a digital and a humanities component, were
full time, lasted at least 12 months and were at PostDoc level or higher".

2002: 11
2003: 6
2004: 15
2005: 15
2006: 18
2007: 24
2008: 27 (incomplete - 1/2 year)
2009: 36
2010: 58
2011: 65 so far

Breakdown by country:
US: 133
GB: 65
CA: 35
IE: 18
DE: 13
FR: 8
IL: 3
NO: 2
NL: 2
ES: 2
AT: 1
AU: 1
BE: 1

Normalised by population:
IE: 4.0
GB: 1.051779935
CA: 1.038575668
US: 0.433224756
NO: 0.416666667
IL: 0.405405405
DE: 0.158924205
FR: 0.127795527
NL: 0.121212121
AT: 0.119047619
BE: 0.092592593
AU: 0.04587156
ES: 0.043572985

You can also see the Digital Humanities job trends from but the percentages are so small I'm not sure its statistically worth including.

Friday, 25 November 2011

Computer Games and author lists

One of the more unusual titles on my list of publications, that doesnt seem to fit in with my previous trajectory, is
Gooding, P and Terras, M (2008) ‘Grand Theft Archive’: a quantitative analysis of the current state of computer game preservation. The International Journal of Digital Curation, 3 (2). PDF.
Its an interesting one, this: trying to articulate the extent of the large-scale loss of the early years of gaming history (particularly in the UK), highlighting games' vulnerability. We did a quantitative case study - trying to get hold of copies of known games through every channel possible, highlighting the inadequacies of even available metadata. There is clearly a PhD study in this, should anyone want to take it forward. I also think its important work, that needs doing sooner rather than later. (For those interested in this area, do also see the Preserving Virtual Worlds project, although they didnt cite our previous work. grump).

So that's the topic, but there's an even more interesting point to be made about this paper. I'm second author, because it was the work of one of my Master's students at the time, Paul Gooding. Paul was on the MA in Library and Information Studies at UCL. I usually supervise around 10 student dissertations a year, previously from Electronic Communication and Publishing, and this year I'll mostly be supervising the cohort from the MA/MSc in Digital Humanities, which is running for the first time this session. I really enjoy supervising our master's students - most are really very bright, driven, and dedicated. I dont have to supervise any librarians or archivists: but occasionally I choose to take on a few extra students from these programmes for their dissertation, both to help my colleagues in LIS, and to work on interesting digital topics in those areas. Paul approached me with this topic, we worked up a methodology, and he did the leg work and the write up. I then encouraged him to submit this to a journal, and I spent a few days turning his masters dissertation into the published paper you see here.

The question is, then: when is it ok to be named as author on work which emanates from a student dissertation? When should you leave it, and say, "this is the students work"? This is a huge issue in graduate studies, and one I tread very carefully in. I hear tale, and have had colleagues, who insist on having their name as first author in anything their research groups publish, even when they havent had anything to do with the research in question. (I called him out on it for being morally wrong: he didnt answer any email from me for a year, which was slightly problematic, given he was superior to me and had to sign off on various things). What makes me want to put my name as co-author in this paper? Why havent I published more with students? Why are the guidelines on this all so woolly? Why do some colleagues insist on having their names on papers when they havent been involved in them? Why do some students feel so maligned by their supervisors when they ask to be included on an author list, even when the supervisor has done huge amounts of work on their project? Its such a touchy subject. And actually - this touchiness carries on throughout interdsciplinary projects: publications and named authors are often the sticking point. (I'd advise anyone to look up Ruecker and Radzikowska's work on project charters: they say, decide all this at the start of a project, not at the end).

With this paper, I could say, hand on heart, that it would not exist without my continued work and input. The dissertation itself was based on a methodology I devised, and I worked very closely with Paul to undertake the study. The paper itself, whilst based on Paul's dissertation, required rewriting: it would not have got to this stage without my time and effort, and prior knowledge regarding what journals expect and want. I have no qualms, therefore, in having my name as second author on this piece: I did the work. As far as I know, Paul is delighted that this paper got published. But I've supervised a whole lot of stuff - some of which was of publishable quality, some not, some that made it to publication, some not - that I would never, ever, ask to be second author on. I have witnessed at first hand colleagues who do not have my scruples.

After a few years out in the real world, Paul is back with us! He is heading into his second year of PhD study, supervised by Claire Warwick (first supervisor) and myself (as secondary supervisor), looking into large scale digitisation initiatives, particularly doing some user studies on the British Library's digital collections. It's a great project, and I'm glad he's come back to do some further study with us.

I'll continue to tread careful about author names, and publication, though, particularly when graduate student work is involved.

Incidentally, the journal that this is published in, The International Journal of Digital Curation, is open access - all articles are available for free. Its a good read.

Monday, 14 November 2011

Should We Just Send A Copy? On Digitisation... and the Mona Lisa

In late 2008 I was asked to give a couple of plenaries/big guest lectures the next summer: one for the Digital Humanities Summer Institute, and one for the Art Libraries Society (ARLIS) 40th Anniversary Conference. I was getting a bit bored of standing in front of a powerpoint talking through bullet points of my research, and wanted to do something a bit more exploratory. But what to do? I had my antennae up for clues.

Around that time, Europeana - the online digital library - launched, and promptly fell over. Many people on the site were searching for "Mona Lisa" when it crashed.

Around that time, I was watching a lot of documentaries about art history on BBC4. I watched one by the critic Robert Hughes called "the Business of Art" (also called the Mona Lisa Curse in some listings), where he traces the obscene growth of the art world to the trip the Mona Lisa took to New York and Washington in 1963. 1,600,000 visitors – more than 30,000 viewers per day – filed past the painting. In particular, he remarked that Andy Warhol - then a struggling artist yet to have his major breakthrough -refused to join the hoardes queuing up to see it, remarking
“Why don’t they just have someone copy it and send the copy? No-one would know the difference.” (Hughes 2006, p. 223)
Put two and two together and what do you get? An overview paper on some of the issues on digitisation, use, and usefulness. What are we doing when we create digital surrogates of cultural and heritage material? What are they for? Should we just send a copy?

You can see me give the plenary in 2009 at the DHSI summer school on youtube (no I havent watched it myself) And here is the resulting paper. It's a gallop round the houses, but I am very fond of it:
Terras, M (2010) Should we just send a copy? Digitisation, Use and Usefulness. Art Libraries Journal, 35 (1). PDF.
Reference: Hughes, R. (2006). “Things I Didn’t Know”. Knopf. London. I've heard this quote repeated elsewhere, but this is the only source I can find. It may, indeed, be apocryphal.
The above image is Andy Warhol's "Thirty are better than one".

On missing out

When you choose to go on leave from a University for a year, you make the choice to miss certain things. Meetings about your research. Team meetings about your centre. Grant writing sessions for projects you were previously on. Guest lectures. Project Launches. Having access to physical resources such as libraries. Hanging out in the pub with colleagues and students. Coffee. Lunch. Bumping into people in the corridor, and just saying hi. I didn't mind missing any of these things, really, when I was on maternity leave: it was my choice to have children, and to take almost the maximum (very generous) maternity leave I was granted from my job. It was all good. Except one thing.

The one thing I missed when I was on leave was the PhD Viva, and resulting celebrations, of my first PhD student to go through the system under my watchful eye: Ernesto Priego. I was Ernesto's secondary supervisor (Claire Warwick his first), on his PhD in web comics, where he researched

the impact of analog and digital technologies on comics... debating the manner in which theories of materiality illuminate the media-specificity of comics, webcomics, mobile comics apps, and how comic book culture fits within current debates about the future of the book.

Ernesto passed back in the spring, and I've watched on the twitters as he does corrections, hands in, gets things bound up, and gets the final letter through. Well done Doctor Ernesto! Sorry I wasnt there to help drink champagne. I know you know it was impossible for me to come into the city at that time. It really was the only thing I regret missing over the year that I was on leave.

Ernesto initiated and co-organises The Comics Grid, a web-based international collaboratory of comics scholars. You can also find him on the twitters, @ernestopriego. And tomorrow he is coming to see me, to catch up, and to start doing a little research assistant work for me on a Super Top Secret Project which we cant talk about until the Spring. I'm looking forward to it.

Which leads me to think... I have ten PhD students at the moment. A lot of my new research is focussing around what they are doing, and I'm having tremendous fun working with them, and colleagues across UCL who are also supervising. With their permission, I'm going to start blogging about the work that we are doing right now.

But Ernesto, as honorary PhD student (do we ever leave our PhD supervisors?) is the first to be mentioned on here.

The image above, btw, is taken from my autographed copy of PhD Comics. Ernesto took Jorge Cham on a tour of UCL and they swung by my office. It was like a medieval scholar meeting Chaucer, and the look on Ernesto's face, to be chatting with one of the people he had studied in depth, was one of my favourite memories of supervising Ernesto's PhD.

Monday, 7 November 2011

The Birth of TEI By Example

Sometimes academic projects come about due to a combination of necessity and wishful thinking. TEI by Example was a product of needing to fill a gap in funding, whilst wishing something into existence that I wanted someone else to do in the first place.

I've talked before about a pop-stack of ideas that exist for "need further work!". As well as that I have a pop-stack, or wishlist, of Things I Wish Existed That Would Make My Academic Life Easier. One of these was an online set of tutorials to teach TEI.

The Text Encoding Initiative guidelines, as we all know, are one of the bedrocks of Digital Humanities. If you want to be in DH, you need to know XML, and TEI, and markup. Which means it has to be taught. Even if, like me, you are not terribly interested in markup per se, and you never use it in your research, and you are not part of the TEI community, you will still need to cover it at some point in a class in Digital Humanities.

Now, there are some excellent people within the TEI doing some excellent teaching. There is some good teaching stuff available online. But what I needed, really, was some point and click tutorials that I could direct my masters students to after an introductory lecture on TEI. When learning code, it's the done thing that you look at and play with examples of code. Where oh where oh where was TEI by example? Where were examples of marked up texts people could see to learn from?

In late 2005 I was chatting to Edward Vanhoutte about his team at the Centrum voor Teksteditie en Bronnenstudie, Ghent, about how things were going up in the Centre. Unlike me, Edward and Ron Van den Branden markup texts all the time, and have considerable expertise in the markup of correspondence material. Edward was saying there was a gap in funds coming up for Ron, as they would be between projects. He needed to find a couple of months of work to pay him. Easy, I said. Apply to the Association for Literary and Linguistic Computing for a small grant, to build a set of tutorials for teaching TEI. Call it TEI by Example.
So we did. And we got the money, and we did.

It will take two months! we said. We started in 2006. Now, given I dont really do TEI, I couldnt really write the tutorials. Edward took that on, but he was in the phase of doing a lot of childcare with very young kids (and this was above and beyond our dayjobs). Ron built the infrastructure and did a fantastic job sorting out the quizzes, and set to building the online validator. I chivvied and tested and harangued. I was asked to give a plenary about the project at TEI@20: 20 Years of Supporting the Digital Humanities Conference, University of Maryland, in November 2007. It will be finished by then! we thought. Of course it wasnt. I gave the plenary when I was 8 weeks pregnant with the bump that turned out to be The Boy. (I cant recommend flying across the pond to give a plenary when the only thing you can stomach is salt and vinegar crisps. I was terrified that customs would take away my stash, and what would I eat when I was there? I also cant recommend giving a plenary when you think you are going barf over the front row's shoes. But I digress). We worked on the tutorials some more. 2008 came and went,with me mostly on Maternity leave. Then Ron also joined the parenting club! Hurrah! TEI by Example was finally born in July 2010, a mere 4 years late. Some gestation that was. Its like the closing credits of Toy Story 3, when they list all the kids born during the production phase.

Its worth also saying that we encountered a fair bit of resistance to TEI by Example from the TEI community. Folks were not interested, in general, in giving us access to fragments of their code. Promises were made and not kept, emails not replied to, snark was thrown on mailing lists. Who were we to build something beyond the TEI community! Who did we thinks we was!

Meanwhile, TEI by Example has been a quiet success. As Edward said in his plenary to this year's TEI meeting:
Fifteen months after the launch of the tutorials, the site has attracted close to 30,000 unique page views with 1,900 unique views for the modules on primary sources and critical editing together. The statistics and logs show that users are finding their way to the tutorials directly, via Digital Humanities courses or via the TEI website and we see that there is high activity from the US, Germany, the UK, France, and Canada: not surprisingly countries with a high digital humanities and digital editing profile. And we're particularly proud of our single visit from Vatican City. 18% of the visitors stay for more than 15 minutes on the site, which suggests that they really do some work. We also see a decent amount of returning visitors.
We hear its being translated into French. We know people within the TEI use it in their teaching. We're getting positive comments about it, and relationships with certain folks have improved dramatically. And I can sit my students in front of it for an hour, after an introductory lecture about TEI. Well I will do, when I return from my second maternity leave break to resume teaching next year.

As for the resulting paper? Here it is - my plenary which explains why we needed to go down this route.

Terras, M and Van den Branden, R and Vanhoutte, E (2009) Teaching TEI: The Need for TEI by Example. Literary and Linguistic Computing , 24 (3) 297 - 306. 10.1093/llc/fqp018. PDF

What happens when you tweet an Open Access Paper

So a few weeks ago, I tweeted and posted about this paper
Terras, M (2009) "Digital Curiosities: Resource Creation Via Amateur Digitisation". Literary and Linguistic Computing, 25 (4) 425 - 438. Available in PDF.
I thought it worth revisiting the results of this. Is it worth me digging out the full text, running the gamut with the UCL repository, and trying to spend the time putting my previous research online? Is Open Access a gamble that pays - and if so, in what way?

Prior to me blogging and tweeting about the paper, it got downloaded twice (not by me). The day I tweeted and blogged it, it immediately got 140 downloads. This was on a friday: on the saturday and sunday it got downloaded, but by fewer people - on monday it was retweeted and it got a further 140 or so downloads. I have no idea what happened on the 24th October - someone must have linked to it? Posted it on a blog? Then there were a further 80 downloads. Then the traditional long tail, then it all goes quiet.

All in all, its been downloaded 535 times since it went live, from all over the world: USA (163), UK (107), Germany (14), Australia (10), Canada (10), and the long tail of beyond: Belgium, France, Ireland Netherlands, Japan, Spain, Greece, Italy, South Africa, Mexico, Switzerland, Finland, Denmark, Norway, Sweden, Portugal, Europe, UAE, "unknown".

Worth it, then? Well there are a few things to say about this.
  • I have no idea how many times it is read, accessed, downloaded in the journal itself. So seeing this - 500 reads in a week! makes me think, wow: people are reading something I have written!
  • It must be all relative, surely. Is 500 full downloads good? Who can tell? All I can say is that it puts it into the top 10 - maybe top 5 - papers downloaded from the UCL repository last month (I wont know until someone updates the webpage with last months stats).
  • If I tell you that the most accessed item from our department ever in the UCL repository, which was put in there five years ago, has had 1000 full text downloads, then 500 downloads in a week aint too shabby. They didnt blog or tweet it, its just sitting there.
  • There is a close correlation to when I tweet the paper and downloads.
  • There can be a compulsion to start to pay attention to stats. Man, it gets addictive. But is this where we want to be headed: academia as X-factor? Hmmm.

Ergo, if you want people to read your papers, make them open access, and let the community know (via blogs, twitter, etc) where to get them. Not rocket science. But worth spending time doing. Just dont develop a stats habit.

I'll feature the next one from my back catalogue, shortly...

Update 08/11/11: As a result of posting this, and this post getting retweeted far and wide (thanks all!) the paper got downloaded a further 120 times. See? See?

Update 08/11/11: The UCL stats page for downloads last month has now been updated: this was the 5th most downloaded paper in the UCL repository in October 2011. Yeah, I'm up there with fat tax, seaworthiness, preventative nutrition, and the peri-urban(?) interface. I'm not sure how many papers in total there are in the repository - I cant find that stat - but a search for "the" or "a" both brings back 224,575 papers, if that is anything to go by.

Update 10/11/11: The Digital Curation Manager at UCL, Martin Moyle, has been in touch to confirm that 6486 of the 224, 575 papers in the repository have downloadable full text attached. And told me where I can generate this stat. Whoops! (Thanks Martin).

Update 10/11/11: After this post, there is the predictable long tail happening with stats. Another 60 downloads on the 8th, 10 on the 9th. Its all quite predictable - yet nice that the paper is wending its way to interested parties!

Update 25/11/11: This post was mentioned in the Times Higher last week, and the paper has now been downladed 805 times in total.

Full Steam Ahead

The people at UCL Discovery are now talking to me, and things are moving forward. I have lots of things ready to go and in the pipeline - which gives me, I reckon, at least one thing to talk about a week on here for the next academic year. I'm also feeding back usability issues to them (like, NO, dont delete the conference paper just because I published a paper in a journal with the same title! YES thats me in that record, even though my name isn't capitalised. NO, Melody Terras who works in Social Sciences at the University of the West of Scotland IS NOT ME).

You can tell I'm delighted by this whole state of affairs, huh?

Seriously, the whole process of sorting out my publication record on the institutional servers/ system is turning out to be a massive timesink. Previously, I had just been keeping a note myself of what I had been up to on my old webpage. That has to go, and I have to generate everything from their database. Which is taking me hours to deal with. Progress!

But, as someone once passive aggressively said to me, "lets move on". Lets talk about what I've been up to - and also, what I'm going to be up to in future. And let's talk about what happens when you tweet something...

Tuesday, 1 November 2011

On thumb twiddling

Well. I bet you are wondering what has happened to my experiment in Open Access Publishing, where I am putting papers up online in our institutional repository, and sharing the best tales behind the papers here.

If it has stalled, its certainly not my fault. Finding drafts of things is much easier, so far, than I had imagined - I'm the messiest person in real life, but turns out I'm pretty organised, informationally. No. The slowness comes from - shall we call it a pipeline?

I'm currently waiting on over ten papers to go "live" in our institutional repository, since I have uploaded them. I've been waiting on them to go live for a month. I have no idea how the process works. I submit papers: I wait. I get no email to indicate progress. Sometimes the person (and it is a person, they make a note on the record) deletes the file, with no reason given. I upload it again. It gets deleted. I send emails. They are ignored. I send more emails. They get replies from an email address that doesnt give the person's name, just the "institutional repository". I reply to those emails. They are ignored.

And so it goes. Lessons in black-box service provision, if ever there was one. Absolutely infuriating. I can see how people give up on uploading things to institutional repositories. I simply dont have the time to hassle them into providing the service they are supposed to. I am not asking them to do anything difficult, after all: just to mount a file. Italics here means: grrrr.

But I shall plug on. I've started complaining further up the tree - hopefully it will trickle down and eventually I shall have something to show for all my hard work - which they say they want to show off. Hmmmm.

I didnt count on the institutional repository itself to be a barrier in making my work available through open access. It means I have actually stopped submitting things. What's the point? I have 100+ more papers to put up there. Why should I waste the time in submitting things if they are ignored?

I also made the decision to blog once about each research project, and tweet the remainder of papers that come out. The LAIRAH project, for example, which I blogged about below, also featured others papers that are freely available for download on the institutional repository. I'll list these at the end of this post: they made it through the barrier previously.

But for the more interesting stuff - and what tales I have to tell you! - you'll have to wait til someone (and it is a someone, not an anonymous pipeline, repository, or computer - how we hide behind these terms!) presses the button to make more stuff live. And stops deleting things willy nilly. Sigh.

Warwick, C., Galina, I., Rimmer, J., Terras, M., Blandford, A., Gow, J., and Buchanan, G. (2009). "Documentation and the users of digital resources in the humanities". Journal of Documentation. Volume:65, Issue: 1, Page: 33 - 57.

Warwick, C., Terras, M., Galina, I., Huntington, P., Pappa, N. (2008). “Library and Information Resources, and Users of Digital Resources in the Humanities”. Program: Electronic Library and Information Systems. Volume 42 Number 1. p. 5-27.

Warwick,C. Galina, I., Terras, M., Huntington, P., and Pappa, N. (2008). "The Master Builders: LAIRAH research on good practice in the construction of digital humanities projects" Literary and Linguistic Computing 23(3), 383-396.

Update: within 24 hours of throwing my rattles out the pram, things are moving, and we have action. Am pleased to say that finally we are making progress. I'll be able to start posting things again on here next week, as a result. Hurrah!