Tuesday 27 May 2014

Inaugural Lecture: A Decade in Digital Humanities

This is the crux of what I planned to say - or hoped to say! at my professorial inaugural lecture at UCL on the 27th May 2014. I'm not one for reading off a script though, so may have deviated, hesitated, or expanded on the night. A video of my talk on the night is now available. No I haven't watched it myself!

I decided to call my inaugural lecture "A Decade in Digital Humanities" for three reasons.
1. The term Digital Humanities has been commonly used to describe the application of computational methods in the arts and humanities for 10 years, since the publication, in 2004, of the Companion to Digital Humanities. "Digital Humanities" was quickly picked up by the academic community as a catch-all, big tent name for a range of activities in computing, the arts, and culture.  A decade on from the publication of this text, I thought it would be useful to reflect on the growth, spread, and changes that had occurred in our discipline, and my place within them.

2. This year sees me in my 10th year of being in an academic post. I joined UCL in August 2003, my first academic post after obtaining my doctorate, and since then have worked my way up the ranks from probationary lecturer, to senior lecturer, to reader, and now full professor. The professorial lecture gives me a rare chance to pause and look behind me to see what the body of work built up over this time represents, and what it means to be undertaking research in this area.

3. You'll have to wait for later in the lecture to see the third reason...

Who here would be comfortable defining what is meant by the term Digital Humanities? In this, the week of UCL Festival of the Arts, celebrating all things to do with the Arts and Humanities, let's go back to first principles. In UCLDH and 4Humanities' award winning infographic "The Humanities Matter" we defined the humanities as "academic disciplines that seek to understand and interpret the human experience, from individuals to entire cultures, engaging in the discovery, preservation, and communication of the past and present record to enable a deeper understanding of contemporary society." It stands to reason, then, that the Digital Humanities are computational methods that are trying to understand what it means to be human, in both our past and present society. But it may be easier if I give some brief examples to demonstrate the kind of work we Digital Humanists get up to.

One of the easiest things we can do with computers is count things. For data to be computationally manipulated, it has to be in numeric form. If we can get text into a computational form, we can easily count and manipulate the language, showing trends across time. For example, if we take a million words of conference abstracts from my discipline from the ALLC/ACH conference across various years, we can easily see how mentions of one technology (XML) becomes more popular, while another (SGML) is in decline. Much of the work in DH is in manipulating and processing and analysing text - our iOS app Textal is just part of that trajectory. Much of my work, though, has been in digital images, starting with developing systems to try and read damaged documents from Hadrian's wall, and more recently working on multispectral and 3D manipulation of damaged texts. We've also worked with museums on large scale 3D capture of cultural and heritage objects. The important thing about all of this is that as well as implementation, we're also interested in use and usage of these technologies, and what impact that they have on those working in culture and heritage, and the ability to study the past and present human record. We often innovate new systems, or adopt concepts and apply them to humanities projects, such as the crowdsourcing of Jeremy Bentham's handwriting by volunteers, or working with visitors to the Grant Museum of Zoology at UCL to encourage debate about zoological collections. We build, we test, we reflect back on what using these technologies means for the humanities, giving recommendations which can be useful across the sector. From these projects, its difficult to pin down what Digital Humanities actually is, but that sums up the difficulty of our discpline's title: it encourages thinking about computational methods in the arts and humanities, and then into culture and heritage, in as broad a sense as possible.

What made Digital Humanities spring, fully formed like Athena from the Head of Zeus, as an academic field in 2004? Was it because that was the first time quantifiable methods had been used in the Arts and Humanities? (remember - all computational methods require quantification). Well, of course that is nonsense. When you look back across the history of Humanities scholarship, quantifiable methods were used in the Arts and Humanities since the birth of Universities. If we think of the book as technology, from its inception scholars took it to pieces to see under the hood: concordances and indexes of works were manually created, such as this "Concordance or table made after the order of the alphabet" from 1579 which lists how many times concepts such as "abomination" appear in the New Testament. Or the work of Joseph Scaliger who in the early 1600s plotted the different periods in time in which different civilizations must have existed, through quantifiable methods. Or the work of August Schleicher in the 1850s who showed, by quantifiable methods, that the languages of Europe must have had a common historical root. All of these texts are available from UCL Library, none of which I have to leave my sofa to see because YAY! Digitisation! Changing humanities scholarship! - but the point is that quantifiable methods are part of established methods in the humanities, and have been for as long as the Humanities have existed. So when I undertook my first project at UCL, looking at whether we could use the high performance computing facilities at UCL to analyse historical census data - this is part of an quantifiable humanities academic tradition which harks back 500 years, just at a grander scale.

So what made Digital Humanities spring, fully formed like Athena from the Head of Zeus, as an academic field in 2004? Perhaps in 2004, this was the first time people had used computational techniques in the arts and humanities? But of course, that is nonsense too. When you look back at the history of computing - and not even digital computing, but the very first computer - the very first computer programmer, Ada Lovelace, hints at the possibilities for art, music, and understanding human knowledge and culture in her earliest writings. She understood that there was something more to the mathematical calculations afforded by this machine than science, and they called her a madwoman for it. Well, this madwoman has a (yet unproven) theory that if you look at the history of the first 100 electronic programmable computers in the 1950s, 1960s and 1970s across the world, you will see humanists eyeing them up and asking "how can I use, or develop this tool for use, in my research"? Its certainly true of Father Busa, working with IBM in the 1950s on the concordance of the works of Thomas Aquinas (counting, indexing, and manipulating words, as part of the historical trajectory of humanities methods stretching back 500 years, just a change in scale...) but also of Roy Wisbey, in Cambridge, who set up the Literary and Linguistic Computing Centre there in the 1960s. When the first computers arrived at UCL, the artists from the Slade School of Fine Art were over there like a shot to establish the Experimental and Computing Department. We should also mention Susan Hockey, who led various initiatives in text encoding, text analysis, and digital libraries. Susan, incidentally, gave me my first academic job here at UCL in 2003: UCL had included a Digital Resources in the Humanities module course as part of its MA offering for librarians and archivists in the School of Library, Archive and Information Studies (now the Department of Information Studies) from 2000, under Susan's auspices. But the point is, considering how best to use computing in the arts and humanities is not something which started in the 21st Century,  nor 2004, and Humanists have been looking at available tools, and how best to use them, since computation began. So when we undertook one of the latest projects at UCLDH, which came from looking at an iPhone, thinking "how can I use, or develop this tool for use, in my research in the Humanities" and developed an iOS app for text analysis, this is part of a longer trajectory of considering available computational tools, and how they may be appropriated, adopted, and adapted for our means in the humanities, just at a grander scale, as processing technologies increase in speed.

So why Digital Humanities, in 2004? Firstly, the coalescing of interested scholars into an identifiable field is an understandable academic response to societal changes. The speed of computing rises, the price of computing plummets, the information available on the internet (and the possibility to create new information) increases, use and usage of internet technologies has become commonplace. Remember, its up to Humanities scholars to look at the past and present record to enable a deeper understanding of contemporary society: quite frankly, it would be more alarming if an academic movement hadn't emerged looking at what using computational methods could do for our understanding of human society, both past and present, and how best we can grab the technical opportunities which fly by and appropriate them for our means, to inform both ourselves and others about the prospects of using computing in this area. The discipline of Digital Humanities is inevitable, and would have appeared whatever the title it was given.

Secondly, Digital Humanities is a handy, all inclusive, modern title which rebrands all the various work which has gone before it, such as Humanities Computing, Computing and the Humanities, Cultural Heritage Informatics, Humanities Advanced Technology... DH has a ring to is, and boy, what a rebranding it was. We tend to call it "Big Tent Digital Humanities" meaning: roll up! roll up! everyone using any computational method in any aspects of the arts and humanities is welcome! but really, Big Wave Digital Humanities may be more appropriate, as we countenance the sudden swell, dissipation, and speed of the activities of the discipline. Taking a peek at the mention of Digital Humanities on Google Ngrams we can see its sudden growth, and the fact that it is now used as a proper noun, with Capital Letters (although remember that this, counting words, is part of a long tradition of humanities scholarship, Google simply have more books to include in their count). We can see how DH has trended over time, appearing in headlines in the media. Many, many textbooks in DH appear, some of which I am responsible for myself. Journals appear, such as Digital Humanities Quarterly (of which I'm one of the general editors), and the ALLC/ACH conference renames itself Digital Humanities (this year, for my sins, I'm the Program Chair for DH2014 which will be held in Lausanne, Switzerland. We have seen over 700 proposals from more than 2000 vying for a space to present). There are many more DH conference presentations and workshop slots, worldwide, year on year. In 2010, I gathered together all the available evidence I could on DH in an infographic called Quantifying Digital Humanities, showing that there were 114 DH Centres in 24 countries. Today, not even four full years later, there are 195 DH Centers in 27 Countries. Those knowing how long it takes to set up a research centre know that this is phenomenal growth in the university and GLAM sector, and that institutional support must be strong, behind each and everyone of these.

UCL Centre for Digital Humanities is part of those who have joined the recently founded centres. We officially launched four years ago to the week of this lecture, in the same lecture hall where this lecture is being presented. We dont talk about the launch much - its not often I'm part of something at work which ends up featured in the political pages of the newspapers - but you'll have to google that to find out more (YAY! digital media! the internet never forgets!) but in those four years since launch we've undertaken a phenomenal amount of projects, covering many aspects of Humanities and Arts research, and considered Digital Humanities in its broadest sense. This isnt all me - there is an amazing team who are part of the Centre, and we've won various awards for our academic projects and collaborations, published many books, papers, and book chapters, and been part of successful funding bids from research councils worth tens of millions of pounds. One wonders what makes a Digital Humanities Centre attractive to universities that dont have one. Nope, I cant see what makes that level of activity attractive, at all.

So what proportion of Humanities scholars are now digital humanists? Back in 2005, participants in the Summit on Digital Tools in the Humanities at the University of Virginia estimated that "only about six percent of humanist scholars go beyond general purpose information technology and use digital resources and more complex digital tools in their scholarship" (p.4 of this PDF). By 2012, N. Katherine Hayles, in her chapter "How we think: transforming power and digital technologies" in David M. Berry's edited text "Understanding Digital Humanities", estimates that 10 per cent of Humanists are now digital humanists (p.59).  Now, in 2014, a forthcoming study from Ithaka S+R (with the working title of Sustaining the Digital Humanities: Institutional Strategies beyond the Start-up Phase) includes surveys of faculty at four American universities. In the departments surveyed at each institution, nearly 50% of faculty members indicated they have "created or managed" digital resources. Granted, the departments were chosen by campus staff (often at the library) who felt there was some significant activity taking     place there. The percentage of these "creators" was consistent across all universities (Brown, Columbia, University of Wisconsin, Indiana University), and most of the creators also felt that their creation was intended for public use (not just their own research aims), and would require ongoing development in the future.

50% of humanists are involved in digital activity, are digital humanists. How can this possibly be? And how can we conceptualise what it means to be a digital humanist, amongst this spread of activity and range of available technology: is creating or managing digital resources the same as being a digital humanist? At a time where (nearly) every library catalogue is digitised and available online, and (nearly) every book manuscript written on a work processor, and many historical documents digitised and available for consulting from your own sofa, does that make everybody working in the humanities a digital humanist? How can I begin to conceptualise my contribution, and my place, and where my work sits within Big Wave Digital Humanities?

I find it useful, here to turn to Roger's Innovation Adoption Curve, a sociological model that looks at how technology spreads through society. This is a bell curve, and right at the start of adoption of technology, are a few innovators, experimenting (and developing) new technology. These innovators sometimes persuade a larger number of early adopters to take up the new technology on offer, and only once a sufficient mass of users are achieved, does the technology "cross the chasm" and become used by the majority of individuals in a society (who are split into an early majority, or late majority). Finally, we have adoption by the "laggards", who are slow in taking up technologies, but do so if they have permeated throughout society. (Hard not to think, here, of my elderly grandmother who recently got her first mobile phone).  Now, this model is useful as we can plot along it some of the technologies which are available to a humanist. Things like word processing, and searching for references online, and even looking up the digitised texts which I showed at the start of this lecture: even the technologically laggard humanists can do it now, and although these technologies are changing scholarship, its a question of scale (better! faster! more!) rather than of approach or technique, for the main. Technically facilitated tasks like updating websites, using and updating wikis, using social media: even the late majority of humanists can do it now. Online tools are available, such as Voyant, which allow you to do text analysis, and manipulate texts to see the underlying patterns: so the early majority of humanists can use these tools should they want to. But the most difficult, intellectual work of applying technology in the humanities still occurs before the chasm has been crossed, in the phase of innovation, and early adoption, where we are looking at the technologies that cross our path and saying "how can I use, or develop this tool for use, in my research?", much like those in the 1950s or 1960s who were coming across university mainframes and asking how best to apply that in the literary and linguistic arena. It's important to note, of course, that this wave of technology keeps on coming at us, and the place of where technology sits along the curve changes: 20 years ago, had you been making a website for your humanities project, you would have been an innovator, rather than a late majority, and the same holds for word processing 40 years ago. The technology keeps coming: we have to respond to this, innovate, adopt, and see what is useful or useable for, or used by, the majority of people in our discpline.

Now (and this is the most contentious thing I'm going to say in my whole lecture, for those attending who are dyed-in-the-wool Digital Humanists) one of the problems that we have as a movement is that we tend to get caught up and fixated upon a certain technological solution. For example, every DH program I've come across teaches XML, that technology which took over from SGML in the conference abstracts - as the best practice way to encode text. And there's no doubt that XML provides the framework with which we can both explore theoretically what is means to describe texts computationally, in such a way they retain the information in their printed or manuscript form, whilst also the means to build and test prototypes. But XML as a technological standard has been around for 16 years, and technology moves on, but DH doesnt seem to be doing so. In many ways, DH's relationship to XML is similar to the AI community's relationship with LISP: the means of computational expression in the language or format suit the questions which need to be asked by the field, so there is no need to use other technologies which come on stream, which may be more efficient from a computational point of view, as we explore what is means to work with our question in this computational way. And that's ok, but we shouldnt be blind to the fact that, hey! technology is advancing all the time and, also, XML is not a technology that crossed the chasm: it may be in use for technical systems, but its not one that you see a lot of the general populace using. This, in turn, means that DH has permanently hitched its wagon to an aging technology, which is hard to explain to others, including other non-XML humanists, whilst other things are happening in the technological world around us. Just something we have to watch out for, when building teaching programs, or looking at the scope of outputs in our field. We dont want to be left behind as the digital in digital humanities rolls on without us.

I find it useful to plot my research on the Innovation Curve, to see where what I am doing sits. So, the work on counting terms across a corpus - very much sits in the early majority, given the availability of tools to do so. But the work on building an iPhone app to do so - very much innovation: it took a lot of pure programming in a relatively new space to achieve it. The work in image processing I do is either innovation (we are publishing here in pure computer/engineering science venues, as well as in humanities venues, which I'm very proud of), or we adopt technologies our academic colleagues in the engineering sciences have generated and roll them out to a humanities or heritage application. Our work on user studies is something completely different though: here we are generally looking at how the majority of people are using an extant text, or (in the case of something like Transcribe Bentham, or QRator) we are conducting reception studies, where we innovate and build a technology, launch it, and study its uptake across the whole cycle. We can see, then a range of DH activity across the innovation cycle, but the majority of the work I do is certainly at the start of the innovation curve. Is this where DH sits? I like to think so, but more to the point, I'm confident its where I sit best, when doing DH.

I need here to show you another curve, though. This time, the Gartner Hype Cycle, which looks at how technologies are launched, mature, and are applied (so people know when to invest). The premise of this is that when technologies are first triggered, everyone thinks they are going to be the Next Big Thing, and so they reach "the peak of inflated expectations", before crashing down into a "trough of disillusionment" when those adopting them realise they aren't that great at all. Its hard work to get technologies up the "slope of enlightenment" where useful, useable applications are found, and few technologies make it to the "plateau of productivity" where they become profitable. Its a useful curve - this year's predictions show Big Data right at the top of the peak, which chimes in with media coverage of how it will solve everything, for example. So where would I put DH, if I had to as a movement, on this curve?

I'd put it at the top. At the top of the Peak of Inflated Expectations. We've got a lot of pressure on us to prove our johnny-come-lately benefit to the world of academia, to demonstrate our worth, to show that the investment made in us over the past few years is worth it (whilst also bringing in further investments in research funding, to meet institutional expectations). After a peak, comes a crash, and we have to be prepared for the tide to turn and the backlash to begin, after the years of media hype and raised expectations. So how do we get to the plateau of productivity of Digital Humanities?

First, I would argue that we have to understand our lineage: that the current manifestation of DH is a logical progression of qualitative methods used in the humanities for the past 500 years. That the current manifestation of DH is a logical progression of humans wondering what the potential is for applying computational methods to humanities problems, which has been going on in the digital space for the past 60 years. These combined trajectories aren't going away, and despite what funding cuts and media backlash may come at us, it is the role of the digital humanist to understand and investigate how computers can be used to question what it means to be human, and the human record, in both our past and present society. Secure in our mission, we can carry on whatever the storm throws at us.

Second, I would argue we have to ignore naysayers who are unsure about this new Digital Humanities lark (and believe me, there are plenty, even in my own department) and just do good work. The way to demonstrate our worth is to demonstrate our worth through doing good work. We have to keep asking questions about computational methods, computational processes, and the potentials that they offer humanities scholars, as well as the pitfalls, to explore this changing information environment from the humanities viewpoint. Its not just about building websites, or putting information online, its about innovating and adopting, and questioning while we build about the ramifications of doing this, the impact on the humanities, the issues using technology raises, and the answers it provides that you couldn't otherwise generate, to do good work in Digital Humanities. I realise this is very Calvinist of me - you can take the lass out of Scotland - but I do see that we have to be engaging with theories and questions of what is means to be doing this work in this way, as well as updating a website or creating a digital file. A continuation of what it means to be a humanities scholar, in the digital space.

I'm not one for looking back, and despite the title, I deliberately didn't want this inaugural to be a survey of all the projects I have undertaken over the past ten years - then I did this, then I talked to that person, then I visited there - but when I look back over the variety and range of projects, publications, and outputs that I've worked on, either on my own, or as part of a team (there's a lot of teamwork that has gone on here) I'm firstly surprised at how much of it there is and the range of topics we've covered, and the opportunities we've pounced on. I see a body of work which explores various aspects of what it means to be applying digital technologies in the humanities space, and facilitates both those in engineering science and those in the humanities to explore issues which are important to them. I've learn't things along the way about the nature of interdisciplinary work, the nature of teams, the nature of the academic publishing and peer review process, the nature of the grant funding process, but I've written about that elsewhere. There are things, also, that I am proud of that are physical rather than purely digital: over the last few years I'm most proud of building the UCL Multi-Modal digitisation suite, which is a shared space between the UCL Library Services, UCL Faculty of Arts and Humanities, and UCL Faculty of Engineering Science, contributing to the infrastructure of UCL in a collaborative endeavor. But what I see here, as a common thread, is that the work I do tends to sit right at the beginning of the technology adoption cycle, aiding and abetting the application of technology within the arts, humanities, and heritage, and I'm comfortable with that. There's a strength in knowing your place, and your remit, and what you do best.

So the third reason for calling my talk "A Decade in Digital Humanities" is that I didn't say which decade we were talking about, and it is time also to look towards the future, and what the next ten years holds for both DH, as the field turns into a teenager, and for me, as I go into my next decade here at UCL. I'm not one for crystal balls, so I'll keep my scrying brief. I see an inevitable fragmentation of the DH community and DH focus - it was never conceived of as a homogenous entity anyway, and it is the nature of waves and swells that they will dissipate. We'll see (we are already seeing) more focussed groups of scholarly work around, say, Geographical Information Systems and literature, as people specialise and work on specific technologies and specific methods. The technology will keep coming, and its up to individual humanities scholars to respond to what is appropriate to their research question: the effects of DH scholarship will continue to ripple out across the humanities as technologies go along the adoption cycle, and certain aspects of digital research will just become normal for humanities scholars, as time goes on. But I do see that there will always be a place, right at the start of the technology innovation uptake curve, for specialists in Digital Humanities to sit, watching out for these changing and emerging technologies, setting up pilot projects to experiment with different aspects of these technologies, feeding back recommendations and the potential ramifications for other humanities and engineering scholars and those within the wider cultural and heritage sector, and exploring what is means to be doing humanities research in that area. I'm happy to remain there, and I see that this will remain my place working with other humanists, and engineers and computer scientists, over the next decade. I'm delighted to be a co-investigator on the doctoral training centre for Science and Engineering in the Arts Heritage and Archaeology, which is the EPSRC's largest every investment in Heritage Science, and for the next 8 years we'll be training up a range of doctoral students in this cross section of the arts, heritage, humanities, and engineering and conservation science. (Perhaps what I really do is Heritage Science, but that's another talk entirely, and DH has work to do with the Heritage Science community in future).  That said, we do have work to do, in keeping an eye to making sure people know about the successes, outputs, and impacts of DH work. Given the expectations foisted upon us, we have to learn to be more vocal about our objectives, our remit, and our results. It's our job to be thinking what it means to use digital technologies in humanities research, and just research, full stop. As a result, our insights can benefit a range of other fields, if we communicate them effectively.

Digital technologies are not going away any time soon: and although DH has had a rapid swell, it will remain essential that we investigate, use, and experiment with technologies over the coming decade. There is a new Companion to Digital Humanities coming out in late 2014, showing how the technologies used in humanities research have developed since the first edition (I'm delighted to have written a chapter on our public engagement work for it), and our see our field, as well as knowing where we have come from, has to understand that the technological wave on which we sail is continually on the move. I hope I've shown here that our uptake of technologies in the humanities is, and will continue to be, a moving target, and that as part of a longer trajectory of investigation into humanities methods, DH is a modern but necessary, and even inevitable, part of the Humanities, and even computational, landscape. I look forward to what adventures the next Decade in Digital Humanities holds. There is so much to do!

Now, that is where I'd normally pause and say thank you for your attention, but hey, its my inaugural, so I'll cry if I want to. I have a few brief thanks to make - its quite a lick to go from probationary lecturer to full prof in ten years, and so I have to thank those who have supported me. Thanks go to my family up in Scotland for all their support, and my family of my own: many of you know that in the past few year's I've had three children, so biggest thanks of all go to my husband Os, aka Expert Sleepers, for his forbearance and baby juggling skillz. I've been blessed with an amazing support network of friends, who have supported my enormously over this period. My first academic supervisor was Professor Seamus Ross, who kick started my interest in this area, and his support and interest at the start of my career really set me up for the work I do today. Likewise, my PhD supervisor Professor Alan Bowman remains a fantastic mentor: thank you, Alan. My other PhD supervisor, Professor Sir Mike Brady, made me promise (when I got my doctorate in engineering) not to go near any nuclear power stations or bridges, a promise I have kept - thanks Mike. I've already mentioned that Professor Susan Hockey gave me my first academic job: but her work remains an inspiration on what is possible in computing in the arts and humanities. I work with an amazing team of people at UCLDH and I thank them for their input both for the centre and on our various projects. Special thanks go to Rudolf Ammann, our designer at large, who helped prepare the graphics for this lecture.

But in this week of UCL's Festival of the Arts and Humanities, its good to pause and see how embedded Digital Humanities research is now throughout college, and how much we work, in the Humanities, with those around us. The projects I've shown, albeit briefly, today, are carried out in league with various other faculties (UCLDH reports to both the Arts and Humanities and Engineering Faculties here). Colleagues come from a range of different departments including not only those across the Arts Faculty, but the Bartlett Centre for Advanced Spatial Analysis (in the UCL Bartlett Faculty of the Built Environment), and across the UCL Faculty of Engineering (I have joint projects with Medical Physics, Computer Science, and Civil, Environmental, and Geomatic Engineering). We are dependent on input from both our colleagues in UCL Library Services, and UCL Museums and Collections, and work very closely with items in all the collections across college. The success of DH at UCL is then dependent on the institutional context we have here. Digital Humanities is now embedded into college life at UCL, and in this week of the Festival of the Arts, my final thanks go to UCL as an community for its institutional support in encouraging us to ride the DH wave: for without being at UCL, my decade in digital humanities would have been completely different.

Saturday 24 May 2014

Roy Wisbey, and Literary and Linguistic Computing, 1965 style

I recently got in touch with Professor Roy Wisbey, who set up the University of Cambridge's Linguistic Computing Centre in 1960, to invite him to my inaugural lecture. He is not able to attend (but passes on his regards to those who know him!) and he also briefly loaned me this newspaper article, from 24th September 1965, from the Cambridge News. A very early piece of Humanities Computing history! It's in very fragile condition - I've spliced it together here to give the whole piece in one image (and the blog stylesheet is not my friend here - will sort out later - but...) - enjoy!

The use of computers will save the scholar years of mindless drudgery! indeed!

Friday 16 May 2014

Siberian Digital Humanities Adventure

The Siberian Federal University
Greetings from Krasnoyark, Siberia, where for the past week I've been hanging out at the Siberian Federal University, the largest university in the Siberian region, which is in the top rankings in Russia. I've been giving some guest lectures on digital humanities, meeting various staff and students, and plotting with them on how to support their work and how to make connections to the wider digital humanities community.

How did I end up here? Its all down to the wonderful Inna Kizhner who approached me nearly two years ago, in my guise then as secretary of what is now the European Association for Digital Humanities. After helping source some teaching materials, in English and Russian, for their taught courses, Inna remarked to me "no-one ever comes to Siberia..." and I immediately said "ask me!". And finally, after much preparation, here I am.

Siberian Federal University are establishing a solid Digital Humanities presence. In the Institute of Humanities they currently offer digital humanities modules at both undergraduate and postgraduate level, and also an undergraduate module in the subject area of digital history (which next year will be taught by Inna). They have a digital lab (door sign, above!)  and digitisation lab. They have a range of projects they have been working on with both researchers and students, many of them led by Maxim Rumyantsev who is now the university's deputy head, so there is positive institutional support here. These projects are mostly in the area of multimedia and digitisation. For example, working with the Museum of Geology of Central Siberia to create the simply stunning companion to their minerals collection (it is no easy task to capture minerals in this detail, at this quality); capturing, virtually exploring,  and explaining regional heritage architecture (which is fast disappearing under new developments in this region) from the nearby town of Yeniseisk, documenting regional art shows and youth art shows; capturing high resolution images of the art contained within the Surikov Museum (life size copies of which adorn the university's walls at every turn); working with Gigapan capture methods and the State Russian Museum to create zoomable images of large art works (can you spot Pushkin?); and creating an interactive model of the Siberian Federal University campus itself. They are keen, now, to be making connections with others across the world, and I'm delighted to be helping them, and introducing them to various figures, and associations, in Digital Humanities. There is much work to be done, we have plans set out, and they are keen to make new relationships and new collaborations.

Its not all been work! I've been welcomed into colleagues' homes for meals (often meeting their families), treated at friendly restaurants (the food is wonderful), and toured round museums and supermarkets (Inna patiently put up with me pointing and exclaiming at various products we dont have in the UK, such as dried fish, and tinned horse). Today we went to the Krasnoyarsk Dam, 30km upstream from the city, on a glorious spring day which showed off this remarkable feat of engineering (which is so exceptional it features on banknotes across Russia). There is a heavy security presence, and no photos allowed, but I did manage this sneaky selfie...

It's been a fantastic, trip, and I've been very welcome here. Thanks to Inna, Maxim and Marina for their hospitality, and I look forward to further opportunities, visits and introducing anyone who wants to be introduced (if I can be of help, drop me an email and I will forward it on). I have to admit I was nervous about my trip here - but instead of stress I've found friendly connections, and much opportunity to help further establish DH in this region, and throughout Russia. Now to pack, and begin the long trip home, where my three small boys are missing their mummy on the other side of the world (and I them). до свидания!

Thursday 15 May 2014

Digitisation's Most Wanted

What are the most commonly accessed digitised items from heritage organisations? Even asking the question leads to further understanding about the current digitisation landscape.

Have you seen this Dog? Last spotted on the Flickr account of the National Library of Wales. Dog with a Pipe in Its Mouth, Taken by P. B. Abery, 1940s.
Last month, at a meeting at the National Library of Scotland, an interesting fact flew by me. The NLS has hundreds of thousands of digitised items online, so what do you think is the most popular, and most regularly accessed and/or downloaded? (it is difficult to make the distinction regarding accessed or downloaded on most sites.) Is it the original Robert Burns material? The last letter of Mary Queen of Scots? or any of the 86,000 maps held in this, one of the best map collections worldwide? No. It is "A grammar and dictionary of the Malay language : with a preliminary dissertation" by John Crawfurd, published in 1852. This is accessed by hundreds of people every month - mostly from Malaysia, partly because it is featured on many product pages providing definitions of malaysian words - demonstrating the surprising reach and potential in digitising items and then making them freely available online, reaching out to a worldwide audience far beyond the geographical local of the library itself. Wonderful.

This left me pondering... what are the other most downloaded items at major institutions in the UK? So I sent out some feelers, and here are the results, demonstrating both the hidden complexity of the question, and the relationship of digitised heritage content to the current online audience landscape.

At Cambridge University Library, the most accessed collection overall is the Newton Papers, which was the first major digitised collection launched by the Library in 2010, and promoted widely. Within that, there is one particular notebook (which Newton acquired while he was an undergraduate at Trinity College and used from about 1661 to 1665 for his lecture notes) which is the most popular, featuring heavily in the initial promotion of the collection, and also in an In Our Time special series hosted my Melvyn Bragg on Radio 4.  But within that notebook there is one page that is accessed more than the others, with most of the traffic coming from Greece. Why? This page was picked up in the Greek press and pointed to on many websites, blogs, newspaper reports, and in social media as evidence that Newton knew Greek. The links that remain still direct thousands of users to view Newton's jottings from his Greek lessons at the front of the book, showing the fascinating relationship between publicity, social media, linkage, and an item which reflects national pride, to a worldwide audience.

The most downloaded items at Cambridge also reflect the rapidly changing mentions of items on social media: in April 2014, an item downloaded/accessed more than 6000 times was the Breviary of Marie de Saint Pol, which went live this month. Why the sudden notice? On the 3rd of April, one of the Cambridge colleges with thousands of followers posted a link to it on Facebook followed by the Cambridge Digital Library Facebook and Twitter feed on the 4th of April. Retweeted a few times, these few postings led to the thousands of views of the document, demonstrating the growing importance of using social media to tell people about newly mounted digitised content.

Over at Trinity College Library, the most accessed item from their digital collection in general is the Book of Kells,  which again was their first major digitised item, heavily promoted in the press, and attracting a level of viewing that is unique due to general tourism and cultural heritage interest. The second most accessed digitised item is the surprise: a book of Lute music by William Ballet, from the 17th Century. There is much discussion of this item, and links to it online, posted by online communities of lute players, and those who blog about lutes worldwide. Interest and demand in at item can therefore be encouraged if interested online communities hear about it, and share with their membership.

A similar tale about the importance of publicity and social media emerges from the British Museum. There are popular items about the Viking exhibition which are linked from their home page at the moment given the current exhibition, but since the 1st January 2014 til now, the most popular item accessed in the digital collection (no, wait, go on, guess.... Rosetta stone? Vindolanda Tablets? ...) is the Landscape Alphabet by Joseph Hulmandell (no? me neither). These were discovered and shared on social media by type enthusiasts on twitter  in mid February, and promoted by the cool-hunter the Laughing Squid who has almost half a million followers on twitter, which caused a sudden spike (I cant see the British Museum actually tweeting them out themselves on their timeline).  However, the initial swell of tens of thousands of hits has since dwindled to nothing, showing the fickleness of attention that comes with the social media stream. In 2013, the most single viewed item at the British Museum was... (go on, guess!)... a lead sling bullet, viewed 42,156 times in total. Why? It was picked up on reddit, due to the sarcastic inscription "some ancient sling bullets excavated from the city of Athens, Greece were inscribed with the word "ΔΕΞΑΙ" (dexai), which translates to "catch!"" which generated a lot of online LOLs ("Halt gentlemen. Do not yet partake of the feast before us, for I must capture the image of it with instagram whereupon I shalt bequeath it to my herald upon Facebook for all to see." here) and this encouraged  - and still encourages - visitors to the British Museum website: some forms of posting on social media generate the long tail of usage more than others.

Things start to get more complicated when various digital asset management systems (DAMS) come into place - often institutions have more than one database of digitised content, from different suppliers, with different licensing restrictions and requirements, and so ascertaining the most viewed single item is not a simple question. Organisations also post and share content in various different places. The National Library of Wales are looking through their DAMS to see which items are the most accessed, but immediately know that the most popular item they hold that has been posted to Flickr (with no known copyright restrictions, contributed to Flickr Commons) is the photograph at the top of this post, Dog with a Pipe in its Mouth, from the P. B. Abery Collection. Again, this is an image which has been mentioned regularly on blogs, social media, and internet chats, as well as being a featured image on the 2013 anniversary of Flickr Commons: the fact that it has no copyright restrictions encourages its reuse - and therefore traffic towards its host institution's site, if those users point back to it - online.

The libraries at Oxford University, including the Bodleian, have been digitising items for over twenty years, and so it is difficult to say what the most accessed or popular items are, due to the way the systems have been designed, implemented and integrated over the past two decades. Their most downloaded or accessed digitised book, scanned in collaboration with Google, is probably the "History of the Scott Monument, to which is prefixed a biographical sketch of Sir Walter Scott" by James Colston (published 1881) - a freely downloadable version is available from its library record (ignore the resellers offering printed versions generated from this for much cost on amazon and eBay!). As far as images are concerned, the most popular at Oxford are among those listed on Early Manuscripts at Oxford University, partly because many of them have been up continuously for twenty years (legacy data for the history of downloads of specific images are not available, indicating how difficult it is to access long term data about this. Server logs get very big very quickly and so are generally periodically discarded, and it is only recently that reporting facilities such as Google Analytics have allowed a quick and easy overview of the usage of websites). Currently popular digitisation projects at the University of Oxford Libraries are the Polonsky Foundation Digitization Project, and the recently launched digitized First Folio of Shakespeare's works, but there isn't sufficient data available from all the digital collections to be able to say one way or the other which is the one most popular project, never mind item. It was also pointed out, though, that you would probably struggle just as much (if not more so) to identify which has been the most requested book in the Bodleian's collections!

This trend of databases complicating the question continues at the British Library, where their digitisation outputs and projects are made available via multiple platforms and viewers, some managed by the British Library, and others by commercial partners, with some content available for free, other content via subscription, or paying a fee per image. These are only some of the most popular different sites: https://imagesonline.bl.uk, http://www.bl.uk/treasures/treasuresinfull.html, http://www.bl.uk/manuscripts/, www.sounds.bl.uk, https://www.flickr.com/photos/britishlibrary/, http://www.britishnewspaperarchive.co.uk/, http://find.galegroup.com/bncn/, http://gdc.gale.com/products/17th-and-18th-century-burney-collection-newspapers/ and the BL module on http://www.biblioboard.com/libraries.html. In addition, there are BL digitisation partnerships with other content providers, for example http://idp.bl.uk/ and http://eap.bl.uk/. Finding out the most accessed digitised item from within this is tricky (but not impossible - they tell me they are looking into it). The fact that they cannot say immediately demonstrates the complexity of running many large databases of digitised content.

These results, from very different institutions, invite discussions on shallow versus deep engagement with digital collections. Some examples of commonly accessed material are what we would think of as part of the Canon of Digitised Content: Shakespeare, Newton, Medieval Manuscripts. Some examples of commonly accessed material here can be taken as little more than clickbait - LOL! History! - or free reference material - its a free Malaysian Dictionary! Bonus! - but is getting people through the virtual door to digitised collections in this way, and through these items, such a bad thing? Come for the Dog with the pipe in its mouth! stay for the genealogy, then the discussions on palaeographic method! One can also argue that some of the discussion surrounding these objects are exactly what we are trying to encourage - many of the hundreds of comments posted on the Reddit item about the British Museum sling shot bullet, although hilarious, show consideration of what it would mean to be human in the time of Ancient Greece, and relate their societal response to ours. Isn't that the starting place (and in some cases, the ending place) of engagement with primary historical evidence? 

Asking to see Digitisation's most wanted opens up wider questions of public engagement, the impact of social networks on internet traffic to digitised collections (from highlights posted by the institution, to those identified and shared by others outside it, often quite unexpectedly), and the role of making images of primary historical sources open for others to discover, use and share. We also become aware of the complex and intertwined database systems which are in place in many large organisations undertaking digitisation and delivering digitised items to users, and the difficulties in reporting on individual items (be they physical or digital!) as a result. Digitisation's most wanted is also a rapidly moving target, dependent on publicity, and changing interest and focus over time: social media can encourage large swings and changes in popular items very quickly. The act of posing this question has led to an interesting discussion on how we think about use of digitised content, and how we can build up evidence about usage. (I'd also like to thank the organisations listed above for responding to my query so promptly!)

Have you, or any organisation you work with, been affected by the discussion in this blog post? Do you have any evidence you can contribute to the investigation? Your help is needed to catch digitisation's most wanted. Please do post your comments about your experiences below (comments are moderated so may take a few hours to appear), or email m dot terras at ucl.ac.uk for them to be integrated here. The internet is a place of busy traffic. Someone must have seen them...

Update 15/05/14: The British Library's Endangered Archives' most popular item is the St Helena Banns of Marriage, an item commonly pointed to on genealogy websites such as this and this.

Update 16/05/14:
-The National Library of Australia have a discussion of their 25 most viewed digitised newspapers, and why, here.
- The International Dunhuang Project at the British Library tell me that a redevelopment of their database and website is underway to improve reporting for them, their partners and users.
- Glasgow University Library Special Collections tell me that their most popular item is the Curious Case of Mary Toft, from 1726, who supposedly gave birth to a litter of rabbits.  This was featured as a book of the month in 2009, but picked up by the social media site Mental Floss in January 2014, with that page being shared on facebook more than 4000 times, and garnering 30,000 hits in one day alone, and has since been posted on various other social media platforms, including Reddit.  Glasgow also say that there is a difficulty in measuring access counts as the content is held on various different servers, and it can be difficult to interpret Google Analytics in this case. They also point out that, from their perspective, there is a lack of benchmarks to compare usage of their items to that of other special collections.
- The National Archives tell me they point to the popular items as part of their navigation and as a result, these "most popular items" remain the most popular, in a virtuous circle. A very popular item at the moment is the The Security Service: Personal (PF Series) Files KV2 which hosts the records of spies such as Mata Hari. These were embargoed until Thursday 10 April 2014, then launched with an accompanying press release, which garnered significant press coverage worldwide, driving traffic to the site. The only frequently accessed item which is not in these lists is the muster roll of HMS Victory for the Battle of Trafalgar, which is commonly referred to in military and naval history websites (although interestingly few people link through directly to the page where it can be downloaded from, so those who read about it must come to TNA's website and search themselves).

Update 19/05/14
- The Estonian Folklore Archives at the Estonian Literary Museum tell me that their most popular item is a leaflet from 1937 on how to preserve sealskins, although I can see no other webpages pointing to this item (perhaps because my Estonian search skills are weak!).
- UCLA Digital Library tell me their most viewed item is a Lyrical Map of the Concept of Los Angeles,  a 23-foot long hand-drawn and hand-lettered map of Los Angeles, using the words and images of dozens of L.A. authors, which was on display in a museum in 2011, and was featured widely on blogs  both at the time of the exhibit and since, which points people to the digital version now the display is no longer live in the museum space. Another popular item is the complete set of the 1582 Corpus Juris Canonici, the "Body of Canon Law," particularly the table of contents, which is commonly linked to from those interested in Canon Law, such as this, thus driving subject specialists to the site.
- The History of Computing in Learning and Education Virtual Museum tells me the most viewed items are the writing competition and Historic Newsletters from the People's Computer Company.
-  A Hack day carried out at the Zurich Hackathon 2014 looked at image analytics from the US National Archives and Record Administrations contributions to flickr commons, looking at 200 million hits in a 3 month period and identifying the most common images: a description of that hack is here, which also gives examples of the most commonly looked at images. "There is a spike on March 24. Further analysis shows that the biggest referral on that day is Dorothy Height. Turns out this lady was featured on a Google Doodle on that day." Popular subjects (and referrer pages, generally from Wikipedia) were John F. Kennedy, World War II, Japanese American Internment, Vietnam War. A full list is available on the project page. This shows the importance of institutions linking their content from Wikipedia, and what can happen if you are featured by Google.
- There is also a useful tool in BaGLAMA which shows view counts for pages using Commons images in GLAM-related category trees.

Update 20/05/14
- The Bodleian also make the very good point that "With most browsers now defaulting to 'do not track' combined with the EU cookies legislation it is difficult to find any sort of data that one can 'stand behind' these days."
- The Jüdischen Museums Berlin's most accessed items are the Sammeldatensatz: Orden, Ehrenzeichen und Embleme von Julius Fliess (1876-1955), but they say that most accesses come from searches for "jewish emblems", and so there is a need to add emblem as synonym for symbol to thesaurus, to help users find what they are looking for. In this way, looking at search terms can help develop user paths through the system so they can find what they actually want.
- The University of Iowa Digital Libraries say that based on google analytics for the last year, the most popular item is a dada book, and the most popular collection is Iowa Maps, but the access numbers for different objects in the database themselves are hard to count, and they'll get back to me on that. Based on recent web searches reported from the web master, a surprisingly high number of people find them via searches for Peter Rabbit: the digital book of which is linked through to their site from the Wikipedia page and various other websites featuring Peter Rabbit.
- The National Library of Wales tell me the most popular article on http://welshnewspapers.llgc.org.uk is a 1916 Cambria Daily Leader advert for 'blouses' and 'hosiery'. To find out more about why may take some digging, though!
- Hamlet Depot and Museums tell me that their most popular items are genealogical records, including railroad employees lists, and seniority records, and also historic pictures.

Update 22/05/14
- The New Zealand Electronic Text Collection tell me that reference works are their most used, including A Grammar and Dictionary of the Samoan Language, with English and Samoan vocabulary (which is linked to from thousands of different sources about New Zealand culture, and discussions on translation), New Zealand in the First World War (which is linked to from various history and genealogy sites) and The Official History of New Zealand in the Second World War (which is also popularly linked to online, including in reminiscing personal postings from soldiers who served, talking about the war on social media).
- The University of Otago Library provided me with a very detailed overview of the issues they face (thanks!). They are in the process of developing a repository to manage all of their digital collections that they want to curate, and the pilot will be live by November, but for the moment, they have a variety of different sites on which you can see digitised material, showing again the complex relationship of databases and content which many institutions have. For example, they have OUR Heritage which is a window across some collections.  Some records are pulled from OUR Heritage and displayed via Special Collections Online Exhibitions. There also is Hocken Collections who had their reader access collection digitised and made available online. They track this via Google Analytics, and also watching their own server stats: and these do not in any way match up. Google does not capture when someone goes directly to a file, so Analytics reports just a fraction of the over a million hits in the past year that they can track on their server. They digitise on request, and respond to community demand, and are trying to prioritise the digitisation process. From Google Analytics, the most heavily used collections are the History of the University and Botanical charts (which belong to the Department of Botany at Otago and some are still used in the Labs.  They digitised these, provided a copy for their use and deposited the originals in Hocken Collections.) The most popular items are “Key plan to Mr G.B. Shaw’s picture of Dunedin in 1851” which is mentioned on various genealogical sites online:  a Painting “Sangro, a rosary of olive trees, landscape of windswept manuka.” which appears linked from some other major federated collections online and a printed map of Rome “Mappa della campagna Romana del 1547” which is a commonly consulted map (there are various copies of it in libraries worldwide) so those searching online to see it must find the freely available copy here.