Thursday, 27 February 2014

Making it Free, Making it Open - Transcribe Bentham, publications, and unexpected benefits

A few years ago I made a commitment to Open Access - in an attempt to reach a wider audience for my academic work, and to tell people about research as it was happening (not three of four years later once it was locked behind a paywalled journal). I'm really pleased to have something new to talk about once again, and this time I can share it with you before it even comes out in print. Allied to this are a few spin offs from the project in question - Transcribe Bentham, which aims to make the work of the the philosopher and reformer, Jeremy Bentham (1748 – 1832) available via a
 double award-winning collaborative transcription initiative, which is digitising and making available digital images of Bentham’s unpublished manuscripts through a platform known as the ‘Transcription Desk‘. There, you can access the material and—just as importantly—transcribe the material, to help the work of UCL’s Bentham Project, and further improve access to, and searchability of, this enormously important collection of historical and philosophical material. [Link]
First, the article: a pre-publication version which will be published in April in a special issue of the International Journal of Humanities and Arts Computing, from Edinburgh University Press. In it, Tim Causer and myself talk about crowdsourcing transcriptions of Bentham's writings, the impact of Transcribe Bentham on the work of the Bentham Project, and the use of volunteers to help us with tasks traditionally associated with lone academic researchers. We give particular examples of new Bentham material transcribed by volunteers dealing with the subjects of political economy, animal welfare, and convict transportation and the history of early New South Wales, which has further clarified and widened our understanding of certain aspects of Bentham’s thought. You can go and get it here:
 Causer, T. and Terras, M. M. (2014) "Crowdsourcing Bentham: beyond the traditional boundaries of academic history". International Journal of Humanities and Arts Computing, 8 (1) (In press). Link to PDF version in UCL Repository.
I'm pleased it is up there quickly, and openly, and free for all to see. Its one of the aims of the Transcribe Bentham project, of which I am only a small cog, to make Bentham's writings more well known, accessible, and searchable, over the long term. Allied to that is the ethos in involving a wider group of society in contributing to the project - this is about "co-creation" (as it gets called in Gallery, Library, Archive, and Museum (GLAM) circles) rather than academic broadcast. It would make no sense for us to take the product of something developed in online crowdsourcing, and lock it back in the academic ivory tower, given we asked for help to understand and find the material in the first place. We're finding our way with how to credit transcribers along the way (some of them are named in the article above, and we did ask their permission to do so) and to carry out crowdsourcing in as ethical a way as possible (something which is also of concern to others figuring out crowdsourcing in GLAM as we go). All in all, open access here is part of the Transcribe Bentham product: make it free, make it open.

And future doors line up ahead of us to walk through. This week we hit over 7000 manuscripts transcribed via the Transcription Desk, and a few months ago we passed the 3 million words of transcribed material mark. So we now have a body of digital material with which to work, and make available, and to a certain extent play with. We're pursuing various research aims here - from both a Digital Humanities side, and a Bentham studies side, and a Library side, and  Publishing side. We're working on making canonical versions of all images and transcribed texts available online.  Students in UCL Centre for Publishing are (quite literally) cooking up plans from what has been found in the previously untranscribed Bentham material, unearthed via Transcribe Bentham. What else can we do with this material?

And other doors open. I've talked before about reuse of the code behind Transcribe Bentham - in use by the Public Record Office of Victoria, and parts of it (the Transcription Desk bar, since you ask) has since been used in the Letters of 1916 transcription project, too. We're also in talks with other collections who are thinking of doing crowdsourcing, and who may use the Transcription Desk: watch this space. Again, this is part of the same trajectory: make it free, make it available.

And other doors open. The development of systems to read handwritten material (more advanced than Optical Character Recognition, which to date really only has success on printed, clean material) depends on having datasets of images of handwritten texts, plus checked validated transcripts of their content in a useful format, to train and test systems and algorithms. Transcribe Bentham is pleased to be part of the Transcriptorium project (as am I!), looking into Handwritten Text Recognition (HTR) technologies, and a set of 433 pages of Bentham's manuscripts plus the crowdsourced transcriptions are this year making up the "ICFHR 2014 Handwritten Text Recognition on the tranScriptorium Dataset" - to evaluate and test the current algorithms on Handwritten Text Recognition. How great is that. Did any of us sitting round the table first discussing crowdsourcing and Bentham back in 2009 ever expect we (and our transcribers) would be creating a benchmarked dataset in which to train handwriting recognition technologies? No. It is wonderful.

Create. Involve. Research. Make it available. Some of this by planning, some of this by happy accident. I now see the Open Access ethos underpinning all of this, and driving forward the direction of my research into the use of computing in culture, heritage, and the humanites. So, enjoy the article. We have access to and did and found out some cool stuff, you know - and we made it freely available. 



No comments:

Post a Comment