查看原文
其他

千载难逢的好机会!大英图书馆和志奋领奖学金联合打造,敦煌写本识别技术研究项目奖研金招募!

大英图书馆 零壹Lab 2022-10-08

Chevening British Library Fellowship working with Chinese historical texts


Chevening is the UK government’s international awards programme aimed at developing global leaders. In 2015, the Foreign and Commonwealth Office (FCO) has partnered with the British Library to offer professionals two new fellowships every year. These fellowships are unique opportunities for one-year placements at the Library, working with exceptional collections under the Library’s custodianship. Past and present Chevening Fellows at the Library have focused on geographically diverse collections, from Latin America through Africa to South Asia, with different themes such as Nationalism, Independence, and Partition in South Asia, 1900-1950 and Big Data and Libraries.


We are thrilled to announce that one of the two placements available for the 2020/2021 academic year will focus on automating the recognition of historical Chinese handwritten texts. This is a special opportunity to work in the Library’s Digital Scholarship Department, and engage with unique historical collections digitised as part of the International Dunhuang Project and the Lotus Sutra Manuscripts Digitisation Project. Focusing on material from Dunhuang (China), part of the Stein collection, this Fellowship will engage with new digital tools and techniques in order to explore possible solutions to automate the transcription of these handwritten texts.


Chinese Lotus Sutra scroll with Tibetan divination texts on the back (Shelfmark: Or.8210/S.155). Digitised as part of the Lotus Sutra Manuscripts Digitisation Project.


The context for this fellowship is the Library’s efforts towards making its collection items available in machine-readable format, to enable full-text search and analysis. The Library has been digitising its collections at scale for over two decades, with digitisation opening up access to diversely rich collections. However, it’s important for us to further support discovery and digital research by unlocking the huge potential in automatically transcribing our collections. Until recently, Western language print collections have been the main focus, especially newspaper collections. A flagship collaboration with the Alan Turing Institute, a project called “Living with Machines,” is underway to apply Optical Character Recognition (OCR) to UK newspapers, design and implement new methods in data science and artificial intelligence, and analyse these materials at scale.


Taking a broader perspective on Library collections, we have started to explore opportunities with non-Latin collections too. Members of the Digital Scholarship team are engaging closely with the exploration of OCR and Handwritten Text Recognition (HTR) systems for Bangla and Arabic. Digital Curators Tom Derrick, Nora McGregor and Adi Keinan-Schoonbaert have teamed up with PRImA Research Lab and the Alan Turing Institute to ran four competitions in 2017-2019, inviting providers of text recognition methods to try them out on our historical material. Another initiative which Tom is engaged with is exploring Transkribus for Bengali printed texts. He trained Transkribus’ HTR+ recognition engine, which ended up transcribing this material at 94% character accuracy! Tom and Adi’s recent blog post in EuropeanaTech Insight (issue on OCR) summarises these initiatives.


Regions and text lines demarcated as ground truth for RASM2019 ICDAR2019 Competition on Recognition of Historical Arabic Scientific Manuscripts (Shelfmark: Add MS 7474). Digitised and available on Qatar Digital Library.


The Chevening Fellow will contribute to our efforts to identify OCR/HTR systems that can tackle digitised historical collections. They will explore the current landscape of Chinese handwritten text recognition, look into methods, challenges, tools and software, use them to test our material, and demonstrate digital research opportunities arising from the availability of these texts in machine-readable format.


This fellowship programme will start in September 2020 for a 12-month period of project-based activity at the British Library. The successful candidate will receive support and supervision from Library staff, and will benefit from professional development opportunities, networking and stakeholder engagement, gaining access to a range of organisational training and development opportunities (such as the Digital Scholarship Training Programme), as well as staff-level access to unique British Library collections and research resources.


For more information and to apply, please visit the Chevening British Library Fellowship page: https://www.chevening.org/fellowship/british-library/, and the “Automating the recognition of historical Chinese handwritten texts” Fellow page: https://www.chevening.org/fellowship/british-library-chinese-handwritten-texts/.


Applications close at 12pm (GMT), 5 November 2019. Good luck!


Original Ad


Automating the recognition of historical Chinese handwritten texts


Hosted by the British Library



This fellowship sits within the British Library’s Digital Scholarship Department. It will engage with new digital tools and techniques in order to explore possible solutions to automate the transcription of historical Chinese handwritten texts. The fellowship will focus on material from Dunhuang (China), part of the Stein collection, which is been digitised through the Lotus Sutra Manuscripts Digitisation Project as part of the digitisation activities conducted by the British Library to make the collections under its custodianship accessible to all. The digitised content will be accessible through the International Dunhuang Project (IDP) platform.


The Stein Collection


The British Library’s Stein collection, gathered by Aurel Stein in the early 20th century, is one of the most outstanding collections of manuscripts and printed books from China and Central Asia. It is of immense historical and cultural significance, containing over 45,000 items written on paper, wood and other materials in many languages, such as Chinese, Tibetan, Sanskrit, Tangut, Khotanese, Kuchean, Sogdian, Uighur, Turkic and Mongolian. It notably holds some of the most important surviving Buddhist texts, such as the famous printed copy of the Diamond Sutra from the Dunhuang Library Cave dated to 868 AD.


The International Dunhuang Project


Established by the British Library in 1994, the International Dunhuang Project is an international collaborative programme including institutions from Europe, Asia and the US holding collections related to Dunhuang and other Silk Road sites. All partners aim to conserve, catalogue and digitise manuscripts, printed texts, paintings, textiles and artefacts under their custodianship and make them freely available online on a web platform. As part of this effort, and thanks to the generous support of a number of institutions and foundations, a large number of manuscripts from the Stein collection have been digitised and images have been made available on the IDP website (over 170,000 to date).


Project Scope and Objectives


Building upon this vast and well-curated digitised resource, the Library’s Digital Scholarship Department aims to promote the collection, enhance its searchability, and actively engage with innovative research using its data, through methods such as text mining and data visualisations. As part of this work, members of the Digital Scholarship team are engaging closely with the development of Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR) systems for non-Western scripts.


The Chevening Fellow will contribute to these efforts. They will research the current landscape of Chinese handwritten text recognition – looking into methods, challenges, tools and software. They will test our material with existing tools and demonstrate digital research opportunities arising from the availability of texts in machine-readable format.


The Library’s ongoing Lotus Sutra Manuscripts Digitisation Project aims to conserve, catalogue and digitise nearly 800 Lotus Sutra manuscripts from Dunhuang in the Chinese language. This corpus of texts constitutes an ideal test case: not only because the Lotus Sutra is one of the main Buddhist scriptures and the canonical edition has already been transcribed, but also because the manuscripts present minor variations, such as variant characters, handwriting and scribal errors. The fellow could therefore use the project’s digitised content as a starting point to examine approaches, opportunities and possible solutions to automate the transcription of our Chinese historical collections.


Key Responsibilities


  • To develop an in-depth understanding of the content digitised as part of the Lotus Sutra Manuscripts Digitisation Project

  • To research existing digitised materials available on the IDP website and identify different scripts and challenges for text recognition tools

  • To identify key stakeholders and research existing market solutions, tools and methods for Chinese OCR/HTR

  • To train text recognition systems with IDP materials, evaluate and compare results

  • In collaboration with the relevant British Library colleagues, to increase awareness of the Stein and other Central Asian collections at the British Library and other digitised content available on the IDP platform, and to promote their research potential when in machine-readable format, e.g. text mining and data visualisation

  • To develop the Library’s engagement in a global network working with Chinese OCR/HTR systems and foster relationships with Chinese Digital Humanities research communities, which could form the basis for future partnerships


Deliverables


  • Creating or joining a network of scholars and professionals exploring OCR/HTR solutions for historical Chinese documents

  • A recommended platform, software or tool for the Library to work with using digitised materials available on the IDP platform

  • A report on the types of texts, scripts and potential challenges that OCR/HTR tools may face with digitised collection items available on the IDP website, including an overview of tested systems and outcomes

  • A suggested operational workflow to produce, proof read, correct and feed transcriptions back into Library strategic systems

  • Promoting the project internally and externally, including posts on the British Library’s Digital Scholarship, Asian and African Collections and IDP blogs, using other British Library social media platforms, and giving a talk for Library staff members about the project, its aims and outcomes

  • Contributing to the Library’s 2021 workshop/conference concluding the Lotus Sutra Manuscripts Digitisation Project

  • Sharing experience and lessons learnt, and participating in other related activities of the Digital Scholarship Department


Candidate Requirements


  • Degree in a relevant subject e.g. digital humanities, computer science and/or cultural history

  • Knowledge of Chinese language, ideally with the ability to read/recognise several variants of historical Chinese scripts and calligraphic styles

  • Excellent written and spoken English

  • Familiarity with OCR/HTR systems

  • Demonstrable knowledge of tools and methods useful for digital humanities research e.g. text and data mining, name entity recognition, data modelling and linking, data visualisation

  • Interest in archival material, library collections and digitisation

  • Excellent writing skills and experience of networking and partnership building


Attention

Individuals must be resident in their home country at the time of making their application. Applicants from Mainland China will be eligible.



项目信息和申请注册请戳“阅读原文”



END

主编 / 徐力恒

责编 / 李瑞芳

美编 / 李瑞芳


您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存