Working with Data using OpenRefine
Over the last couple of years, the British Library have been running a set of internal courses on digital skills for librarians. As part of this programme I’ve delivered a course called “Working with...
View ArticleUsing OpenRefine to manipulate HTML
Jon Udell wrote a post yesterday “Where’s the IFTTT for repetitive manual text transformation?“. In the post Jon describes how he wanted to update some links on one of his web pages and documents the...
View ArticleTalking about Tools
This week I attended a THATCamp organised by the British Library Labs . THATCamp is a series of unconferences focussing on the Humanities and Technology. I’ve been thinking a lot about the tools...
View ArticleAdoption and Adaptation: Making Technology work for us
The following is reasonably close to a transcription of my keynote at the JIBS meeting titled “Technology will not defeat us: offering a great service in difficult times” on 26th February 2015. The...
View ArticleWhat it means to be Open
I originally started to write this post in reaction to a thread on the BIBFRAME email list in March 2015 entitled “Linked Data”. In reaction to this thread I wanted to write something on what I saw as...
View ArticleUsing Google Sheets with ESTC
I’ve done a lot of work with early English texts (and especially the related medata) over the past few years as part of my work on the Jisc ‘Historical Texts‘ platform which brings together texts from...
View ArticleA worked example of fixing problem MARC data: Part 1 – The Problem
In what will eventually be a series of 5 posts (I think) I’m going to walk through a real life example of some problematic MARC records I’ve been working with using a combination of three tools (the...
View ArticleA worked example of fixing problem MARC data: Part 2 – Text editor
This is the second post in a series of 5. As described in Part 1 – I had 50K MARC records in a file, but due to the use of incorrect ‘delimiter’ and ‘record terminator’ characters the file wasn’t...
View ArticleA worked example of fixing problem MARC data: Part 3 – MarcEdit
This is the third post in a series of 5. In Part 2 I describe how I used a text editor to get a malformed file to the point where it could be read as a MARC file by MarcEdit. I knew that there would...
View ArticleA worked example of fixing problem MARC data: Part 4 – OpenRefine
This is the fourth post in a series of 5. In Part 3 I describe how converted the MARC records into the ‘mnemonic’ format using MarcEdit, and also created a list of issues with the file using the...
View ArticleA worked example of fixing problem MARC data: Part 5 – OpenRefine and...
This is the fifth and last post in a series of 5. In Part 4 I described how I used OpenRefine to fix issues with MARC records. In this fifth and final blog post in this series I’m going to cover...
View ArticleIntroduction to APIs using IIIF
This “Introduction to APIs” was developed by Owen Stephens (owen@ostephens.com) on behalf of the British Library. This work is licensed under a Creative Commons Attribution 4.0 International License...
View ArticleKeeping up to date with ejournal collections using KB+ and IFTTT
KnowledgebasePlus (KB+) is a Jisc service (which I work on) which helps (UK HE) libraries manage their e-resources more efficiently by providing accurate publication, subscription, licence and...
View ArticleWriting an extension to add new GREL functions to OpenRefine
I’ve been an enthusiastic user of OpenRefine for a long time and think it is a great tool. However, I sometimes come across things that it doesn’t do, or doesn’t do easily, and end up wishing someone...
View ArticleAdvent of Code with OpenRefine
In case you’ve not come across it… Advent of code is: … an Advent calendar of small programming puzzles for a variety of skill sets and skill levels that can be solved in any programming language you...
View ArticleAdvent of Code with OpenRefine Day 2
The day 2 challenge led me to some manipulation of the cross function in GREL, and a horrible hack to add up all the numbers in a column (really not recommended in real life! If only I’d got around to...
View ArticleAdvent of Code with OpenRefine Day 3
The way I solved this puzzle has a nice use of value.split(//) to get an array of letters from a string, as well as: splitByLengths() length() filter() unicode() with() and() That’s quite a list of...
View ArticleAdvent of Code with OpenRefine Day 4
So this puzzle was definitely more straightforward from an OpenRefine perspective. Once I’d got the logic worked out, it all just needed to be applied to each row in turn, and using one of my...
View ArticleAdvent of Code with OpenRefine Day 5
This was a fun one to do and it took me a while to work out how to approach it, but once I’d realised I could generate an OpenRefine operation history in JSON from the provided instructions and apply...
View ArticleAdvent of Code with OpenRefine Day 6
I think this might have been the quickest so far, although I’m not sure it’s a particular efficient solve and the GREL was a bit ugly, but it worked immediately.
View Article