New Explore Function!

Do you like colourful bubble charts? Then you’re in luck, because we are introducing a new feature to our site that involves exactly that! The feature will be rolled out in phases, and new aspects will be added to it in phases, but you can already use it now.

Introducing: our keyword function!

Our keyword function displays in a bubble chart which words in one part of Hansard compared to another part of Hansard. This is modelled on the keyword function often used in corpus linguistics, where keywords are often used in corpus linguistics to answer questions like: “What is the body of text mainly about?”, “What are the major themes or concerns of the text?” and “What makes a text distinct from another text?”. Keywords computed with this function are those words which occur relatively more frequently in one corpus compared to another corpus. To do this, we compare the corpus we are interested in knowing keywords for (the ‘target corpus’) with a ‘comparison’ corpus. The algorithm behind the comparison then calculates which words are key by carrying out a statistical test called ‘Log-likelihood’. The higher the log-likelihood score, the more confident we can be that the frequency difference between the two corpora is not just by chance. The keyword function on our site only shows the keywords of which we are 99.99% certain this difference is not by chance. The words you will find in the bubble chart are thus those that we are very certain that the difference in word frequency between the target and comparison corpora is is statistically significant.

You can either self-define the two sections of Hansard you want to compare or you can select pre-programmed time periods (such as decades and various wars). One of the ways in which we are planning to develop the keyword function is to add to the various pre-programmed sections of Hansard.

From your bubble chart you can select up to four items to search for them within your chosen target corpus. The four terms will be shown in concordance lines just as everywhere else on the site.

If you have any questions about this feature, please get in touch! We would love to explain more!

 

An example of a linguistic study: Labour-relations in Hansard

The rich Hansard corpus of over 18 million words cries out for analysis. While it’s not exactly verbatim, the corpus satisfactorily represents how MPs talk about the topics of parliamentary debates, even with the repetitions and pauses edited out. A rich and accurate resource like this can provide answers to a myriad of questions. For our Hansard at Huddersfield website, we are exploring what kinds of questions can be answered using the simplified corpus linguistic methodology that will be incorporated. We thought that examples of linguistic research using this methodology might spark ideas about the questions you may be able to answer using our Hansard website.

One incentive driving linguistic researchers to use corpus linguistic methodology is that the software makes it easy to identify the degree of interest MPs will have had in a particular concern over an extended period of time. The results of such a query can then be cross-referenced with historical events to understand why MPs were so keen (or not) to discuss that topic during that period.

Three linguistic researchers, Jane Demmen, Lesley Jeffries and Brian Walker, tried to answer this type of question in a recent study. They used corpus linguistic methodology to investigate ‘labour-relations’ in the House of Commons debates over the 19th and 20th centuries. ‘Labour-relations’ refers to “the notion that both employers and workers have rights and responsibilities which can be negotiated using the collective bargaining power of trade unions” (Demmen et al. 2018). They realised that if they just searched for terms they thought might be relevant to labour-relations, they would not capture as many terms as they would using automated semantic searches to find words associated with labour relations. Figure 1 below shows the extent to which they found parliament was discussing labour relations during the 19th and 20th century:


Figure 1 – Frequency of results concerning labour relations in House of Commons debates 1803-2005 (per million words).

Let us take the first half of the 19th century as an example. As can be gauged from the graph, there was very little discussion of labour relations during that period, with the exception of a small spike in the 1830s. When looking at historical events at that time, the researchers found this spike reflected two events: 1. Labour relations started to be debated after setting up trade unions was no longer deemed illegal, and 2. A specific debate about whether to send a group of agricultural workers to Australia after they had formed a kind of trade union to object to decreasing wages in their sector. Putting their initial corpus findings in this historical context helped them understand the way in which labour relations were discussed in that period.

Another, related, question they wanted to answer had to do with the different aspects of labour relations in parliamentary discourse. In particular, they wanted to know what importance they placed on each of the different aspects. They found that, throughout all the years represented in the Hansard corpus, parliamentarians talk most about trade unions as organisations, secondly about potentially disruptive actions like strikes, and only thirdly about the people involved in trade unions and in strikes. While they did not clearly state the consequences of these results, organisations such as trade unions, political parties, local authorities, campaigning and lobbying groups, think tanks and journalists might want to use information like this to support their work.

It is not difficult to see how corpus linguistic methodology might be useful for all kinds of interest groups. Our simplification of these methods will hopefully help readers understand that so much more can be done with databases of text than just searching for your own search terms. Future blogs will show more examples to help you start thinking about how you might use our Hansard tool.

Want to read more about the study Jane Demmen, Lesley Jeffries and Brian Walker undertook? This is its reference:

Demmen, J., Jeffries, L., & Walker, B. (2018). Charting the semantics of labour relations in House of Commons debates spanning two hundred years: A study of parliamentary language using corpus linguistic methods and automated semantic tagging. In M. Kranert, & G. Horan (Eds.), Doing Politics: Discursivity, Performativity and Mediation in Political Discourse (pp. 81-104). Amsterdam: John Benjamins Publishing.

[Figure 1 has been reprinted from Charting the semantics of labour relations in House of Commons debates spanning two hundred years: A study of parliamentary language using corpus linguistic methods and automated semantic tagging (p. 92). By Demmen, J., Jeffries, L., & Walker, B. In M. Kranert, & G. Horan (Eds.), Doing Politics: Discursivity, Performativity and Mediation in Political Discourse (pp. 81-104). Amsterdam: John Benjamins Publishing. Copyright (2018) by John Benjamins Publishing. Used with permission.]

 

How do we aim to address the needs of our end-users?: Incorporation of Corpus Linguistic methodology

All the debates in Hansard together can be called a ‘corpus’. A corpus (plural: corpora) is a ‘body’ – not a physical one, but a figurative body of data defined by features such as size, language, content and period of time. The Hansard corpus as it stands contains all English-language parliamentary debates between 1803 and 2005 (with 2006-current to be added soon). Translated into numbers, that is well over a billion words.

18 million words should offer rich insights into parliamentary debates of our end-user’s concerns. For obvious reasons, however, manual searching of a corpus with that number of words is impossible. A simple search for a particular word throughout the whole corpus might furthermore not help answer complex questions you might want to see answered using the Hansard corpus. Yet linguistics researchers often manage to produce in-depth textual analyses of corpora like the Hansard one. In the case of such large datasets, they employ so-called ‘corpus linguistic software’ to generalise from, manipulate and statistically probe the text at hand. The identification of relevant textual features using this methodology often leads to further in-depth manual analysis of text in context, but does not have to.

You do not need to know in detail how the software works to understand what it can show about the language of a corpus. Corpus linguistic software allows its users to identify the most characteristic words in the whole or parts of the corpus even when not prompted by a specific word search. These are called ‘keywords’. It also allows for studying trends in the use of words or phrases over a particular period of time, which can then be compared to other words and phrases. Another function of corpus software is to show how often two or more words appear together. Known as ‘collocation’ this kind of search can tell us how a word is used in context, giving us more information on how its meaning might be changing in particularly sets of data. More complex features of corpus software can indicate the frequency of meaning (‘semantic’) patterns within the whole or a particular part of the corpus. This means that we can learn about recurring topics associated (or not) with the particular issue we are studying. For example, when studying Brexit debates, we might find a lot of talk about social welfare, but not about education.

Of course most users of Hansard are not corpus linguists. Many will never have heard about corpus linguistics and its software, let alone what benefit this kind of software will have on researching Hansard. Furthermore, the software is very complex and requires knowledge of language and statistics. It is not very usable by non-linguists or non-statisticians. To let those non-expert users benefit from the benefits of the software without having to struggle with language, Hansard at Huddersfield aims to produce a simplified and visualised version of several corpus software features. Watch this blog space to find out more about this simplification and its incorporated visualisations in the next few blogs!

How do we aim to address the needs of our end-users?: MP, parliamentary and archival data and sharability

Last week’s blog detailed the ways in which we engage with our end-users. This week we would like to explain how we aim to address some of their suggestions.

We love receiving suggestions from as many different end-users as possible. We hope that many of these suggestions will influence the final version of our Hansard interface. However, our project is finite in time and money and so we will be attempting to incorporate many, though not all, of the suggestions by the time of our launch in March 2019.

Rather than dwelling on these limitations, we like to focus on the feasibility of many of the suggestions we have received. And each of these suggestions will, once incorporated, significantly improve the number and range of questions relating to Hansard that can be easily answered.

Up to now, users have been able to search Hansard to answer a specific research question related to a particular period, MP, political party or concern. But we are aware that some users may just want to explore Hansard more generally, perhaps in order to start creating or refining the questions they want to ask. They may want to delimit the specific selection of data they want to search but using more generic identifiers than date and speaker. It is, for example, no use to search the whole of Hansard if you are specifically looking to find contributions from the Labour Party on a particular topic. Neither can an individual go looking for each Party members’ contribution individually.

One way in which we are trying to address this is by including archival information about MPs into the website, such as their gender, party affiliation and when they sat in parliament. This means users will be able to click on MP’s names in Hansard to find this information, and will allow them to search for contributions from, for example, female Conservative MPs in 1973.

Next to being able to search per party, we are looking to use archival data to establish constituency boundaries to allow searching for attitudes of MPs within particular constituencies. Paired with election results, this should make for an interesting exploration of what parties This all should help users to have a better understanding of the debates they are looking through without having to find this MP information on other websites.

Incorporating MP data will also allow for more concrete visualisations of Hansard search results. It means that for example the number of contributions per constituency can be counted too. Or what about it generating an outline of the clashes between political parties? These ideas are under examination at the moment, so no promises! But it will give you and idea of how we are thinking.

You might want to share the fascinating search results our interactive interface for Hansard will bring up in your own reports and publications. Rest assured, there will be an option to easily download the results in, hopefully, several different formats. Another feature should enable you to enrich your social media posts with our graphs and visualisations.

Perhaps you can think of other features we could incorporate. If you have ideas of how our interface might be of use to you, contact us on hansard@hud.ac.uk to inform us of your suggestions!

 

Feedback from our end-users

Of course, as linguists, we have our ideas about the value of Hansard. We regularly use academic research methods to study big textual datasets like Hansard. Simply expressed, we use software to look at complex patterning in the text, and obtain quantitative data about those patterns. Researchers then use relevant contextual information to interpret the findings and draw conclusions about the textual choices in the dataset. Underlying Hansard at Huddersfield’s goals is a conviction that such research methods could also benefit non-linguists. We have our own ideas about the benefits, but is this what actual non-linguist users might be looking for? That is why engagement with the potential end-users of our Hansard at Huddersfield interface is of utmost importance to us. We hope this blog will give you an insight in how we are trying to do that.

Who do we engage with? By publicising our project on social media, and by emailing organisations we think might benefit from using Hansard, we aim to converse with two kinds of end-users: those who already use Hansard, and those who are yet to discover its usefulness. For those already familiar with Hansard, we intend to increase the benefit of Hansard for their research. At the same time, we would love to present a tool that attracts those unaware about Hansard to make use of it. To accomplish either goal, we need to know what people would like to find out Hansard.

Social media, face-to-face meetings, questionnaires, Skype and telephone conversations have so far been means of interaction with end-users. We engage with some of them collectively during our end-user meetings, and others we ask individually. The most productive means by which we have gathered suggestions so far was our round of end-user meetings in July 2018, attended by 20 individuals, where we presented some ideas about how an adapted version of our linguistic methods might be relevant to their aims. Our end-users discussed how they would like to use the Hansard dataset and how they felt our Hansard interface might be able to help them.

It emerged from those discussions that contextual information for the dataset would be of importance for a maximally interpretable dataset. That information could then for example be used for comparing contributions by two particular MPs, or perhaps two parties.

As we anticipated, the raw quantitative data produced by linguistic software proved inaccessible to non-expert users, demanding too much knowledge of linguistics and statistics for easy interpretation. Solving that problem by presenting suitable, interactive visualisations of the data was welcomed. Because the visualisations change according to changes in the variables studied, they provide a wealth of information in an accessible way. Interactive visualisations promise to be less overwhelming, better interpretable, and of course more appealing.

While we are still looking for suggestions like the ones above, our next round of meetings and individual interactions with end-users will explore how well our end-users think our interface responds to what they want to do with Hansard, and whether they need help with understanding and interpreting the visualisations. We are happy for you to get in touch to help us out (hansard@hud.ac.uk)!

 

 

The Rationale for Hansard at Huddersfield

While some people have never have heard of Hansard, we think it one of the most important resources for the UK’s democracy to function well. Consider, for example, that most British citizens are unable to attend public debates on a regular basis. Yes, they can read newspaper reports, but these are often biased and might contradict each other. Were it not for Hansard, UK citizens would not be able to access a substantially verbatim report of parliamentary debate. Though edited for repetitions and obvious mistakes, it is an unbiased report, the provision of which affords public engagement with what was actually said in parliament.

Although Hansard’s primary purpose may be to make parliamentary debate accessible to the public, those actually using it regularly are mainly politicians and intermediaries, whose job it is us to collectively disseminate political knowledge among voters. Think journalists, academics, and those working for think tanks and pressure groups. With higher levels of research skill than average, they have nevertheless found the current Hansard dataset difficult to research. Many who have attempted to use Hansard have been left frustrated by the difficulty of extracting just the data they needed from all the millions of Hansard’s words. It is obvious, however, that if Hansard is essential to a fully-functioning representative democracy, it should be accessible and easily searchable for all. Ideally, search options should cater for different kinds of users, and results should be clearly and attractively presented. This would help those already using it, but would also engage the wider public, leading to a higher level of public engagement with political debate.

Hansard at Huddersfield has recognised these problems with Hansard’s existing online interface. Although it has recently undergone a process of streamlining, bringing the different access points onto one website, there remain limitations on its searchability. Consequently, we have set out to develop a highly usable interface that provides for the research needs of its potential users. An easily searchable website that provides clearly visualised data of complex searches. A website that responds to what the public wants to know, and that enables professionals to search for the Hansard sections they need to engage with their political concerns. We are already collaborating with potential end-users of our interface and would love to hear from other organisations wanting to test it out as it develops. Your feedback is invaluable in the creation of a maximally effective tool for accessing Hansard, which we hope will ultimately affect the way that the voting public use this valuable resource. If so, we dare to hope that our new interface to this valuable data will, in some modest way, enhance the public’s engagement with democracy.

Blog series

This blog series will inform you on why and how we are trying to adapt linguistic research methods to ease the process of using Hansard for research by non-linguists. It takes you through the process of creating our Hansard at Huddersfield interface, and concurrently gives examples of how linguists answer their research questions regarding Hansard using the particular research methodology we aim to simplify. We hope to show that by using our Hansard at Hudddersfield website you will not need to be a linguist to still get the most out of Hansard.