Using corpora in the classroom – an #ELTchat summary (6/3/13)

A PLN for ELT Professionals

Using corpora in the classroom – an #ELTchat summary (6/3/13)

Screenshot of BNC home page –


This is a summary of an #ELTchat discussion which took place on 6 March 2013.


The full topic title was ‘Using corpora in the classroom – teachers and learners, tips, ideas, best practices’, and was proposed by @LizziePinard.  It takes quotes and opinions directly from the chat and generally avoids the temptation to digress into those found on the numerous links mentioned during the chat, unless explicitly stated.


The numerous links could be followed up for more detail as the chat itself, arguably, only skimmed the surface of the topic.  In trying to represent what was said, I hope that I haven’t inadvertently misrepresented anyone, which can happen given the nature of #ELTchat, the way tweets interlink and the requirement on contributors using the hashtag correctly in order for tweets to appear in the transcript.  Nearly all the external references are linked to from within the summary, hence there is no exhaustive list at the end.
An initial request to clear up some terms for those unfamiliar with corpus (n sing) and corpora (n plural) fell on deaf ears as it appeared that most who contributed to the chat already had some idea and, indeed, were very knowledgeable on the topic.  However there may have been some #ELTchat ‘lurkers’ – a term which seems to have become synonymous with this format – who were less sure.  For the purposes of this brief summary, a quick definition is borrowed from @teflerinha‘s post (26 Sep 2012) which was referred to:


“a corpus (plural corpora) is a collection of texts (for written corpora) or recordings of speech (for spoken corpora). A vast amount of language is gathered, and when sorted by a computer, this can provide a lot of data about how language is actually used, which words naturally collocate and so on.”


There is also this introduction, part of a series on the topic, by Jamie Keddie (@cheimi10) as mentioned in the chat by @Onestopenglish, although you need to be a member to access all six parts.   A subsequent request during the chat was made to provide some sources and links.


Examples of Corpora

These came thick and fast.

@bhrbahar and others mentioned the British National Corpus, as hosted on the Brigham Young University website. @Vickyloras, c/o @annapires, has used Compleat Lexical Tutor. @teacherphili mentioned Collins Wordbank@theteacherjames recalled someone previously on #ELTchat recommending the “nicely designed” Scottish corpus website and, in particular, its collocate clouds.

@Marisa_C believed The Time American Corpus is “great … and has a great tutorial on how to use.”

@ManosSY provided a link to a collection of online concordancers (software for producing examples of lines of text from a corpus which shows a word or phrase and its surrounding co-text) – see here.
Corpus researcher @Lexicojules stated she used “BAWE (student academic writing) and BASE (ditto speaking) with EAP Ss [as] very relevant”, later adding,  “You can use any online search tools as a basic corpus, [such as] online journals for EAP.”

@muranava, who writes extensively about corpora on his blog, highlighted a recent post which included mentions of specific corpora links “for some of the doubters”.  It includes the Backbone pedagogic corpora for content and language integrated learning, the MICASE corpus of academic spoken English, COCA (corpus of contemporary American English), as well as the already mentioned BNC (British national corpus).  It also highlights the Phrases in English tool which uses the BNC, giving an example.

Screenshot taken from a Phrases In English concordance search.


It was queried whether the BNC was still free. @leoselivan stated that it is “but that you have to sign in after 10 lookups.”  @lexicojules, who believed the BNC is now considered small and rather dated, mentioned other fun analysis tools include things like Google NGram Viewer (which looks at how words or phrases in a corpus of books have changed in frequency over time) and Google Fight to investigate relative frequencies.


@leoselivan, however, later questioned the trustworthiness as “Google yields different results for different people”.  So there is a question of bias. @How2TchEnglish also suggested using Google, if there is no access to a corpus, using quotation marks around the search term.
@muranava also offered demo activities from the FLAX learner corpus – here. The FLAX interface to the BAWE is great, apparently! – see here. She also linked to another post here about building your own corpus using TextSTAT and AntConc, showing how to generate a frequency list.  A number of others referred to creating your own corpus or using alternative, personalised corpora.  Some of these possibilities are discussed by Jamie Keddie in the already mentioned source, while his article on using Google searches is freely available here.

Uses of Corpora

@lexicojules:  “If you use live/unedited corpora, you have to be prepared to explain things that don’t follow the rules.”

There was a general sense from the transcript that use of corpora and of concordance software could be a little bit intimidating unless you know what you are doing.  You have to take care considering which one to use and how.  There is also arguably not a huge amount of free corpora online. But the effort can be rewarding for those that try and those that had experience of doing so, nonetheless, shared some ideas, whilst the rest speculated.

Early on, @toulasklavou stated that “corpora can help Ts identify vocab used for specific genres to help Ss use relevant vocab”. Later @toulasklavoualso suggested that “Ss in groups find the 10 most frequent adjectives, verbs etc for a specific genre, with each group [taking] one part of speech.”  while @How2TchEnglish thought it was “good for students to use in conjunction with a vocab notebook and [to] promote learner autonomy.”
@theteacherjames, who had dabbled with corpora to check how English is really used stated that it could “be used as a reference for the Ss but also as a reference for the teacher to make sure they are teaching the ‘correct’ language.” Corpora can go some way to showing the most common way of saying something, if not the ‘correct’ way.  He mentioned the ‘availability bias/heuristic’ – “being affected by what we see/hear in how we perceive lang to be” – or where teachers make judgments about the probability of events or overestimate the frequency of an event by how easy it is to think of examples.   @harrisonmike offered some words of caution in reply:  “Gotta be careful which corpus you’re using then!!” before adding “It can be limiting/misleading since you have to choose whether Am or Br Eng, written or spoken. Can be time consuming.”  Indeed, many teachers are probably put off by how long it takes to figure out the software and use it effectively.   But is it still preferable to intuition?
@Marisa_C offered some practical ideas for a ‘genre balanced’ Eng corpus, COCA, with this 2009 blog post.

@muranava offered a series of short examples for using the same corpus in her more recent posts entitled Quick Cup of COCA.  @muranava also offered his post on how a teacher could explain a word using COCA and its ‘word and’ interface.


@Marisa_C also mentioned a CUP book, ‘From Corpus to Classroom’ which “has many useful ideas” and is “superb, very well written and very informative” according to @muranava.

A screenshot of COCA’s Word and Phrase.Info Interface – see here

@LizziePinard asked if [anyone] uses corpora when making materials.  @CotterHUE replied “occasionally when checking word frequency for an article … but rarely.”  @muranava stated that in his opinion teachers need to play around with corpora first before using it in class.  She went onto state that she uses it with multi-media students, using it to “focus reading texts, e.g. word like ‘features’, then collocates of that.”  @lexicojules added that you could check the usage of a word that students disagree about – either during the class if confident or at home, bringing in the results next class.
@LizziePinard asked if anyone had used concordances in class or a non-native speaker corpus but there wasn’t much direct reply to this.  Similarly,@Marisa_C asked if anyone had projected a concordancer [in a lesson] to little response.  ‘Antconc’ is a free concordancer, noted @LizziePinard, and there may be others, unless you want to buy WordSmith.  There are online tutorials for Antconc, such as the one here, by the creator, Laurence Anthony.
@lexicojules later stated that “looking up following verb patterns or dependent preps always comes up nicely in a concordance.”

@Marisa_C mentioned the late Graham Davies, who wrote a guide on using concordance software in MFL – see here.

@leoselivan mentioned @teflerinha‘s greatpost on user-friendly concordance ideas late on.  She acknowledges at the beginning that it is rare for corpora and especially concordances to be used much in the classroom, often being “dense and unattractive”, before going on to give examples of how they might be used, such as raising awareness of collocations. See the post for full details.
@LizziePinard also asked if anyone had made a corpus out of student’s work and analysed it for errors.  @toulasklavou thought this sounded like “a great idea, an engaging activity”. Similarly, @theteacherjames thought this would be fascinating if there is software for it.
@LizziePinard suggested that it would be easy using word tools, before going on to suggest making a corpus out of your course book and comparing it to BNC.   @lexicojules thought this was easier with students submitting work electronically, although systematically searching for errors is not simple.  At a basic level Ss essays could be combined in a single word doc with ‘find’ facility engaged for how they used words.

@theteacherjames still wondered how you would analyse it before suggesting making a corpus based on #ELTchat conversations. @Marisa_C said the [CUP] book suggestsWordFast, while @LizziePinard returned with the previously mentioned Compleat Lexical Tutor for analysing small amounts.
Late on, @theteacherjames retweeted a couple of more links c/o @EBEFL – a piece here by Andrew Walkely and another by @EBEFL here – as to why corpora can be useful and which both discuss the ‘availability bias/error’.
@leoselivan, joined in the chat towards the end having been knowingly namechecked by @teacherphili and @theteacherjames, who had stated that “using corpora would be necessary if you are teaching in a lexical, chunk based way.” As a keen practitioner in the lexical approach, @leoselivanobviously agreed, but immediately mentioned a “recent criticism of an obsession with corpora” by another keen advocate of this approach,@hughdellar and his intention to write a response.  He later stated that “corpus is important for both NS (native speaker) and NNS (non-native speaker) teachers [as] intuitions may not always be true.”  A teacher’s intuition seems to be a fundamental point about the usefulness of corpora in the classroom.  As Leo has since stated on his blog, “Corpora have shed light on many aspects of language which were previously described based on intuition.  Instead of groping in the dark and anecdotal evidence we now have access to authentic language data.”


Debate surrounding use of Corpora

@muranava twice highlighted a post, mentioned already, which was a response to @hughdellar‘s earlier post, which had reiterated doubts from ten years ago about the use(fulness) of corpora.  It includes a subsequent discussion.  This post was followed, as mooted in the chat, by @leoselivan‘s own response and subsequent discussion with Hugh on his blog.  Follow the links to read the discussion.  Leo’s own guide to essential corpus tools which features Just-the-Word, Phrase Up, Netspeak, Concordancer and Fraze It is here.



Although I took part in the chat, it wasn’t always easy to follow the thread.  Looking back at the transcript, 7 weeks later, it was no clearer.  It was apparent that a few of the participants clearly knew what they were talking about and gave plenty of links to using corpora, while others showed an interest in the topic without having ever put any of the theory into practice.  But working out what a reply was in reply to was not always easy.  This summary only represents the outline discussion – there was no mention of colligation, semantic prosody or lexical priming – a lot of the ‘meat’ is in the links, mostly to blog posts, that were offered up, where these and many other things are discussed.
There was, at one point, a request for a screencast to be made showing how to upload your own corpus to lextutor.  Screencasts, such as the one of Antconc mentioned, which demonstrates analysing text uplodaed from a textbook, as well as more straightforward step by step guides, might well be invaluable ways of demonstrating the usefulness of certain concordance tools to teachers.  I don’t think even Russell Stannard of has ever done a screencast for one.  The potential for analysing ‘real’, ‘correct’ language and confirming, or otherwise, a teacher’s intuition is definitely there.  But I guess, unless you try it out, you’ll never know for sure.


Tutorial on how to use COCA

Scott Thornbury’s tutorial on which he posted on May 12, 2013 in response to this particulat #ELTchat can be viewed  here 




About the summary writer:  Phil Longwell is currently out of work following an illness. Last year, he completed an MA in ELT at the University of Warwick where, amongst other things, he wrote an essay – here – on using corpora to investigate the phrase ‘public interest’.   The main finding of which was that the phrase has a tendency to appear at the beginning or end of sentences, especially the former, and that we might be primed into using it this way.


8 Responses

  1. Crystal says:

    Thanks for the summary! I saw a presentation by Dilin Liu of University of Alabama at the TESOL Convention in Dallas in which he talked about research he’s done around using corpora with higher level students as a tool to correct lexico-grammatical errors in their writing. Pretty interesting. His website has a good bibliography of his research into using corpora with language learners.

  2. Tilly Harrison says:

    Hi Phil,
    Thanks for the summary – I’m sorry I missed this conversation – loads of useful links! Not sure I agree that the CUP book ‘has useful ideas’ as it explicitly says in the introduction that it has no ‘off the shelf’ ideas or classroom materials. I thought it had rather a misleading title. It was about research supporting pedagogy in general. For all the interest in corpora, the ELT world is still waiting for a free, fool-proof user-friendly interface to a large, current corpus….

  3. Thanks for that additional resource, Crystal, and thanks for the comment, Tilly. So, the CUP book is not as useful as stated in the chat, apparently. You state that there is no single, free, user-friendly interface to a large, current corpus. Many of them are, indeed, complex and far from user-friendly. Many corpora are either limited in size, skewed towards certain types of texts, rather dated or have limited time spans. Many require a subscription or have strict conditions of use. Nonetheless, I am investigating some of the ones mentioned, as well as others mentioned to me since, as I am intending to create at least one screencast to be uploaded to Teacher Training at some time in the near future. I would be interested to know if anyone else has a good, single recommendation.

  4. […] recent ELT Chat on using corpora suggested that what might be helpful for teachers coming to online corpora (such as COCA- the […]

  5. Update 27/05 – Since I wrote this summary, Scott Thornbury has taken up the challenge which I laid down on his A-Z of ELT blog by creating a great tutorial on using COCA. Scott used, a tool that he had been playing around with and which adds ‘a human touch’ by including the presenter’s face. It is by no means the first tutorial on using COCA. There are a number of recently created tutorials on YouTube, as well as one by Ammar Elhassan Elmerhbi on Vimeo made in 2009. There are also some videos created by BYU architect, Mark Brigham on using the Word and Phrase Info interface, as mentioned in the chat. In the first of numerous comments that followed Scott’s post, Mura Nava mentioned that I intended to create some screencasts on COCA. I have now completed the screencasts and these will be published on the Teacher Training Videos site soon. Alannah Fitzgerald’s lengthy comment to Scott’s post mentioned that she was [also] “putting some training videos [together for the site] on how to build your own corpora and using many of the open educational resources available … using the FLAX open source software.” I will post the links to both of these as a comment on this #ELTchat summary once they appear on the TTV site.

  6. Marisa says:

    Thanks for reminder Phil – will embed the tutorial

  7. Phil Longwell says:

    The screencasts for using COCA have finally gone onto the TTV site. Better late than never!

Comments are closed.