"Search the BNC for concordances" provides a user-friendly yet powerful interface to query and return up to 1000 examples from the British National Corpus of your search terms highlighted in context (the sentence in which they occur flanked by the preceding and following sentences). An option to display results in traditional concordance lines is planned as well.
Simple Query supports different kinds of matches. Match "the phrase", "all the words" or "any of the words" are self-explanatory options. "Boolean" adds support for logical AND, OR, NOT and grouping with ( ). When Boolean mode is selected, a list of the Boolean operators appears below the input box; the list can be hidden by unchecking the box above it.
The wildcard operator * is supported in any position (initial, medial, final). For better performance one can omit the * and choose between exact wordform matching and lemma matching. Lemma means base-form of a word. Enter any form of a lemma to match all other forms. For example, been matches itself and be, am, is, are, was, were and being. Currently there is no distinction by part of speech (PoS), so being matches both verb and noun forms.
Random sort returns a random sample of matches from the BNC in random order. Sort by text id also yields a random sample in text id order, which is more likely to show excerpts from similar texts near each other.
Advanced Query has a larger query entry box for more complex queries. It also adds two functions, filtering by text-type and extended mode matching.
There are two sets of text-type categories. The compilers of the BNC assigned each text to one of 17 "domains" detailed in the BNC User's Guide. David Lee later introduced a far more nuanced distinction among 71 "genres", some of which have a very small sample size. In this detailed article Lee motivates and discusses his finer classification. This response by Guy Aston raises a number of valid counterarguments and concerns. One argument is particularly compelling:
The BNC, which contains just over 4,000 texts, uses a framework which guarantees at least 100 texts in most principal categories. You may or may not like the categories chosen, but the corpus arguably allows you to generalize about these categories – about spoken and written texts, the nine different domains of written texts, the four different domains of "context-governed" spoken texts, and so forth – with reasonable certainty that findings will not be unduly biased by any particular text or any particular subcategory of texts.Anyone who undertakes serious research on language in various text-types in the BNC should read both papers thoughtfully. In addition to domain and genre there is a further breakdown of written texts into six medium types. Eventually medium filters will be combined with filtering by domain or genre.
Four "Power Select" buttons assist in selecting related groups of domain and genre types:
Extended query matching supports more sophisticated queries than Boolean mode. In addition to the Boolean operators, extended mode supports:
To make your queries more productive, this page can return datasets with up to 1000 matches. Please be gentle with our server and request no more than you need. On the other hand, if you do require larger or complete datasets feel free to request them from me.