British
Association for Applied Linguistics
Corpus SIG Day 16th April 2004
University
Of Birmingham
Abstracts
Maggie Charles
Problems in the corpus investigation of stance
This paper explores some of the limitations of a corpus-based approach through an examination of problems encountered in carrying out research on stance. There are four aspects of this complex phenomenon that pose particular challenges for corpus-based methods. First, stance can be implicit, and thus difficult to identify and retrieve. Second, stance can be multi-layered, which makes it hard to know what to include in frequency counts. Third, stance can be indeterminate, which means that it may not be possible to assign items reliably to a single analytical category. Finally stance is integrated into the text, which means that large amounts of context must be examined in order to give an account of its operation. Discussion of these problems leads to the conclusion that it is necessary both to be aware of the limitations of corpus-based methods and to combine them with detailed contextual analysis where appropriate.
Alice Deignan
The contribution of corpus linguistics to cognitive categories
The literature on metaphor and other kinds of figurative language has been dominated by theorists in the cognitive tradition for several decades. Useful and potentially powerful frameworks have been developed, but much of this research depends on theoretically and intuitively defined categories, which can be difficult to operationalize when dealing with language in use. One example of this problem is defining different kinds of figurative language, for which cognitive linguists generally depend on the notion of 'domains'. This notion is intuitively satisfying but it is very difficult to define and is thus problematic when working with non-prototypical examples. This paper shows how corpora can be used to develop alternative methods of categorizing figurative language. The results are consistent with cognitive theory, and the methodology can be implemented for borderline cases as well as the prototypical, thus contributing to a more robust model of figurative language.
Verena Jung
Using Webcorpora and other corpora to demonstrate the changes in collocation environment for English syntagmas when used in English and used as loan phrases in German
We know that words change their meaning when they become imported into another language, even when they were imported seemingly with the meaning in which it was used in the original. Countless articles in German newspapers constantly bemoan the fact that too many English structures are used in German and that this does not fit with the German language. The purpose of this paper is not to decide on whether it is good to borrow English phrases into German but to follow these phrases once they have entered the German language and demonstrate that the way theses phrases can be used in German differs greatly from the way they can be or tend to be used in English. Other then using dictionaries or intuition, using corpora is the only method that is able to demonstrate the shift in contextualisation or possible contextualisation that takes place when the phrase is used in the other language. By using domain restricted websearches, it can be demonstrated that contextualisations for phrases in German domains are radically different from those in English-speaking domains.
Hilary Nesi
Enumeration in British and American lectures
Enumeration has been identified as a common predictive signal in academic writing (Tadros 1985, 1994), but there has been little corresponding analysis of academic speech. Tadros identifies three types of enumeration. Types a) and b) are marked by the colon and their use is therefore confined to written text, but type c) is marked by the presence of a numeral, and therefore also occurs in spoken discourse. This paper compares the use of enumeration in lectures drawn from the British Academic Spoken English (BASE) corpus and the Michigan Corpus of Academic Spoken English (MICASE). The device is found to be common in both corpora, but with somewhat different patterns of use. MICASE lecturers are more likely to elicit enumerated sequences from their audiences, while BASE lecturers tend to use extended pre-planned enumeration sequences over longer sections of text. Differences in enumeration patterns reflect key differences in lecturing style in Britain and America.
Alison Sealey & Paul Thompson
"The bravest knight in the whole land": a corpus analysis of writing for children
This paper reports on findings from a project (named 'CLLIP') which investigates how British children of primary-school age respond to corpus-based approaches to learning about their first language. The aspect focused on here is the characteristics of the corpus used in the teaching activities. This is a sub-corpus of 31 imaginative prose texts written for children, extracted from the BNC. This corpus was analysed with a view to identifying its distinctive lexical and syntactic features.
For comparison, two reference (sub-)corpora were created from the BNC: one of fiction written for adults (317 texts) and the other of newspaper writing (114 texts). Comparison with these corpora makes it possible to identify features of the CLLIP corpus that are distinctive of writing for children and of imaginative fiction.
Initial comparison of the raw word and POS frequencies revealed a remarkably similar profile for the two imaginative prose corpora, in contrast to that for the newspaper corpus. This has led us to investigate the uses of high frequency lexical items more closely, in terms of collocational and colligational features, to test the initial indications of similarity.
Michael Stubbs
On very frequent phrases in English: distributions, functions and structure
One area of linguistics which has developed very rapidly in the last 25 years is phraseology. Corpus study has shown that routine phraseology is pervasive in language use, and also that recurrent word-combinations can be modelled in various ways. I will discuss the function, structure and lexis of some of the most frequent phrases in English.
In the first half I will introduce a major new interactive data-base which provides extensive quantitative information on recurrent phraseology in the BNC. This data-base has been developed by William Fletcher and is available at http://pie.usna.edu.
In the second half I will present some preliminary findings from this data-base, and discuss some of their implications for linguistic theory, including recent studies of grammar and of semantic change.
I will also point out, however, that the very large amount of data itself poses methodological and interpretative puzzles.
Minako Yamada
Interfacing Corpus Linguistics and Sociocultural Theory as a Research Methodology
It has become increasingly clear that the investigation of human communication is necessarily interdisciplinary in nature. This study incorporates some relevant concepts and methods from corpus linguistics and sociocultural theory in order to gain a finer-grained analysis of task proficiency. Data were produced by pairs of Japanese college students during a map-completion task in English. In particular, the study analyses lexical density, as obtained from the pattern of a concordance of L1 "private speech" (i.e. self-directed speech). Lexical density has been shown to indicate whether interlocutors co-constructed a socially-shared world. L1private speech is a means of observing interlocutors' cognitive processes while they cooperatively achieve a task-goal. The results clearly indicate that the pairs who achieved the best results were, regardless of their English proficiency level, good at grasping the status of their task performance in both problematic and non-problematic situations. Implications for language teaching and testing will be discussed.