Читать книгу Introduction to Corpus Linguistics - Sandrine Zufferey - Страница 21
1.9.2. Answer key
Оглавление1) First of all, let us recall that the rationalist methodology interrogates the knowledge of the researcher by means of introspection and reasoning, whereas the empirical methodology looks for answers by observing or experimenting on data that is external to the researcher. Chemistry is typically an empirical science, which makes extensive use of experimentation and observation. Ethics is a philosophical discipline that involves reflections on moral questions. These reflections are, by nature, introspective and involve a rationalist methodology. Law is a science that studies the rules and laws that govern social relationships. Many aspects of the law involve the interpretation of existing rules or the creation of rules based on reasoning and common sense. Thus, introspection plays a big role. That being said, in certain cases, law also deals with external data. For example, a search can be performed throughout previous decisions (case law), in order to find a similar case that could apply to a certain situation. The role of case law is very different in different legal systems. In English-speaking countries, which apply the common law, previous cases play a fundamental role, because they become binding rules for solving the following cases. We can therefore say that in these countries, the part of empiricism when applying the law is also very important. Anthropology is a discipline that studies humanity in its various aspects (physiological, social and cultural). This discipline places great importance on the observation of data. Despite the fact that we can generally classify a branch as being rather empirical or rationalist in nature, we should bear in mind that these two methodologies are often present in varying degrees. For instance, we have already discussed the case of law, where not only an introspective element is involved, but also the use of external data in the form of case law. We can also imagine other situations of interaction between methodologies. For example, we have classified ethics as a rationalist discipline. Nevertheless, ethics was also built on the basis of empirical material. In the field of medicine in particular, medical ethics is based on the facts observed in practice.
2) Chomsky notably criticized corpus linguistics for offering only a partial vision of language, insofar as a corpus includes the productions of a limited number of speakers, at a given situation. This same observation also applies to the experimental methodology, which tests a small number of speakers along a limited number of linguistic stimuli. The main response to such criticism is that these areas are based on the use of quantitative methods (namely inferential statistics), which make it possible to draw conclusions from a sample and to extrapolate them to an entire population. The criticism of the potentially problematic choice of subjects who could be aphasic and not represent the normal use of language also applies to experimental methodology. In theory, though, such subjects could also be recruited for an experiment by mistake. That being said, good practices in corpus linguistics and experimental linguistics require obtaining information about participants beforehand, which can eventually eliminate this type of bias. Typically, researchers verify that the people who contribute to a French corpus are native French speakers. Likewise, they test the language skills of speakers before considering them by default as French-speaking, bilingual, etc.
3) a) This type of research is corpus-driven, because the starting point is not hypotheses which have to be verified throughout the corpus. The starting point for research is the corpus itself, in order to be able to infer usage rules from its content.
b) On the other hand, this type of research is corpus-based, because it starts from a hypothesis (e.g. “passive sentences tend to be used more frequently with state verbs”), and seeks to verify it in the corpus, which, in that way, only works as an analysis tool.
4) These tools have been developed for simplifying searches within a corpus. Otherwise, it would be very inconvenient to use the standard tools that are present in a word processor, for example. In particular, concordancers make it very easy to extract all the occurrences of a word or an expression with its left and right context, as well as to determine its main collocations. These tools also help us create a list of all the words in the corpus, sorted by frequency. While one corpus can be compared to another reference corpus, these tools also make it possible to extract a list of keywords that are specific to the corpus studied. In the field of multilingual corpora, aligners make it possible to align parallel corpora sentence by sentence, and then to extract a sentence and its translation by means of a bilingual concordancer.
5) A quantitative study on this question could focus on the creation of categories for classifying spelling mistakes, for example, agreement errors, redoubling of consonants, dumb letters, etc., and then counting all the occurrences of errors belonging to each category. By applying a statistical test, this study would then make it possible to know whether students tend to make certain types of mistakes more often (e.g. grammatical errors) rather than other mistakes (e.g. lexicon errors). A qualitative study on this same question would identify some examples of spelling mistakes and analyze in detail the contexts in which they occur, for example the grammatical category of the words concerned, whether they are rare or frequent words, occurring in a long or a short sentence, what type of phonemes is poorly transcribed, etc. This study would make it possible to identify linguistic contexts that tend to be conducive to spelling mistakes.
6) The results of the quantitative corpus study summarized above, namely the quantification of the different types of spelling mistakes, could be considered as a kick off for an experimental study. For example, the corpus study could help identify one type of common error, and one type of rare mistake. An experiment could then help to determine whether being in a stressful situation or not has a different impact on the two types of error.
7) a) In order to study a phonological phenomenon like this, a spoken corpus is essential. This corpus should be specific to the population of French-speaking Switzerland. A large type of corpus comprising a large number of different speakers would be desirable. Finally, this corpus should contain a synchronic type of data, corresponding to the current pronunciation, rather than to its diachronic evolution.
b) In order to study the evolution of a language, a diachronic corpus is essential. This constraint implies the use of written data, since oral data only go back to the middle of the 20th Century. Finally, the chosen corpus should include productions made by adult native speakers.
c) In order to study translation, a parallel corpus is required. This corpus should contain original texts in French and their translation in English. It should be a synchronic corpus, corresponding to current uses of the language.