Papers by Thomas Gaillat, PhD
This paper deals with the way learners make use of the demonstratives this and that. NLP tools ar... more This paper deals with the way learners make use of the demonstratives this and that. NLP tools are applied to classify occurrences of native and non-native uses of the two forms. The objective of the two experiments is to automatically identify expected and unexpected uses. The textual environment of all the occurrences is explored at text and PoS level to uncover features which play a role in the selection of a particular form. Results of the first experiment show that the PoS features predeterminer and determiner, which are found in the close context of occurrences, help identify unexpected learner uses among many occurrences also including native uses. The second experiment shows evidence that the PoS features plural noun and coordinating conjunction influence the unexpected uses of the demonstratives by learners. This study shows that NLP tools can be used to explore texts and uncover underlying grammatical categories that play a role in the selection of specific words.
This paper critically assesses the algorithms implemented in the {arules} package (Hahsler et al... more This paper critically assesses the algorithms implemented in the {arules} package (Hahsler et al.,
2005). We are interested in the heuristic potential of these rules by visualising several significance
indicators. We present a case study in which the production process of these 'probabilistic' rules is
analysed. The desiderata of these explorations are detailed, comparing association rules and some
other statistically-based methods for the exploration of linguistic properties.
Classifying French learners of English with written-based lexical and complexity metrics.
This pa... more Classifying French learners of English with written-based lexical and complexity metrics.
This paper assesses spontaneous oral monologues in the ANGLISH corpus (Tortel 2009). Twenty
oral transcriptions of NS English are compared with forty French-L1 transcriptions of NNS English
of intermediate and advanced levels. Syntactic and complexity metrics (Lu 2014) and Vocabulary
Growth Curves (Evert & Baroni 2008, Baayen 2008) are used to classify speakers. We analyse how
significant these written-based metrics are in the classification of speakers for their oral production.
Recherche et pratiques pédagogiques en langues de spécialité - Cahiers de l APLIUT, 2014
Cet article aborde la problématique de la construction d'un didacticiel d'apprentissage de l'angl... more Cet article aborde la problématique de la construction d'un didacticiel d'apprentissage de l'anglais dans le cadre de la mise en place d'un programme d'auto-formation guidée conjuguant travail à distance et présentiel. L'objectif est de proposer une approche novatrice dans la conception d'un scénario d'apprentissage, à savoir la nécessité de combiner un déroulement pédagogique dans lequel l'apprenant puise le sens de son travail, et un déroulement linguistique assurant l'assimilation de compétences linguistiques. L'approche défend l'idée que le processus d'élaboration d'un parcours résulte de la confrontation d'une démarche actionnelle et d'une pratique raisonnée de la langue.
ASp, 2013
Cet article analyse les caractéristiques distributionnelles des deux démonstratifs « this » et « ... more Cet article analyse les caractéristiques distributionnelles des deux démonstratifs « this » et « that » afin d'identifier des usages spécifiques en fonction de domaines spécialisés de l'anglais. Les données sont collectées dans le corpus ICE-GB. L'étude consiste à échantillonner le corpus en souscorpus, en fonction du domaine spécialisé et du mode écrit ou oral des textes. Les sous-corpus relevant de l'anglais général sont distingués de ceux relevant de domaines spécialisés (médecine, science et technologie). Pour chaque sous-corpus, l'outil ICECUP est utilisé pour effectuer des requêtes et extraire le nombre d'occurrences des démonstratifs en fonction de leur catégorie grammaticale : déterminant, pro-forme, adverbe, complétif et pronom relatif. La distinction par catégorie vise à compter le nombre exact d'une forme particulière en fonction d'un sous-corpus spécifique représentant un domaine. Les résultats statistiques globaux montrent une corrélation limitée entre les démonstratifs et les domaines spécialisés. Cependant, certaines catégories grammaticales entretiennent un lien étroit avec un domaine particulier. Enfin, l'étude montre des tendances marquées concernant l'usage des démonstratifs dans leur rôle de pro-forme.
Automatic tagging of a learner corpus of English with a modified version of the Penn Treebank tag... more Automatic tagging of a learner corpus of English with a modified version of the Penn Treebank tagset This article covers the issue of automatic annotation of a learner corpus of English. The objective is to show that it is possible to PoStag the corpus with a tagger to prepare the ground for learner error analysis. However, in order to have a finegrain analysis, some functional tags for the study of specific linguistic points are inserted within the tagger's tagset. This tagger is trained on a nativeEnglish corpus with an extended tagset and the tagging is done on the learner corpus. This experiment focuses on the incorrect use of this and that by learners. We show how the insertion of a functional layer by way of new tags for the forms allows us to discriminate varying uses among natives and nonnatives. This opens the path to the identification of incorrect patterns of use.
Learner corpus research is now faced with a multiplicity of tagsets. It is therefore difficult to... more Learner corpus research is now faced with a multiplicity of tagsets. It is therefore difficult to carry out cross-corpus analysis due to the variety of tags used for each part-of-speech (POS). In this paper, we envisage this issue through a specific linguistic point. We propose a typology of uses in both native and non-native corpora. Various tagsets are analysed so as to measure the relevance of the linguistic information provided for this and that. Overall, a comparative analysis of this and that in tagsets is proposed and the benefits and flaws of manual fine-grained annotation versus automatic annotation are assessed. This study comes as a first step towards automated annotation of this and that in various corpora as this process would pave the way to corpus interoperability at POS level.
Recent Advances in Corpus Linguistics, 2014
This paper deals with the way learners make use of the demonstratives this and that. NLP tools ar... more This paper deals with the way learners make use of the demonstratives this and that. NLP tools are applied to classify occurrences of native and non-native uses of the two forms. The objective of the two experiments is to automatically identify expected and unexpected uses. The textual environment of all the occurrences is explored at text and PoS level to uncover features which play a role in the selection of a particular form. Results of the first experiment show that the PoS features predeterminer and determiner, which are found in the close context of occurrences, help identify unexpected learner uses among many occurrences also including native uses. The second experiment shows evidence that the PoS features plural noun and coordinating conjunction influence the unexpected uses of the demonstratives by learners. This study shows that NLP tools can be used to explore texts and uncover underlying grammatical categories that play a role in the selection of specific words.
Uploads
Papers by Thomas Gaillat, PhD
2005). We are interested in the heuristic potential of these rules by visualising several significance
indicators. We present a case study in which the production process of these 'probabilistic' rules is
analysed. The desiderata of these explorations are detailed, comparing association rules and some
other statistically-based methods for the exploration of linguistic properties.
This paper assesses spontaneous oral monologues in the ANGLISH corpus (Tortel 2009). Twenty
oral transcriptions of NS English are compared with forty French-L1 transcriptions of NNS English
of intermediate and advanced levels. Syntactic and complexity metrics (Lu 2014) and Vocabulary
Growth Curves (Evert & Baroni 2008, Baayen 2008) are used to classify speakers. We analyse how
significant these written-based metrics are in the classification of speakers for their oral production.
2005). We are interested in the heuristic potential of these rules by visualising several significance
indicators. We present a case study in which the production process of these 'probabilistic' rules is
analysed. The desiderata of these explorations are detailed, comparing association rules and some
other statistically-based methods for the exploration of linguistic properties.
This paper assesses spontaneous oral monologues in the ANGLISH corpus (Tortel 2009). Twenty
oral transcriptions of NS English are compared with forty French-L1 transcriptions of NNS English
of intermediate and advanced levels. Syntactic and complexity metrics (Lu 2014) and Vocabulary
Growth Curves (Evert & Baroni 2008, Baayen 2008) are used to classify speakers. We analyse how
significant these written-based metrics are in the classification of speakers for their oral production.