The construction of a corpus of aviation scientific articles:
an interdisciplinary study
DOI:
https://doi.org/10.22480/revunifa.2024.37.617Keywords:
Corpus, Corpus Linguistics, Systemic-Functional LinguisticsAbstract
This article presents the experience of building a corpus of scientific articles written in English in the field of aviation, and the linguistic-computational treatment given by the [...] and the linguistic-computational treatment given by Corpus Linguistics. Data collection was performed using computer programming techniques for data scraping, which allowed the collection of articles from two electronic journals: Air & Space Power Journal and Journal of Aviation/Aerospace Education and Research. The corpus is used for linguistic research, based on Systemic-Functional Linguistics (Halliday, 1994 e Halliday & Matthiessen, 2004, 2014), that sees language as a potential system of meanings, in which the concept of choice is essential for allowing the study of lexical regularities, and has implications for both language description and language teaching. With the use of Corpus Linguistics computational tools (Berber-Sardinha, 2000, 2004), it is possible to work with a large number of texts, obtaining quantitative data that help in the qualitative analysis of these regularities. As a result, we have a study corpus that can be considered "[...] medium-large (Berber-Sardinha, 2004), with more than three million words. It is expected that the construction of this corpus will encourage new linguistic and statistical research in aviation, especially involving cadets who participate in scientific initiation programs and who draft their course completion papers.
References
BERBER SARDINHA, T. Computador, corpus e concordância no ensino de léxico-gramática de língua estrangeira. In: V, Leffa (org.) As palavras e sua companhia: o léxico na aprendizagem. Pelotas: EDUCAT, UCP, p. 45-72, 2000.
BERBER SARDINHA, T. Linguística de Corpus. Barueri-SP: Manole, 2008.
BIBER, D. Representativiness in Corpus Design. Linguist Computing. v. 8, p. 243-257, 1993.
BIRD, Steven; LOPER, Edward; KLEIN, Ewan. Natural Language Processing with Python. O’Reilly Media Inc., 2009. Disponível em: https://www.nltk.org/book/. Acesso em: 24 jul. 2023.
BISONG, E. Google Collaboratory. In: Building Machine Learning and Deep Learning Models on Google Cloud Platform. Berkeley, CA: Apress, 2019. Capítulo 7. Disponível em: https://doi.org/10.1007/978-1-4842-4470-8_7.
CRYSTAL, D. English as a global Language. Cambridge. Cambridge University Press, 1997.
EGGINS, S. An introduction to Systemic Functional Linguistics. Londres: Pinter Publishers, 1994.
GOUVEIA, C. Texto e gramática: uma introdução a linguística sistêmico-funcional.
Matraga. Rio de Janeiro, v. 16, n. 24, p. 13-47, 2009.
GROSS, A. The rhetoric of science. Cambridge, MA: Harvard University Press, 1996.
HALLIDAY, M. A. K. An introduction to Functional Grammar. Londres: Edward Arnold, 1994.
_________________. & MATTHIESSEN, C. M.I.M. An introduction to Functional Grammar. Londres: Edward Arnold. Third Edition, 2004.
_______ & MATTHIESSEN, C. M.I.M. An introduction to Functional Grammar. Londres: Edward Arnold. Third Edition, 2014.
HARRIS, Charles R. et al. Array programming with NumPy. Nature, v. 585, n. 7825, p. 357-362, set. 2020. DOI: 10.1038/s41586-020-2649-2. Disponível em: https://doi.org/10.1038/s41586-020-2649-2.
HUNTER, J. D. Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, v. 9, n. 3, p. 90-95, 2007.
pdfminer.six. (2023). pdfminer.six (Version 20221105). [Software de extração de texto de PDF]. Disponível em: https://pypi.org/project/pdfminer.six/. GitHub repository: https://github.com/pdfminer/pdfminer.six.
Leonard Richardson. BeautifulSoup (Version 4.11.2). [Pacote Python para análise de documentos HTML e XML]. Disponível em: https://pypi.org/project/beautifulsoup4/. GitHub repository: https://github.com/wention/BeautifulSoup4.
MARTIN, J. R. English Text: System and Structure. Ámsterdam: Benjamins, 1992.
McENERY, T. & WILSON, A. Corpus Linguistics. Edinburgh, Edinburgh University Press.
MOITA LOPES, L. P. (Org.) Por uma. Linguística Aplicada Indisciplinar. São Paulo: Parábola Editorial, 2006.
AUTOR1. Entre alhos e bugalhos – os usos do clítico SE na escrita acadêmica. Tese de Doutorado. PUC-SP. 2013.
___________. Os dizentes nos artigos científicos de Linguística - um estudo baseado na Linguística Sistêmico-Funcional e com o auxílio da Linguística de Corpus. Letras & Letras, v. 30, p. 46-63, 2014.
___________. O uso do processo existencial ‘haver’ na escrita acadêmica: um estudo com base em um corpus de artigos científicos de diversas áreas do conhecimento. Revista (Con) Textos Linguísticos (UFES), v. 9, p. 142-160, 2015.
___________. O gênero resenha na sala de aula de Língua Portuguesa como L2. Anais do IV Encontro Mundial de Ensino de Língua Portuguesa. Washington: Georgetown University, 2016.
MOREIRA FILHO, J. L. Python para Linguística de Corpus : guia prático, 1. ed., São Paulo, Ed. do Autor, 2021.
SANCHEZ, A. Definicion e historia de los corpus. In: SANCHEZ, A et al (Org.) CUMBRE – corpus linguistico de espanol contemporaneo. Madrid: SGEL, 1995.
SCOTT, M. R. Wordsmith Tools v. 8. Software for text analysis. Oxford University Press, 2018.
THOMPSON, G. Introducing Functional Grammar. New York: Routledge, 1996.
TRASK, R. L. Dicionário de Linguagem e Linguística. São Paulo: Contexto, 2004.
VIRTANEN, Pauli et al. SciPy 1.0: Algoritmos fundamentais para computação científica em Python. Nature Methods, v. 17, p. 261-272, 2020. DOI: 10.1038/s41592-019-0686-2.
WIDDOWSON, H. ELF and the pragmatics of language variation. Journal of English as Lingua Franca. V. 4 (2), pp. 359-372, 2015.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Fernanda Beatriz Caricari de Morais, João Paulo Martins dos Santos
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Revista da UNIFA permite que o (s) autor (es) mantenha(m) seus direitos autorais sem restrições. Atribuição-NãoComercial 4.0 Internacional (CC BY-NC 4.0) - Revista da UNIFA é regida pela licença CC-BY-NC