ENGRI

(English words in Croatian)

Goal

The goal of this project is to investigate English loan words in Croatian, their frequency, cognitive processing in an L1 environment, the differences in L2 proficiency and their impact on cognitive processing, as well as associative relations between English words and Croatian words.

Team Members

Associates

  • Marijan Palmović, Faculty of Education and Rehabilitation Sciences, University of Zagreb, Croatia

  • José Antonio Hinojosa Poveda, Pluridisciplinary Institute, Complutense University of Madrid, Spain

  • Marc Guasch, Department of Psychology, Rovira i Virgili University, Tarragona, Spain

  • Maria Pilar Ferre, Department of Psychology, Rovira i Virgili University, Tarragona, Spain

PUBLICATIONS

Borucinsky, M. & Bogunović, I. (2022). Crpljenje engleskih riječi iz korpusa hrvatskoga jezika. Fluminensia, 34(2), 435-461. (WOS) https://doi.org/10.31820/f.34.2.13

Kao globalni jezik modernoga doba engleski je postao dominantan jezik davatelj. Danas se smatra da hrvatski jezik najviše posuđuje upravo iz engleskoga. Utjecaj engleskoga jezika na hrvatski vidljiv je u različitim funkcionalnim stilovima te na gotovo svim jezičnim razinama, no najizraženiji je na leksičkoj razini. U novije vrijeme, posebice u medijima i na društvenim mrežama, sve se češće javljaju neprilagođene engleske riječi, tj. riječi koje su zadržale izvorni oblik, a kojima se po potrebi dodaju hrvatski afiksi. Za sada još uvijek ne postoje konkretni podaci o takvim riječima u hrvatskome jeziku. U cilju pronalaženja engleskih riječi, u drugim su se jezicima koristile različite metode, od ručnih klasifikacija i korištenja postojećih jezičnih resursa do razvoja novih alata i/ili resursa. Međutim, jezične tehnologije za hrvatski jezik još uvijek su nedostatno razvijene. Stoga je cilj ovoga rada ispitati mogućnosti nekih od postojećih alata i resursa za crpljenje engleskih riječi i stvaranje baze engleskih riječi. U tu svrhu pretraživan je mrežni korpus hrvatskog jezika hrWaC pomoću platforme Sketch Engine. Ovom metodom dobiven je popis od 1217 engleskih riječi. Rezultati su pokazali da se pomoću dostupnih alata i resursa za hrvatski jezik može izraditi popis engleskih riječi i njihovih frekvencija, ali i da postoje brojni problemi zbog kojih se rezultati ne mogu smatrati u potpunosti pouzdanima. Isto tako, sam se postupak i dalje mora kombinirati s ručnim metodama i klasifikacijama. Zaključujemo da je za izradu cjelovite baze engleskih riječi u hrvatskome potrebno razviti nove alate i resurse koji bi omogućili automatsko crpljenje engleskih riječi iz korpusa hrvatskoga jezika.

Ćoso, B., Guasch, M., Bogunović, I., Ferre, P., & Hinojosa, J. A. (2022). CROWD‐5e: A Croatian psycholinguistic database of affective norms for five discrete emotions. Behavior Research Methods, 1-17. https://doi.org/10.3758/s13428-022-02003-2

The present study introduces affective norms for a set of 3022 Croatian words on five discrete emotions: happiness, anger, sadness, fear, and disgust. The words were rated by 1239 Croatian native speakers. Each participant rated 251 or 252 words, for one discrete emotion on a five-point Likert scale. The analyses revealed a significant relationship between discrete emotions, emotional dimensions (valence and arousal), and other psycholinguistic properties of words. In addition, small sex differences in discrete emotion ratings were found. Finally, the analysis of the distribution of words among discrete emotions allowed a distinction between “pure” words (i.e., those mostly related to a single emotion) and “mixed” words (i.e., those related to more than one emotion). The new database extends the existing Croatian affective norms collected from a dimensional conception of emotions, providing the necessary resource for future experimental investigation in Croatian within the theoretical framework of discrete emotions.

Kučić, M. (2021). Creating a Web Corpus using GO. Proceednings of the 44th International Convention MIPRO 2021, Croatian Society for Information, Communication and Electronic Technology (ISSN 1847‐3946 ), 1931‐1933. https://doi.org/10.23919/MIPRO52101.2021.9597093

The Web contains large amounts of textual data which could be used as a source to create new corpora, yet there are not many plug and play solutions for scraping specific parts of the websites. This paper presents a new open-source solution for downloading and parsing HTML websites which can be configured from one configuration file. As a demonstration of this method, a new ad hoc corpus was built. The corpus contains a total of 2,395,735 titles and articles collected from 14 most popular Croatian websites.

Bogunović, I. & Ćoso, B. (2019). Lexical access in Croatian–English unbalanced bilinguals: A cross–linguistic study. Suvremena lingvistika, 87, 1-22. (WOS) https://doi.org/10.22210/suvlin.2019.087.08

In Croatia, early exposure to English is enabled through early language learning programs as well as the media. The media plays an important role in incidental language learning. This, along with the fact that daily exposure to English is measured in hours, indicates that its status as a foreign language is changing, which offers a unique opportunity to investigate the relationship between language exposure, level of proficiency and lexical access. The main goal of this study is to explore lexical access in Croatian speakers of English, with different levels of proficiency. The investigation consisted of a questionnaire on language use and exposure, proficiency test and an experiment in which cross–language priming was combined with a lexical decision task. The experiment explored whether priming effect would occur in two conditions: associative and semantic relatedness and translation equivalence, in both language directions. Semantic relationship between words elicits shorter reaction time, suggesting that sharing similar meaning speeds up the recognition process in words from two languages. Even stronger effect was observed in the case of translation equivalents. Surprisingly, proficiency level was not significant. The results are discussed in the light of the Revised Hierarchical Model and the Bilingual Interactive Activation Model +.

Bogunović, I. & Jelčić Čolakovac, J. (2019). The Role of Informal Activities in Incidental Language Acquisition: The Relationship Between Language Use and Proficiency. Fluminensia, 30, 181-199. (WOS) https://doi.org/10.22210/suvlin.2019.087.08

The English language is studied as a foreign language in Croatia, and, apart from being included in formal education, it is also present in everyday life. Daily exposure to English is measured in hours and research has shown that many informal activities allow for incidental language acquisition. This paper is aimed at identifying the activities in which the Croatian student population spends most of their time using English as well as at investigating whether a connection between exposure to English, its use and prior knowledge of language can be established. Ninety-three participants were included in the study, all students of the Faculty of Humanities and Social Sciences at the University of Rijeka. The level of English knowledge was determined by administering the Oxford Placement Test. Three groups were formed based on the results obtained in the testing. Exposure to English and the students’ use of the language were tested by means of a questionnaire in which the participants were asked to approximate how much time they spent using English in the activities listed in the questionnaire. The results showed that the participants spent most of their time online and least in spoken communication. The differences between groups with the lowest and the highest levels of knowledge were found to be significant across all activities apart from reading for leisure, written, and spoken communication. On the one hand, this study has managed to corroborate the connection between language use and the level of language proficiency, and on the other, it has indicated that the status of English is slowly changing on both the global and individual level.

Ćoso, B. & Bogunović, I. (2017). Person perception and language: a case of English words in Croatian. Language & communication, 53, 25-34. (WOS) https://doi.org/10.1016/j.langcom.2016.11.001

Research on language attitudes has shown that speech style plays an important role in social evaluation. In Croatia, English words commonly occur in everyday communication, which could affect the way we perceive other people. This study aims to investigate the relation between English words and person perception. 200 Croatian elementary school students, adolescents and young adults were given one of the three versions of the same text, varying in the frequency of English words, and a questionnaire to evaluate personal characteristics of the author of the text. The results showed that frequent use of English words was related to higher estimations of social attractiveness, indicating that the use of English words has become an important cue in person perception.

Bogunović, I. & Ćoso, B. (2013). English in Croatian Scientific Medical Discourse: A Corpus-Based Study. Fluminensia, 2, 177-191. (SCOPUS) https://hrcak.srce.hr/114745


English as a lingua franca is a part of the more general phenomenon of “English as an international language”. Its influence on other languages, Croatian being one of them, is evident across different functional styles. This study presents findings from a corpus-based qualitative and quantitative analysis of anglicisms and English words on the lexical level, and the passive, light verbs and noun compounds on the syntactic level. The corpus, consisting of texts published in four journals, Acta Stomatologica Croatica, Gynaecologia et Perinatologia, Medicina Fluminensis and Paediatria Croatica, is based on the introductory parts of the papers. The results indicate that the influence of English is most evident on the lexical level – 1.55% of all words are anglicisms. On the syntactic level, the use of noun compounds is significant.

Brdar, I. (2010). English Words in the Language of Croatian Media . LAHOR: časopis za hrvatski kao materinski, drugi i strani jezik, 10, 217-232. https://hrcak.srce.hr/68617


English is lingua franca of today’s society. The consequences of that fact are twofold. On one hand languages such as roatian tend to restrict huge numbers of anglicizms and plain English words that are taken into Croatian, whilst on the other hand the media promotes English words daily. This paper analyses sixty Croatian texts taken from the internet to see if different sources influence the choice of English words. One group of texts, which discusses the divorce between Paul McCartney and Heather Mills, are based on English sources, hence are often direct translations. The other group of texts, originaly written in Croatian, discusses the divorce between Josip Radeljak and Vlatka Pokos, Croatian ex-couple who could be described as celebrities. The analysis shows that the first group of texts is influenced by English sources on syntax (such as usage of passive, personal pronouns, gerunds) and semantics (such as literate translations of idoms, false friends). They have somewhat more non-adapted English words than the second group of texts. Non-adapted English words in the first group were mostly connected to British culture and music, while the second group of texts used non-adapted English words either with negative connotation or due to trendiness.

Conferences

Pavlinušić Vilus, E., Bogunović, I. , & Ćoso, B. (2022). Lexical access to unadapted English loanwords in Croatian: evidence from translation priming. ExLing 2022 Paris: Proceedings of 13th International Conference of Experimental Linguistics, 17-19 October 2022, Paris, France (paper, online).

Bogunović, I. , Pavlinušić Vilus, E. & Ćoso, B. (2022). English loan words in Croatian: The gap between the linguists’ expectations and the speakers’ needs. Sociolinguistics Symposium 24 - Inside and beyond binaries, 13th-16 th July 2022, Ghent, Belgium (oral presentation, online).

Ćoso, B., Guasch, M., Bogunović, I., Pavlinušić Vilus, E., Ferré, P., & Hinojosa, J. A. (2022) Affective norms of valence and arousal for 400 most frequent English words in Croatian language. Sociolinguistics Symposium 24 - Inside and beyond binaries, 13th-16 th July 2022, Ghent, Belgium (poster, online).

Bogunović, I. & Jelčić Čolakovac, Jasmina. (2022). Trebamo li download, downloadati ili preuzimati: nastajanje sveobuhvatne baze engleskih riječi u hrvatskom jeziku [Do we need download, downloadati or preuzimati: creating a database of unadapted English loanwords and their Croatian equivalents]. 36th International Conference of the Croatian Applied Linguistics Society, STANDARD AND NON-STANDARD IDIOMS, 9th-11th June 2022, Osijek, Croatia .

Pavlinušić Vilus, E. & Bogunović, I. (2022). Differences in Grammatical Features of the Words in Croatian and English: Evidence from a Translation Task. 36th International Conference of the Croatian Applied Linguistics Society, STANDARD AND NON-STANDARD IDIOMS, 9th-11th June 2022, Osijek, Croatia .

Kučić, M. (2021). Creating a Web Corpus Using GO . 44th International Convention on Information, Communication & Electronic Technology, 27th September-1st October 2021, Opatija, Croatia.

Borucinsky, M. & Bogunović, I. (2020). Finding English words in Croatian: an analysis of corpus linguistics tools. 34th International Conference of the Croatian Applied Linguistics Society, LINGUISTIC AND EXTRALINGUISTIC IN INTERACTION, 24th-26th September 2020, Split, Croatia (online).

Bogunović, I. & Ćoso, B. (2019). Cognitive processing of unadapted English words in Croatian: Evidence from Croatia speakers of English with different levels of L2 proficiency. 21st conference of the European Society for Cognitive Psychology, Tenerife, Spain, 25-28.09.2019.

Brdar, I. & Ćoso, B. (2012). English language in Croatian medical discourse: A corpus-based study. International Scientific Conference, 14th Days of Bioethics, University of Rijeka, Faculty of Medicine, Rijeka, Croatia, 10.-11.5.2012.

Brdar, I., Ćoso, B. & Hodak, J. (2010). Person perception and language. Summer school in Cognitive neuroscience, Leipzig, Germany, 19.-21.07.2010.

Other

Corpus available at the repository of the University of Rijeka

The corpus consists of the text collected from the most popular webpages in Croatia in the period from 2014 to 2020: Direktno, Dnevno, Net Hr, Hrt, Index_Hr, Jutarnji, Novilist, Rtl, SlobodnaDalmacija, Večernji. Tportal, Dnevnik (Reuters Institute Digital News Report for 2018, retrieved from http://www.digitalnewsreport.org in April, 2019). Web browsing and web crawling were used to select and store the texts, while the useful HTML information (such as publication date of the article, its URL, title, etc.) as well as the text of the article with corresponding tags and categories, if available, were extracted and analysed with Python package „beautifulsoup“. The extracted data is stored in an MySQL relational database.

The corpus is available online

Database of English words in Croatian publicly available

A complete database of English words in Croatian collected from the ENGRI web corpus is now available at: https://figshare.com/articles/dataset/The_database_of_English_words_in_Croatian_xlsx/20014364


Database of English words and their Croatian equivalents publicly available

A complete database of English words in Croatian with their Croatian equivalents is now available at:

https://figshare.com/articles/dataset/The_database_of_English_words_and_their_Croatian_equivalents/20014712

Supplementary material

Questionnaires - English loan words (translation study)

Questionnaires with English loan words in Croatian for translation study is available here:

https://docs.google.com/spreadsheets/d/1DhGDDMq1_zLP6O0ZDBof8w1LBF4WnzfxengSLBFTBGc/edit?usp=sharing


CROWD-5e database

Discrete emotions database

https://figshare.com/articles/dataset/CROWD-5e_xlsx/19221678


News