ENGRI

(English words in Croatian)

Goal

The goal of this project is to investigate English loan words in Croatian, their frequency, cognitive processing in an L1 environment, the differences in L2 proficiency and their impact on cognitive processing, as well as associative relations between English words and Croatian words. 

Team Members

Associates

PUBLICATIONS


Bogunović, I. (2023).  A corpus-based approach to English loanwords: Introducing the Database of English loanwords in Croatian. Fluminensia. 35(2), 437-460. (WOS). https://doi.org/doi.org/10.31820/f.35.2.1  

Unadapted English loanwords have become part of informal communication in many languages, including Croatian. Their use is often motivated by the lack of adequate native equivalents, exposure to English through the media, but also by the prestigious status of the English language. A vast body of research has been dedicated to lexical borrowing, especially from English. At the same time, corpus analyses have mostly been conducted on smaller, ad hoc corpora. Therefore, the goal of this paper is to present the database of English loanwords in Croatian. The database was developed by algorithmic and manual classification of words from the Corpus of Croatian news portals, ENGRI, and provides a list of 9,452 unadapted English loanwords together with the data on their absolute and relative frequencies. The analysis showed that most loanwords (75.85%) appear less than 50 times, while a total of 44.78% of words appear 10 times or less. The biggest drop in the number of loanwords is observed in the categories of occurrence above 500, while only 27 words appear 5,000 times or more. The most frequent English loanword in the corpus is ‘show’ with 80,805 occurrences, which is 0.0122% of all words in the corpus. The analysis of loanwords that occur more than 5,000 times showed that most of them have Croatian translation equivalents, which confirms the role of the media in the introduction of new words. In addition to providing an insight into the occurrence of English loanwords in Croatian, this database also represents a valuable contribution to Croatian computational linguistics resources and enables future experimental research by providing the data on word frequency. 


Th e English language is considered prestigious, and the prestigious status of a language is closely related to linguistic borrowing, language attitudes, and language exposure. English has become the dominant donor language, and its prestigious status reduces the probability that a borrowed word will adapt to the rules of the recipient language. As a result, many words borrowed from English are used in an unadapted form. In Croatian, the use of native equivalents is generally recommended, but it seems that they still do not fulfill the speakers’ communicative needs. Research has shown that Croatian speakers generally have positive attitudes towards such words in some domains, and that the use of English words is related to intergroup favouritism and social evaluation. Th ese results can in part be attributed to the prestigious status of English, but also to language exposure. Croatian speakers of diff erent ages are exposed to English through various informal activities, which facilitate incidental language acquisition. Language exposure has also been recognised as one of the most important factors in the research of bilingualism in a broad sense. Th is creates possible ground for research into the cognitive processing of English words within the framework of bilingual language processing. Th e aim of this paper is to present the emergence of English words in Croatian through diff erent scientifi c approaches, to propose an interdisciplinary approach that would provide a more comprehensive insight into the issue and to consider research problems, possible solutions and guidelines for further research.


Jelčić Čolakovac, J., & Borucinsky, M. (2023). In the melting pot of web-crawled texts: The challenges of extracting English words and phrases from Croatian corpora. International Journal of Applied Linguistics.  (WOS) [Accepted for publication].

The focus of this paper are English words and phrases used in Croatian which, unlike loanwords, have not undergone major adaptations at the orthographic, phonetic, or other levels apart from being influenced by the inflectional system of the recipient language. A list of English words in Croatian corpora was compiled using automatic algorithm extraction, corpus query language (CQL) in Sketch Engine, and manual word list evaluation with the end goal of publishing the first comprehensive online database of English words in Croatian. The ENGRI corpus of Croatian was created by web crawling procedure and used together with the existing Croatian hrWaC 2.2 RFTagger corpus to produce a list of English words and phrases. In this paper, word list compilation issues are discussed in relation to both general issues encountered in the study of interlingual lexical types (such as false cognates, antonomasia, and polysemy) as well as Croatian-specific language properties such as its inflectional system and diacritical marks. In conclusion, we propose that manual evaluation is  an indispensable method and a necessary complement to computational linguistic tools in the creation of word lists and databases of foreign words in other languages.


Pavlinušić Vilus, E., Bogunović, I., & Ćoso, B. (2022). Students’ Strategies for Translating Most Frequent English Loanwords in Croatian. Rasprave: Časopis Instituta za hrvatski jezik i jezikoslovlje, 48 (2), 547-570. (WOS) https://doi.org/10.31724/rihjj.48.2.7

English has become the dominant donor language for many languages, including Croatian. Its prestigious status reduces the likelihood of borrowed words to adapt to a recipient language. As a result, some English loanwords occur in an unadapted form. Recent computational linguistic resources have given the necessary corpus-based data on the frequency and use of English loanwords in Croatian. This paper investigates the strategies employed by 116 students of the Faculty of Maritime Studies, University of Rijeka when asked to translate 392 most frequent, corpus-derived English loanwords into Croatian. The results were then compared with the available corpus-based data. The results show that single-word Croatian equivalents were preferred over adapted forms of English loanwords and multi-word expressions. When no such equivalent was available, unadapted English forms were used more frequently compared to adapted forms and multi-word expressions. The co-existence of loanwords and their native equivalents is reflected in responses to loanwords that have and those that do not have single-word equivalents. The results highlight the need for creating semantically precise single-word native equivalents, at the same time illustrating the resistance to accept novel native words.

Borucinsky, M. & Bogunović, I. (2022). Crpljenje engleskih riječi iz korpusa hrvatskoga jezika. Fluminensia, 34(2), 435-461.  (WOS) https://doi.org/10.31820/f.34.2.13

Kao globalni jezik modernoga doba engleski je postao dominantan jezik davatelj. Danas se smatra da hrvatski jezik najviše posuđuje upravo iz engleskoga. Utjecaj engleskoga jezika na hrvatski vidljiv je u različitim funkcionalnim stilovima te na gotovo svim jezičnim razinama, no najizraženiji je na leksičkoj razini. U novije vrijeme, posebice u medijima i na društvenim mrežama, sve se češće javljaju neprilagođene engleske riječi, tj. riječi koje su zadržale izvorni oblik, a kojima se po potrebi dodaju hrvatski afiksi. Za sada još uvijek ne postoje konkretni podaci o takvim riječima u hrvatskome jeziku. U cilju pronalaženja engleskih riječi, u drugim su se jezicima koristile različite metode, od ručnih klasifikacija i korištenja postojećih jezičnih resursa do razvoja novih alata i/ili resursa. Međutim, jezične tehnologije za hrvatski jezik još uvijek su nedostatno razvijene. Stoga je cilj ovoga rada ispitati mogućnosti nekih od postojećih alata i resursa za crpljenje engleskih riječi i stvaranje baze engleskih riječi. U tu svrhu pretraživan je mrežni korpus hrvatskog jezika hrWaC pomoću platforme Sketch Engine. Ovom metodom dobiven je popis od 1217 engleskih riječi. Rezultati su pokazali da se pomoću dostupnih alata i resursa za hrvatski jezik može izraditi popis engleskih riječi i njihovih frekvencija, ali i da postoje brojni problemi zbog kojih se rezultati ne mogu smatrati u potpunosti pouzdanima. Isto tako, sam se postupak i dalje mora kombinirati s ručnim metodama i klasifikacijama. Zaključujemo da je za izradu cjelovite baze engleskih riječi u hrvatskome potrebno razviti nove alate i resurse koji bi omogućili automatsko crpljenje engleskih riječi iz korpusa hrvatskoga jezika.

Ćoso, B., Guasch, M., Bogunović, I., Ferre, P., & Hinojosa, J. A. (2022).  CROWD‐5e: A Croatian psycholinguistic database of affective norms for five discrete emotions. Behavior Research Methods, 1-17.  https://doi.org/10.3758/s13428-022-02003-2 

The present study introduces affective norms for a set of 3022 Croatian words on five discrete emotions: happiness, anger, sadness, fear, and disgust. The words were rated by 1239 Croatian native speakers. Each participant rated 251 or 252 words, for one discrete emotion on a five-point Likert scale. The analyses revealed a significant relationship between discrete emotions, emotional dimensions (valence and arousal), and other psycholinguistic properties of words. In addition, small sex differences in discrete emotion ratings were found. Finally, the analysis of the distribution of words among discrete emotions allowed a distinction between “pure” words (i.e., those mostly related to a single emotion) and “mixed” words (i.e., those related to more than one emotion). The new database extends the existing Croatian affective norms collected from a dimensional conception of emotions, providing the necessary resource for future experimental investigation in Croatian within the theoretical framework of discrete emotions.

Kučić, M. (2021).  Creating a Web Corpus using GO. Proceednings  of the 44th International Convention MIPRO 2021,  Croatian Society for Information, Communication and Electronic Technology (ISSN 1847‐3946 ), 1931‐1933.  https://doi.org/10.23919/MIPRO52101.2021.9597093

The Web contains large amounts of textual data which could be used as a source to create new corpora, yet there are not many plug and play solutions for scraping specific parts of the websites. This paper presents a new open-source solution for downloading and parsing HTML websites which can be configured from one configuration file. As a demonstration of this method, a new ad hoc corpus was built. The corpus contains a total of 2,395,735 titles and articles collected from 14 most popular Croatian websites. 

Bogunović, I. & Ćoso, B. (2019). Lexical access in Croatian–English unbalanced bilinguals: A cross–linguistic study. Suvremena lingvistika, 87, 1-22. (WOS) https://doi.org/10.22210/suvlin.2019.087.08

In Croatia, early exposure to English is enabled through early language learning programs as well as the media. The media plays an important role in incidental language learning. This, along with the fact that daily exposure to English is measured in hours, indicates that its status as a foreign language is changing, which offers a unique opportunity to investigate the relationship between language exposure, level of proficiency and lexical access. The main goal of this study is to explore lexical access in Croatian speakers of English, with different levels of proficiency. The investigation consisted of a questionnaire on language use and exposure, proficiency test and an experiment in which cross–language priming was combined with a lexical decision task. The experiment explored whether priming effect would occur in two conditions: associative and semantic relatedness and translation equivalence, in both language directions. Semantic relationship between words elicits shorter reaction time, suggesting that sharing similar meaning speeds up the recognition process in words from two languages. Even stronger effect was observed in the case of translation equivalents. Surprisingly, proficiency level was not significant. The results are discussed in the light of the Revised Hierarchical Model and the Bilingual Interactive Activation Model +. 

Bogunović, I. & Jelčić Čolakovac, J. (2019). The Role of Informal Activities in Incidental Language Acquisition: The Relationship Between Language Use and Proficiency. Fluminensia, 30, 181-199. (WOS) https://doi.org/10.22210/suvlin.2019.087.08 

The English language is studied as a foreign language in Croatia, and, apart from being included in formal education, it is also present in everyday life. Daily exposure to English is measured in hours and research has shown that many informal activities allow for incidental language acquisition. This paper is aimed at identifying the activities in which the Croatian student population spends most of their time using English as well as at investigating whether a connection between exposure to English, its use and prior knowledge of language can be established. Ninety-three participants were included in the study, all students of the Faculty of Humanities and Social Sciences at the University of Rijeka. The level of English knowledge was determined by administering the Oxford Placement Test. Three groups were formed based on the results obtained in the testing. Exposure to English and the students’ use of the language were tested by means of a questionnaire in which the participants were asked to approximate how much time they spent using English in the activities listed in the questionnaire. The results showed that the participants spent most of their time online and least in spoken communication. The differences between groups with the lowest and the highest levels of knowledge were found to be significant across all activities apart from reading for leisure, written, and spoken communication. On the one hand, this study has managed to corroborate the connection between language use and the level of language proficiency, and on the other, it has indicated that the status of English is slowly changing on both the global and individual level. 

Ćoso, B. & Bogunović, I. (2017). Person perception and language: a case of English words in Croatian. Language & communication, 53, 25-34.  (WOS) https://doi.org/10.1016/j.langcom.2016.11.001

Research on language attitudes has shown that speech style plays an important role in social evaluation. In Croatia, English words commonly occur in everyday communication, which could affect the way we perceive other people. This study aims to investigate the relation between English words and person perception. 200 Croatian elementary school students, adolescents and young adults were given one of the three versions of the same text, varying in the frequency of English words, and a questionnaire to evaluate personal characteristics of the author of the text. The results showed that frequent use of English words was related to higher estimations of social attractiveness, indicating that the use of English words has become an important cue in person perception. 

Bogunović, I. & Ćoso, B. (2013). English in Croatian Scientific Medical Discourse: A Corpus-Based Study. Fluminensia, 2, 177-191.  (SCOPUS) https://hrcak.srce.hr/114745


English as a lingua franca is a part of the more general phenomenon of “English as an international language”. Its influence on other languages, Croatian being one of them, is evident across different functional styles. This study presents findings from a corpus-based qualitative and quantitative analysis of anglicisms and English words on the lexical level, and the passive, light verbs and noun compounds on the syntactic level. The corpus, consisting of texts published in four journals, Acta Stomatologica Croatica, Gynaecologia et Perinatologia, Medicina Fluminensis and Paediatria Croatica, is based on the introductory parts of the papers. The results indicate that the influence of English is most evident on the lexical level – 1.55% of all words are anglicisms. On the syntactic level, the use of noun compounds is significant.  

Brdar, I. (2010). English Words in the Language of Croatian Media . LAHOR: časopis za hrvatski kao materinski, drugi i strani jezik, 10, 217-232. https://hrcak.srce.hr/68617


English is lingua franca of today’s society. The consequences of that fact are twofold. On one hand languages such as  roatian tend to restrict huge numbers of anglicizms and plain English words that are taken into Croatian, whilst on the  other hand the media promotes English words daily. This paper analyses sixty Croatian texts taken from the internet to see if different sources influence the choice of English words. One group of texts, which discusses the divorce between Paul McCartney and Heather Mills, are based on English sources, hence are often direct translations. The other group of texts, originaly written in Croatian, discusses the divorce between Josip Radeljak and Vlatka Pokos, Croatian ex-couple who could be described as celebrities. The analysis shows that the first group of texts is influenced by English sources on syntax (such as usage of passive, personal pronouns, gerunds) and semantics (such as literate translations of idoms, false friends). They have somewhat more non-adapted English words than the second group of texts. Non-adapted English words in the first group were mostly connected to British culture and music, while the second group of texts used non-adapted English words either with negative connotation or due to trendiness. 

Conferences

Jelčić Čolakovac, J.  & Bogunović, I. (2023). Od novinskih mrežnih portala do baze neprilagođenih engleskih riječi u hrvatskom jeziku. CLARC 2023: Language and Language Data, 28. - 30th September 2023, Rijeka, Croatia.

Pavlinušić Vilus, E., Ćoso, B., Bogunović, I., & Jelčić Čolakovac, J. (2023).  Istraživanje obrade engleskih riječi u hrvatskome metodom mjerenja evociranih potencijala. CLARC 2023: Language and Language Data, 28. - 30th September 2023, Rijeka, Croatia.

Bogunović, I., Ćoso, B., Guasch, M., Pavlinušić Vilus, E., Hinojosa, J. A., & Ferré, P. (2023).  ENGRI CROWD: Psiholingvistička baza afektivnih i leksičko-semantičkih normi za najčešće engleske riječi u hrvatskome. CLARC 2023: Language and Language Data, 28. - 30th September 2023, Rijeka, Croatia.

Pavlinušić Vilus, E., Bogunović, I., & Ćoso, B. (2023). INVESTIGATION INTO THE PROCESSING OF ENGLISH LOANWORDS IN CROATIAN USING CROSS-LINGUISTIC TRANSLATION AND SEMANTIC PRIMING PARADIGMS , 23° CONFERENCE OF THE EUROPEAN SOCIETY FOR COGNITIVE PSYCHOLOGY, 6-9th September 2023, Porto, Portugal.

Bogunović, I., Pavlinušić Vilus, E., & Ćoso, B. (2023). Project ENGRI: Rediscovering English loanwords through computational linguistics, psycholinguistic and neuroscientific approach, 23° CONFERENCE OF THE EUROPEAN SOCIETY FOR COGNITIVE PSYCHOLOGY, 6-9th September 2023, Porto, Portugal.

Bogunović, I.  (2023). Linguistic borrowing and bilingualism: An interdisciplinary approach to English words in Croatian. 37th International Conference of the Croatian Applied Linguistics Society: Language and migrations, 15-17th June 2023, Osijek, Croatia.

Pavlinušić Vilus, E., Ćoso, B., & Bogunović, I.  (2023). Investigation into lexical processing of unadapted English loanwords in Croatian using the cross-linguistic semantic priming paradigm. 37th International Conference of the Croatian Applied Linguistics Society: Language and migrations, 15-17th June 2023, Osijek, Croatia.

Ćoso, B., Bogunović, I., Guasch, M., Pavlinušić Vilus, E., Ferré, P., & Hinojosa, J. A. (2022). ENGRI CROWD: An investigation into the affective and lexico-semantic content of English loanwords and their Croatian equivalents. XVI International Symposium of Psycholinguistics, Book of Abstracts, 30th May-2nd June 2023, Vitoria-Gasteiz, Spain.

Pavlinušić Vilus, E., Bogunović, I. , & Ćoso, B. (2022). Lexical access to unadapted English loanwords in Croatian: evidence from translation priming. ExLing 2022 Paris: Proceedings of 13th International Conference of Experimental Linguistics, 17-19 October 2022, Paris, France (paper, online).

Bogunović, I. , Pavlinušić Vilus, E. & Ćoso, B. (2022). English loan words in Croatian: The gap between the linguists’ expectations and the speakers’ needs. Sociolinguistics Symposium 24 - Inside and beyond binaries, 13th-16 th July 2022, Ghent, Belgium (oral presentation, online).

Ćoso, B., Guasch, M., Bogunović, I., Pavlinušić Vilus, E., Ferré, P., & Hinojosa, J. A. (2022) Affective norms of valence and arousal for 400 most frequent English words in Croatian language. Sociolinguistics Symposium 24 - Inside and beyond binaries, 13th-16 th July 2022, Ghent, Belgium (poster, online).

Bogunović, I. & Jelčić Čolakovac, Jasmina. (2022).  Trebamo li download, downloadati ili preuzimati: nastajanje sveobuhvatne baze engleskih riječi u hrvatskom jeziku [Do we need download, downloadati or preuzimati: creating a database of unadapted English loanwords and their Croatian equivalents]. 36th International Conference of the Croatian Applied Linguistics Society, STANDARD AND NON-STANDARD IDIOMS, 9th-11th June 2022, Osijek, Croatia .

Pavlinušić Vilus, E. & Bogunović, I.  (2022). Differences in Grammatical Features of the Words in Croatian and English: Evidence from a Translation Task. 36th International Conference of the Croatian Applied Linguistics Society, STANDARD AND NON-STANDARD IDIOMS, 9th-11th June 2022, Osijek, Croatia .

Kučić, M. (2021). Creating a Web Corpus Using GO . 44th International Convention on Information, Communication & Electronic Technology, 27th September-1st October 2021, Opatija, Croatia. 

Borucinsky, M. & Bogunović, I. (2020). Finding English words in Croatian: an analysis of corpus linguistics tools. 34th International Conference of the Croatian Applied Linguistics Society, LINGUISTIC AND EXTRALINGUISTIC IN INTERACTION, 24th-26th September 2020, Split, Croatia (online).

Bogunović, I. & Ćoso, B. (2019). Cognitive processing of unadapted English words in Croatian: Evidence from Croatia speakers of English with different levels of L2 proficiency. 21st conference of the European Society for Cognitive Psychology, Tenerife, Spain, 25-28.09.2019.

Brdar, I. & Ćoso, B. (2012). English language in Croatian medical discourse: A corpus-based study. International Scientific Conference, 14th Days of Bioethics, University of Rijeka, Faculty of Medicine, Rijeka, Croatia, 10.-11.5.2012. 

Brdar, I., Ćoso, B. & Hodak, J. (2010). Person perception and language. Summer school in Cognitive neuroscience, Leipzig, Germany, 19.-21.07.2010.

Supplementary material

Corpus available at the repository of the University of Rijeka

The corpus consists of the text collected from the most popular webpages in Croatia in the period from 2014 to 2020:   Direktno, Dnevno, Net Hr, Hrt, Index_Hr, Jutarnji, Novilist, Rtl, SlobodnaDalmacija, Večernji. Tportal, Dnevnik (Reuters Institute Digital News Report for 2018, retrieved from http://www.digitalnewsreport.org in April, 2019). Web browsing and web crawling were used to select and store the texts, while the useful HTML information (such as publication date of the article, its URL, title, etc.) as well as the text of the article with corresponding tags and categories, if available, were extracted and analysed with Python package „beautifulsoup“. The extracted data is stored in an MySQL relational database. The corpus is available online 

Database of English words in Croatian publicly available 

A complete database of English words in Croatian collected from the ENGRI web corpus is now available at:  https://figshare.com/articles/dataset/The_database_of_English_words_in_Croatian_xlsx/20014364


Database of English words and their Croatian equivalents publicly available 

A complete database of English words in Croatian with their Croatian equivalents is now available at:

https://figshare.com/articles/dataset/The_database_of_English_words_and_their_Croatian_equivalents/20014712

CROWD-5e database

Discrete emotions database: https://figshare.com/articles/dataset/CROWD-5e_xlsx/19221678


Questionnaires - English loanwords (translation study)

Questionnaires with English loan words in Croatian for translation study is available here:

https://docs.google.com/spreadsheets/d/1DhGDDMq1_zLP6O0ZDBof8w1LBF4WnzfxengSLBFTBGc/edit?usp=sharing


NEWS