The Bilingual Lexicon and Language Skills – A Detailed Look

Different models have tried to explain bilingual language organisation. Connectionist models, such as the Bilingual Interactive Activation Model (BIA, BIA+), postulate an integratednetwork and anon-selective language access to bilinguals’mental lexicon. Accordingly, a language conflict appears in bilinguals when accessingwords. This language conflict predicts slower reaction times of bilinguals on interlingual homographs in a lexical decision task. Here, German-English bilinguals, highly proficient German-English users of English and poorly proficient German-English users of English performed a general lexical decision task on interlingual homographs, non-words, and English and German control words. There is no significant difference regarding the group’s reaction times for interlingual homographs, and thus, these results do not provide empirical evidence for BIA or BIA+ models. Additionally, in future research more attention needs to be paid to participants’ language skills.


Introduction
In everyday communication, we have to find the right word in real time from among 30,000 to 60.000 entries in the activated mental lexicon. The underlying processes are called lexical selection and word recognition (Zwitzerlood and Bölte 2002). Within bilinguals (or multilinguals), these processes are even more complex. For bilinguals (and multilinguals by default), the first challenge is activating the right word in the right language. This may cause a language conflict, which involves the interference of two or more languages. Bilinguals' word access has been widely investigated within psycholinguistic research and different models have tried to explain this process. While connectionist models postulate an integrated organisation of the lexicon, other models suggest separated lexicons. Still, both models have contributed to an understanding of language organisation. For instance, one connectionist model of bilingual word access that assumes an integrated network of the languages spoken is the Bilingual Interactive Activation Model, together with the Bilingual Interactive Activation Plus, or BIA+ (Dijkstra and van Heuven 2002). Here, access to the integrated network happens nonselectively. The BIA+ supposes two systems with contextual influence on word identification. These are the word identification system and the task decision system. During the word identification process in the word identification system, first the visual input activates the sub-lexical orthographic representation. Simultaneously, the sub-lexical phonological representation is activated. Then, the orthographic and phonologic word representations activate the semantics and the language nodes. The language nodes suggest the membership of a language. Afterwards, the task decision system uses the information of the word identification system to conduct the task. Accordingly, the entries of the first and the second language activate. The language conflict can be dealt with in two ways. Firstly, both languages are activated and the selection between L1 and L2 takes place later. Secondly, the mechanisms right at the beginning of the word recognition process inhibit the non-target language. Similarly, the model of Inhibitory Control (Green 1998) assumes that words are selected by inhibition and the deselection of the non-target language. For instance, the non-target language, that is not required but interferes, is inhibited whereas the attention focuses on the target language. In contrast to the described models of the organisation of the bilingual lexicon, the modularity hypothesis favours separate lexicons. Thus, L1 and L2 operate in isolation from each other.
Research has found support for both views of the organisation of the bilingual lexicon. Several studies (van Heuven et al. 2008;van Heuven and Dijkstra 2010;Martin et al. 2012;Wu and Thierry 2012;Wu et al. 2013) support bottom-up, non-selective access to the bilingual's lexicon. For example, van Heuven et al.'s (2008) behavioural data showed slower access results to interlingual homographs (IH) than to control words.
The participants' measured reaction time on interlingual homographs was higher than on control words that only exist in one language. Van Heuven and Dijkstra (2010) state that electro-physical data are in support of a parallel access to words and the non-selective view. In addition, people with brain damage and aphasia were seen to have had selective recovery of only one language. This indicated the existence of language selective areas which operate in the human brain (Fabbro 1999;Grosjean 1982, 60 cit. in Singelton 2007. Research by Poort and Rodd (2017) and Borodkin, Kenett, Faust and Mashal (2016) do not suggest support for the common lexicon in bilinguals. In their experiments, Poort and Rodd (2017) investigated the effect of the stimulus list composition on the cognate facilitation effect and found no strong evidence for the existence of the common lexicon. According to the authors, bilinguals process cognates more quickly because of the assumed shared storage for both the first and second language. However, the cognate facilitation effect in a single-language lexical decision task without words in the non-target language, Poort and Rodd (2017) argued, may be a result of facilitation at the decision stage because the task tolerates both readings of the cognate to be related to the 'yes'-reaction. Moreover, Borodkin et al. (2016) suggest that the lexical network of L2 showed greater local connectivity and less modular community structure when compared with that of L1. The authors conclude that the lexical network of L2 (even in highly proficient bilinguals) may not be as well organised as that of L1. Overall, due to contrasting results, hierarchical models and connectionist models still co-exist.
The present study aims to contribute to answering the question of how bilinguals and highly proficient users of a second language differ from poorly proficient users of a second language in their reaction times. The research question focusses on whether the two groups differ in their reaction times with interlingual homographs. This is investigated by using English/German interlingual homographs in a general lexical decision task. The results should indicate whether language access in the bilingual mind is selective or non-selective. Martin et al. (2009) describe the lexical decision task (LDT) as particularly disposed to triggering lexical access. According to Moret-Tatay and Perea (2011) a lexical decision task is "the most commonly used laboratory visual word identification task and a myriad of experiments have shown that it provides relevant insight into the structure of the internal lexicon" (125). Usually, accuracy is high in the lexical decision task. Here, two proficiency groups of English are investigated. The study wants to add to the discourse on the organisation of the bilingual lexicon.

Methodology
The participants for the study were carefully selected because of their language experience in English and German. The English language skills were confirmed with the Oxford English Placement Test. The independent variables in the study were the variable Word Type and Group and the dependent variables are Reaction Time (RT) and Accuracy (ACC). The possible distractor of the language of the instructions was controlled; within the groups, the experiment's instructions were counterbalanced. The reaction time and accuracy was measured using ePrime psychology software. Additionally, a language learner questionnaire was used to gain further information on the participants' background.

Participants
The participants were students of Psychology and students as well as teaching staff at the Department of English and American Studies at the University of Klagenfurt. The total number of participants was 56. Seven participants had to be excluded because their mother tongue was neither English nor German or they had more than 60 per cent wrong answers on the LDT. This left 49 participants (34 female, 15 male) with 45 participants having German as their mother tongue, two whose first language was English, and three who were classified as early bilinguals since they had learned both languages from early on in their lives. All participants spoke English and German. Moreover, among those participants, 24 spoke French, 13 Spanish, 10 Italian, 3 Dutch, 3 Portuguese, and 3 Russian. The mean age of the participants was 24 (SD = 6.7). Among the participants, 37 achieved the A-Levels, five had completed a Bachelor's degree, five a Master's degree, and three had finished their Diploma studies. As described above, the groups of the sample do differ in size. The poorly proficient group consists of 24 participants, the highly proficient group of 17 and the bilingual group of 8 participants. Several reasons contributed to the dissimilarity in size of the groups. First, the psychology students, who made up the biggest part of the first group, were easier to recruit than students from other fields of studies, or even participants from outside the University.  For a comparison of the two scores achieved in the introspective rating of language skills and the score achieved in the language test (Oxford English Quick Placement Test or QPT), the scores were transformed into z-values to be able to compare the different scores statistically. The differences between the two scores then made up a new score: the difference between the introspective and the "real" score. The higher this value, the bigger the difference. For the initial proficiency group of the participants, 30 lay in the field of m+ or m-SD. 19 participants were above or below this range. Afterwards, an ANOVA was conducted to clarify the difference between the groups and their relation between internal and external language proficiency.
The result in QPT showed a homogeny of variances as prelimination (p = 0.939).
The initial proficiency groups do not differ significantly in their differences between their introspective skills and results in the quick placement test (F = 2.646, p = 0.082).
The differences between the introspective judgement of their language competences and the test result did not relate to the language proficiency group. When comparing the mean z values for the difference in introspective estimation and measured skills with the different grouping, no significant difference can be found. The Psychology students' group has a mean of -0.26 (SD = 0.81), the English students' group M = 0.26 (SD = 0.76), and the bilingual group M = 0.23 (SD = 0.87).

Material
To investigate the question whether the reaction time on interlingual homographs causes a language conflict resulting in slower reaction times in highly proficient users of English in comparison to poorly proficient users of English, the participants had to be assigned to proficiency groups. To answer the research question a language learner questionnaire and the Oxford English Quick Placement Test was used to gain information about participants' language skills and demographic, the lexical decision task (LDT) was employed to measure the accuracy (ACC) and reaction time on the words.
A language learner questionnaire was given to participants with questions on age, gender, profession, the years of learning the L2 and L3 and the context in which the languages were acquired. Additionally, participants completed the self-assessment of proficiency in their second language. The questionnaire asked the participants to rate each of their language skills separately (reading, listening, writing, and speaking) on a 10point scale, ranging from almost not present to near native language skills. An example of the questionnaire is displayed in Figure 3 below. The Oxford English Quick Placement test was used to measure participants' second language ability as in Park, Badzakova-Trajkov, and Waldie (2012). Here, the paper and pencil form was used, available on the homepage of the Volkshochschule Aschaffenburg (n.d.). Geranpayeh (2003) describes the Oxford English Quick Placement Test as a multiple-choice test that aims at students' placement according to their level of English. It covers morphological, syntactical, lexical, and pragmatic features of English.
It is widely used to classify learners of English according to various proficiency groups. The lexical decision task was designed and realised with E-Prime®. The interlingual homographs were taken from studies using interlingual homographs (Dijkstra, Grainger, and van Heuven 1999) and websites on the topic of false friends 1 . Control words in English and control words in German were matched with the interlingual homographs ac-cording to their frequency and word length. Furthermore, filler words and non-words in English and non-words in German were created with the word generation programme WordGen (Duyck et al. 2004). Here, the instructions were presented on an 18 inch computer screen and printed in in Type Courier New, Point Size 18 for infinite duration until the participant pressed any key. The language of the instructions was counterbalanced; half of the participants were shown instructions in English, the other half in German. After the instructions, a fixation cross was presented at the centre of the screen in Courier New, Size 26, for the duration of 18ms. Afterwards, instructions for the buttons to press stayed on screen in Centre Courier New, Size 18, Bold. Then, the stimulus appeared in the centre of the screen in Courier New, Size 26, Bold for the maximum duration and a response window of 2000ms. Immediately afterwards, the next trial started with a fixation cross. During the experiment, participants had to classify the letter strings into two categories: word and non-word. In this general lexical decision task (GLDT) each participant first completed an exercise example without data logging. Afterwards, the LDT existed of 240 stimuli. Among those words were interlingual homographs, control words in English, control words in German, and non-words in English and in German. The lexical consisted of 43 interlingual homographs, 42 German and 42 English control words and non-words. The non-words were 113 strings of letters that do not exist as words either in German or in English. The words were carefully selected and matched according to their frequency with WordGen software (Duyck et al. 2004).

Procedure
After participants arrived at the cognition laboratory, they were assigned to the experimental groups. The groups created were psychology students as poorly proficient, English students as highly proficient, and bilinguals. Within each group, participants were alternately assigned to either the experiment version with German instructions or with English instructions. Then, they read the general information on duration and procedure. Afterwards, they were seated 30 centimetres away from an 18 inch computer screen and the lexical decision task started. All participants had normal or corrected to normal visual ability and were seated in front of the computer. After the LDT, with approximately 25 to 30 minutes duration, participants answered a language learner questionnaire on paper. Then, the second part of the study, a short priming task on the computer, took place (given the scope of this paper, this second part will not be discussed here at length). Afterwards, participants filled in the paper and pencil form of the Oxford English Quick Placement Test and were then debriefed.

Grouping
Initially, participants were grouped according to their field of study and their self-assessment about whether they belong to the bilingual group. However, a test to examine the language proficiency in English was also given to participants. This language proficiency test gave insight into the measured proficiency of the participants. The test score achieved can be classified according to the levels of the Common European Framework.    from which two were actually categorised as B2 and three as B1. As satisfactory were rated four people who are one in B1, two in B2, and one in C1. The majority of participants claimed to have good language skills in the second language. Among those classified as C1 and three in C2. Another big group, namely 10 participants, classified themselves as having very good skills in their second language (4 in C2, 2 in C1, 2 in B2 and 1 in B1). Participants who classified themselves as almost native (8) and native (4) achieved levels C1 (5) and C2 (7) according to the QPT.

Results
For the lexical decision task, participants' reaction times on word types were compared.
For the analysis, only correct answers were included. Table 4 below depicts reaction times on different word types. Here, words were recognised more quickly than nonwords. Moreover, non-words had the highest maximum in their reaction times compared to interlingual homographs and their control words in English and German as well as the smallest standard deviation in comparison to the other types of stimuli. Additionally, participants recognised German control words more quickly than English control words. Here, there was also a smaller standard deviation. Interlingual homographs were recognised more quickly than their control words in English, but slower than their control words in German. English control words were not recognised as quickly as German controls.  The reaction time between all participants and from each participant do not follow a normal distribution. On that account, the data was analysed in more detail to identify outliers. Box plots show where the suspected outliers and extreme values of the participants' reaction times lay and scores above or below 1.5 interquartile range (IQR) were considered as outliers as described in the Tukey Method for identifying outliers (Tukey 1977). For further analysis, the extreme values were recoded into missing values. These were not included in the analyses in order not to falsify values. Additionally, only correct answers' reaction times were taken into the analysis. The missing value analysis showed that 23 participants produced under 10 % missing values after only accurate reaction time answers were coded and outliers were recoded into missing values as well, 21 participants produced between 10-15 % missing values, 2 between 15-20 % and 2 produced between 20-25 % missing values. Concerning accuracy, no difference in accuracy between the groups could be found.
As there were no differences in the test scores of the bilingual participants and the highly proficient group participants on the Quick Placement Test, these two groups were taken together and compared with the poorly proficient group in the next part of the analysis. This left two groups, highly proficient including bilingual and poorly pro-

Discussion
The aim of this study was to test the reaction time difference between proficiency groups of a second language to gain further insight into the mental lexicon and the storage of words in people speaking more than one language. The participants were primarily students of English and American Studies and Psychology. Their language skills were tested with a Language Learner Questionnaire and the Oxford Quick Placement Test. Then, due to the achieved score in the Quick Placement Test and the subsample size, participants were regrouped for further analysis.
No significant differences in reaction times in the LDT on homographs between the groups could be found. A possible explanation lies in the proficiency of the groups.
Hence, the highly proficient groups may not be proficient enough to show a difference in the LDT and the individual differences in the proficiency in the sample may be too big to subsume participants. However, the result does not contribute to support a nonselective theory of language and an integrated lexicon. The results can be seen as similar to van Heuven and Dijkstra (2010). The authors state that the question on the overlap of the activated brain regions cannot be fully explained because the equipment's resolution is not satisfactory. Similarly, the present study displays certain limitations. Unfortunately, it was not possible to recruit more bilinguals for this study. Another limitation of the study may be that the subsample of the highly proficient group is not proficient enough in their second language. Even more proficient L2 users may cause different outcome. This could be constructed by including only participants having spent some time abroad or having achieved at least a Bachelor's degree in English. Additionally, a lexical decision task in only one language would further facilitate the procedure. Moreover, the study showed that the grouping of participants according to their language skills must be considered in more detail in future research. As described here, self-reported language skills do not always coincide with measured skills in the second language. Additionally, participants' fields of studies are not always an indicator of language skills. Students of psychology may also achieve a high score in the placement test, and students of English may receive only a low score.
In summary, inferences on the bilingual lexicon have to be drawn carefully due to certain limitations. However, the results cannot account for the theory of BIA, BIA+ because here, the interlingual homographs did not cause a conflict on the answer level, as it is described in the study conducted by van Heuven et al. (2008). This implies that only one reading of the interlingual homograph is activated in the lexical decision task.
The Revised Hierarchical model sees a united semantic and conceptual level. However, the results of the present investigation can be described with the modularity hypothesis.
Second language learners have isolated operations in their L2 and their L1 and there is a formal differentiation. However, the two lexicons may interact dynamically and have high interconnectivity as described by Singelton (2007). Additionally, there are factors that contribute to speakers' lexical organisation. Apart from the years of learning of a second language, the initial age of starting to learn a language, the context of language learning, the actual time spent speaking the second language on a daily basis, the time spent in a country speaking the second language, the time spent reading in a L2 or even watching and listening to audiovisual material such as films or radio programmes and podcasts in L2 all contribute to a learner's proficiency and add to the complexity of describing said proficiency. Furthermore Ma et al. (2017) suggest that future research should emphasise ERP data and behavioural data together. Additionally, second language testing was not considered here but plays an important role in any attempt to explain contradicting results in research. While some studies have used language tests to confirm participants' language skills, others do not use such tests. In future, consistent and precise description as well as testing of bilinguals' or L2 learners' proficiency would add to a better comparability of studies. Additionally, the definition of a bilingual is not consistent through the studies reviewed. Thus, comparing studies is getting more and more difficult. Clear definitions of what constitutes a poorly proficient user, a highly proficient user of L2, and a bilingual have to be developed in order to achieve better comparability in research.