刊讯｜SSCI 期刊《语言评估季刊》2023年第1-5期

七万学者关注了→ 语言学心得

2024-09-03

LANGUAGE ASSESSMENT QUARTERLY

Volume 20, Issue 1-5, 2023

Language Assessment Quarterly（SSCI 一区，2022 IF：2.9）2023年第1-5期共发文31篇，其中研究性论文22篇，书评7篇，评论3篇，测评1篇。研究论文涉及语言测试、语言政策、动态评估、第二语言学习、机器学习等。欢迎转发扩散！（2023年已更完）

往期推荐：

刊讯｜SSCl 期刊《语言评估季刊》2022年1-5期

Issue 1

■Advancing L2 Dynamic Assessment: Innovations in Chinese Contexts, by Matthew E. Poehner & James P. Lantolf, Pages 1-19.

■Enhancing EFL Learners’ Reading Proficiency through Dynamic Assessment, by Yanfeng Yang & David D. Qian, Pages 20-43.

■An Interventionist Dynamic Assessment Approach to College English Writing in China, by Youjun Tang & Xiaomei Ma, Pages 44-65.

■Promoting Learning Potential among Students of L2 Chinese through Dynamic Assessment, by Lin Jia, Jianyong Cai & Jianqin Wang, Pages 66-87.

■Fostering Self-Regulated Young Writers: Dynamic Assessment of Metacognitive Competence in Secondary School EFL Class, by Yanhong Zhang & Jiao Xi, Pages 88-107.

■A Study on Peer Mediation in Dynamic Assessment of Translation Revision Competence, by Yaqing Liang, Yanzhi Li & Zhonggang Sang, Pages 108-126.

■Dynamic Assessment of the Learning Potential of Chinese as a Second Language, by Zhijun Sun, Peng Xu & Jianqin Wang, Pages 127-142.

Issue 2

■Aligning Language Frameworks: An Example with the CLB and CEFR, by Brian North & Enrica Piccardo, Pages 143-165.

■Investigating Second Language (L2) Reading Subskill Associations: A Cognitive Diagnosis Approach, by Huilin Chen, Yuyang Cai & Jimmy de la Torre, Pages 166-189.

■Gazing into Cognition: Eye Behavior in Online L2 Speaking Tests, by J. Dylan Burton, Pages 190-214.

■Assessment of English Learners and Their Peers in the Content Areas: Expanding What “Counts” as Evidence of Content Learning, by Scott E Grapin, Pages 215-234.

Issue 3

■Using Causal Explanation Speaking Tasks to Assess Young EFL learners’ Speaking Ability: The Effects of Age, Cognitive, and L2 Linguistic Development, by Wenjun Ding & Guoxing Yu, Pages 251-276.

■Comparability Between Graded Readers and the Common Test in Japan in Terms of Text Difficulty Perceived by Learners, by Yuya Arai, Pages 277-295.

■A Meta-Analysis of Accommodation Effects for English Learners: Considering Possible Moderators, by Nathalie L. Marinho, Sara E. Witmer, Nicole Jess & Sarina Roschmann, Pages 296-318.

■Examining the Scoring of Content Integration in a Listening-Speaking Test: A G-Theory Analysis, by Rongchan Lin, Pages 319-338.

■The Canadian English Language Proficiency Index Program (CELPIP) Test, by Melissa McLeod & Liying Cheng, Pages 339-352.

Issue 4-5

■Advancing Language Assessment with AI and ML–Leaning into AI is Inevitable, but Can Theory Keep Up? by Xiaoming Xi, Pages 357-376.

■Assessing Interactional Competence: ICE versus a Human Partner, by Gary J. Ockey, Evgeny Chukharev-Hudilainen & R. Roz Hirch, Pages 377-398.

■Validity Arguments for Automated Essay Scoring of Young Students’ Writing Traits, by L. Hannah, E. E. Jang, M. Shah & V. Gupta, Pages 399-420.

■Automatic Speaking Assessment of Spontaneous L2 Finnish and Swedish, by Ragheb Al-Ghezi, Katja Voskoboinik, Yaroslav Getman, Anna Von Zansen, Heini Kallio, Mikko Kurimo, Ari Huhta & Raili Hildén, Pages 421-444.

■Insights into Editing and Revising in Writing Process Using Keystroke Logs, by Mengxiao Zhu, Mo Zhang & Lin Gu, Pages 445-468.

■Remote Proctoring in Language Testing: Implications for Fairness and Justice, by Daniel R. Isbell, Benjamin Kremmel & Jieun Kim, Pages 469-487.

■Test-Taker Engagement in AI Technology-Mediated Language Assessment, by Yan Jin & Jason Fan, Pages 488-500.

■Reflections on the Application and Validation of Technology in Language Testing, by Barry O’Sullivan, Pages 501-511.

■The Use of Assistive Technologies Including Generative AI by Test Takers in Language Assessment: A Debate of Theory and Practice, by Erik Voss,Sara T. Cushing,Gary J. Ockey, Xun Yan, Pages 520-532.

摘要

Advancing L2 Dynamic Assessment: Innovations in Chinese Contexts

Matthew E. Poehner, The Pennsylvania State University, University Park, USA

James P. Lantolf, Beijing Language and Culture University and The Pennsylvania State University (Emeritus), Beijing, China

Abstract This introduction to the special issue, L2 Dynamic Assessment Research in China, examines the theoretical foundations of Dynamic Assessment (DA) in the writings of L. S. Vygotsky, with particular attention to the concepts of praxis, mediation, and zone of proximal development, while also recognizing contributions from notable DA researchers such as Israeli psychologist and educator Reuven Feuerstein. Trends in the general and L2 DA research literatures are considered, including: flexible, open-ended mediation (interactionist DA) and standardized, scripted mediation (interventionist DA); formats of embedding mediation in assessment procedures (so-called ‘sandwich’ and ‘cake’ formats of DA); DA administered in classroom and group settings; DA in formal testing contexts, including computerized DA procedures; mediated scores and learning potential scores; and uses of learner profiles resulting from DA to inform instructional enrichment programs. Each article in the special issue reports original research conducted by scholars in China. How the individual studies take up and extend trends in L2 DA research is explained. Given that learning activities and the thinking associated with them are deeply saturated in specific cultural practices and norms, it is argued that extension of DA principles to new cultural contexts, such as those reported in the special issue, is informative not only for assessment researchers in China but for the international community of assessment scholars, and it is essential for the continued development of DA frameworks.

Enhancing EFL Learners’ Reading Proficiency through Dynamic Assessment

Yanfeng Yang, School of Literature and Media, Dongguan University of Technology, Dongguan, Guangdong, China

David D. Qian, Research Centre for Professional Communication in English, Department of English and Communication, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China

Abstract Built on Vygotsky’s Sociocultural Theory, Dynamic Assessment (DA) integrates teaching and assessment through mediator-learner interactions to promote learner development. This study employed interactionist DA to diagnose Chinese university EFL learners’ reading difficulties and promote their reading proficiency in a seven-week study. The design included a pre-test, a four-week Enrichment Program, a post-test, and a transfer test. Five learners completed each test both in a non-dynamic (NDA) and DA form. The learners’ individual interactions with a mediator in DA were recorded, transcribed and analyzed via Nvivo. In addition, the learners’ independent performances (IPs) on the NDA and DA, difficulties encountered in the process, the mediator’s prompts provided for the learners, and the learners’ mediated performances (MPs) were all identified and analyzed. Comparisons of the learners’ IPs and MPs across the tests showed that DA contributed to learners’ reading proficiency development, and this progress was evident both in their post-test IPs and MPs.

An Interventionist Dynamic Assessment Approach to College English Writing in China

Youjun Tang, Xi’an Jiaotong University, China;b Qingdao Binhai University, Qingdao, China

Xiaomei Ma, Xi’an Jiaotong University, China;b Qingdao Binhai University, Qingdao, China

Abstract This article explores the value of dynamic assessment (DA) for college English writing (CEW), a required course for millions of students in China that typically enrolls 50 students in each class. An interventionist approach to DA, in which mediation and administration are standardized, was selected and supplemented with a construct-descriptor-based rating checklist as a writing assessment before and after an eight-week instruction phase. The DA group received graduated mediation that focused on the constructs and descriptors from the scale, while a control group received holistic corrections. Data were processed through ANOVA and MANCOVA revealing variable development concerning specific constructs but overall significantly greater improvement by the DA group. The results are interpreted according to the degree of change as indicative of the zone of proximal development. The value of the construct-driven scale and associated descriptors through the mediational process are also discussed. It is argued that interventionist DA is equipped to identify the components and processes within a construct, and in so doing offers the possibility of fine-tuning teachers’ and learners’ understanding of problem areas for individuals.

Promoting Learning Potential among Students of L2 Chinese through Dynamic Assessment

Lin Jia, Beijing Chinese Language and Culture College, Beijing, China

Jianyong Cai, Beijing Language and Culture University, Beijing, China

Jianqin Wang, Beijing Language and Culture University, Beijing, China

Abstract In Dynamic Assessment (DA), the observation that individuals respond differently to support, or mediation, is important for diagnoses of development. The concept of learning potential refers to openness to mediation, i.e., the extent of change to performance when mediation is available, which may suggest learners will need less overall instruction to develop. The current study investigates this prediction and the premise that aligning DA mediation with learner needs promotes development. Mediation, as a systematic interactional process, is more likely to appropriately identify learner needs that are sensitive to future development. Thirty-four secondary school learners of L2 Chinese participating in a two-month study abroad in China completed narration tasks following an instructional intervention, one group receiving DA mediation and the other explicit feedback. The bα (把)-construction, recognized as particularly challenging for learners of Chinese, was the focal language feature. Analysis revealed significant differences, with the group receiving DA mediation showing greater improvement with mediation and more accuracy during independent functioning.

Fostering Self-Regulated Young Writers: Dynamic Assessment of Metacognitive Competence in Secondary School EFL Class

Yanhong Zhang, Luoyang Normal University, Luoyang, P. R. China

Jiao Xi, Beijing Forestry University, Beijing, P. R. China

Abstract Research into metacognition has found it to facilitate self-regulation and correlate to learners’ L2 writing level. Following Lee & Mak’s (2018) framework of Metacognitive Instruction (MI) for L2 writing classrooms, this study applies Dynamic Assessment (DA) to writing MI (MI-DA) in a rural middle school EFL class in China. A one-semester comparative experimental study was conducted in two parallel Grade Seven classes (32 learners in each, taught by the same teacher) following a 3-step procedure: a nondynamic pretest and posttest for both control class (CC) and experimental class (EC) and an intervention phase, with CC receiving a score on written assignments and teacher’s comments while EC was provided with MI-DA intervention during pre-writing, writing, and revision. Ratings of student independent writing as well as interview data indicate that MI improved significantly students’ writing performance and metacognitive competence, influencing their attitude toward and confidence with writing. These goals, typically beyond the focus of most conventional assessments, are realized in DA through its commitment to taking account of the results of past development and those abilities that are ripening (i.e., future development).

A Study on Peer Mediation in Dynamic Assessment of Translation Revision Competence

Yaqing Liang, School of Foreign Studies, Xi’an Jiaotong University, Xi’an, Shaanxi, China

Yanzhi Li, School of Foreign Studies, Xi’an Jiaotong University, Xi’an, Shaanxi, China

Zhonggang Sang, School of Foreign Studies, Xi’an Jiaotong University, Xi’an, Shaanxi, China

Abstract This study investigated how peer-mediated Dynamic Assessment (DA) unfolded in translation revision competence (TRC) of students of Master’s degree of Translation and Interpreting (MTI) in China. Thirty subjects first completed three revision tasks and were then rated as high- or low-level performers according to their average scores across the first two tasks. Students were subsequently assigned to either the role of learner or peer mediator. Peer mediators received training in a graduated prompts approach to DA to learn how to provide their peers with mediation. Peer mediation sessions were conducted with the mediators and the learners paired at random and directed to jointly review their third revision. After that, all participants re-revised their last texts with their justifications and were interviewed about their attitudes towards peer interaction and their progress in TRC. Diagnosis of TRC comprised scores of the first two revisions as well as the third revision following peer mediation, with this latter score indicating responsiveness to mediation and interpreted as the Zone of Proximal Development. The findings indicated that peer mediation may help improve both mediators’ and learners’ TRC, yet other potential factors at work should not be ignored. The peer engagement process allowed participants to improve their TRC in terms of justification and interpersonal skills. This research explored the application of DA in translation training and provided a process-oriented evaluation for translation studies.

Dynamic Assessment of the Learning Potential of Chinese as a Second Language

Zhijun Sun, Beijing Language and Culture University, Beijing, China;b Shandong Normal University, Jinan, Shandong, China

Peng Xu, Shandong Normal University, Jinan, Shandong, China

Jianqin Wang, Beijing Language and Culture University, Beijing, China

Abstract The construct of learning potential has been proposed to capture differences between learner independent performance and performance during Dynamic Assessment (DA). This paper introduces a new LPS formula implemented in a DA study involving Pakistani learners of L2 Chinese. Learners were randomly assigned to a control or experimental group and administered a pre-, post-, and more difficult transfer test, each focused on verb-resultative constructions. Use of the new learning potential score (LPS) formula allowed for greater differentiation of learner trajectories.

Aligning Language Frameworks: An Example with the CLB and CEFR

Brian North, Formerly Eurocentres Foundation, Zurich, Switzerland

Enrica Piccardo, University of Toronto, Toronto, Canada

Abstract This paper presents a methodology for directly aligning ‘can do’ frameworks to each other. The methodology, inspired by the manual for relating examinations to the Common European Framework of Reference for Languages: Learning, teaching, assessment (CEFR) (Council of Europe, 2009) and Kane’s (2004, 2013) interpretative argument, takes account of both the horizontal dimension (content analysis) and the vertical dimension (benchmarking with Multifaceted Rasch Modelling – MFRM). The paper exemplifies the application of the methodology by introducing the research conducted to align the Canadian Language Benchmarks (CLB)/ Niveaux de compétence linguistique canadiens (NCLC) to the CEFR, presenting the resulting alignment, and discussing the rationale for the choices made.

Investigating Second Language (L2) Reading Subskill Associations: A Cognitive Diagnosis Approach

Huilin Chen, School of Education, Shanghai International Studies University, Shanghai, China

Yuyang Cai, School of Languages, Shanghai University of International Business and Economics, Shanghai, China

Jimmy de la Torre, University of Hong Kong, Hong Kong, China

Abstract This study uses a cognitive diagnosis model (CDM) approach to investigate the associations among specific L2 reading subskills. Participants include 1,203 Year-4 English major college students randomly selected from the nationwide test takers of Band 8 of Test for English Majors (TEM8), a large-scale English proficiency test for senior English majors in China. Their English reading was measured using a reading comprehension subtest of the TEM8. Based on the CDM output on latent class size estimates, the chi-square test of independence was used to uncover the associations among reading subskills, and odds ratio estimation was used to determine the strengths of those associations. The CDM output on attribute mastery prevalence was used to establish the stochastic direction of the associations between reading subskills. The study has the following findings: a reading subskill network displaying significant subskill associations together with their strengths and directions can be established through a CDM approach, and the patterns of reading subskill associations based on cognitive levels and local/global comprehension resonate with major reading process models and reflect the hierarchical and compensatory characteristics of reading subskills.

Gazing into Cognition: Eye Behavior in Online L2 Speaking Tests

J. Dylan Burton, Michigan State University, East Lansing, USA

Abstract The effects of question or task complexity on second language speaking have traditionally been investigated using complexity, accuracy, and fluency measures. Response processes in speaking tests, however, may manifest in other ways, such as through nonverbal behavior. Eye behavior, in the form of averted gaze or blinking frequency, has been found to play an important role in regulating information in studies on human cognition, and it may therefore be an important subconscious signal of test question difficulty in language testing. In this study, 15 CEFR B2/C1-level-English learners took a Zoom-based English test with ten questions spanning six CEFR complexity levels. The participants’ eye behaviors were recorded and analyzed between the moment the test question ended and the beginning of their response. The participants additionally provided self-report data on their perceptions of test-question difficulty. Results indicated that as test questions increased in difficulty, participants were more likely to avert their gaze from the interlocutor. They did not, however, blink more frequently as difficulty changed. These results have methodological implications for research on test validation and the study of nonverbal behavior in speaking tests.

Assessment of English Learners and Their Peers in the Content Areas: Expanding What “Counts” as Evidence of Content Learning

Scott E Grapin, University of Miami, Coral Gables, Miami, USA

Abstract In this article, I argue for expanding what “counts” as evidence of content learning in the assessment of English learners (ELs) and their peers in the content areas. ELs bring expansive meaning-making resources to content classrooms that are valuable assets for meeting the ambitious learning goals of the latest K-12 education reform. Traditionally, however, the assessment of ELs in the content areas (e.g., science, language arts) has been pursued in restrictive ways, with a narrow focus on demonstrating learning through the written language modality and independent performance. This disconnect between the expansive meaning-making resources of ELs and the restrictive nature of content assessments limits ELs’ opportunities to demonstrate what they know and can do and ultimately serves to perpetuate the deficit views of these students. I begin by providing contextual background on classroom assessment aligned to the latest standards in U.S. K-12 education. Then, I present two studies that illustrate two different expansive assessment approaches with ELs in elementary science: (a) multimodal assessment and (b) dynamic assessment. Finally, I highlight synergies of these studies with related research efforts across diverse contexts, toward the goal of developing a collective vision of expansive assessment that leverages ELs’ expansive ways of making meaning.

Using Causal Explanation Speaking Tasks to Assess Young EFL learners’ Speaking Ability: The Effects of Age, Cognitive, and L2 Linguistic Development

Wenjun Ding, University of Bristol, Bristol, UK

Guoxing Yu, University of Bristol, Bristol, UK

Abstract This paper examined to what extent causal explanation speaking tasks (CESTs) are cognitively appropriate for assessing young language learners’ (YLLs) L2 speaking. Ninety-six YLLs (48 from Grade 4 and 6 each) in China performed two CESTs in both L1 (Chinese) and L2 (English). They also completed receptive and productive L2 vocabulary size tests. We examined how their CEST performance scores, choice of causal antecedents, and speech utterances were related with language modes of the tasks (L1 vs. L2), grade levels, and L2 vocabulary sizes. L2 CEST performance scores were found to have significant positive correlations with L2 productive vocabulary size. CESTs were found to be generally cognitively appropriate for YLLs because their high scores in L1 performance indicated that performing CESTs is within their L1 capacity. By examining causal connectives used by YLLs, we found that learners from both age groups had cognitive ability sufficient to verbalise causality. Yet YLLs’ cognitive ability to interpret and verbalise mental states is still developing and reasoning between causal antecedents that have competing causal relationship with the final state can be cognitively challenging. We discussed the findings with reference to the design of cognitively appropriate CESTs that can assess both language and thinking skills.

Comparability Between Graded Readers and the Common Test in Japan in Terms of Text Difficulty Perceived by Learners

Yuya Arai, Waseda University, Tokyo, Japan

Abstract Proponents of extensive reading argue that test-oriented language teaching and learning could discourage extensive reading practice. However, this has not been examined empirically. Given the possibility that the consistency of reading texts used in extensive reading and entrance examinations is one source of washback, the present study employed a two-facet Rasch measurement to examine the comparability between graded readers and the Common Test in Japan in terms of text difficulty perceived by Japanese high school students. It was found that perceived difficulty of the Common Test texts was not statistically different from that of some graded reader texts, providing positive evidence for the consistency between the Common Test and graded reader texts. Implications for future research on the relationship between entrance examinations and extensive reading are discussed in detail based on study findings and limitations.

A Meta-Analysis of Accommodation Effects for English Learners: Considering Possible Moderators

Nathalie L. Marinho, Michigan State University, East Lansing, USA

Sara E. Witmer, Michigan State University, East Lansing, USA

Nicole Jess, Michigan State University, East Lansing, USA

Sarina Roschmann, Michigan State University, East Lansing, USA

Abstract The use of accommodations is often recommended to remove barriers to academic testing among English Learners (ELs). However, it is unclear whether accommodations are particularly effective at improving ELs’ test scores. A growing foundation of empirical work has explored this topic. We conducted a meta-analysis that examined several possible moderators of accommodation effectiveness for improving EL test performance. Results showed substantial variability among estimates, an overall small positive significant effect on test scores and a statistically significant positive effect for indirect linguistic supports and combined accommodations compared to native language accommodations.

Examining the Scoring of Content Integration in a Listening-Speaking Test: A G-Theory Analysis

Rongchan Lin, National Institute of Education, Nanyang Technological University, Singapore

Abstract Communication in the real world often entails the interpretation, evaluation, and integration of content from different sources. However, it appears that the ability to integrate content into discourse has not been explicitly scored for in existing studies. This study operationalizes content integration in the analytic scoring of a listening-speaking test in Chinese. International students who were non-native speakers of Chinese took the test that comprised two retell tasks and an oral presentation linked by an academic scenario. They were scored for content integration, organization, delivery, and language control for all three tasks. Multivariate generalizability theory (G-theory) was used to investigate the functioning of content integration in the analytic rubric of the retell tasks and oral presentation respectively. Overall, this study aimed to illuminate issues on dependability and construct validity for the two task types, focusing on content integration particularly. The findings suggested that content integration functioned differently to some extent when compared with the other dimensions studied.

The Canadian English Language Proficiency Index Program (CELPIP) Test

Melissa McLeod, Queen’s University, Kingston, Ontario, Canada

Liying Cheng, Queen’s University, Kingston, Ontario, Canada

Abstract The Canadian English Language Proficiency Index Program (CELPIP) Test was designed for immigration and citizenship in Canada. CELPIP is a computer-based English-language proficiency test which covers all four skills. This test review provides a description of the test and its construct, tasks, and delivery. Then, it appraises CELPIP for reliability, fairness, and validity before identifying future directions for research.

Advancing Language Assessment with AI and ML–Leaning into AI is Inevitable, but Can Theory Keep Up?

Xiaoming Xi, Hong Kong Examinations and Assessment Authority, Hong Kong SAR, China

Abstract Following the burgeoning growth of artificial intelligence (AI) and machine learning (ML) applications in language assessment in recent years, the meteoric rise of ChatGPT and its sweeping applications in almost every sector have left us in awe, scrambling to catch up by developing theories and best practices. This special issue features studies of recent AI and ML advances and thought pieces and attempts to unify our field with a collection of work towards a common set of tools, frameworks, and practices. In this editorial, I briefly review the five studies and four commentaries and discuss the key validity issues around the AI applications covered. To unpack complex validity issues for lay users, I propose accessible questions to ask when evaluating these applications. I stress the importance of developing best practices guiding ethical and responsible use of AI and improving users’ AI literacy skills. In light of users’ increasing access to AI tools in real-world communication, I raise the need for redefining the constructs of language tests to be in sync with what is happening in the real world. These new conceptions of language ability are expected to result in significant changes in task design, scoring, and test interpretation and use.

Assessing Interactional Competence: ICE versus a Human Partner

Gary J. Ockey, Iowa State University, lowa, United States

Evgeny Chukharev-Hudilainen, Iowa State University, lowa, United States

R. Roz Hirch, Iowa State University, lowa, United States

Abstract Most second language assessment researchers agree that interactional competence (IC) is an important part of the construct of oral communication. However, measurement of IC has proven challenging because at least one interlocutor is considered necessary to create an appropriate social context for test takers to demonstrate their IC. Including interlocutors in the assessment process can be impractical and may make judging test takers’ IC difficult because their performances may be impacted by the interlocutors. One potential approach to assessing oral communication that might diminish these challenges is to use a Spoken Dialogue System (SDS) as a test taker’s partner. To explore the potential of an SDS for assessing the IC, the use of an SDS and a human peer partner were compared to determine which is more appropriate for eliciting discourse for this purpose. Forty test takers completed a video-taped paired discussion task with both a human partner and Interactional Competence Elicitor (ICE), an SDS created by the researchers. Four trained raters evaluated the video-recorded performances, and results indicated that in the SDS condition raters: believed more features of IC were ratable, assigned lower scores for most IC features, and had more positive perceptions of the rating process.

Validity Arguments for Automated Essay Scoring of Young Students’ Writing Traits

L. Hannah, University of Toronto, Toronto, Canada

E. E. Jang, University of Toronto, Toronto, Canada

M. Shah, University of Toronto, Toronto, Canada

V. Gupta, University of Toronto, Toronto, Canada

Abstract Machines have a long-demonstrated ability to find statistical relationships between qualities of texts and surface-level linguistic indicators of writing. More recently, unlocked by artificial intelligence, the potential of using machines to identify content-related writing trait criteria has been uncovered. This development is significant, especially in formative assessment contexts where feedback is key. Yet the extent to which writing traits can be validly scored by machines remains under-researched, especially in the K-12 context. The present study investigated the validity of machine learning (ML) models designed for students in grades 3–6 to score three writing traits: task fulfillment, organization and coherence, and vocabulary and expression. The study utilized an argument-based approach, focusing on two primary inferences: evaluation and explanation. The evaluation inference investigated human-machine score alignment, the ability for the models to detect off-topic and gibberish responses, and the consistency of human-machine score alignment across grades and language backgrounds. The explanation inference investigated the relevance of features used in the models. Results indicated that human-machine score alignment was sufficient for all writing traits; however, validity concerns were raised regarding the models’ performances detecting off-topic and gibberish responses and the consistency across sub-groups. Implications for language assessment professionals and other educators were discussed.

Automatic Speaking Assessment of Spontaneous L2 Finnish and Swedish

Ragheb Al-Ghezi, Aalto University, Aalto, Finland

Katja Voskoboinik, Aalto University, Aalto, Finland

Yaroslav Getman, Aalto University, Aalto, Finland

Anna Von Zansen, University of Helsinki, Aalto, Finland

Heini Kallio, University of Jyväskylä, Jyvaskyla, Finland

Mikko Kurimo, Aalto University, Aalto, Finland

Ari Huhta, University of Jyväskylä, Jyvaskyla, Finland

Raili Hildén, University of Helsinki, Aalto, Finland

Abstract The development of automated systems for evaluating spontaneous speech is desirable for L2 learning, as it can be used as a facilitating tool for self-regulated learning, language proficiency assessment, and teacher training programs. However, languages with fewer learners face challenges due to the scarcity of training data. Recent advancements in machine learning have made it possible to develop systems with a limited amount of target domain data. To this end, we propose automatic speaking assessment systems for spontaneous L2 speech in Finnish and Finland Swedish, comprising six machine learning models each, and report their performance in terms of statistical evaluation criteria.

Insights into Editing and Revising in Writing Process Using Keystroke Logs

Mengxiao Zhu, University of Science and Technology of China, Hefei, China

Mo Zhang, Educational Testing Service, Princeton, USA

Lin Gu, b Educational Testing Service, Princeton, USA

Abstract Recent technology advances have enabled the collection of keystroke logs during writing, a non-intrusive approach to collecting writing process data that could provide insights into writers’ editing and revising behaviors in the writing process. Using keystroke logs from 761 middle school students in the US, this study investigated the association between the writers’ editing and revising behaviors, especially in-text editing and jump editing, and their scores and gender. The results showed different writing behavior patterns for participants of different ability and gender. The research findings on the writing processes went beyond character- and word-level activities and shed light on writers’ revising behaviors and related writing evaluating skills, which can potentially be used for automated writing interventions.

Remote Proctoring in Language Testing: Implications for Fairness and Justice

Daniel R. Isbell, University of Hawaiʻi at Mānoa, Honolulu, HI, USA

Benjamin Kremmel, Universität Innsbruck, Innsbruck, Tirol, Austria

Jieun Kim, University of Hawaiʻi at Mānoa, Honolulu, HI, USA

Abstract In the wake of the COVID-19 boom in remote administration of language tests, it appears likely that remote administration will be a permanent fixture in the language testing landscape. Accordingly, language test providers, stakeholders, and researchers must grapple with the implications of remote proctoring on valid, fair, and just uses of tests. Drawing on an argument-based approach to fairness and justice, which subsumes validity, we articulate key sub-claims, warrants, rebuttals, and relevant backing related to the use of remote proctoring in language tests. With respect to meaningfulness as a core element of fairness, we focus on how remote proctoring is both a bulwark against construct-irrelevant responses (cheating) and a potential source of construct-irrelevant variance due to inauthentic constraints on test-taking conditions. Other fairness concerns relate to technological biases across racial/ethnic groups and access to suitable technology and physical space for remote proctoring. For justice, we consider the consequences and social values of remote-proctored language tests (Coghlan et al., 2021). We propose that these articulations of remote proctoring issues within Kunnan’s fairness and justice framework can usefully motivate and guide research on as well as critique of testing procedures and test uses.

Test-Taker Engagement in AI Technology-Mediated Language Assessment

Yan Jin, Shanghai Jiao Tong University, Shanghai, China

Jason Fan, The University of Melbourne, Melbourne, Australia

Abstract In language assessment, AI technology has been incorporated in task design, assessment delivery, automated scoring of performance-based tasks, score reporting, and provision of feedback. AI technology is also used for collecting and analyzing performance data in language assessment validation. Research has been conducted to investigate the efficiency and functionality of assessment technologies, but empirical explorations on test-taker engagement in AI technology-mediated language assessment are remarkably scarce. In this commentary, we first examine the impact of AI technology on test takers, in terms of both the benefits and the challenges that it poses in the critical stages of language assessment development and validation. Next, we propose a conceptual model to facilitate the implementation and evaluation of test-taker engagement in AI technology-mediated language assessment. The model delineates two forms of test-taker engagement: test takers’ participation in technology-mediated assessment activities and their perceptions of technological innovations in language assessment. We then review the articles in this special issue based on this model and discuss directions for future research. We conclude by offering some guidance for maximizing test-taker engagement in technological innovations, thereby promoting learning-oriented and equity-minded language assessment.

Reflections on the Application and Validation of Technology in Language Testing

Barry O’Sullivan, British Council, Dublin, Ireland

Abstract This paper highlights as issues of concern the rapid changes in technology and the tendency to report on partial validation efforts where the work is not identified as forming part of a larger validation project. With close human supervision emerging technologies can have a significant and positive impact on language testing. While technology seems to be constantly changing and improving, there is still a lot to learn in the approaches these papers take in identifying aspects of the constructs of interest amenable to operationalization and measurement through technology. In terms of validation, the paper suggests a number of important things to consider. These include presenting assessment arguments in a form that meets the needs of the stakeholder groups most clearly affected by the claims being made; working towards an interactive communication-based argument approach, ideally involving validators who are independent of the developer; and ensuring that any “partial” validation is explicitly linked to a broader argument. The paper ends by predicting that technology will see the practice of language testing changing radically from where it is today and while fairness, justice and validation will always be primary concerns, the challenges associated with establishing evidence of these will only increase.

The Use of Assistive Technologies Including Generative AI by Test Takers in Language Assessment: A Debate of Theory and Practice

Erik Voss, Teachers College, Columbia University, New York, NY, USA

Sara T. Cushing, Georgia State University, Atlanta, GA, USA

Gary J. Ockey, Iowa State University, Ames, IA, USA

Xun Yan, University of Illinois, Urbana, IL, USA; Beckman Institute for Advanced Science and Technology, University of Illinois, Urbana, USA

Abstract Assistive writing tools that provide suggestions for word choice, sentence structure, and grammar correction are allowed as accommodations for students with learning disabilities on a case-by-case basis. These assistive technologies, including generative artificial intelligence (AI) tools, are increasingly accessible to more people than ever and are being utilized for second language classroom instruction and learning. In light of this trend, at a meeting of the Automated Language Assessment (ALA) SIG at the Language Testing Research Colloquium (LTRC) in New York City, a debate took place on the topic of allowing access to assistive technologies including generative AI in language assessment. This commentary, building and expanding on the debate that occurred between two opposing teams who argued for or against allowing students’ access to these assistive technologies during language assessment, extends the exchange of ideas in a written debate. The debate raises issues related to construct definition, scoring and rubric design, validity, fairness, equity, bias and copyright. The debate also speculates on the use of generative AI by test takers at different proficiency levels and different stakes (high vs. low) assessments. The debate ends with thoughts on AI’s impact on language teaching and learning and when access to such technologies might emerge in language assessment. The issues and questions raised in the debate forecast discussions regarding the feasibility of allowing test takers to use assistive technologies including generative AI during language assessment and the extent to which humans interact and collaborate with these new technologies.

期刊简介

Language Assessment Quarterly（LAQ）is dedicated to the advancement of theory, research, and practice in first, second, and foreign language assessment for school, college, and university students; for employment; and for immigration and citizenship. LAQ publishes original articles addressing theoretical issues, empirical research, and professional standards and ethics related to language assessment, as well as interdisciplinary articles on related topics, and reports of language test development and testing practice. All articles are peer-reviewed. Language Assessment Quarterly accepts the following types of article: Full-length articles, Commentary, Book Reviews, Test Reviews, Interviews and Practical Advice. The journal is directed to an international audience. Examples of topic areas appropriate for LAQ include:

《语言评估季刊》是一个致力于推动学校、大学和职业、移民和公民身份等方面的第一语言、第二语言和外语评估的理论、研究和实践的国际学术期刊。LAQ发表原创文章，涉及语言评估的理论问题、实证研究、专业标准和伦理，以及相关主题的跨学科文章，以及语言测试开发和测试实践的报告。所有文章都经过同行评审。《语言评估季刊》接受以下类型的文章：全文、评论、书评、测试评估、访谈和实用建议。该期刊面向国际读者。适合在LAQ上发表的主题领域包括：

assessment from around the world at all instructional levels including specific purposes;
assessment for immigration and citizenship and other ‘gate-keeping’ contexts;
issues of validity, reliability, fairness, access, accommodations, administration, and legal remedies;
assessment in culturally and/or linguistically diverse populations;
professional standards and ethical practices for assessment professionals;
interdisciplinary interfaces between language assessment and learning;
issues related to technology and computer-based assessment;
innovative and practical methods and techniques in developing assessment instruments;
recent trends in analysis of performance; and
issues of social-political and socio-economic concern to assessment professionals.

来自世界各地的评估，包括特定目的的各种教学水平的评估；
移民和公民身份等“关卡”环境的评估；
关于有效性、可靠性、公正性、获取、适应、管理和法律救济的问题；
在文化和/或语言多样性人群中的评估；
评估专业人员的专业标准和伦理实践；
语言评估和学习之间的跨学科接口；
与技术和基于计算机的评估相关的问题；
开发评估工具的创新和实用方法和技术；
最近表现分析的趋势；
评估专业人员所关心的社会政治和社会经济问题。

官网地址：

https://www.tandfonline.com/journals/hlaq20

本文来源：LANGUAGE ASSESSMENT QUARTERLY官网

点击文末“阅读原文”可跳转官网

桐城一派｜突发！湖南省财政厅厅长刘文杰坠楼身亡

因为地铁逃票，警察拔枪乱射，无辜乘客爆头

陈佩斯，这次真悬了！

不能返税、不能补贴，招商局长们怎么办？

大，无需多言，事实胜于雄辩

刊讯｜SSCI 期刊《语言评估季刊》2023年第1-5期

您可能也对以下帖子感兴趣

桐城一派｜突发！湖南省财政厅厅长刘文杰坠楼身亡

因为地铁逃票，警察拔枪乱射，无辜乘客爆头

陈佩斯，这次真悬了！

不能返税、不能补贴，招商局长们怎么办？

大，无需多言，事实胜于雄辩

生成图片，分享到微信朋友圈

刊讯｜SSCI 期刊《语言评估季刊》2023年第1-5期

您可能也对以下帖子感兴趣