刊讯｜SSCI 期刊《语音学杂志》2023年第96-101卷

六万学者关注了→ 语言学心得

2024-09-03

Journal of Phonetics

Volume 96-101, 2023

《语音学杂志》（SSCI二区，2022 IF：1.9，排名：61/194）2023年第96-101卷共发文40篇。欢迎转发扩散！（2023年已更完）具体来说，2023第96卷共发文4篇。话题涉及声调语言、荷兰语、F0等方面。2023第97卷共发文8篇。话题涉及声调感知与产出、说话者特异性、语音灵活性、语音稳定性、元音感知等方面。2023年第98期共发文10篇。研究论文主题涉及澳大利亚英语、阿拉伯语、苏州方言、新加坡英语中的重音和语调等。2023年第99期共发文5篇。研究论文主题涉及二语方言习得、非重读元音、中式英语、21世纪语音学的发展等。2023年第100期共发文7篇。研究论文主题涉及波兰语、元音感知边界转移、英语儿向语、英语口语中的节奏同步等。2023年第101期共发文6篇。研究论文主题涉及音系调解、语言接触中的结构融合、音乐能力和言语韵律感知等。
往期推荐：刊讯｜SSCI 期刊《语音学杂志》2022第93-95卷
刊讯｜SSCI 期刊《语音学杂志》2022第91-92卷

Volume 96

■Articulatory and acoustic variation in Polish palatalised retroflexes compared with plain ones by Anita Lorenc, Marzena Żygis, Łukasz Mik, Daniel Pape, Márton Sóskuthy

■Schwa’s duration and acoustic position in American English by Uriel Cohen Priva, Emily Strand

■Phonological and phonetic contributions to perception of non-native lexical tones by tone language listeners: Effects of memory load and stimulus variability by Juqiang Chen, Mark Antoniou, Catherine T. Best

■Red and blue bananas: Time-series f0 analysis of contrastively focused noun phrases in Papuan Malay and Dutch by Constantijn Kaland, Marc Swerts, Nikolaus P. Himmelmann

Volume 97

■Speakers coarticulate less in response to both real and imagined communicative challenges: An acoustic analysis of the LUCID corpus by Zhe-chen Guo, Rajka Smiljanic

■Production and perception of prevelar merger: Two-dimensional comparisons using Pillai scores and confusion matrices by Valerie Freeman

■Same vowels but different contrasts: Mandarin listeners’ perception of English /ei/-/iː/ in unfamiliar phonotactic contexts by Yizhou Wang, Rikke L. Bundgaard-Nielsen, Brett J. Baker, Olga Maxwell

■Prosodic marking of information status in Italian by Simona Sbranna, Caterina Ventura, Aviad Albert, Martine Grice

■Flexibility and stability of speech sounds: The time course of lexically-driven recalibration by Yi Zheng, Arthur G. Samuel

■Speaker-specificity in speech production: The contribution of source and filter by Vincent Hughes, Amanda Cardoso, Paul Foulkes, Peter French, ... Philip Harrison

■Discriminative segmental cues to vowel height and consonantal place and voicing in whispered speech by Luis M.T. Jesus, Sara Castilho, Aníbal Ferreira, Maria Conceição Costa

■The final lengthening of pre-boundary syllables turns into final shortening as boundary strength levels increase by Gerrit Kentner, Isabelle Franz, Christine A. Knoop, Winfried Menninghaus

Volume 98

■ Analysis and computational modelling of Emirati Arabic intonation – A preliminary study by Muhammad Swaileh A. Alzaidi, Yi Xu, Anqi Xu, Marta Szreder

■ Gestural characterisation of vowel length contrasts in Australian English by Louise Ratko, Michael Proctor, Felicity Cox

■ Sound change in Western Andalusian Spanish: Investigation into the actuation and propagation of post-aspiration by Nicholas Henriksen, Amber Galvano, Micha Fischer

■ The change in breathy voice after tone split: A production study of Suzhou Wu Chinese by Chunyu Ge, Wenwei Xu, Wentao Gu, Peggy Pik Ki Mok

■ Phonetic differences between nouns and verbs in their typical syntactic positions in a tonal language: Evidence from disyllabic noun–verb ambiguous words in Standard Mandarin Chinese by Qibin Ran, Kai Gao, Yuzhu Liang, Quansheng Xia, Søren Wichmann

■ Prominence and intonation in Singapore English by Adam J. Chong, James S. German

■ Revisiting the nasal continuum hypothesis: A study of French nasals in continuous speech by Gillian de Boer, Jahurul Islam, Charissa Purnomo, Linda Wu, Bryan Gick

■ Advancements of phonetics in the 21st century: Theoretical issues in sociophonetics by Tyler Kendall, Nicolai Pharao, Jane Stuart-Smith, Charlotte Vaughn

■ Advancements of phonetics in the 21st century: Theoretical and empirical issues in the phonetics of sound change by Patrice Speeter Beddor

Volume 99

■Second dialect acquisition and phonetic vowel reduction in the American Midwest by Cynthia G. Clopper, Rachel Steindel Burdin, Rory Turnbull

■The perceptual center in Mandarin Chinese syllables by Yu-Jung Lin, Kenneth de Jong

■Unstressed vowel reduction and contrast neutralisation in western and eastern Bulgarian: A current appraisal by Mitko Sabev

■Perception and production of Mandarin-Accented English: The effect of degree of Accentedness on the Interlanguage Speech Intelligibility Benefit for Listeners(ISIB-L) and Talkers(ISIB-T) by Sheyenne Fishero, Joan A. Sereno, Allard Jongman

■Advancement of phonetics in the21st century: Exemplar models of speech production by Matthew Goldrick, Jennifer Cole

Volume 100

■Do children better understand adults or themselves? An acoustic and perceptual study of the complex sibilant system of Polish by MarzenaŻygis, Daniel Pape, Marek Jaskuła, Laura L. Koenig

■Compensatory effects of foot structure in segmental durations of Soikkola Ingrian disyllables and trisyllables by Natalia Kuznetsova, Irina Brodskaya, Elena Markus

■An acoustic study of rhythmic synchronization with natural English speech by Tamara Rathcke, Chia-Yuan Lin

■Looking within events: Examining internal temporal structure with local relative rate by Sam Tilsen, Mark Tiede

■L1 vowel perceptual boundary shift as a result of L2vowel learning by Chikako Takahashi

■Cognitive factors in nonnative phonetic learning: Impacts of inhibitory control and working memory on the benefits and costs of talker variability by Xiaojuan Zhang, Bing Cheng, Yu Zou, Xujia Li, Yang Zhang

■Phonetic variation in English infant-directed speech: A large-scale corpus analysis by Ekaterina A. Khlystova, Adam J. Chong, Megha Sundara

Volume 101

■Stop voicing perception in the societal and heritage language of Spanish-English bilingual preschoolers: The role of age, input quantity and input diversity by Simona Montanari, Jeremy Steffman, Robert Mayr

■Phonological mediation effects in imitation of the Mandarin flat-falling tonal continua by Wei Zhang, Meghan Clayards, Francisco Torreira

■Loss of unreleased final stops among Mandarin-Min bilinguals: Structural convergence of languages in contact by Wei-Cheng Weng, Sang-Im Lee-Kim

■An acoustic analysis of rhoticity in Lancashire, England by Danielle Turton, Robert Lennon

■The relation between musical abilities and speech prosody perception: A meta-analysis by Nelleke Jansen, Eleanor E. Harding, Hanneke Loerts, Deniz Başkent, Wander Lowie

■Advancements of phonetics in the21st century: Theoretical and empirical issues of spoken word recognition in phonetic research by Natasha Warner

摘要

Articulatory and acoustic variation in Polish palatalised retroflexes compared with plain ones

Anita Lorenc a, Marzena Żygis b, Łukasz Mik c, Daniel Pape d, Márton Sóskuthy e

a Faculty of Polish Studies, University of Warsaw, ul. Krakowskie Przedmiescie 26/28, 00-927 Warsaw, Poland

b Leibniz-Centre for General Linguistics & Humboldt University, Schützenstr. 18, 10-117 Berlin, Germany

c University of Applied Sciences in Tarnow, Mickiewicza 8, 33-100 Tarnow, Poland

d Department of Linguistics and Languages, McMaster University, 1280 Main Street West, Hamilton, Canada

Abstract

The present paper investigates articulatory and acoustic variation in Polish palatalised retroflex sibilants compared with their plain counterparts. It tests the hypothesis advanced by Hamann (2003: 44) that palatalised retroflexes are non-existent and that retroflexes in Polish change to palato-alveolars [ʃ ʒ t͡ʃ d͡ʒ] when being palatalised. Based on articulatory data from 20 speakers we provide evidence that at least part of the data (53.5%) are palatalised retroflexes [ʂʲ ʐʲ ʈ͡ʂʲ ɖ͡ʐʲ]. The plain counterparts are shown to be retroflex, as proposed by Hamann (2003).

Our averaged results indicate that both palatalised and plain retroflexes show a convex tongue shape. However, individual data reveals a wide range of realisations, from a bunched dorsum to flat and even hollowed tongue shapes. Taking this variability into account, we propose a new tongue shape classification based on Heron’s Formula – i.e. concave, slightly concave, flat, convex and slightly convex. The different tongue shapes are also visualised in the form of videos created using GAMMs.

Regarding acoustic results, our analysis reveals that the strongest correlate of palatalised retroflex sibilants is longer duration of frication in palatalised sibilants followed by higher Centre of Gravity (COG) and m1 spectral slope.

Key words Palatalised retroflexes ;Sibilants ;Polish ;Articulation ;Acoustics ;Inter-speaker variation ;

Schwa’s duration and acoustic position in American English

Uriel Cohen Priva, Emily Strand

Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, Providence, RI 02912, United States

Abstract

Is American English schwa’s position determined solely by the context in which it appears? Do vowels neutralize to schwa when their duration is shorter? We address these two inter-related questions using the Buckeye corpus to study vowel behavior across multiple contexts of spontaneous speech. We find that all except tense high vowels shift to lower F1 values when their duration is relatively short, including lax high vowels and lexical schwas, rather than toward a mid-vowel position that schwa occupies when its duration is long. However, we also replicate the finding that schwa is more dependent on both context and duration than other vowels. The results are not consistent with the idea that schwa’s position is determined exclusively by the context in which it appears. However, schwa’s shift to higher F1 values when its duration is longer is not necessarily different from other vowels’ shift to higher F1 values when their duration is longer, making it unnecessary to argue that schwa’s mid-vowel properties are due to having a target in F1 terms.

Key words Duration ;Corpus study ;Schwa ;Neutralization ;Assimilation ;American English ;

Phonological and phonetic contributions to perception of non-native lexical tones by tone language listeners: Effects of memory load and stimulus variability

Juqiang Chen a, Mark Antoniou b, Catherine T. Best b c

a School of Foreign Languages, Shanghai Jiao Tong University, Shanghai 200240, China

b Western Sydney University, The MARCS Institute for Brain Behaviour and Development, Penrith, NSW 2751, Australia

c Haskins Laboratories, New Haven, CT, USA

Abstract The present study examined native language phonological and phonetic factors in non-native lexical tone perception by tone language listeners, manipulating memory load and stimulus variability to bias listeners towards a more phonological or more phonetic mode of perception. Mandarin and Vietnamese listeners categorised the five Thai lexical tones to their native tones, and discriminated five selected Thai tone contrasts that were predicted by the Perceptual Assimilation Model (PAM, Best, 1995) to be discriminated differently. Categorisation responses showed more phonologically-based patterns under high than low memory load but were unaffected by talker and vowel variability, whereas discrimination accuracy was reduced by talker and vowel variability but not by memory load. Phonological factors indicated by type of categorisation and category overlap generally predicted the discrimination of non-native tone contrasts in line with PAM principles. Phonetic factors reflected in category overlap scores and fit index difference scores predicted variations in discriminating contrasts of the same contrast categorisation type. These findings uphold the extension of PAM principles to non-native tone perception by native listeners of other tone languages. Native phonological and phonetic contributions to non-native speech perception differ between categorisation and discrimination tasks, as reflected in differential modulation by memory load and stimulus variability.

Key words Non-native lexical tone perception ;Perceptual assimilation ;Talker variability ;Vowel variability ;Memory load ;

Red and blue bananas: Time-series f0 analysis of contrastively focused noun phrases in Papuan Malay and Dutch

Constantijn Kaland a, Marc Swerts b, Nikolaus P. Himmelmann

a Institute of Linguistics, University of Cologne, Germany

b Department of Communication and Cognition, Tilburg University, The Netherlands

Abstract

The prosody of Papuan Malay, spoken in the easternmost provinces of Indonesia, is not fully described and understood. The limited work available suggests that phrase prosody in this language is different from other well-studied (West-Germanic) languages. However, not much is known about possible correlates of focus marking, for which prosody is used extensively in languages like Dutch and English. To gain insight into universal and specific usages of prosody, this study reports two identical production experiments and acoustic analyses carried out for Papuan Malay and Dutch, to investigate the prosody of noun phrases in different contrastive focus conditions. Participants in the experiments described pictures with different shapes and colors using specific matrix phrases. The prosody of these descriptions was examined by time-series measures of f0 and statistically analysed using generalised additive mixed models (GAMMs). Results show that speakers of Papuan Malay do not use f0 to mark contrastively focused noun phrases, unlike Dutch speakers. The main function of f0 in Papuan Malay phrases appears to be boundary marking on the final syllable in the phrase, a function also observed in Dutch. In addition, the pre-final syllable in the Papuan Malay phrase was always marked with a rising f0, whereas in Dutch an interaction between the boundary and focus marking was found. The results are discussed in a typological perspective and provide new insights into the prosody of Papuan Malay.

Key words Prosody ;Contrastive focus ;Papuan Malay ;Dutch ;Typology ;f0 ;

Speakers coarticulate less in response to both real and imagined communicative challenges: An acoustic analysis of the LUCID corpus

Zhe-chen Guo, Rajka Smiljanic

Department of Linguistics, University of Texas at Austin, Austin, USA

Abstract

Overlap of adjacent articulatory gestures leads to coarticulation. Understanding how hyperarticulated intelligibility-enhancing clear speech modifications affect coarticulation can inform theories of phonetic variation and speech intelligibility. However, prior research yielded mixed findings regarding the relationship between hyperarticulation and coarticulatory patterns. This study extends previous work by analyzing the degree of coarticulation across several different communicative conditions in the LUCID corpus (Baker & Hazan, 2010). Southern British English speakers completed an interactive spot-the-difference task with a partner with and without a communicative barrier (e.g., speech degraded by talker babble). They also read sentences without an interlocutor casually and clearly. Diphones in keywords produced in both tasks were analyzed using two whole-spectrum measures, with greater spectral distance and shorter coarticulatory overlap between the diphones indexing less coarticulation. Results revealed that speakers coarticulated less in response to both real (interactive task) and imaginary (sentence-reading) communicative challenges. Speakers furthermore varied the degree of coarticulatory resistance in different real communicative barriers. Diphones with greater consonant articulatory constraint were less sensitive to differences between the conditions, suggesting a limit to the hyperarticulation-induced phonetic variation. The findings agree with the models of targeted speaker adaptations assuming coarticulatory resistance in hyperarticulated clear speech (the H&H theory: Lindblom, 1990).

Key words Coarticulation ;Clear speech ;Listener-directed speech ;Hyperarticulation ;LUCID corpus ;

Production and perception of prevelar merger: Two-dimensional comparisons using Pillai scores and confusion matrices

Valerie Freeman

Oklahoma State University, Stillwater, OK 74078, USA

Abstract

Vowel merger production is quantified with gradient acoustic measures, while phonemic perception methods are often coarser, complicating comparisons within mergers in progress. This study implements a perception experiment in two-dimensional formant space (F1 × F2), allowing unified plotting, quantification, and statistics with production data. Production and perception are compared within 20 speakers for a two-part prevelar merger in progress in Pacific Northwest English, where mid-front /ɛ, e/ approximate or merge before voiced velar /ɡ/ (LEG–VAGUE merger), and low-front prevelar /æɡ/ raises toward them (BAG-raising). Distributions are visualized with kernel density plots and overlap quantified with Pillai scores and confusion matrices from linear discriminant analysis models. Results suggest that LEG–VAGUE merger is perceived as more complete than it is produced (in both the sample and community), while BAG-raising is highly variable in production but rejected in perception. Relationships between production and perception varied by age, with raising and merger progressing across two generations in production but not perception, followed by younger adults perceiving LEG–VAGUE merger but not producing it and varying in (minimal) raising perception while varying in BAG-raising in production. Thus, prevelar raising/merger may be progressing among some social groups but reversing in others.

Key words Prevelar merger ;Pacific Northwest English ;Merger perception ;Sound change in progress ;

Same vowels but different contrasts: Mandarin listeners’ perception of English /ei/-/iː/ in unfamiliar phonotactic contexts

Yizhou Wang a, Rikke L. Bundgaard-Nielsen b, Brett J. Baker a, Olga Maxwell a

a School of Languages and Linguistics, The University of Melbourne, Australia

b MARCS Institute for Brain, Behaviour & Development, Western Sydney University, Australia

Abstract

The study presented here examines how adult L2 listeners’ L1 phonotactics interferes with L2 vowel perception in different consonantal contexts. We examined Mandarin listeners’ perception of the English /ei/-/iː/ vowel contrast in three onset consonantal contexts, /p f w/, which represent different phonotactic scenarios with respect to the permissibility of Mandarin phonology. L1 Mandarin listeners (N = 42) completed a series of three tasks: a categorisation task, a vowel identification task, and an AXB discrimination task. The results show that English /ei/-/iː/ are perceived as highly contrastive in the /p/ context because both /pei/ and /piː/ constitute a licit sequence in Mandarin phonology. However, participants experience substantial /ei/-/iː/ category confusion in the /f/ and /w/ contexts, where Mandarin listeners repair perceptually by modifying the vowel quality in illicit (unattested) consonant–vowel sequences, i.e., */fiː/ → /fei/ and */wiː/ → /wei/. Further exploratory analyses indicate that L2 listeners’ vowel perception in unfamiliar phonotactic contexts is associated with their target language experience, typically indicated by their L2 vocabulary size. The findings thus suggest that the acquisition of novel phonotactic regularities is tied to increased experience with the L2 lexicon.

Key words Vowel perception ;Phonotactics ;Phonological repairs ;Mandarin ;

Prosodic marking of information status in Italian

Simona Sbranna, Caterina Ventura, Aviad Albert, Martine Grice

IfL-Phonetik, University of Cologne, Germany

Abstract

Previous studies on the prosodic marking of information status argue that Italian tends to resist deaccentuation of given elements. In particular, Italian reportedly always accents post-focal given information within noun phrases (NPs), so that it is not possible to reliably reconstruct the information status of the items from the acoustic signal. However, descriptions have so far been concerned with categorical accent patterns, lacking crucial information about continuous phonetic parameters and their distribution in the utterance in ways that can contribute to prosodic marking. In this paper, we use a novel approach based on periodic-energy-related measures to explore how speakers of the Neapolitan variety of Italian modulate continuous prosodic parameters to differentiate information structure. We show that, contrary to previous findings, Italian speakers of the Neapolitan variety do mark information status prosodically within noun phrases. The discrepancy with previous work is explained by the fact that the prosodic marking of post-focal givenness is not achieved through the categorical presence or absence of a pitch accent on one specific syllable, but through the gradual modulation of phonetic parameters at various locations. Moreover, we find that these modulations occur early in the noun phrase. We also show that native speakers can make use of their knowledge of these modulations to reliably identify post-focal given elements in the absence of the pragmatic context, that is, directly from the acoustic signal.

Key words Prosodic marking of information status ;Prosodic prominence ;Periodic energy ;Continuous phonetic parameters ;Italian L1 ;

Flexibility and stability of speech sounds: The time course of lexically-driven recalibration

Yi Zheng a, Arthur G. Samuel a b c

a Department of Psychology, Stony Brook University, United States

b Basque Center on Cognition, Brain, and Language, Spain

c Ikerbasque, Basque Foundation for Science, Spain

Abstract

Perceptual stability is obviously advantageous, but being able to adjust to the prevailing environment is also adaptive. Previous research has identified ways in which the categorization of speech sounds shifts as a function of recently heard speech. Dozens of studies have examined “lexically driven recalibration”, an adjustment to categorization after listeners hear a number of words with a particular speech sound designed to be perceptually ambiguous. Despite the large number of these studies, little is known about how long the adjustment endures. Using two different stimulus sets, we assess the recovery time after lexically driven recalibration. In addition, we examine whether the size of the recalibration effect diminishes during the identification test used to measure it, and whether the recalibration effect is stronger for one side of a tested contrast or the other. The effect did in fact decline during its measurement, and one side of the contrast (/s/) produced stronger shifts than others (/ʃ/ or /θ/) under the conditions typically examined in recalibration studies. Recalibration was quite robust after 24 hours for both stimulus sets, and still measurable after one week for one of them. This time course is strikingly different than the recovery times reported in previous studies for two other adjustment processes – selective adaptation and audiovisually driven recalibration. The vastly different time courses pose a major challenge for models that ascribe these phenomena to the same adjustment function. Thus, such models will need to be substantially modified, or alternative models will need to be developed.

Key words Speech perception ;Recalibration ;Recovery Time ;Time course ;

Speaker-specificity in speech production: The contribution of source and filter

Vincent Hughes a, Amanda Cardoso b, Paul Foulkes a, Peter French a c, Amelia Gully a, Philip Harrison a

a Department of Language and Linguistic Science, University of York, UK

b Department of Linguistics, The University of British Columbia, Canada

c J P French Associates, York, UK

Abstract

This study examines the extent to which speaker-specific information is encoded in different features of vocal output and the relationships between those features. A range of acoustic features, grouped as source (laryngeal voice quality measures and fundamental frequency) and filter features (formants and Mel-frequency cepstral coefficients; MFCCs), were extracted from the vocalic portion of the hesitation marker um for 90 male speakers of Standard Southern British English. Little overall correlation between the sets of features was observed, suggesting no strong interdependence between source and filter in our data. Although filter features were consistently better at discriminating between same- and different-speaker pairs compared with source features, combining source and filter has the potential of producing the lowest error rates and the strongest speaker discrimination scores. Taken together, results show that source and filter provide complementary speaker-specific information. However, the extent of the improvements in speaker discrimination performance when combining source and filter varied across speakers. We explore potential explanations for this finding and discuss the implications for source-filter theory, and for applied fields such as speaker recognition and forensic speech science.

Key words Speaker-specificity ;Speaker recognition ;Forensic speech science ;Hesitation markers ;Source-filter theory ;

Discriminative segmental cues to vowel height and consonantal place and voicing in whispered speech

Luis M.T. Jesus a, Sara Castilho b, Aníbal Ferreira c, Maria Conceição Costa d

a School of Health Sciences (ESSUA), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), Intelligent Systems Associate Laboratory (LASI), University of Aveiro, Aveiro, Portugal

b Hospital Arcebispo João Crisóstomo, Cantanhede, Portugal

c Department of Electrical and Computer Engineering, University of Porto, Portugal

d Department of Mathematics (DMat) and Centre of Research and Development in Mathematics and Applications (CIDMA), University of Aveiro, Portugal

Abstract

The acoustic signal attributes of whispered speech potentially carry sufficiently distinct information to define vowel spaces and to disambiguate consonant place and voicing, but what these attributes are and the underlying production mechanisms are not fully known. The purpose of this study was to define segmental cues to place and voicing of vowels and sibilant fricatives and to develop an articulatory interpretation of acoustic data.

Metho d Seventeen speakers produced sustained sibilants and oral vowels, disyllabic words, sentences and read a phonetically balanced text. All the tasks were repeated in voiced and whispered speech, and the sound source and filter analysed using the following parameters: Fundamental frequency, spectral peak frequencies and levels, spectral slopes, sound pressure level and durations. Logistic linear mixed-effects models were developed to understand what acoustic signal attributes carry sufficiently distinct information to disambiguate /i, a/ and /s, ʃ/.

Results

Vowels were produced with significantly different spectral slope, sound pressure level, first and second formant frequencies in voiced and whispered speech. The low frequencies spectral slope of voiced sibilants was significantly different between whispered and voiced speech. The odds of choosing /a/ instead of /i/ were estimated to be lower for whispered speech when compared to voiced speech. Fricatives’ broad peak frequency was statistically significant when discriminating between /s/ and /ʃ/.

Conclusions

First formant frequency and relative duration of vowels are consistently used as height cues, and spectral slope and broad peak frequency are attributes associated with consonantal place of articulation. The relative duration of same-place voiceless fricatives was higher than voiced fricatives both in voiced and whispered speech. The evidence presented in this paper can be used to restore voiced speech signals, and to inform rehabilitation strategies that can safely explore the production mechanisms of whispering.

Key words Speech production ;Acoustic phonetics ;Whispered speech ;Vowels ;Fricatives ;

The final lengthening of pre-boundary syllables turns into final shortening as boundary strength levels increase

Gerrit Kentner a b, Isabelle Franz b c, Christine A. Knoop b, Winfried Menninghaus b

a Goethe University Frankfurt, Germany

b Max Planck Institute for Empirical Aesthetics, Germany

c Hochschule für Gesundheit Bochum, Germany

Abstract

Phrase-final syllable duration and pauses are generally considered to be positively correlated: The stronger the boundary, the longer the duration of phrase-final syllables, and the more likely or longer a pause. Exploring a large sample of complex literary prose texts read aloud, we examined pause likelihood and duration, pre-boundary syllable duration, and the pitch excursion at prosodic boundaries. Comparing these features across six predicted levels of boundary strength (level 0: no break; 1: simple phrase break; 2: short comma phrase break; 3: long comma phrase break; 4: sentence boundary; 5: direct speech boundary), we find that they are not correlated in a simple monotonic fashion. Whereas pause duration monotonically increases with boundary strength, both pre-boundary syllable duration and the pitch excursion on the pre-boundary syllable are largest for level-2 breaks and decrease significantly through levels 3 to 5. Our analysis suggests that pre-boundary syllable duration is partly contingent on the tonal realization, which is subject to f0 declination as the utterance progresses. We also surmise that pre-boundary syllable duration reflects differences in planning complexity for the different prosodic and syntactic boundaries. Overall, this study shows that a simple monotonic correlation between pause duration and pre-boundary syllable duration is not valid.

Key words Pre-boundary lengthening ;Prosodic phrasing ;Pauses ;Prosodic boundary ;Declination ;

The final lengthening of pre-boundary syllables turns into final shortening as boundary strength levels increase

Gerrit Kentner a b, Isabelle Franz b c, Christine A. Knoop b, Winfried Menninghaus b

a Goethe University Frankfurt, Germany

b Max Planck Institute for Empirical Aesthetics, Germany

c Hochschule für Gesundheit Bochum, Germany

Abstract

Key words Pre-boundary lengthening ;Prosodic phrasing ;Pauses ;Prosodic boundary ;Declination ;

Analysis and computational modelling of Emirati Arabic intonation – A preliminary study

Muhammad Swaileh A. Alzaidi a, Yi Xu b, Anqi Xu c, Marta Szreder d

a Department of English Language, College of Language Sciences, King Saud University, P. O. Box 145111, Riyadh, Saudi Arabi a

b Department of Speech, Hearing and Phonetic Sciences, University College London, London WC1N 1PF, United Kingdom

c Harbin Institute of Technology, Shenzhen, HIT Campus, University Town of Shenzhen, Xili, Nanshan District, Shenzhen 518055, Chin a

d Speech Language Pathology Department, United Arab Emirates University, United Arab Emirates

Abstract

This study is a preliminary investigation of intonation in Emirati Arabic (EA) (an under-researched Arabic dialect), using systematic acoustic analysis and computational modelling. First, we investigated the prosodic realisation of information focus and contrastive focus at sentence-initial, -penultimate and -final positions. The analysis of 1980 EA utterances produced by eleven EA native speakers revealed that (1) in focused words, only contrastive focus is realised with expanded excursion size, longer duration, and stronger intensity relative to their neutral focus counterparts, (2) post-focus words have a lower f0 and weaker intensity in both contrastive focus and information focus, and (3) pre-focus words have compressed excursion size and relatively short duration. We then used computational modelling to test how much of the EA intonation could be captured by the PENTA model, with focus-defined functional categories and a number of other, putative categories. PENTAtrainer was trained on syllable-sized multi-functional targets from a subset of the production data. The model then generated f0 contours with the learned targets and imposed them on resynthesised speech for perceptual evaluation. A comparison of the model-generated f0 contours with the natural f0 contours showed that not only focus but also weight, stress, position of word-level stressed syllable and prosodic word are important factors determining the fine details of EA intonation. A perceptual test with native EA listeners showed that the synthetic EA f0 contours sounded nearly as natural as the original intonation, and could convey focus nearly as accurately as natural intonation.

Key words Focus ;PENTAtrainer ;PF C Emirati Arabi c Predictive synthesis ;

Gestural characterisation of vowel length contrasts in Australian English

Louise Ratko, Michael Proctor, Felicity Cox

Department of Linguistics, Macquarie University, Balaclava Road, North Ryde, 3345-8749 New South Wales, Australi a Abstract

Abstract

Many languages contrast long and short vowels, but the phonetic implementation of vowel length contrasts is not fully understood. We examine articulation of long and short vowels in Australian English to investigate whether duration contrasts involve intrinsic differences in the underlying gestures, or differences in their timing relationships with flanking consonants. We used electromagnetic articulography to track tongue dorsum and lip movement in two long-short vowel pairs /iː-ɪ/ (bead – bid) and /ɐː-ɐ/ (bard – bud) produced in /pVp/ syllables by nine speakers of Australian English. For short vowels, lingual movement towards the vowel target (formation interval) is shorter and smaller, but not stiffer, than that of long vowels. Syllables containing the short vowel /ɐ/ also exhibited more vowel-coda overlap than those containing /ɐː/. These data suggest that both vowel-intrinsic and syllable-level mechanisms are involved in the realisation of vowel length contrasts in Australian English.

Key words Vowel production ;Vowel length ;Articulatory phonetics ;Articulatory phonology ;Australian English ;

Sound change in Western Andalusian Spanish: Investigation into the actuation and propagation of post-aspiration

Nicholas Henriksen a, Amber Galvano b, Micha Fischer a

a University of Michigan, United States

b University of California, Berkeley, United States

Abstract

This study investigates the actuation and propagation of sound change in Western Andalusian Spanish (WAS) by examining the change from pre- to post-aspiration in intervocalic /s/ + voiceless stop sequences (i.e., /sp st sk/). We collected read-speech data from 30 WAS speakers and 30 comparison speakers of North-Central Peninsular Spanish (NCPS). The results show that the shift toward post-aspiration is most advanced in /st/-words, as compared to /sp/- and /sk/-words, which we take as evidence that actuation likely occurred in the coronal context. We additionally demonstrate how post-aspiration is integrating into the wider WAS sound system: (i) post-aspirated stops undergo closure voicing in a fashion akin to plain stops; and (ii) the post-aspirated pattern is now emerging in phonological environments that historically lacked coda-/s/, namely in the stop + /t/ context. An important contribution of this study concerns the likely role played by the coronal context (i.e., /st/-words) during both the actuation and propagation stages of the sound change. We situate the findings within frameworks suggesting that actuation and propagation are systematically connected phases of sound change rather than wholly independent processes.

Key words Sound change ;Actuation ;Propagation ;Western Andalusian Spanish ;Post-aspiration ;Spanish ;

The change in breathy voice after tone split: A production study of Suzhou Wu Chinese

Chunyu Ge a, Wenwei Xu a, Wentao Gu b, Peggy Pik Ki Mok a

a The Chinese University of Hong Kong, Shatin, Hong Kong

b Nanjing Normal University, Nanjing, Jiangsu 210097, China

Abstract

In some languages, breathy voice plays a pivotal role in tone split. After tone split, breathy voice can undergo further changes. Suzhou Wu Chinese used to have a voicing contrast in initial obstruents, which has transphonologized to a tone contrast and resulted in a two-way tone split, with breathy voice in the low register tones. This study investigates the change in breathy voice after the tone split in Suzhou Wu with apparent-time data from speakers from three age groups. Simultaneous audio and electroglottographic recordings were collected. Principal component analysis and linear discriminant analysis conducted on the acoustic measurements indicate that breathy voice is used less by younger speakers. Generalized Additive Mixed Models were conducted to reveal the changes in breathy voice during the time course of the vowel with regard to different low register tones. It is also found that T2 and T8 are undergoing a decrease in breathy voice with tone changes, but breathy voice is decreasing without tone change in T6. Younger female speakers are ahead of younger male speakers in the decrease in breathy voice. This paper provides a valuable investigation of the change in breathy voice after tone split and contributes to our understanding of the development of phonation types.

Key words Breathy voice ;Tone split ;Wu Chinese ;

Phonetic differences between nouns and verbs in their typical syntactic positions in a tonal language: Evidence from disyllabic noun–verb ambiguous words in Standard Mandarin Chinese

Qibin Ran a c, Kai Gao b d, Yuzhu Liang a c, Quansheng Xia b c, Søren Wichmann e

a School of Liberal Arts, Nankai University, Tianjin, China

b College of Chinese Language and Culture, Nankai University, Tianjin, China

c Laboratory of Social Science of Tianjin, Tianjin, China

d Institute of Forensic Science, Ministry of Public Security, China e University of Kiel, Germany

Abstract

This study investigates how word categories, namely noun and verb, influence acoustic realizations (duration, F0, intensity) in Standard Mandarin Chinese, a language having phonemically distinctive tones and a simple morphological system. Noun-verb ambiguous words were selected and presented in the final positions of typical syntactic contexts in order to avoid the interference of prosodic boundary, syntactic complexity, contextual predictability, tonal environment, F0 range and syllable properties (consonant, vowel, tone, syllable length). Linear mixed models were fitted to duration, and generative additive mixed models were fitted to F0 and intensity. The results showed that phonetic differences between nouns and verbs were still evident in duration, F0 and intensity after lexical frequency, speech rate and some other related factors were taken into consideration in the models. The second syllables of nouns were longer than those of verbs, and both syllables of nouns were higher in F0 and greater in intensity than those of verbs. Since the prosodic boundary, frequency and other factors were controlled for, the phonetic differences between nouns and verbs might be attributed to their differences in information load and number of syllables. This study provided evidence that phonetic differences between nouns and verbs might be driven by the grammatical classes themselves and is not an epiphenomenon of other processes.

Key words Noun;Ver b Prosody;Typical syntactic position;Phonetic difference;Information loa d Number of syllables;

Prominence and intonation in Singapore English

Adam J. Chong a, James S. German b

a Queen Mary University of London, London, UK

b Aix-Marseille Université, CNRS, LPL, Aix-en-Provence, France

Abstract

Previous work on Singapore English prosody has focused largely on establishing the acoustic correlates of lexical stress and examining where the language falls within a rhythm-class typology. Little attention, however, has been paid to how lexical prominence, if present, interacts with phrasal prominence. In this study, we examine the extent to which f0 realizations vary across lexical items with differing stress patterns, while taking into account that prosodic phrasing requirements necessitate an f0 rise to the phrase-final syllable. We show that across target types of varying stress placement, syllable length, and constituency, f0 realizations are highly consistent, involving a rise from the start of the target word or phrase which culminates with a peak on the phrase-final syllable. The location of lexical prominence is the primary influence on the scaling of f0 across the entire target, with stress-initial targets having a higher mean f0. Exploratory analysis of duration and intensity measures further corroborates the prominence-lending nature of the phrase-final syllable, with some evidence for marking of prominence on non-final lexically stressed syllables. The findings support the primarily post-lexical role that f0 plays in marking phrase edges, instead of lexical heads, in Singapore English, in line with a previously proposed AM model of Singapore English intonation. The implications of these findings for the study of prosodic typology and sociolinguistic variation in Singapore English are also discussed.

Key words Singapore English; Intonation; Prosodic typology; Lexical stress; Prominence; Edge-marking; Autosegmental Metrical Theory;

Revisiting the nasal continuum hypothesis: A study of French nasals in continuous speech

Gillian de Boer a, Jahurul Islam b, Charissa Purnomo b, Linda Wu b, Bryan Gick b c

a University of Alberta, Edmonton, AB, Canada

b University of British Columbia, Vancouver, BC V6T 1Z4, Canada

c Haskins Laboratories Inc, New Haven, Connecticut, United States

Abstract

Speech sounds are generally classified as either nasal or oral, with the velopharyngeal opening (VPO) characterized as simply open or closed. This account contrasts with clinical perspectives, in which the degree of VPO is described as being more continuous. An examination of laboratory studies of French suggests a third possibility, in which the VPO may have multiple distinct degrees of opening. Based on this limited literature we predicted that the VPO of Québécois French would be largest for speech pauses, then in descending order, phonemically nasal vowels, nasal consonants, contextually nasal vowels (with carryover being larger than anticipatory), and finally oral sounds. We analyzed full sentences read by nine speakers of Québécois French from the Université Laval X-ray videofluorography database. The films were annotated, and degrees of VPO were measured from the sagittal projections of the vocal tract. We found evidence for most of the proposed distinctive VPO targets in Québécois French, with the exception that anticipatory nasalization led to greater VPO than carryover nasalization.

Key words Velum; Nasal; French; Velopharyngeal opening; Contextual nasalization;

Advancements of phonetics in the 21st century: Theoretical issues in sociophonetics

Tyler Kendall a 1, Nicolai Pharao b, Jane Stuart-Smith c, Charlotte Vaughn a d

a University of Oregon, United States

b University of Copenhagen, Denmark

c University of Glasgow, United Kingdom

d University of Maryland, United States

Abstract

Variation in speech has always been important to phonetic theory, but takes center stage in the growing area of sociophonetics, which places the role of the social at the heart of the theoretical and methodological enterprise. This paper provides a comprehensive survey of key advances and theoretical issues in sociophonetic research, in both production and perception. It reviews the foundations of sociophonetics in phonetics and sociolinguistics, and articulates several major theoretical questions that run through sociophonetic work, as well as the nature of evidence and methods in sociophonetics. It explores the many factors that underpin variation and change within individuals, such as speech accommodation and speech style, and major factors that organize group-level variation and change, including regional affiliation, social class, sex, gender, and sexuality, race and ethnicity, and age. By connecting sociophonetic research to a wide range of areas, from cognition to indexicality, the paper synthesizes cross-cutting themes from prior research, and highlights current and future directions for the field.

Key words Sociophonetics; Speech production; Speech perception; Social evaluation; Variation; Sound change; Acoustics;

Advancements of phonetics in the 21st century: Theoretical and empirical issues in the phonetics of sound change

Patrice Speeter Beddor

Department of Linguistics, University of Michigan, Ann Arbor, MI 48109, US A

Abstract

It has long been understood that speakers produce and listeners perceive non-random, systematic phonetic variants that serve as the raw material for sound change. This understanding underlies much of the current research on the phonetic underpinnings of change, which includes study of (i) general phonetic principles underlying variation, (ii) specific phonetic ‘preconditions’ and biases arguably linked to specific patterns of phonological instability and change, and (iii) the production and perception of variation by speaker-listeners in situations of actual ongoing change and by interacting agents in computational simulations of change. This paper shows how findings from these three broad areas of study have led to 21st century theoretical and empirical advancements in our understanding of phonetic change. Big-picture questions about the nature of change are approached through consideration of a series of smaller, more tractable questions (e.g., about the nature of, and relation between, innovative speaking and innovative listening for both stable patterns of variation and ongoing change). The paper’s goals are to show, for these questions, their theoretical grounding, empirical challenges, preliminary answers and, in turn, the new theoretical directions emerging from those answers.

Key words Sound change; Phonetic variation; Gestural overlap and reduction; Perception-production relation; Individual differences; Origin and spread of change;

Second dialect acquisition and phonetic vowel reduction in the American Midwest

Cynthia G. Clopper a, Rachel Steindel Burdin b, Rory Turnbull c

a Ohio State University, USA

b University of New Hampshire, USA

c Newcastle University, UK

Abstract

Geographic mobility can lead to the acquisition of new regional dialect features. This second dialect acquisition is highly variable across individuals and is affected by a range of linguistic and social factors. The realization of dialect-specific features is also affected by linguistic variables related to phonetic reduction, but this interaction has been primarily examined with a mix of mobile and non-mobile participants. In the current study, second dialect acquisition by Midwestern American young adults and its interaction with phonetic reduction processes was examined. Relative to lifetime residents of the Northern and Midland regions of American English, some Northern transplants to the Midland region exhibited second dialect acquisition and others exhibited maintenance of Northern dialect features. All talkers showed phonetic reduction due to lexical frequency, phonological neighborhood density, discourse mention, semantic predictability, and speaking style. These phonetic reduction processes only weakly interacted with dialect variation, such that less phonetic reduction was observed overall when it was potentially in conflict with dialect-specific vowel features. Taken together, the results provide additional evidence for substantial individual variation in second dialect acquisition, but limited evidence of an effect of second dialect acquisition on the interaction between dialect variation and phonetic reduction processes.

Key words Second dialect acquisition; Phonetic reduction; Lexical frequency; Phonological neighborhood density; Discourse mention; Cloze predictability; Speaking style

The perceptual center in Mandarin Chinese syllables

Yu-Jung Lin a, Kenneth de Jong b

a Department of World Languages, Literatures, and Cultures, College of the Holy Cross, United States

b Department of Linguistics, Indiana University, Bloomington, United States

Abstract

This study explores the location of the p-center in Mandarin Chinese and factors that influence it. Previous research has suggested that p-center behavior in languages that lack obstruent clusters, such as Cantonese and Mandarin, will differ from that found in Indo-European languages and others that are typologically different from Chinese languages. The purposes of the current paper are to investigate(1) whether Chinese languages systematically have a different p-center location from that found in previous studies of Indo-European languages, (2) whether vowel onglides are included as part of the syllable rime as claimed in the assumed analysis of Mandarin, and(3) how the alignment of the p-center is influenced by different features of the initial consonant, different features of rime, as well as the speech rates. Six native Mandarin speakers from Taiwan participated in a syllable repetition task with two different speech rates: 60bpm and120bpm. The results indicate that the p-center in Mandarin Chinese is roughly aligned with the acoustic vowel onset, when the syllable does not have an onglide, and the onglide onset, when the syllable has an onglide. The initial consonant manner did not significantly influence onglide or vowel onsets, but the initial consonant acoustic duration, rimes, and speech rates all significantly influenced the vowel onsets as previous p-center studies have found. This appears to differ markedly from the p-center found in a recent study of Cantonese. Various causes for this mismatch are discussed.

Key words Mandarin; Perceptual-center; Rhythm; Speech production; Speech timing; Synchronization

Unstressed vowel reduction and contrast neutralisation in western and eastern Bulgarian: A current appraisal

Mitko Sabev, Department of Language Science and Technology, Saarland University, C7.2, 66123Saarbrücken, Germany

Abstract

Although Bulgarian frequently appears in discussions of vowel reduction, the vowel changes and contrast neutralisation that occur in Bulgarian unstressed syllables are often not well understood and misrepresented in the literature. I report the results of an acoustic study of stressed and unstressed vowels in two present-day varieties of Bulgarian, from the West and the East of Bulgaria. The dialects differ with respect to the magnitude of reduction(how changed unstressed vowels are), its generalisation(which vowels are affected), and the resultant neutralisation patterns; overall, reduction is stronger in the eastern variety. A number of long-standing claims about Bulgarian phonology are disproven, notably that there is less reduction in immediately pretonic than in other unstressed syllables, that high vowels are lowered in unstressed position, and that western Bulgarian reduction is necessarily gradient. I further demonstrate that, although implicationally related, reduction proper(i. e. systematic differences between stressed and unstressed vowels), its potential phonologisation, and contrast neutralisation are distinct aspects of the traditional notion of‘vowel reduction’, each of which can be fruitfully examined in its own right.

Key words Vowel reduction; Vowel merger; Contrast neutralisation; Incomplete neutralisation; Categorical and gradient reduction; Undershoot; Bulgarian

Perception and production of Mandarin-Accented English: The effect of degree of Accentedness on the Interlanguage Speech Intelligibility Benefit for Listeners(ISIB-L) and Talkers(ISIB-T)

Sheyenne Fishero, Joan A. Sereno, Allard Jongman

Department of Linguistics, University of Kansas, 1541Lilac Ln Room427, Lawrence, KS66045, USA

Abstract

Previous research on the Interlanguage Speech Intelligibility Benefit(ISIB) indicates nonnative listeners may have an advantage at understanding nonnative speech of talkers with the same first language(L1) due to shared interlanguage knowledge. The present study offers a comprehensive analysis of various factors that may modulate this advantage, including the proficiency of both the listeners and the talkers, the mapping of phonemes between the L1and second language(L2), and the acoustic properties of the phones. Accuracy scores on a lexical decision task were used to investigate both native English listeners’and native Mandarin learners’of English perception of native English and Mandarin-accented English speech. Results show clear ISIB-L and ISIB-T effects and demonstrate the dynamic nature of ISIB effects, with both being modulated by speaker and listener proficiency. More striking ISIB effects typically occur at the most extreme ends of accentedness. Additionally, an advantage for common-phoneme over unique-phoneme words in nonnative speech was observed. While nonnative productions of common-phoneme words are more accurate than those of unique-phoneme words, for the most accented productions, nonnative listeners are faster to respond to these unique, often mispronounced, productions. The nonnative listener advantage at perceiving nonnative speech depends on various factors, including listener proficiency, speaker proficiency, phoneme characteristics, and the acoustics of specific speech tokens.

Key words Interlanguage; Intelligibility; Accentedness; L2 proficiency; L2 learners; Nonnative speakers; Nonnative listeners

Advancement of phonetics in the 21st century: Exemplar models of speech production

Matthew Goldrick, Jennifer Cole

Department of Linguistics, Northwestern University, Evanston60208, IL, USA

Abstract

In the first decades of the21st century, exemplar theory has fueled an explosion of theoretical and empirical work in speech production. We review the foundations for this framework in linguistics and cognitive science, and examine how recent empirical findings challenge core principles of exemplar theory. While theoretical advances in hybrid exemplar models address some of these issues, accounting for the emergence of structure, the incorporation of structure into exemplar updating, and the non-uniformity of phonetic variation and convergence(among other phenomena), remain major challenges for current models. We discuss future directions for developing exemplar theories as comprehensive accounts of speech production.

Key words Usage based models; Exemplar models; Connectionism; Hybrid models

Do children better understand adults or themselves? An acoustic and perceptual study of the complex sibilant system of Polish

Marzena Żygis a, Daniel Pape b, Marek Jaskuła c, Laura L. Koenig d

a Leibniz-Centre General Linguistics, Berlin, Germany; Humboldt University, Berlin, Germany;

b McMaster University, Hamilton, Canada; Westpomeranian University of Technology, Szczecin, Poland;

c Adelphi University Garden City, New York, USA;

d Haskins Laboratories, New Haven, CT, USAOklahoma State University, Stillwater, OK 74078, USA

Abstract

This paper reports a developmental production-perception study of the three-way Polish sibilant contrast/s, ʂ, ɕ/in typically developing children(N=76). Children aged2; 11–7; 11produced words with sibilants in word-medial and initial position. They then identified the same words they produced, and the words as produced by an unknown adult female. Results show higher identification accuracy for adult productions across all ages. Production and perception data suggest that the alveolo-palatal/ɕ/is acquired first, and that it is differentiated mainly by formant patterns. In the perceptual discrimination task, most errors were found for child-produced/ʂ/, and this persisted into the oldest ages. Early acquisition of/ɕ/has been observed in other languages and may reflect motoric considerations as well as a focus on formant information in child speech perception. Cue weighting appears to change over age in sibilant-specific ways. While all children weight formants highest for/ɕ/, spectral cues appear to be more important for/s/and/ʂ/, and reliance on formants may decrease with age. This work contributes to the study of cross-language differences in acquisition, provides an acoustic characterization of child-produced Polish sibilants, and elucidates the acoustic characteristics that children use in perceptual judgments of sibilants.

Key words Children's production; Perception; Polish sibilants; Acquisition

Compensatory effects of foot structure in segmental durations of Soikkola Ingrian disyllables and trisyllables

Natalia Kuznetsova a, Irina Brodskaya b, Elena Markus c

a UniversitàCattolica del Sacro Cuore b Institute for Linguistic Studies, Russian Academy of Sciences

b Queen Margaret University

c University of Tartu e Institute of Linguistics, Russian Academy of Sciences

Abstract

This acoustic study explores compensatory influences of foot structure on segmental duration and quantity in the foot nuclei of22trisyllabic and four disyllabic structures in vanishing Soikkola Ingrian(Finnic). A robust ternary quantity contrast of consonants is confirmed for both disyllables and trisyllables. While in the shortest disyllables the contrast is“pure”(i. e., not significantly reinforced by the durations of other segments), in all trisyllables it is enhanced through the durationally inverse(compensatory) effects in other segments. In this, the situation in trisyllables is closer to that attested in other languages with ternary consonantal quantity than the situation in disyllables. The phonological quantity contrast has been lost from the second syllable vowel of trisyllables, and its duration is now inversely related to the first syllable complexity. In the segments preceding this vowel, all compensatory effects are purely phonetic. Shorter segmental durations and stronger compensatory effects in trisyllables than in disyllables indicate tendencies for both polysegmental and polysyllabic shortening. We discuss a potential relation of observed compensatory effects of shortening and lengthening(a“half-long”vowel) to foot isochrony and metrical stress.

Key words Soikkola Ingrian(Finnic) Trisyllabic footDisyllabic footTernary quantity of consonantsTemporal compensationVowel shorteningIsochrony

An acoustic study of rhythmic synchronization with natural English speech

Tamara Rathcke a, Chia-Yuan Lin b

a Department of Linguistics, Universität Konstanz, Konstanz, Germany

b Department of Psychology, University of Huddersfield, Huddersfield, United Kingdom

Abstract

Sensorimotor synchronization as a means of studying rhythmic perception-action coupling has been extensively researched across a large number of temporally regular structures including music while little is known about synchronization with speech. The present study fills this gap by applying a sensorimotor synchronization paradigm to natural speech and studying acoustic landmarks that may serve as perceptual anchors of rhythmic movement in spoken sentences. Five rhythmically relevant types of acoustic landmarks were identified in twenty sentences of English containing syllables with vocalic and non-vocalic nuclei. The landmarks were either manually defined or algorithm-generated and included nucleus onsets, peaks and onsets of inter-syllabic and inter-stress timescales, moments of the fastest energy change(approximating the P-center location), and timepoints of combined pitch and periodic power. Sensorimotor synchronization data from32native English participants were examined with regards to the location of an increased synchronization activity in the proximity of the predefined landmarks. The results demonstrated that participants synchronized with syllable-size units regardless of the type of syllable nucleus(vowel or consonant) and that their taps were consistently timed close to nucleus onsets. Hereby, the manually defined nucleus onsets predicted synchronization peaks as well as the algorithm-generated moments of the fastest energy change around nucleus onsets(i. e., a model of the P-center location) did. In contrast, other landmarks did not constitute a stable acoustic anchor of sensorimotor synchronization with English speech. The synchronization performance was not influenced by either acoustic F0-information or by phonological tune specifications. These findings provide new evidence for the proposals that rhythmic attention in natural speech may be locked on to fast spectral changes within a syllable as the smallest structuring unit of prosodic hierarchy.

Key words Speech rhythm; Prosody; Sensorimotor synchronization; P-center; Syllable nucleus; Pitch and Periodic Power; Empirical Mode Decomposition

Looking within events: Examining internal temporal structure with local relative rate

Sam Tilsen a, Mark Tiede b

a Cornell University, USA

b Haskins Laboratories, USA

Abstract

This paper describes a method for quantifying temporally local variation in the relative rates of speech signals, based on warping curves obtained from dynamic time warping. Although the use of dynamic time warping for signal alignment is well established in speech science, its use to estimate local rate variation is quite rare. Here we introduce an extension of the local relative rate method that supports the quantification of variability in local relative rate, both within and across a set of events. We show how measures of temporal variation derived from this analysis method can be used to characterize the internal temporal structure of events. In order to achieve this, we first provide an overview of the standard dynamic time warping algorithm. We then introduce the local relative rate measure and describe our extensions, applying them to an articulatory and acoustic dataset of consonant-vowel-consonant syllable productions.

Key words Speech rate; Dynamic time warping; Linear time warping; Time normalization; Temporal variability; Articulation

L1 vowel perceptual boundary shift as a result of L2vowel learning

Chikako Takahashi, Columbia University, New York, NY10027, USA

Abstract

The current study investigated second language (L2) vowel learning influence on first language(L1) vowel perception. We examined how late L2-English learners’perception of L1-Japanese vowels is influenced by learning to perceive a new L2-English vowel. The study compared L1/L2perception task results from 60 late L1-Japanese learners of L2-English with those of monolingual Japanese(N=21) and English speakers(N=16). To further test hypotheses put forward in the revised Speech Learning Model(SLM-r: Flege&Bohn, 2021), that L2input distribution is associated with L1/L2phonetic learning, information on L2-learner participants’L2dominance was gathered. The results showed clear L1perceptual drift in a subgroup of L2-learner participants who were NOT nativelike in L2English/i-ɪ/categorization but were L2dominant. The results support the claim that L2input plays an important role in reorganizing the L1 phonetic system. However, they also highlight the importance of separating L2 dominance related factors(e. g., L2input/use) and L2 perceptual ability in investigating L1-L2 phonetic interaction.

Key words Phonetic drift; L2phonetics; L2acquisition; Phonetic interaction; Speech perception

Cognitive factors in nonnative phonetic learning: Impacts of inhibitory control and working memory on the benefits and costs of talker variability

Xiaojuan Zhang a, Bing Cheng a, Yu Zou a, Xujia Li a, Yang Zhang b

a English Department&Language and Cognitive Neuroscience Lab, School of Foreign Studies, Xi’an Jiaotong University, 710049, China

b Department of Speech-Language-Hearing Sciences&Center for Neurobehavioral Development, University of Minnesota, Minneapolis, MN55455, USA

Abstract

Talker variability has been reported to facilitate generalization and retention of speech learning, but is also shown to place demands on cognitive resources. Our recent study provided evidence that phonetically-irrelevant acoustic variability in single-talker(ST) speech is sufficient to induce equivalent amounts of learning to the use of multiple-talker(MT) training. This study is a follow-up contrasting MT versus ST training with varying degrees of temporal exaggeration to examine how cognitive measures of individual learners may influence the role of input variability in immediate learning and long-term retention. Native Chinese-speaking adults were trained on the English/i/-/ɪ/contrast. We assessed the trainees’working memory and inhibition control before training. The two trained groups showed comparable long-term retention of training effects in terms of word identification performance and more native-like cue weighting in both perception and production regardless of talker variability condition. The results demonstrate the role of phonetically-irrelevant variability in robust speech learning and modulatory functions of nonlinguistic domain-general inhibitory control and working memory, highlighting the necessity to consider the interaction between input characteristics, task difficulty, and individual differences in cognitive abilities in assessing learning outcomes.

Key words Non-native speech learning; Talker variability; Phonetically-irrelevant variability; Long-term retention; Cognitive abilities

Phonetic variation in English infant-directed speech: A large-scale corpus analysis

Ekaterina A. Khlystova a, Adam J. Chong b, Megha Sundara a

a University of California Los Angeles, Los Angeles, 90095CA, USA

b Queen Mary University of London, Mile End Road, London E14NS, United Kingdom

Abstract

Learning sound categories is central to language acquisition–but we know little about the extent of phonetic variability in the learner’s input. In this study, we phonetically annotated coronal segments(/t/, /d/, /s/, /z/, and/n/) in a corpus of naturalistic American English infant-directed speech(IDS). We did not find evidence that IDS is consistently more canonical than adult-directed speech(ADS), challenging the notion of IDS as a learning register. While IDS is not more canonical than ADS overall, the canonical form was nonetheless the most frequent form in IDS for all segments except/t/. We also considered how infants may move beyond the task of identifying the canonical form to how they may learn to cluster allophones; for this purpose, we quantified the dissimilarity in the phonological environments of the variants in question. Lastly, we investigated a case in which the overwhelming majority of instantiations were not canonical–word-final t and d–and demonstrated that morphologically-conditioned suffixes were more canonical than other word final segments. This corpus is a vital step towards understanding how infants can learn to categorize sounds from their input and will be an invaluable tool for future sociolinguistic, computational and theoretical modeling of language learning.

Key words Phonetic categories; Allophones; Corpus; Morphology; Acquisition; Pronunciation variants

Stop voicing perception in the societal and heritage language of Spanish-English bilingual preschoolers: The role of age, input quantity and input diversity

Simona Montanari a, Jeremy Steffman b, Robert Mayr c

a Department of Child and Family Studies, California State University, Los Angeles, United States

b Linguistics and English Language, University of Edinburgh, United Kingdom

c Centre for Speech, Hearing and Communication Research, Cardiff Metropolitan University, United Kingdom

Abstract

This is the first study to examine stop voicing perception in the societal(English) and heritage language(Spanish) of bilingual preschoolers. The study a) compares bilinguals’English perception patterns to those of monolinguals; b) it examines how child-internal(age) and external variables(input quantity and input diversity) predict English and Spanish perceptual performance; and c) it compares bilinguals’perception patterns across languages. Perception was assessed through a forced-choice minimal-pair identification task in which children heard synthesized audio stimuli that varied systematically along a/p-b/and/t-d/Voice Onset Time(VOT) continuum and were asked to match them with one of two pictures for each contrast. The results of Bayesian mixed-effects logistic regression analyses indicate that the bilinguals’category boundary for English stops was impacted by their experience with Spanish, with more short-lag VOT tokens being perceived as voiceless consistent with Spanish VOT. Age solely predicted English perceptual skills, whereas input quantity was the only moderator of Spanish perceptual performance. Finally, the bilingual children showed separate stop voicing contrasts in each language, although perceptual performance was already more mature in English by preschool age. Implications for theories of bilingual speech learning and the role of sociolinguistic variables are discussed.

Key words Speech perception; Voice Onset Time; Bilingualism; Heritage language; Preschoolers; Spanish; English

Phonological mediation effects in imitation of the Mandarin flat-falling tonal continua

Wei Zhang a, Meghan Clayards ab, Francisco Torreira a

a Department of Linguistics, McGill University, H3A1A7Montreal, Canada

b School of Communication Sciences and Disorders, McGill University, H3A1G1Montreal, Canada

Abstract

Phonetic imitation has been found to be mediated by phonological contrast. For features whose values vary around a phonological prototype, the imitation is distorted by the phonological category, i. e., the imitation is nonlinear. This phonological mediation effect was mostly found in segmental features such as VOT and formants. Supra-segmental features, on the contrary, are generally found to be easy to imitate, i. e., the imitation is linear. Nevertheless, whether the phonological effect exists in the imitation of supra-segmental features is not fully understood. This study, through an imitation experiment of Mandarin flat-falling tonal continua, examined whether a supra-segmental feature would be linearly imitated when it is the primary cue(F0range) and the non-primary cue(duration) to the tonal contrast, respectively. Results showed that F0range imitation was non-linear while duration imitation was linear. This reveals that the phonological effect is stronger in mediating imitation than would be predicted by the general hypothesis that supra-segmental features are easier to imitate.

Key words Phonetic imitation; Tonal contrast; Mandarin tones; Cue weighting

Loss of unreleased final stops among Mandarin-Min bilinguals: Structural convergence of languages in contact

Wei-Cheng Weng a, Sang-Im Lee-Kim b

a National Yang Ming Chiao Tung University, Hsinchu, Taiwan

b HIPCS Hanyang University, Seoul, Korea

Abstract

The two languages of a bilingual speaker are interconnected and mutually influence linguistic forms and structures. This study presents a case in which two languages in contact exhibit phonotactic asymmetries but converge on abstract phonological units by bilingual speakers. The specific case examined here concerns the change-in-progress of unreleased final stops among young Mandarin-Min bilingual speakers in Taiwan. Phonotactically, obstruent finals are illegal in Taiwan Mandarin, whereas Taiwanese Southern Min(TSM), a local substratum language, allows obligatorily unreleased final stops. In the discrimination of stimuli modeled after TSM, bilingual listeners were consistently outperformed by Korean listeners, a non-native reference group without restrictions against obstruent finals. A follow-up production study revealed that final stops produced by the bilingual speakers were prone to deletion accompanied by vowel lengthening, similar to a long vowel in an open syllable, as well as frequent substitution. Furthermore, strong correlations were found between bilingual speakers’perception and production accuracy, indicating a bidirectional co-evolution between perception and production during language development. Taken together, the results suggest that a loss of unreleased final stops is underway in TSM through the structural convergence of two interacting phonological systems within bilingual individuals.

Key words Language contact; Bilingualism; Structural convergence; Sound change; Unreleased final stops; Stop place contrasts; Taiwanese Southern Min

An acoustic analysis of rhoticity in Lancashire, England

Danielle Turton, Robert Lennon

Lancaster University, United Kingdom

Abstract

This paper presents the first systematic acoustic analysis of a rhotic accent in present-day England. The dataset comprises spontaneous and elicited speech of28speakers from Blackburn in Lancashire, Northern England, where residual rhoticity remains, having never been lost in the earlier sound change which rendered most of England non-rhotic. Although sociolinguistic studies of rhoticity in England exist, we have almost no description of its phonetic properties. Moreover, most sociolinguistic studies focus on the South West of England and relatively little is known about rhoticity in the North. Our study is timely because Northern rhoticity is predicted to disappear in the next few generations, a process which is now complete in many areas of the South West. Our results demonstrate that rhoticity is still present in Blackburn, although non-prevocalic/r/is weaker when compared to other rhotic varieties of English such as those in Scotland and North America. We find that non-prevocalic/r/is phonetically weakening in apparent time, with the F3-F2difference being larger for younger speakers as well as females. We present additional social and linguistic factors affecting its potential demise, and discuss how our results contribute to our understanding of historical/r/-loss in Anglo-English.

Key words /r/; Rhoticity; Varieties of English; Sociophonetics; Sound change

The relation between musical abilities and speech prosody perception: A meta-analysis

Nelleke Jansen abc, Eleanor E. Harding bcd, Hanneke Loerts ac, Deniz Başkent bc, Wander Lowie ac

a Department of Applied Linguistics, Center for Language and Cognition, Faculty of Arts, University of Groningen, Oude Kijk in 't Jatstraat26, 9712EK Groningen, The Netherlands

b Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Hanzeplein1, 9713GZ Groningen, The Netherlands

c Research School of Behavioural and Cognitive Neurosciences, University of Groningen, Postbox196, 9700AD Groningen, The Netherlands

d Prince Claus Conservatory, Hanze University of Applied Sciences, Meeuwerderweg1, 9724EM Groningen, The Netherlands

Abstract

Previous research has suggested a relationship between musical abilities and the perception of speech prosody. However, effect sizes and significance differ across studies. In a meta-analysis, we assessed the overall size of this relation across109studies and investigated which factors moderated the effect. We found a significant, medium-sized positive correlation between musical abilities and speech prosody perception. This correlation was larger for studies on non-native compared to native prosody perception. We attribute this difference to ceiling performance in native perception, while non-native perception may be more difficult and can thus be facilitated by musical abilities. In addition, prosody perception was more strongly correlated with music perception than with music training, possibly because training metrics disregard untrained individuals with naturally strong musical abilities. Further analyses showed a stronger correlation for prosodic pitch compared to prosodic timing perception, and a stronger correlation for behavioural accuracy measures compared to reaction times. We did not find differences in effects between linguistic and emotional prosody, between L1tone language users or non-tone language users, or between adults and children. This meta-analysis generally supports theories proposing a connection between music and speech prosody. Furthermore, this study highlights the potential importance of individuals’musical abilities for the acquisition of second language prosody.

Key words Prosody; Perception; Musical abilities; Meta-analysis

Advancements of phonetics in the 21st century: Theoretical and empirical issues of spoken word recognition in phonetic research

Natasha Warner, Department of Linguistics, University of Arizona, Box210025, Tucson, Az85721-0025, USA

Abstract

How do listeners understand what they are hearing? Humans hearing speech perform spoken word recognition, recognizing what words they are hearing in a speech stream in order to understand the meaning. Phonetics refers to the properties of the speech at a detailed level, particularly below the level of segmental phonemic distinctions. In order to recognize spoken words, listeners have to extract information from the detailed acoustic signal in some way, but theories differ about whether listeners extract phonemes, whole words, or other units, by what mechanism, and they differ on what kinds of information are stored in the lexicon. The process of spoken word recognition can be affected by any number of situations such as the speaker or listener being a non-native of the language or dialect, being a child, having a speech/hearing disability, hearing speech in noise, the speech itself containing variability, or many other situations. Any of these situations can shed light on theoretical questions by giving a fuller picture of how listeners recognize words. This chapter examines what we have learned in these first∼21years of the21st century about how phonetics interacts with spoken word recognition.

Key words Spoken word recognition; Exemplar; Abstractionist; Phonetics; Modeling

期刊简介

The Journal of Phonetics publishes papers of an experimental or theoretical nature that deal with phonetic aspects of language and linguistic communication processes. Papers dealing with technological and/or pathological topics, or papers of an interdisciplinary nature are also suitable, provided that linguistic-phonetic principles underlie the work reported. Regular articles, review articles, and letters to the editor are published. Themed issues are also published, devoted entirely to a specific subject of interest within the field of phonetics.

官网地址：https://www.sciencedirect.com/journal/journal-of-phonetics/about/aims-and-scope

本文来源：Journal of Phonetics官网

点击文末“阅读原文”可跳转下载

桐城一派｜突发！湖南省财政厅厅长刘文杰坠楼身亡

因为地铁逃票，警察拔枪乱射，无辜乘客爆头

陈佩斯，这次真悬了！

不能返税、不能补贴，招商局长们怎么办？

大，无需多言，事实胜于雄辩

刊讯｜SSCI 期刊《语音学杂志》2023年第96-101卷

您可能也对以下帖子感兴趣

桐城一派｜突发！湖南省财政厅厅长刘文杰坠楼身亡

因为地铁逃票，警察拔枪乱射，无辜乘客爆头

陈佩斯，这次真悬了！

不能返税、不能补贴，招商局长们怎么办？

大，无需多言，事实胜于雄辩

生成图片，分享到微信朋友圈

刊讯｜SSCI 期刊《语音学杂志》2023年第96-101卷

您可能也对以下帖子感兴趣