Julio de los Reyes Lozano
Universitat Jaume I
https://orcid.org/0000-0002-4539-0757
Laura Mejías-Climent
Universitat Jaume I
https://orcid.org/0000-0003-2933-7195
Abstract
In an increasingly globalized world, technological advancements and artificial intelligence (AI) have catalysed the proliferation of audiovisual productions and their translation, leading to new dynamics in the professional landscape. Tools and practices such as machine translation (MT) and post-editing (PE) are becoming widespread, augmenting and diversifying translators’ tasks and creating new professional profiles and educational needs. While some specialized fields with more restrictive languages and terminology have readily incorporated MT, recent developments in neural MT (NMT) engines have led to their use in more creative areas such as literary or audiovisual translation (AVT). However, this trend raises concerns about the quality of both the raw and the final output, user perceptions, ethics and working conditions. Academic studies which explore the convergence of MT and AVT therefore become crucial to understanding the implications of this technology for all stakeholders: professionals, students and instructors, researchers, audiences and society at large. This article delves into the impact of neural MT and other AI-based tools on AVT workflows, both professionally and academically, and stresses the need for further research in this area. It then proceeds to outline the main research directions in this field as evidenced by the contributions included in this special issue, particularly those regarding the incorporation of MT into different AVT modes.
Keywords: neural machine translation, post-editing, dubbing, audiovisual translation, subtitling
1. Introduction
Since the turn of the century, technological development has catalysed the spread of internet access and has proved pivotal in the rapid proliferation of audiovisual productions and their translations via subtitling and dubbing, among other audiovisual translation (AVT) modes, to reach wider and ever more global audiences. Ongoing technological advances have resulted in the increasing use of new tools and software, and the mechanization and streamlining of professional processes (see Bolaños-García-Escribano et al., 2021; Díaz-Cintas, 2013; do Carmo & Moorkens, 2022; Georgakopoulou, 2018, among others, for a comprehensive review of such technologies and their evolution). In addition, the development of artificial intelligence (AI) has boosted the use of machine translation (MT) in recent years, particularly neural machine translation (NMT), and post-editing (PE) (Leiva Rojo, 2018). The more frequent use of MT and PE, along with other AI-based tools such as computer vision, speech recognition or speech synthesis, is creating new dynamics and profiles in the industry. In the process, their use has introduced considerable changes in the working conditions of translation professionals (do Carmo, 2020; do Carmo & Moorkens, 2022). In addition to economic, ethical (Moniz & Parra Escartín, 2023) and reception issues that require urgent review by academics and professionals, the very term translator is being questioned (Asscher, 2023) in favour of new terms such as post-editor and text localizer (Pym, 2013). This is because the tasks being performed by translators are starting to diversify and require the use of specific technologies or software which sometimes might be imposed by companies seeking to optimize time and investment.
Fields in which specialized or restrictive languages are used, such as legal or scientific-medical translation, have been more permeable to the incorporation of MT until recently. It is the development of MT engines based on neural networks that has allowed for such naturalness and creativity that even areas such as literary translation and AVT have incorporated automatic engines into their professional workflows. This seems to be the trend for the future and, just like any advancement in technology, the gradual expansion of NMT has attracted everyone’s attention, at the same time setting off alarm bells in professional and academic circles. As in Black Mirror (Brooker & Jones, 2011-present), a British television series presenting speculative fiction scenarios where a dystopian society must face the consequences of its own technological evolution, we are now prompted to reflect on the possible ramifications of NMT. We should also be reflecting on the impact it may have on AVT and other forms of translation and interpreting, and on society as a whole. These include cultural implications, social biases, the quality of automation, user perceptions, and environmental and ethical issues such as the widespread and inappropriate use of protected intellectual property, among many other considerations.
The term “black mirror” itself is a reference to the reflective screens of our electronic devices when they are turned off. More broadly, the “black mirror effect” can be used to describe the phenomenon of recognizing the possible outcomes of rapid technological advancement, but also to encourage critical thinking about how we integrate technology into our lives and interact with it.
In this context, conducting thorough academic analyses of the way these tools work and what their implications are is essential. Such analyses should gather data and shed light on the incorporation of technologies into AVT practice and the ways in which they can truly contribute to optimizing processes and maintaining or improving quality – which directly relates to audience perceptions. The following sections briefly describe the development of AI and its incorporation into AVT in recent years, including tools such as automatic speech recognition (ASR), speech-to-text (STT) and text-to-speech (TTS) software. A particular focus will be on NMT engines by presenting some of the most prominent research that has been carried out on the incorporation of MT into the different AVT modes. In this way, we outline the main paths in which there is still much research to be done and to which this special issue intends to make its modest contribution.
2. Professional, academic, and educational approaches to MT, AI, and AVT in recent years
Throughout the history of this professional practice, the development of technological tools has always been closely associated with AVT (Díaz-Cintas, 2013). For example, the introduction of technology was a milestone during the 1990s (Georgakopoulou, 2018, pp. 518–519), when ASR software or STT tools and computer-assisted translation (CAT) tools, based on translation memories, began to be used. Master templates in subtitling were also implemented in the subsequent years (Bolaños-García-Escribano et al., 2021) and, more recently, programs in the cloud started to enable simultaneous work on the same projects from different locations (Spiteri-Miggiani, 2023). More recently, speech synthesis or TTS and speech-to-speech (STS) programs, in addition to MT and PE, seem also to have been incorporated irrevocably into AVT (Mejías-Climent & de los Reyes Lozano, 2023).
The stable relationship between AVT and technology can also be viewed from a different perspective. On the one hand, technology materializes in the form of cutting-edge tools that are gradually incorporated into professional workflows, as expressed above. On the other hand, technology also fosters globalized consumption methods and access to audiovisual products. This, in turn, increases demand, creating a need for more language service providers. As mentioned, MT is the technological tool that has arguably brought about the most significant revolution in recent years, hence the focus on it here. Understood as the production of translated texts from one natural language to another with or without human intervention, MT represents the most profound change yet in the role of the translator (Cid-Leal et al., 2019). Since the initial MT engines of the 1950s, which were very basic in nature, their development has been exponential and nowadays mainly three types of MT engine can be cited: rule-based, statistical and NMT, the improved performance of which is becoming increasingly closer to human parity (Rico Pérez, 2023; Toral, 2020). This has led to the displacement of other MT engines and also the use of NMT in more creative contexts, such as AVT.
All of these technologies – ranging from NMT to other AI-based tools such as ASR, STT and TTS – are gradually being incorporated into professional practice due to the industry’s need to satisfy increasingly globalized and immediate consumption that aims to cover all possible local markets and respond to the demands of more and more consumers (Rico Pérez, 2023). In view of the many concerns that AI, and MT in particular, generate among AVT professionals and audiences, putting on a blindfold and acting as if technology did not exist might not seem to be the best approach to the issue (Chaume & Díaz Cintas, 2023). On the contrary, defending labour rights, improving the working conditions of practitioners, and respecting certain ethical values is essential, but these should always be broached from a sensible and informed position. This analytical perspective can be promoted by studies such as those presented in this issue, studies which target some research gaps and gather reliable data on current market trends, educational practices and research approaches.
In particular, resistance to the incorporation of MT engines into AVT seems to be based on the following reasons: when an audiovisual text is processed for translation, its semiotic nature involving not only linguistic content but also a complex visual and acoustic configuration creates constraints (Chaume, 2012) that MT is not fully able to overcome. Moreover, the translation of audiovisual content requires a level of interpretation and creativity, or hermeneutics, which, for the time being, machines do not seem to be able to deliver (Rico Pérez, 2023; Romero-Fresco & Chaume, 2022). Furthermore, the wide variety of genres and the lack of restricted languages and specialized terminology in fictional audiovisual texts greatly hinder the work of MT engines. This is because creativity generates an immense variability in linguistic structures and styles, which makes it impossible to establish concrete patterns to which a machine can adhere.
Despite these limitations, the first projects seeking to explore the incorporation of MT into AVT began to emerge at the turn of the century. Specifically, the AVT form in which most projects have been developed is subtitling, perhaps due to its written mode and technological nature (dedicated software is always required). This was followed more recently by dubbing and voiceover and also modes related to media accessibility, such as intralingual subtitling for the deaf and the hard of hearing (SDH) and audio description for the blind and the visually impaired (AD).
2.1. Subtitling and media accessibility through MT and AI: in search of productivity
In the academic context, several authors have investigated the application of MT systems to interlingual subtitling with special emphasis on the quality of the raw output of MT (i.e., not involving PE), the role of PE, and improvements in the productivity of subtitlers (see Díaz Cintas & Massidda, 2019, p. 261). As for larger projects that aimed to combine MT and subtitling, MUSA (Multilingual Subtitling of Multimedia Content, 2002–2004[1]) and SUMAT (Subtitling by Machine Translation, 2011–2014[2]) are usually mentioned as the first. Although these projects still employed non-neural MT and therefore the translation results were not very accurate, they did pave the way for the potential offered by MT for subtitling if the text is pre-edited adequately and thorough PE processes are also involved (Sánchez-Gijón, 2016). More recently, in 2017, the Universitat Politècnica de València and researchers from the Machine Learning and Language Processing (MLLP) group and the Information and Communication Systems Area (ASIC) developed an internationally pioneering online service for the automatic multilingual transcription and subtitling of educational audiovisual content, called poliTrans.[3]
Using similar technology to poliTrans, the European project EMMA (European Multiple MOOC Aggregator project, 2019–202[4]) focuses on the transcription, MT and subtitling of educational videos. Similarly, the TraMOOC project (Translation for Massive Open Online Courses, 2015–2018[5]), funded by the European Commission, offers subtitling services in 11 language pairs for online educational videos, which entails integrating NMT services in a collaborative platform (Díaz Cintas & Massidda, 2019, p. 262). ASR is now also being used to automate subtitling in platforms such as YouTube or the SAVAS project (Live Subtitling and Captioning Made Easy, 2012–2014[6]), which focuses on developing independent ASR technology that generates multilingual subtitles in television broadcasts. Regarding television broadcasting, the Spanish television channel La 1 de TVE has, since 2022, included 17 territorial news programmes with automatic subtitles (De Higes Andino, 2023). YouTube also introduced accessibility options in 2008 and, since 2010, it has been offering automatic subtitling using Google’s voice recognition technology and the NMT engine Google Translate.
Similarly, at the Universitat Autònoma de Barcelona, several projects have been carried out that have combined MT and AD. One of these is ALST (Linguistic and Sensorial Accessibility: Technologies for Voiceover and Audio Description, 2013–2016[7]), which compares the process of developing AD manually and by MT and PE according to the effort and time spent (Fernández-Torné & Matamala, 2015). Another example is the MeMAD project (Methods for Managing Audiovisual Data, 2018–2021[8]) of the Aalto University School of Science. Its objective was to develop a method for the efficient reuse of audiovisual content, especially television and video-on-demand (VOD) platforms, to enhance as much content as possible with complementary subtitles and AD through “supervised automation”. Wang et al. (2021) took a (giant) step forward and built a system that analyses the audiovisual content of a video and then generates the AD: first it predicts the insertion time; then it artificially generates the AD; finally, Google Cloud Text-to-Speech is applied to convert the audio descriptions into speech almost instantly.
2.2. Dubbing, MT, and AI: the final frontier
As stated previously, NMT and other AI-based tools are booming in the audiovisual industry, with dubbing being one of the slowest AVT modes to embrace these technologies. The use of such tools in professional dubbing appears to be limited to the non-fiction genre, with a very restricted audiovisual configuration, specific terminology and usually involving off-screen voices or only one character speaking in front of the camera. Therefore, educational or informative videos seem to be the most appropriate with which to explore the incorporation of automatic technology, as illustrated by some of the projects that are mentioned below. In addition, there is EXPERT (Educational eXplanations and Practices in Emergency Remote Teaching), a transnational Erasmus+ project, in which the areas of work focus on the (semi-)automatic dubbing of video lectures by using state-of-the-art speaker-adaptive technologies (Pérez González de Martos et al., 2022). In this context, dubbing and AI have been key to companies’ investments lately, as was the case with startups such as Dubdub, Dubverse and Deepdub, whose main scope is educational videos (Wyndham, 2022), and more recently Aloud[9] (Kottahachchi & Abeysinghe, 2022), Rask.ai,[10] Dubbah[11] or HeyGen,[12] which are tools capable of imitating (or even cloning) the original speakers’ voices (Pérez Colomé, 2023).
Regarding studies that analyse the use of MT in dubbing, some major projects can be cited. These cover all the production stages, from transcription to speech synthesis, especially for voiceover, and are restricted to non-fiction genres. First, Matousek and Vít (2012) exploited the results obtained from applying MT to subtitling to adapt a text for voiceover. In a more comprehensive way, Taylor et al. (2015) developed for Disney a method for automatic dubbing in which the target text is created using pronunciation dictionaries and language models to match the specific visemes of a video sequence. Under the Google umbrella, Yang et al. (2020) described a system for large-scale AVT and dubbing, where the original content was transcribed, translated, and automatically synthesized into target language speech using the original speaker’s voice and synthesizing lip movements for the speaker to match the translated audio. In this regard, Amazon has developed software to automate the entire dubbing process (Brannon et al., 2023; Federico et al., 2020a; Lakew et al., 2021): it focuses on speech synthesis, synchronizing the translated transcript with the original utterances (Federico et al., 2020b) and taking isochrony into consideration (Tam et al., 2022). Finally, the Fondazione Bruno Kressler in Trento has sought to implement dubbing automation strategies based on the differences between on- and off-screen shots (Karakanta et al., 2021).
In addition to these big-budget projects, DubTA (2021-2022) can also be mentioned as a more limited project, one developed by the research group TRAMA (Universitat Jaume I). It aims to assess the possibility of applying NMT to dubbing by correctly preparing or pre-editing a transcribed script based on a series of errors that could be considered common in PE (Villanueva Jordán & Romero-Muñoz, 2023). According to existing research such as the projects mentioned above, special emphasis seems to be placed on improving ASR and TTS, and also the technological processes involved in generating dubbed audio. Nonetheless, the necessary labelling and preparation (or pre-editing) of the source text, considering the requirements of dubbing, seem to remain somewhat under-explored (de los Reyes Lozano & Mejías-Climent, 2023), indicating a research gap that this recent project aimed to start bridging.
2.3. MT and new professional profiles reach educational contexts
The need to provide future translation professionals with comprehensive knowledge of the trends in the professional market has been pointed out by authors such as Hurtado Albir (2019), who places special emphasis on the importance of bringing new technological tools into the classroom. In this regard, among many others, the DITAPE project (Universitat de València) has recently made an interesting contribution by identifying attitudes and trends in the educational landscape in Spain regarding the incorporation of MT and PE into the classroom. As presented in Cerezo Merchán and Artusi (2023), this project aimed to fill a gap between professional practices and education; and, in fact, it reflects the interviewed teachers’ awareness of the paradigm shift caused by the incorporation of technologies into professional workflows. In particular, MT and PE are not considered a threat that seeks to end human translation but rather as a new tool that is becoming essential if future professionals are to meet the needs and demands of the market.
Other projects, such as those carried out by Wang (2023), Massida and Sandrelli (2023), Nitzke et al. (2019), Guerberof Arenas and Moorkens (2019), Cid-Leal et al. (2019), Moorkens (2018), Pérez Macías (2017), and Rico Pérez (2017), to name but a few, have already explored the benefits of introducing innovative educational practices in the classroom using MT and PE. The aim of these experiences is to respond to current market trends, prepare translators-to-be to be open-minded towards optimization processes, and to enrich the current and specialized training of Translation and Interpreting students.
Considering this shifting paradigm, and as indicated by the latest European Language Industry Survey (ELIS) (2022), the use of MT in the professional sector is already a reality. This survey gathers more than 1,300 responses from translation companies, freelance translation professionals, public and private translation departments and higher-education institutions offering translation studies across Europe and indicates that MT is being used by most agents in more than 20% of translation projects. In this sense, it is inevitable that the profile of professional translators will continue to diversify. Their tasks might now involve not only translation, but also the review and improvement of MT output (PE). In fact, the PE process might not be so removed from a translation process itself due to the complexity and cognitive processes involved, as pointed out by authors such as do Carmo and Moorkens (2020, 2022).
To regulate MT and PE practices, the ISO standard 18587:2017 entitled Translation Services–Post-editing of Machine Translation output–Requirements was published in April 2017 (ISO, 2017). This norm is aimed at translation service providers, clients and post-editors, and describes both the requirements for undertaking PE tasks and the competencies required by post-editors. Similarly, the UNE-ISO standard 18587 (AENOR, 2020) was published in Spain three years later and has been analysed by de los Reyes Lozano and Mejías-Climent (2023) who have adapted these requirements to the AVT context and to dubbing in particular. As stated in this publication, new professional profiles include skills such as being able to work with MT raw output and post-editing it adequately, adding a human touch to the limited creative and hermeneutic capacity of processing a multimodal text and reproducing its intended message in a target text. Moreover, these new augmented professional profiles must include specialization in both MT and AVT processes (do Carmo & Moorkens, 2020) – besides, of course, a strong translation competence, the following skills: linguistic, textual, cultural, technical, research and document competence, and information acquisition and processing.
Despite this first proposal for standardizing the competences required by PE, much research is still needed in this field to systematize the requirements involved in the correct processing of MT output and its adaptation to target text. Such systemization is characterized, in the case of AVT, by a series of formal and multimodal guidelines. Considering these needs, further research is also needed to identify the main elements to be pre-edited, labelled and, subsequently, post-edited and adapted to dubbing scripts; also included should be an indication of the extent to which the MT and PE processes would be feasible in professional processes.
In short, the technological framework described above calls for the need to delve into the various forms of convergence between MT, AI and AVT, including:
· the conceptualization of the translation task (Asscher, 2023; do Carmo & Moorkens, 2020);
· the latest technological advances (Bolaños-García-Escribano et al., 2021; Díaz Cintas & Massidda, 2020);
· the different considerations arising from the use of NMT, its ethical and professional implications (Federici et al., 2023; Kenny et al., 2020);
· reviews of the current automation practices in AVT processes; and
· what the prospects for the future are (Moniz & Parra Escartín, 2023; Sánchez Ramos & Rico Pérez, in press) in the academic, professional and educational fields.
In the contributions to this special issue that follow, some of the elements of the use of MT and AI in AVT processes are reviewed from professional, academic, and educational perspectives.
3. Emerging topics in AVT, MT, and AI
As pointed out, the incorporation of MT and other AI-based tools into the professional processes of AVT seems unstoppable and opens up new lines of research that analyse the implications of this technology in the professional spheres, in different AVT modes, and in the educational field and the new profiles for which updated training is required.
The contributions presented in this special issue begin to enrich this emerging research: on the one hand, the professional changes and attitudes of translation professionals towards this technology are explored; on the other, the opportunities offered by state-of-the-art automatic software for dubbing and subtitling are also aired. In addition, teaching experiences that innovate by incorporating MT into the classroom are also shared. It should be noted that the order in which the contributions are presented below is a choice among many that could have been made, since, as will be seen, the topics analysed converge, overlap at different levels and are closely related. In any event, from one perspective or another, they represent a valuable addition to research in MT and AVT.
3.1. Professional issues
The professional field of translation, as mentioned above, has undergone considerable changes with the advent of MT engines, especially due to the exponential development of this technology in combination with neural networks. To provide an overview of the evolution and incorporation of these technologies into the AVT sector, Granell and Chaume present the changes brought about by digitization. AVT, broadly understood as a synonym for media content localization and not only as a particular practice of linguistic transfer, is undergoing a revolution that was unsuspected only a few years ago – even in those territories where viewers are less accustomed to localized content. Digitalization and technological changes, which have had such an impact on the way audiovisual texts – whether original, localized or adapted – are produced, distributed, edited, consumed, and shared, have also had a substantial impact on the AVT profession. In their contribution, Granell and Chaume explore the ways in which technology has been evolving as an aid to translators: from being a merely clerical aid for transcribing digital texts to automating tasks and integrating MT into human translation processes. This it has done by providing a range of tools to help translators with their work processes, progressively migrating both tools and processes to cloud-based environments. The focus is then set on AVT, and particularly on dubbing, where digitalization has shaped the consumer market and now poses several challenges for language technology developments and professional AVT practices. Academia has also paid attention to such developments and has increasingly been dealing with several issues affecting both practice and training to cater to the needs of current media markets. A final word is devoted to proposing a literacy-based framework for translators’ training that not only embraces technology to incorporate automation as an additional aid, but also redefines the audiovisual translator’s workstation.
Also from a professional perspective, the study by Koglin, Cândido Moura, Aparecida de Matos, and Pereira da Silveira shifts the focus to the translation product and investigates the perceived quality of machine-translated post-edited interlingual subtitles. Based on data collected from Brazilian professional translators, their study investigated whether the use of MT has an impact on the perceived quality and acceptability of interlingual subtitles. They also examined those technical parameters and linguistic aspects which were considered troublesome and whether translators would report any evidence that the subtitles were machine-translated. Sixty-eight Brazilian translators volunteered to participate in this study, which involved their filling out a questionnaire, watching a movie trailer with post-edited subtitles, and then assessing their quality by rating them on a Likert-type scale. Finally, the participants answered a written verbal protocol whose data were triangulated with the quantitative results from the Likert-type scale, combined with IBM SPSS Statistics used for statistical analysis. Interestingly, their findings show that most translators assessed the post-edited subtitles as satisfactory or very satisfactory. The authors consider that these results seem to be an indication of the acceptability of MTPE subtitles as they find support in the written verbal protocol answers. Regarding the technical and linguistic parameters, although some of the participants reported some issues such as a lack of accuracy, the use of literal translation, and unnatural subtitles, no one would explicitly affirm that the subtitles were machine translated and post-edited. There is no doubt that reception studies are essential to understanding the acceptability of AVT among audiences, whether MT is applied or not. In this sense, the authors of this article follow in the footsteps of Künzli (2022) or Menezes (2023) and ask professionals directly, opening an interesting line of research that should be prioritized in future studies and combined with audience studies.
3.2. New practices and up-to-date software
This changing professional landscape, as can be deduced, is a consequence of the adoption of new practices such as PE, among many others (Wang, 2023), and the incorporation of new software that seeks to optimize the work of translators. In this regard, in an attempt to cope with the higher demand for dubbing experienced over the past few years, media companies have invested in AI, with significant developments having ensued. As discussed above, organizations such as Google, Amazon and Disney have seen the potential in automatic dubbing. This is understood as being the combination of ASR, MT and TTS technologies with a view to automatically replacing the audio track of an original audiovisual text with synthetic speech in a different language, taking into consideration the relevant synchronies. In the light of the latest developments in this field, the article by Baños sets out to provide an overview of automatic dubbing in the current mediascape and to identify some of the main challenges with its implementation, especially regarding the integration of MT and speech synthesis in the dubbing workflow. This is achieved through an analysis of the educational videos posted in the Spanish version of the YouTube Channel Amoeba Sisters; these were dubbed using Aloud, an automatic dubbing solution from Area 120, Google’s in-house incubator for experimental projects. Baños’s evaluation focuses on the issues highlighted by users in their comments on this same YouTube channel, which include naturalness and accuracy. Special attention is paid to the use of synthetic voices, which are heavily criticized by users – although the recent advent of tools that can also clone the voice of the original speakers, such as Rask.ai or HeyGen, may change this picture (Pérez Colomé, 2023).
However, in line with the original intention of the Aloud developers to increase accessibility, users also highlight the usefulness of automatic dubbing for students interested in biology-related topics who are not fluent in English. In this sense, despite the many drawbacks it presents, one of the positive perspectives of the adoption of AI is that this technology allows for many types of audiovisual content to be translated that would otherwise never be translated if this decision depended solely on market and industry expectations. This is the case with entertaining social media content, but also with audiovisual messages in crisis and emergency contexts (Federici et al., 2023, p. 148). This also applies to minority and/or minoritized languages, where the costs of conventional dubbing or subtitling would be ruled out, which would be a constraint on both creators and audiences who would want or need to consume such content in less widely spoken languages. However, as Pérez Colomé (2023) points out, AI could overcome these limitations, yet many lines of research on ethical implications and quality remain to be explored.
Jin and Yuan also point out that the use of AI technology for dubbing has become increasingly popular due to its ability to improve content production and dissemination. In particular, MT has been applied to audiovisual contexts, resulting in more efficient and productive AI content generation in some cases. To assess the quality of MT-dubbed videos, their study proposed a new model called FAS (functional equivalence, acceptability and synchrony), which they have developed from the FAR model devised by Pedersen (2017), with the “R (Readability)” parameter replaced with the “S” to include synchrony. This amendment is a response to the research on dubbing translation quality which points to the need to explore a method for achieving better synchronization between audio and video. Using the 2020 live-action remake of Mulan as a case study, Jin and Yuan’s study evaluates the quality of the automatically dubbed videos generated by YSJ (Ren Ren Yi Shi Jie), an MT platform for audiovisual products in China. By analysing errors of functional equivalence, acceptability and synchrony, their study assesses whether China’s latest MT engine can meet the demand for quality dubbing and improve cross-cultural communication. The results show that, whereas YSJ is able to generate a moderately acceptable result, some semantic, idiomaticity, and synchrony errors can still be detected, which may lead to the incorrect translation of certain information and, consequently, it’s possibly being misunderstood by viewers. Overall, this study sheds light on the current state of AI-dubbing technology in China – a country with a growing demand for dubbing and subtitling services – and highlights areas for improvement in the future.
Also focusing on MT as applied to Chinese–English subtitling, Tian’s study offers an exploratory analysis of the current literature on this practice; in addition, it draws our attention to the general constraints from three perspectives: technical, textual, and cultural. To overcome these constraints and achieve concision, comprehensibility, and coherence in subtitles, in her study Tian focuses on condensation, context, and coordination as the key strategies. However, these strategies pose considerable challenges to MT in Chinese–English subtitling, which are illustrated using examples of bilingual subtitles from American Factory, the 2020 Oscar winner of Best Documentary Feature. The official subtitles for this documentary are compared to those generated by three commonly used MT tools in Chinese–English translation: DeepL Translator, Youdao Translation, and ChatGPT. The study investigates the quality and potential of MT in subtitling and proposes possible solutions and suggestions for integrating MT into subtitling. The author considers this to be a promising solution, despite the unsatisfactory output produced by the three engines, particularly regarding concision – this in addition to the frequent cultural and textual constraints imposed by the text and the accuracy and coherence problems caused by the segmented nature of subtitles. Of the three MT tools, ChatGPT is considered to have potential, since precise instructions can be fed to it in order to produce customized translations.
Another contribution that explores the use of new software in the AVT environment is that by Romero-Fresco and Fresno, who also focus on the key role that accuracy plays in rendering automated live captioning accessible: closed captions, they assert, play a vital role in making live broadcasts accessible to many viewers. Traditionally, stenographers and respeakers have been in charge of the production of captions, but this scenario is changing due to the steady improvements that ASR has undergone in recent years. This technology is now being used to create intralingual live captions without human assistance and broadcasters have begun to explore its use. Human and automatic captions now co-exist on television and, while some research has focused on the accuracy of human live captions, comprehensive assessments of the accuracy and quality of automatic captions are still needed, as this special issue shows. The authors tackle this topic by presenting the main findings of the largest study conducted to date that explores the accuracy of automatic live captions. Through four case studies which include approximately 17,000 live captions in English analysed using the NER model (Romero-Fresco & Martínez, 2015) from 2018 to 2022 in the United Kingdom, the United States and Canada, this article tracks the recent developments with unedited automatic captions, compare their accuracy to that achieved by human beings and conclude with a brief discussion of what the future of live captioning looks like for both human and automatic captions. Overall, the authors believe that human captioners are still essential, since only they can provide the levels of accuracy and readability required for high-quality live captions, and thus guarantee access to audiovisual live content.
3.3. Integrating MT in educational settings
In addition to the transformation that the professional translation landscape has undergone as a result of the exponential development of technologies, education has also found it essential to update its curricula and teaching methods. In line with this, teachers of Translation have had to consider incorporating new professional profiles and new technological skills into their curricula in order to stay abreast of new professional practices that range from NMT to subtitling automation, including the proliferation of cloud-based software. Indeed, the rise of post-editors in media localization, specifically in subtitling, has been a reality for some time now, which has triggered the need for up-to-date training methods and academic curricula. It is against this backdrop that the study by Bolaños-García-Escribano and Díaz Cintas examined subtitling students’ perceptions of PE. In their contribution, they describe a total of four teaching experiences, conceived as practical experiments in interlingual subtitle PE (English into Spanish) and involving postgraduate students in AVT from both Spain and the United Kingdom. Their sample comprised 36 master’s-level students enrolled in translator training programmes that focus on AVT. Having adopted a mixed-methods approach, they collated the feedback received after each experience by means of online questionnaires. This feedback proved paramount to understanding the participants’ opinions of PE in the subtitling classroom. Paradoxically, most of respondents believe that training in subtitle PE should feature more prominently in Translation curricula even though they expressed their reluctance to undertake PE work professionally.
An educational approach is also adopted by Cerezo Merchán, who presents the results of a teaching experience that aimed to investigate the field of MT and PE in the subtitling classroom. Her experiment involved students who were taking a course in specialized translation from English into Spanish/Catalan towards the Degree in Translation and Interlinguistic Mediation at the Universitat de València. The students performed two translation tasks: one by means of MT and PE, using different MT engines, and the other without MT. The text chosen for translation was a documentary. The students measured the time spent on PE and human translation (HT) tasks, evaluated the errors generated by the different MT engines according to the subtitling correction scale of the subject, and, in written reports, reflected on the advantages and risks of the use of MT in subtitling. The results of this case study showed an improvement in productivity (measured according to time taken) when PE was used compared to when the translation was done from scratch. It was found that the quality of the translated text was slightly higher with human translation. The students’ comments also pointed to an improvement in productivity when MT was used, although they highlighted the difficulty of applying MT in subtitling – an AVT mode that involves many technical conventions with which MT engines are unfamiliar. Nevertheless, the author emphasizes that the advantages and disadvantages of MT and of PE in the training of future subtitlers should be explored further.
Also highlighting the need to update translation training, Wang and Xu set out to assess the use of NMT in AVT teaching. Their study, based on readability theory, proposes the use of different evaluation indicators that can be computed by WordSmith 5.0 and Coh-Metrix 3.0, rendering the evaluation of NMT output more manageable. By building five comparable corpora consisting of direct and post-edited subtitle translations of corporate videos from Chinese into English, they investigated the performance of four online NMT systems (Google Translate, Baidu Translate, Bing Translator, and Youdao Translate) in AVT teaching. Their statistical analyses and case studies indicate that Google Translate outperforms the other three platforms on all readability tests; furthermore, it can enhance the readability of post-edited subtitle translation across five levels (word, syntax, textbase, situation model, and genre and rhetorical). The remaining three platforms performed differently in the tests. In particular, Bing Translator can improve the readability of students’ subtitle translations at the word level, Youdao Translate can improve readability at the textbase level, and Baidu Translate can improve readability at the syntax and textbase levels. Further analyses based on concrete examples substantiate their statistical analyses. This study, therefore, contributes to existing research both by examining the application and performance of NMT in AVT and confirming its contribution to enhancing the readability of students’ subtitles and by offering potential directions for refining current NMT systems and their incorporation into the educational landscape.
4. Final remarks and future perspectives
With this special issue, we intended to shed light on the impact of MT and other AI-based tools on AVT workflows, and on the ways in which researchers are echoing this paradigm shift and also incorporating it into educational approaches. The advent of innovative MT technologies (particularly NMT), their improving results and the ease of access to them have accelerated the integration of these tools into all walks of life. If we think about automatic subtitling or dubbing, what was until recently considered science fiction has now become an astounding reality that is within everyone’s reach. The vast number of both free and paid applications available on the internet for translating audiovisual content to and from virtually any language – some of which are described and analysed in this issue – underscores the global demand for accessible and versatile AVT solutions in today’s interconnected world. Likewise, users of minority and/or minoritized languages and media accessibility services can also benefit from MT and gain greater independence by instantly accessing and understanding multiple forms of audiovisual content that they would not otherwise be able to access. However, sometimes MT may not capture the nuances, cultural context, specialized terminology, or technical specificities of an audiovisual text, which can possibly lead to misunderstandings or miscommunications. In this regard, we believe that reception studies should be promoted since they offer a valuable lens through which the quality of MT can be assessed. By understanding how audiences and professionals perceive and interact with audiovisual content translated using MT, we can gauge its effectiveness and identify areas for improvement.
In addition, this special issue highlights the range of transformations that the AVT profession is having to undergo. The incorporation of MT into AVT workflows calls for further research on areas such as the new professional profiles that might be required – for instance, the translator-post-editor – whose characteristics are yet to be clearly defined. Current and future professionals seem to be facing the need to move their workstations towards an augmented translation environment or, in the case of AVT and dubbing, “augmented dialogue writing” (de los Reyes Lozano & Mejías-Climent, 2023), where the incorporation of new technologies expands and, above all, should facilitate the functions of translation professionals. Indeed, recent studies suggest that, when thoughtfully integrated into the AVT workflow, NMT can improve productivity and efficiency; yet it also poses several challenges: translators are now expected to take on board a more diverse set of responsibilities, including PE, quality control, and the training of MT engines. In this regard, PE plays a crucial role in ensuring that the final output meets the desired standards. But, as can be observed throughout the articles of this special issue, the translator’s expertise remains indispensable in refining MT-generated content to align it with cultural nuances, context-specific references, and stylistic preferences. However, doing so also presents translators with a moral dilemma: acceptance that they are not the creators of their own translations. This situation involves a number of ethical issues, such as a client explicitly informing a translator that the text with which they will be working represents raw MT results, as the ISO 18587 norm states (ISO, 2017), or the way confidentiality is approached when the material is processed by freely available MT engines, to name but two.
What is clear from this special issue is that the AVT workflow will not continue to be managed as it has been until now, partially due to the rise of NMT. This shift is inevitably reflected in current research approaches and educational programmes. If we want to go beyond the black mirror effect and make correct and ethical use of MT in AVT, we must advocate responsible development, user education, and the synergistic integration of human expertise into MT capabilities. In other words, translators, scholars, and even students should shift their focus towards understanding and mastering processes such as pre-editing, PE, and quality control. Furthermore, the adequate and moral supervision, design, and training of MT engines also remain essential, while not leaving aside the enhancement of creativity. The evolution of this field of necessity also requires educational programmes in Translation to be re-evaluated so that they are better able to equip future AVT professionals with the skills needed to leverage MT in their work. As technology progresses, all stakeholders must stay alert to harnessing the power of MT in a dynamic AVT landscape.
References
AENOR (Agencia Española de Normalización) (2020). UNE-ISO 18587. Servicios de traducción. Posedición del resultado de una traducción automática. Requisitos. AENOR.
Asscher, O. (2023). The position of machine translation in translation studies. Translation Spaces, 12(1), 1–20. https://doi.org/10.1075/ts.22035.ass
Bolaños-García-Escribano, A., Díaz-Cintas, J., & Massidda, S. (2021). Latest advancements in audiovisual translation education. The Interpreter and Translator Trainer, 15(1), 1–12. https://doi.org/10.1080/1750399X.2021.1880308
Brannon, W., Virkar, Y., & Thompson, B. (2023). Dubbing in practice: A large scale study of human localization with insights for automatic dubbing. Transactions of the Association for Computational Linguistics, 11, 419–435. https://doi.org/10.1162/tacl_a_00551
Brooker, C., & Jones, A. (Executive Producers). (2011-present). Black Mirror [TV series]. Zeppotron; House of Tomorrow; Broke & Bones.
Cerezo Merchán, B., & Artusi, A. (2023). Percepciones y usos de la traducción automática por parte de los docentes de traducción audiovisual en España: Resultados del proyecto DITAPE. In L. Mejías-Climent & J. de los Reyes Lozano (Eds.), La traducción audiovisual a través de la traducción automática y la posedición: Prácticas actuales y futuras (pp. 129–150). Comares.
Chaume, F. (2012). Audiovisual translation: Dubbing. St. Jerome.
Chaume, F., & Díaz Cintas, J. (2023). Tiempos modernos: Desafíos y oportunidades de la traducción automática en los medios audiovisuales. In L. Mejías-Climent & J. de los Reyes Lozano (Eds.), La traducción audiovisual a través de la traducción automática y la posedición: Prácticas actuales y futuras (pp. xi–xii). Comares.
Cid-Leal, P., Espín-García, M.-C., & Presas, M. (2019). Traducción automática y posedición: Perfiles y competencias en los programas de formación de traductores. MonTI. Monografías de Traducción e Interpretación, 11, 187–214. https://doi.org/10.6035/monti.2019.11.7
de Higes Andino, I. (2023). Estudio del subtitulado automático bilingüe en la Comunitat Valenciana: El caso de L’Informatiu – Comunitat Valenciana de RTVE. In L. Mejías-Climent & J. de los Reyes Lozano (Eds.), La traducción audiovisual a través de la traducción automática y la posedición: Prácticas actuales y futuras (pp. 95–112). Comares.
de los Reyes Lozano, J., & Mejías-Climent, L. (2023). La norma UNE-18587 sobre posedición y traducción automática: Intersecciones con la industria del doblaje. In L. Mejías-Climent & J. de los Reyes Lozano (Eds.), La traducción audiovisual a través de la traducción automática y la posedición: Prácticas actuales y futuras (pp. 81–93). Comares.
Deryagin, M., Pošta, M., Landes, D., Wells, V., Carrington, C., Carraud, E., & Renard, E. (2021). AVTE machine translation manifesto. https://avteurope.eu/wp-content/uploads/2022/10/Machine-Translation-Manifesto_ENG.pdf
Díaz-Cintas, J. (2013). The technology turn in subtitling. In M. Thelen & B. Lewandowska-Tomaszczyk (Eds.), Translation and meaning Part 9 (pp. 119–132). Maastricht School of Translation and Interpreting.
Díaz Cintas, J., & Massidda, S. (2019). Technological advances in audiovisual translation. In M. O’Hagan (Ed.), The Routledge handbook of translation and technology (pp. 255–270). Routledge. https://doi.org/10.4324/9781315311258
Di Gangi, M., Rossenbach, N., Pérez, A., Bahar, P., Beck, E., Wilken, P., & Matusov, E. (2022). Automatic video dubbing at AppTek. Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, 351–352. https://aclanthology.org/2022.eamt-1.65.pdf
do Carmo, F. (2020). ‘Time is money’ and the value of translation. Translation Spaces, 9(1), 35–57. https://doi.org/10.1075/TS.00020.CAR
do Carmo, F., & Moorkens, J. (2020). Differentiating editing, post-editing and revision. In M. Koponen, B. Mossop, I. S. Robert, & G. Scocchera (Eds.), Translation revision and post-editing: Industry practices and cognitive processes (pp. 35–49). Routledge. https://doi.org/10.4324/9781003096962-4
do Carmo, F., & Moorkens, J. (2022). Translation’s new high-tech clothes. In G. Massey, E. Huertas-Barros, & D. Katan (Eds.), The human translator in the 2020s (pp. 11–26). Routledge. https://doi.org/10.4324/9781003223344-2
ELIS (2022). European Language Industry Survey 2022. Trends, expectations and concerns of the European language industry. Directorate-General for Translation. https://elis-survey.org/
Federici, F. M., Declercq, C., Díaz Cintas, J., & Baños Piñero, R. (2023). Ethics, automated processes, machine translation, and crises. In H. Moniz & C. Parra Escartín (Eds.), Towards responsible machine translation: Ethical and legal considerations in machine translation (pp. 135–156). Springer. https://doi.org/10.1007/978-3-031-14689-3_8
Federico, M., Enyedi, R., Barra-Chicote, R., Giri, R., Isik, U., Krishnaswamy, A., & Sawaf, H. (2020a). From speech-to-speech translation to automatic dubbing. arXiv. https://doi.org/10.48550/arXiv.2001.06785
Federico, M., Virkar, Y., Enyedi, R., & Barra-Chicote, R. (2020b). Evaluating and optimizing prosodic alignment for automatic dubbing. Interspeech 2020, 1481–1485. https://doi.org/10.21437/Interspeech.2020-2983
Fernández-Torné, A., & Matamala, A. (2015). Text-to-speech vs. human voiced audio descriptions: A reception study in films dubbed into Catalan. The Journal of Specialised Translation, 24, 61–88.
Georgakopoulou, P. (2018). Technologization of audiovisual translation. In L. Pérez-González (Ed.), The Routledge handbook of audiovisual translation (pp. 516–539). Routledge. https://doi.org/10.4324/9781315717166-32
Georgakopoulou, P. (2020). The faces of audiovisual translation. The many faces of translation: from video games to the Vatican. DG TRAD Conference 2019, 16–33. https://doi.org/10.2861/898773
Georgakopoulou, P., & Bywood, L. (2014). MT in subtitling and the rising profile of the post-editor. Multilingual, 25(1), 24–28.
Guerberof Arenas, A., & Moorkens, J. (2019). Machine translation and post-editing training as part of a master’s programme. The Journal of Specialised Translation, 31, 217–238.
Hurtado Albir, A. (2019). La investigación en didáctica de la traducción: Evolución, enfoques y perspectivas. MonTI. Monografías de Traducción e Interpretación, 11(11), 47–76. https://doi.org/10.6035/monti.2019.11.2
ISO (International Organization for Standardization) (2017). ISO 18587: 2017, Translation services – Post-editing of machine translation output – requirements. ISO.
Karakanta, A., Bhattacharya, S., Nayak, S., Baumann, T., Negri, M., & Turchi, M. (2021). The two shades of dubbing in neural machine translation. Proceedings of the 28th International Conference on Computational Linguistics, 4327–4333. https://doi.org/10.18653/v1/2020.coling-main.382
Kenny, D., Moorkens, J., & do Carmo, F. (2020). Towards ethical, sustainable machine translation. Translation Spaces, 9(1), 1–11, https://doi.org/10.1075/ts.00018.int
Kottahachchi, B., & Abeysinghe, S. (2022). Overcoming the language barrier in videos with Aloud. https://blog.google/technology/area-120/aloud/
Künzli, A. (2022). How subtitling professionals perceive changes in working conditions: An interview study in German-speaking countries. Translation and Interpreting Studies, 18, 91–112. https://doi.org/10.1075/tis.20107.kun
Lakew, S. M., Federico, M., Wang, Y., Hoang, C., Virkar, Y., Barra-Chicote, R., & Enyedi, R. (2021). Machine translation verbosity control for automatic dubbing. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 7538–7542. https://doi.org/10.48550/arxiv.2110.03847
Leiva Rojo, J. (2018). Aspects of human translation: The current situation and an emerging trend. Hermēneus: Revista de Traducción e Interpretación, 20, 257–294. https://doi.org/10.24197/HER.20.2018.257-294
Massida, S., & Sandrelli, A. (2023). ¡Sub! localisation workflows (th)at work. Translation and Translanguaging in Multilingual Contexts, 9(3), 298–315. https://doi.org/10.1075/ttmc.00115.mas
Matousek, J., & Vít, J. (2012). Improving automatic dubbing with subtitle timing optimisation using video cut detection. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/ICASSP.2012.6288395
Menezes, R. (2023). Paradójico o complementario: La revisión de subtítulos en el flujo de trabajo de traducción automática y posedición. In L. Mejías-Climent & J. de los Reyes Lozano (Eds.), La traducción audiovisual a través de la traducción automática y la posedición: Prácticas actuales y futuras (pp. 115–128). Comares.
Mejías-Climent, L., & de los Reyes Lozano, J. (Eds.). (2023). La traducción audiovisual a través de la traducción automática y la posedición: prácticas actuales y futuras. Comares.
Moniz, H., & Parra Escartín, C. (2023). Towards responsible machine translation: Ethical and Legal considerations in machine translation. Springer. https://doi.org/10.1007/978-3-031-14689-3
Moorkens, J. (2018). What to expect from neural machine translation: A practical in-class translation evaluation exercise. The Interpreter and Translator Trainer, 12(4), 375–387. https://doi.org/10.1080/1750399X.2018.1501639
Nitzke, J., Tardel, A., & Hansen-Schirra, S. (2019) Training the modern translator: The acquisition of digital competencies through blended learning. The Interpreter and Translator Trainer, 13(3), 292–306. https://doi.org/10.1080/1750399X.2019.1656410
Pedersen, J. (2017). The FAR model: Assessing quality in interlingual subtitling. The Journal of Specialised Translation, 28, 210–229.
Pérez Colomé, J. (2023, September 15). Feijóo, Belén Esteban y El Fary hablan en perfecto inglés: El doblaje automático en vídeo ya arrasa y confunde. El País. https://elpais.com/tecnologia/2023-09-15/belen-esteban-y-el-fary-hablan-en-perfecto-ingles-el-doblaje-automatico-en-video-ya-arrasa.html
Pérez González De Martos, A. M., Giménez Pastor, A., Jorge Cano, J., Iranzo Sánchez, J., Silvestre Cerdà, J. A., Garcés Díaz-Munío, G. V., Baquero Arnal, P., Sanchis, A., Civera, J., Juan, A., & Turró, C. (2022). Doblaje automático de vídeo-charlas educativas en UPV[Media]. Proceedings of the In-Red 2022 – VIII Congreso Nacional de Innovación Educativa y Docencia en Red. https://doi.org/10.4995/INRED2022.2022.15844
Pérez Macías, L. (2017). Análisis de las percepciones en torno a la práctica de la posedición en el sector profesional de la traducción en España [Unpublished doctoral dissertation]. Universidad Pablo de Olavide.
Pym, A. (2013). Translation skill-sets in a machine-translation Age. Meta: Journal des traducteurs, 58(3), 487–503. https://doi.org/10.7202/1025047ar
Rico Pérez, C. (2017). La formació de traductors en Traducció Automàtica. Tradumàtica: Traducció i Tecnologies de la Informació i la Comunicació, 15, 75–96. https://doi.org/10.5565/rev/tradumatica.200
Rico Pérez, C. (2023). Caminos convergentes en traducción audiovisual, traducción automática y posedición. In L. Mejías-Climent & J. de los Reyes Lozano (Eds.), La traducción audiovisual a través de la traducción automática y la posedición: Prácticas actuales y futuras (pp. 1–14). Comares.
Romero-Fresco, P., & Chaume, F. (2022). Creativity in audiovisual translation and media accessibility. The Journal of Specialised Translation, 78, 75–101.
Romero-Fresco, P., & Martínez, J. (2015). Accuracy rate in live subtitling: The NER Model. In J. Díaz Cintas & R. Baños Piñero (Eds.), Audiovisual translation in a global context: Mapping an ever-changing landscape (pp. 28–50). Palgrave Macmillan. https://doi.org/10.1057/9781137552891_3
Sánchez-Gijón, P. (2016). La posedición: Hacia una definición competencial del perfil y una descripción multidimensional del fenómeno. Sendebar, 27, 151–162.
Sánchez Ramos, M. M., & Rico Pérez, C. (2020). Traducción automática: Conceptos clave, procesos de evaluación y técnicas de posedición. Comares.
Sánchez Ramos, M. M., & Rico Pérez, C. (Eds.). (in press). La traducción automática en contextos especializados. Peter Lang.
Spiteri-Miggiani, G. (2023). Cloud studios and scripts: Evolving workspaces and workflows in dubbing. In C. Pena-Díaz (Ed.), The making of accessible audiovisual translation (pp. 145–175). Peter Lang.
Tam, D., Lakew, S. M., Virkar, Y., Mathur, P., & Federico, M. (2022). Isochrony-aware neural machine translation for automatic dubbing. Interspeech 2022. https://doi.org/10.21437/Interspeech.2022-11136
Taylor, S., Theobald, B. -J., & Matthews, I. (2015). A mouth full of words: Visually consistent acoustic redubbing. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4904–4908. https://doi.org/10.1109/ICASSP.2015.7178903
Toral, A. (2020). Reassessing claims of human parity and super-human performance in machine translation at WMT 2019. Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, EAMT 2020, 185–194. https://doi.org/https://doi.org/10.48550/arXiv.2005.05738
Villanueva Jordán, I. A., & Romero-Muñoz, A. (2023). Nociones metodológicas para el análisis comparativo de traducciones automáticas para el doblaje. In L. Mejías-Climent & J. de los Reyes Lozano (Eds.), La traducción audiovisual a través de la traducción automática y la posedición: Prácticas actuales y futuras (pp. 17–36). Comares.
Wang, Y. (2023). Artificial intelligence technologies in college English translation teaching. Journal of Psycholinguistic Research, 52(5), 1525–1544. https://doi.org/10.1007/s10936-023-09960-5
Wang, Y., Liang, W., Huang, H., Zhang, Y., Li, D., & Yu, L.-F. (2021). Toward automatic audio description generation for accessible videos. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI '21), article 277, 1–12. https://doi.org/10.1145/3411764.3445347
Wyndham, A. (2022, December 14). The art of dubbing: As demand rises, creativity remains key. Slator. https://shorturl.at/abiyH
Yang, Y., Shillingford, B., Assael, Y., Wang, M., Liu, W., Chen, Y., Zhang, Y., Sezener, E., Cobo, L.C., Denil, M., Aytar, Y., & Freitas, N.D. (2020). Large-scale multilingual audio visual dubbing. ArXiv, abs/2011.03530. https://doi.org/10.48550/arXiv.2011.03530