RU2585974C2

RU2585974C2 - Method of providing communication between people speaking in different languages

Info

Publication number: RU2585974C2
Application number: RU2014103284/08A
Authority: RU
Inventors: Владимир Александрович Елин; Александр Алексеевич Кибкало
Original assignee: Владимир Александрович Елин; Александр Алексеевич Кибкало
Priority date: 2014-01-31
Filing date: 2014-01-31
Publication date: 2016-06-10
Also published as: RU2014103284A

Abstract

FIELD: information technology.

SUBSTANCE: invention relates to electronics, particularly to means of receiving and transmitting speech of subscribers, speaking in different languages. Mobile terminal uses language models configured to given topic (e.g., tourism, business, everyday life, etc.). This enables to achieve high quality on devices with low computational power (for example, smart phones). For communication between people speaking languages A and B, an information processing conveyor is set up, comprising steps of: inputting speech in a language A, conversion of speech to text in language A, translation of text to B language, speech synthesis in language B, outputting speech in language B. System also makes it possible to train acoustic and language models, targeted at a specific user and/or conversation topics, and save them in a cloud service. Flexibility is provided by possibility of localisation of different pipeline stages of information processing terminals of participants of a conversation, as well as ability to download models used in cloud service.

EFFECT: providing protection of transmitted and received speech from unauthorised access, high accuracy of transmitting speech, high reliability of receiving and transmitting speech.

12 cl, 12 dwg

Description

Изобретение относится к области электроники, в частности к технике для реализации общения людей, говорящих на разных языках, в частности, но не исключительно, к системам, которые преобразуют входной речевой аудиосигнал на входном языке в выходной речевой аудиосигнал или текст на выходном языке.The invention relates to the field of electronics, in particular to a technique for realizing communication of people speaking different languages, in particular, but not exclusively, to systems that convert an input speech audio signal in an input language into an output speech audio signal or text in an output language.

Далее по тексту описания принято называть входной речевой аудиосигнал на входном языке А участника разговора А, а выходной речевой аудиосигнал на выходном языке Б участника разговора Б, терминальные устройства, используемые участниками разговора: первый (передающий) терминал А и второй (принимающий) терминал Б, которые используют языки А и Б соответственно.Hereinafter, it is customary to call the input speech audio signal in the input language A of the conversation participant A, and the output speech audio signal in the output language B of the conversation participant B, terminal devices used by the conversation participants: the first (transmitting) terminal A and the second (receiving) terminal B, which use languages A and B, respectively.

Известен способ связи разноязычных собеседников, включающий центр переводов и средство в виде терминала мобильной связи для осуществления собеседниками передачи друг другу информации через центр переводов, терминал мобильной радиосвязи содержит громкоговорящий элемент (RU №31288).A known method of communication of multilingual interlocutors, including a translation center and a tool in the form of a mobile communication terminal for interlocutors to transmit information to each other through a translation center, the mobile radio terminal contains a loud-speaking element (RU No. 31288).

Недостатком данного устройства является сложность системы связей, предусматривающей обращение в промежуточный центр переводов, чем обусловлена ограниченность области применения и низкая надежность.The disadvantage of this device is the complexity of the communication system, providing for the appeal to an intermediate translation center, which is due to the limited scope and low reliability.

Известен способ связи разноязычных собеседников и синхронного перевода речевой информации с одного языка на другой, в котором запоминающее устройство, коммутационно связанное с аналого-цифровым преобразователем /АЦП/ 1-го терминала связи электронной системы (функционально являющегося в данном рассмотрении передающим), вводят входящий аналоговый аудиосигнал в речевой форме на входном языке. Затем посредством АЦП осуществляют аналого-цифровую обработку и преобразование речевой формы аналогового аудиосигнала в кодовую текстовую форму на 1-м, входном, языке, В процессе обмена информационными - кодовыми сигналами (в текстовой форме) между 1-м терминалом связи и вторым (принимающим) терминалом, обеспечивают электронный перевод кодовой текстовой формы на 1-м, входном, языке передающего 1-го терминала в кодовую текстовую форму на 3-м, выходном, языке 2-го (принимающего) терминала. На конечном этапе приемопередачи преобразуют кодовую форму текста на 3-м, выходном, языке 2-го терминала в аналоговый аудиосигнал в речевой форме на 3-м, выходном, языке этого 2-го терминала, посредством синтезатора речи и выводят речевую форму аналогового аудиосигнала на 3-м, выходном, языке через динамик 2-го терминала связи, функционально являющегося в рассматриваемом случае принимающим (RU №2419142).A known method of communication of multilingual interlocutors and the simultaneous translation of speech information from one language to another, in which a memory device switching is connected to an analog-to-digital converter / ADC / 1 terminal of the electronic system (functionally transmitting in this consideration) is input analog audio signal in speech form in the input language. Then, through the ADC, analog-to-digital processing and conversion of the speech form of an analog audio signal into a text text form are performed in the 1st, input, language. In the process of exchanging information - code signals (in text form) between the 1st communication terminal and the second (receiving) terminal, provide electronic translation of the code text form in the 1st, input, language of the transmitting 1st terminal into the code text form in the 3rd, output, language of the 2nd (receiving) terminal. At the final stage of the transceiver, the code form of the text in the 3rd, output, language of the 2nd terminal is converted into an analog audio signal in speech form in the 3rd, output, language of this 2nd terminal by means of a speech synthesizer and the speech form of the analog audio signal is output to 3rd, output, language through the speaker of the 2nd communication terminal, functionally receiving in this case (RU No. 2419142).

Недостатком прототипа являются сложность аппаратной схемы и сложность преобразования аудиосигнала, в частности с учетом необходимости перевода кодовой текстовой формы входного аудиосигнала на промежуточный язык, длительность перевода, не обеспечивающая на практике реального синхронного перевода.The disadvantage of the prototype is the complexity of the hardware circuit and the complexity of the conversion of the audio signal, in particular taking into account the need to translate the code text form of the input audio signal into an intermediate language, the duration of the translation, which does not provide real synchronous translation in practice.

Известен способ обеспечения общения людей, говорящих на разных языках, электронная приемо-передающая система реализующая синхронный перевод устной речи с одного языка на другой, включающая по меньшей мере один приемо-передающий терминал связи входящего аналогового аудиосигнала в речевой форме, который содержит: энергонезависимое запоминающее устройство, преимущественно флэш-память; средства ввода речевой формы аналогового аудиосигнала на входном языке в терминал связи, функционально являющийся передающим; средства аналого-цифрового преобразования и обработки речевой формы аналогового аудиосигнала в кодовую текстовую форму на входном языке; средства перевода кодовой формы текста на входном языке в кодовую форму текста на выходном языке; средства цифроаналогового преобразования и обработки кодовой формы текста на выходном языке в аналоговый аудиосигнал на этом же языке в речевой форме, а также средства вывода речевой формы аналогового аудиосигнала на выходном языке через терминал связи, функционально являющийся принимающим (RU №2070734, прототип).There is a method of communication between people speaking different languages, an electronic transceiver system that implements simultaneous translation of oral speech from one language to another, including at least one transceiver communication terminal of an incoming analog audio signal in a speech form, which contains: non-volatile storage device mainly flash memory; speech input means of an analog audio signal in the input language to the communication terminal, which is functionally transmitting; means for analog-to-digital conversion and processing of the speech form of an analog audio signal into a code text form in the input language; means for translating the code form of the text in the input language into the code form of the text in the output language; means for digital-to-analog conversion and processing of the code form of the text in the output language into an analog audio signal in the same language in speech form, as well as means for outputting the speech form of an analog audio signal in the output language through a communication terminal that is functionally receiving (RU No. 2070734, prototype).

К недостаткам данного известного из уровня техники решения следует отнести относительно низкие функциональные и технические характеристики вследствие значительных погрешностей преобразования речевой формы вводимого в передающий терминал аудиосигнала в кодовую текстовую форму.The disadvantages of this prior art solution include relatively low functional and technical characteristics due to significant errors in the conversion of the speech form of the audio signal input into the transmitting terminal into a code text form.

В целом, в настоящее время существуют системы распознавания речи и машинного перевода достаточно высокого качества. Такие системы требуют значительных вычислительных ресурсов. Для этого используется распределенная обработка на серверах в сети Интернет или облачных структурах. Такой подход имеет как плюсы (возможность обеспечить высокое качество), так и минусы (работы только при доступе к Интернету и полный контроль со стороны провайдера сервиса). Системы компаний Google и Apple (Siri) обеспечивают высокое качество распознавания речи при использовании большого словаря и сетевых облачных ресурсов. Локальные версии этих систем обеспечивают хорошее качество лишь для сравнительно небольших задач, например голосового ввода записей телефонной книжки.In general, there are currently systems of speech recognition and machine translation of a fairly high quality. Such systems require significant computing resources. For this, distributed processing is used on servers on the Internet or in cloud structures. This approach has both pros (the ability to provide high quality) and cons (works only when accessing the Internet and full control by the service provider). Google and Apple (Siri) systems provide high-quality speech recognition using a large vocabulary and networked cloud resources. Local versions of these systems provide good quality only for relatively small tasks, such as voice input of phone book entries.

Локальные системы распознавания речи и машинного перевода ограничены в своих возможностях вычислительными ресурсами терминалов (компьютеров или смартфонов). Главной причиной этого является большой размер словарей (до нескольких сот тысяч слов), языковых моделей и сложность таких систем. Однако при сужении тематики разговора сложность задачи распознавания речи и перевода значительно снижается. Для хорошего покрытия текста на заданную тему (например, бизнес, путешествия, спорт и т.д.) достаточно словаря размером 10-20 тысяч слов. Современные системы распознавания речи и машинного перевода способны решать такую задачу в несколько раз быстрее реального времени (времени произнесения фразы).Local speech recognition and machine translation systems are limited in their capabilities by the computing resources of the terminals (computers or smartphones). The main reason for this is the large size of dictionaries (up to several hundred thousand words), language models and the complexity of such systems. However, when narrowing the subject of conversation, the complexity of the task of speech recognition and translation is significantly reduced. For a good coverage of the text on a given topic (for example, business, travel, sports, etc.), a dictionary of 10-20 thousand words is enough. Modern speech recognition systems and machine translation are able to solve such a problem several times faster than real time (the time the phrase is pronounced).

Технической задачей полезной модели является создание эффективного способа обеспечения общения людей, говорящих на разных языках, и расширение арсенала способов обеспечения общения людей, говорящих на разных языках.The technical task of the utility model is to create an effective way to ensure communication between people speaking different languages, and to expand the arsenal of ways to ensure communication between people speaking different languages.

Техническим результатом изобретения является обеспечение высокой точности перевода и возможности гибкого использования мобильных терминалов без привлечения сетевых ресурсов с одновременным расширением возможностей, реализуемых посредством заявленного изобретения, в частности: возможность высококачественного распознавания речи и машинного перевода на терминале без доступа к сетевым ресурсам за счет использования тематических языковых моделей; возможность использования для организации общения людей, говорящих на разных языках, различных, в том числе маломощных, терминалов за счет гибкого распределения стадий конвейера обработки информации между терминалами; возможность управления набором используемых акустических и языковых моделей, в том числе загрузка моделей из облачного сервиса, адаптация моделей к речи участника разговора и тематике разговора, сохранение моделей в облачном сервисе.The technical result of the invention is to provide high translation accuracy and flexible use of mobile terminals without attracting network resources while expanding the capabilities realized by the claimed invention, in particular: the possibility of high-quality speech recognition and machine translation on the terminal without access to network resources through the use of thematic language models; the ability to use for the organization of communication of people speaking different languages, various, including low-power, terminals due to the flexible distribution of stages of the conveyor of information processing between terminals; the ability to control the set of used acoustic and language models, including downloading models from the cloud service, adapting models to the participant’s speech and the subject of the conversation, saving models in the cloud service.

Для достижения технического результата в данном изобретении предлагается использовать набор тематических языковых моделей сравнительно небольшого размера для систем распознавания речи и машинного перевода, функционирующих на терминалах участников разговора. Средний объем таких моделей составляет от 5 до 20 мегабайт, что позволяет хранить набор моделей в постоянной памяти терминала. Участники разговора могут выбирать используемую модель в зависимости от тематики разговора.To achieve a technical result, the present invention proposes to use a set of thematic language models of a relatively small size for speech recognition and machine translation systems operating on the terminals of conversation participants. The average volume of such models is from 5 to 20 megabytes, which allows you to store a set of models in the permanent memory of the terminal. Participants in the conversation can choose the model used depending on the topic of the conversation.

В данном изобретении также предлагается возможность использования дополнительных сервисов, в частности сервиса распространения и хранения моделей в облаке и сервиса обучения акустических и языковых моделей. Пользователи предлагаемой системы могут получать доступ к акустическим и языковым моделям, хранящимся в облачном сервисе, и устанавливать их на свой терминал. На терминал могут быть установлены модели для нескольких языков и/или тематик. Речевой сигнал участника разговора может быть использован для обучения акустической модели, адаптированной к голосу этого участника разговора. Речевой сигнал участника разговора в виде текста может быть использован для обучения тематических языковых моделей. Обученные модели сохраняются в локальном запоминающем устройстве или в облачном сервисе хранения моделей.The present invention also offers the possibility of using additional services, in particular, the service of distribution and storage of models in the cloud and the service of training acoustic and language models. Users of the proposed system can access the acoustic and language models stored in the cloud service and install them on their terminal. Models for several languages and / or themes can be installed on the terminal. The speech signal of a conversation participant can be used to train an acoustic model adapted to the voice of that conversation participant. The speech signal of the participant in the conversation in the form of text can be used to teach thematic language models. Trained models are stored in a local storage device or in a cloud service for storing models.

Сущность изобретения заключается в том, что представлен способ обеспечения общения людей, говорящих на разных языках, согласно которому в первый терминал А вводится входной аналоговый речевой аудиосигнал на входном языке А; речевой сигнал преобразуется в цифровую форму, цифровой речевой сигнал на языке А при помощи программного средства распознавания речи, использующего только локальные ресурсы терминала А, в том числе тематическую языковую модель, расположенную в запоминающем устройстве терминала А, преобразуется в текст на языке А, текст на языке А при помощи программного средства машинного перевода речи, использующего только локальные ресурсы терминала А, в том числе тематическую языковую модель, расположенную в запоминающем устройстве терминала А, преобразуется в текст на выходном языке Б, и текст на языке Б передается по каналу связи второму терминалу Б, текст ответного сообщения на языке Б принимается терминалом А по каналу связи и передается программному средству синтеза речи на языке Б, формирующему цифровой аудиосигнал на языке Б, преобразуемый средством вывода речи в аналоговый аудиосигнал, который воспроизводится терминалом А. При этом терминал Б функционирует аналогично с точностью до замены языка А на язык Б и обратно.The essence of the invention lies in the fact that a method for ensuring communication of people speaking different languages is presented, according to which the input analog speech audio signal in the input language A is introduced into the first terminal A; the speech signal is converted to digital form, the digital speech signal in language A using speech recognition software using only the local resources of terminal A, including the thematic language model located in the storage device of terminal A, is converted into text in language A, the text in language A using a machine tool for speech translation using only the local resources of terminal A, including a thematic language model located in the storage device of terminal A, etc. is generated into the text in the output language B, and the text in the language B is transmitted via the communication channel to the second terminal B, the text of the response message in the language B is received by the terminal A through the communication channel and transmitted to the speech synthesis software in the language B, which generates a digital audio signal in the language B, converted by the means of outputting speech into an analog audio signal, which is reproduced by terminal A. Moreover, terminal B functions in a similar way, up to replacing language A with language B and vice versa.

При этом средство распознавания речи, и/или средство машинного перевода, и/или средство синтеза речи имеют аппаратную или аппаратно-программную реализацию.In this case, the speech recognition means, and / or the machine translation means, and / or the speech synthesis means have a hardware or hardware-software implementation.

В частных случаях реализации средство синтеза речи на языке Б функционирует на терминале А, средство синтеза речи на языке А функционирует на терминале Б, и по каналу связи передается речевой сигнал или кодированный речевой сигнал. При этом терминал Б функционирует аналогично с точностью до замены языка А на язык Б и обратно.In particular cases of implementation, the speech synthesis tool in language B operates on terminal A, the speech synthesis tool in language A operates on terminal B, and a speech signal or an encoded speech signal is transmitted via a communication channel. At the same time, terminal B functions similarly up to the replacement of language A with language B and vice versa.

При этом средство распознавания речи, и/или машинного перевода, и/или средство синтеза речи имеют аппаратную или аппаратно-программную реализацию.In this case, speech recognition means, and / or machine translation, and / or speech synthesis means have a hardware or hardware-software implementation.

Кроме того, средство машинного перевода с языка Б на язык А и средство машинного перевода с языка А на язык Б функционируют на терминале А.In addition, a machine translation tool from language B to language A and a machine translation tool from language A to language B operate on terminal A.

При этом средство распознавания речи и/или средство машинного перевода и/или средство синтеза речи имеют аппаратную или аппаратно-программную реализацию.In this case, the speech recognition means and / or the machine translation means and / or the speech synthesis means have a hardware or hardware-software implementation.

В частных случаях реализации средство синтеза речи на языке Б, средство машинного перевода с языка Б на язык А и средство синтеза речи на языке Б функционируют на терминале А, а в качестве терминала Б используется устройство с малой вычислительной мощностью (стационарный телефон или маломощный).In special cases of implementation, speech synthesis tool in language B, machine translation tool from language B to language A and speech synthesis tool in language B operate on terminal A, and a device with low processing power (landline or low-power) is used as terminal B.

При этом средство распознавания речи и/или машинного перевода и/или средство синтеза речи имеют аппаратную или аппаратно-программную реализацию.At the same time, speech recognition and / or machine translation means and / or speech synthesis means have hardware or hardware-software implementation.

В частных случаях реализации в качестве терминалов связи используются мобильные электронные устройства, соединенные по протоколу Bluetooth 2.0/4.0 и не использующие сотовую связь для организации разговора.In special cases of implementation, mobile electronic devices connected via Bluetooth 2.0 / 4.0 and not using cellular communication to organize a conversation are used as communication terminals.

В частных случаях реализации в качестве терминалов связи используются мобильные электронные устройства, соединенные по протоколу NFC и не использующие сотовую связь для организации разговора.In special cases of implementation, mobile electronic devices connected via the NFC protocol and not using cellular communication to organize a conversation are used as communication terminals.

В частных случаях реализации в качестве первого и второго терминала функционально используют один и тот же терминал связи.In particular cases of implementation, the same communication terminal is functionally used as the first and second terminal.

Предпочтительно, способ предусматривает обучение акустических и языковых моделей, ориентированных на конкретного пользователя и/или тематику разговора, и сохранение обученных моделей в облачном сервисе и их загрузку из облачного сервиса.Preferably, the method provides for training acoustic and language models focused on a specific user and / or topic of conversation, and the storage of trained models in a cloud service and downloading them from a cloud service.

Предпочтительно, способ предусматривает загрузку из облачного сервиса акустических и языковых моделей, ориентированных на различные языки и тематики разговора.Preferably, the method provides for downloading from a cloud service acoustic and language models oriented to different languages and topics of conversation.

На фиг.1 приведена общая схема аппаратной части терминала, на фиг.2 - общая схема программного обеспечения терминала, на фиг.3 - блок-схема процесса общения, когда участник А говорит, а участник Б слушает, на фиг.4 - блок-схема процесса, при котором используется перевод на промежуточный более распространенный язык В, на фиг.5-6 - блок-схемы терминалов А и Б при симметричном распределении стадий конвейера обработки информации и передаче текстовой информации между терминалами, на фиг.7-8 - блок-схемы терминалов А и Б при симметричном распределении стадий конвейера обработки информации и передаче кодированного речевого сигнала (например, по стандарту GSM) между терминалами, на фиг.9-10 - блок-схемы терминалов А и Б при условии, что терминал А обладает большей вычислительной мощностью, чем терминал Б, и передаче текстовой информации между терминалами, на фиг.11 - блок-схема терминала А при условии, что терминал Б является стационарным телефоном или маломощным сотовым телефонами и передаче кодированного речевого сигнала (например, по стандарту GSM) между терминалами, на фиг.12 - блок-схема терминала А при условии, что терминал Б является пейджером или другим устройством, допускающим ввод/вывод текстовой информации.Figure 1 shows the general diagram of the hardware of the terminal, figure 2 is a general diagram of the software of the terminal, figure 3 is a flowchart of the communication process, when participant A speaks and participant B listens, in figure 4 - block a process diagram in which a translation into an intermediate more common language B is used, in FIGS. 5-6 are block diagrams of terminals A and B with a symmetrical distribution of stages of the information processing pipeline and transmission of text information between the terminals, FIGS. 7-8 are a block diagrams of terminals A and B with a symmetric distribution of diy conveyor processing information and transmitting an encoded speech signal (for example, according to the GSM standard) between the terminals, Figs. 9-10 are block diagrams of terminals A and B, provided that terminal A has more processing power than terminal B, and transmission text information between the terminals, in Fig. 11 is a block diagram of a terminal A, provided that the terminal B is a stationary telephone or low-power cell phones and transmitting an encoded speech signal (for example, according to the GSM standard) between the terminals, in Fig. 12 is a block terminal circuit a A, provided that terminal B is a pager or other device that allows input / output of text information.

Конструктивно терминал (сотовый мобильный телефон, смартфон, персональный компьютер) представляет собой корпус 9 (изображен условно), в котором размещен процессор 1. К процессору 1 подключены запоминающее устройство 2, монитор 3, микрофон 4, динамик 5, клавиатура 6, блок питания 7, приемо-передающее устройство 8 (приемопередатчик GSM/GPRS/Wi-Fi, сетевая карта, гибридный Bluetooth 2.0/4.0, NFC и т.д.). Общая схема аппаратной части терминала приведена на фиг.1.Structurally, the terminal (cell mobile phone, smartphone, personal computer) is a case 9 (shown conditionally) in which the processor 1 is located. A processor 2, a monitor 3, a microphone 4, a speaker 5, a keyboard 6, a power supply 7 are connected to the processor 1 , transceiver 8 (GSM / GPRS / Wi-Fi transceiver, network card, hybrid Bluetooth 2.0 / 4.0, NFC, etc.). A general diagram of the hardware of the terminal is shown in FIG.

Программное обеспечение терминала включает следующие компоненты:The terminal software includes the following components:

управляющая система 10, средство 11 ввода речи, средство 12 вывода речи, средство 13 кодирования речи, средство 14 распознавания речи, средство 15 синтеза речи, средство 16 машинного перевода, средство 17 обучения акустических моделей, средство 18 обучения языковых моделей, средство 19 взаимодействия с облачным сервисом 20. Используемые акустические и языковые модели для систем обработки речи хранятся в запоминающем устройстве 2. На фиг.2. обозначены:control system 10, speech input means 11, speech output means 12, speech encoding means 13, speech recognition means 14, speech synthesis means 15, machine translation means 16, acoustic model learning means 17, language model learning means 18, means for interacting with 19 cloud service 20. Used acoustic and language models for speech processing systems are stored in storage device 2. In figure 2. marked:

21 - первый (передающий) терминал А;21 - the first (transmitting) terminal A;

22 - процессор (на терминале А);22 - processor (at terminal A);

23 - запоминающее устройство (на терминале А);23 - storage device (at terminal A);

24 - языковые модели, хранящиеся в запоминающем устройстве (на терминале А);24 - language models stored in a storage device (at terminal A);

25 - микрофон (на терминале А);25 - microphone (at terminal A);

26 - средство ввода речевой информации (на терминале А);26 - means of inputting voice information (at terminal A);

27 - средство распознавания речи для входного языка А (на терминале А);27 - speech recognition means for input language A (at terminal A);

28 - средство перевода текста на языке А в текст на языке Б (на терминале А);28 - means of translating a text in language A into text in language B (at terminal A);

29 - средство синтеза речи для языка А (на терминале А);29 - speech synthesis tool for language A (at terminal A);

30 - средство вывода речевой информации (на терминале А);30 - means for outputting voice information (at terminal A);

31 - динамик (на терминале А);31 - speaker (at terminal A);

32 - канал передачи информации между терминалами А и Б;32 - channel for the transfer of information between terminals A and B;

33 - второй (принимающий) терминал Б;33 - the second (receiving) terminal B;

34 - процессор (на терминале Б);34 - processor (at terminal B);

35 - запоминающее устройство (на терминале Б);35 - storage device (at terminal B);

36 - языковые модели, хранящиеся в запоминающем устройстве (на терминале Б);36 — language models stored in a storage device (at terminal B);

37 - микрофон (на терминале Б);37 - microphone (at terminal B);

38 - средство ввода речевой информации (на терминале Б);38 - means of inputting voice information (at terminal B);

39 - средство распознавания речи для языка Б (на терминале Б);39 - speech recognition means for language B (at terminal B);

40 - средство перевода текста на языке Б в текст на языке А (на терминале40 - means of translating text in language B into text in language A (on the terminal

Б);B)

41 - средство синтеза речи для языка Б (на терминале Б);41 - a means of speech synthesis for language B (at terminal B);

42 - средство вывода речевой информации (на терминале Б);42 - means for outputting voice information (at terminal B);

43 - динамик (на терминале Б);43 - speaker (at terminal B);

44 - средство перевода текста на языке Б в текст на языке А (на терминале А);44 - means of translating text in language B into text in language A (at terminal A);

45 - средство синтеза речи для языка А (на терминале А);45 - speech synthesis tool for language A (at terminal A);

46 - средство синтеза речи для языка Б (на терминале А);46 - speech synthesis tool for language B (at terminal A);

47 - средство кодирования речи (на терминале А);47 — speech encoding means (at terminal A);

48 - средство кодирования речи (на терминале Б);48 - speech encoding means (at terminal B);

49 - средство распознавания речи для языка Б (на терминале А);49 - speech recognition means for language B (at terminal A);

На фиг.5-12 некоторые существенные блоки и средства коммуникации (такие как экран, клавиатура, блок питания и т.д.) не показаны, поскольку они являются общеизвестными, и их функциональная принадлежность к рассматриваемым терминалам неоспорима, и они не являются принципиальными объектами заявленных изобретений.5-12 some essential blocks and means of communication (such as a screen, keyboard, power supply, etc.) are not shown, since they are well-known, and their functional affiliation with the considered terminals is undeniable, and they are not principal objects claimed inventions.

Управляющая система 10 отвечает за координацию работы всех компонентов программного обеспечения терминала и управление режимами работы терминала. Средство 11 ввода речи отвечает за ввод речевого сигнала с микрофона. Средство 12 вывода речи отвечает за вывод речевого сигнала на динамик или другое устройство воспроизведения звука. Средство 13 кодирования речи выполняет преобразование речевого сигнала в набор кодовых векторов при распознавании речи и передаче на другой терминал. Средство 14 распознавания речи выполняет преобразование речевого акустического сигнала в текстовую форму. Средство 15 синтеза речи выполняет преобразование текста в речевой акустический сигнал. Средство 16 машинного перевода преобразует текст на одном языке в текст на другом языке. Средство 17 обучения акустических моделей накапливает информацию о речевом сигнале пользователя терминала и адаптирует акустические модели к специфическим особенностям речи данного пользователя. Средство 18 обучения языковых моделей накапливает информацию о фразах, произносимых пользователем терминала, и адаптирует языковые модели к специфическим особенностям речи данного пользователя на заданную тему. Средство 19 взаимодействия с облачным сервисом 20 отвечает за загрузку акустических и языковых моделей из облака и их сохранение в облаке.The control system 10 is responsible for coordinating the operation of all components of the terminal software and controlling the operating modes of the terminal. The speech input means 11 is responsible for inputting a speech signal from a microphone. The speech output means 12 is responsible for outputting a speech signal to a speaker or other sound reproducing device. Speech encoding means 13 converts a speech signal into a set of code vectors in speech recognition and transmission to another terminal. Speech recognition means 14 converts the speech acoustic signal into text form. Speech synthesis tool 15 performs the conversion of the text into a speech acoustic signal. Means 16 machine translation converts text in one language into text in another language. The acoustic model training means 17 accumulates information about the speech signal of the terminal user and adapts the acoustic models to the specific speech features of the user. The tool 18 for teaching language models accumulates information about phrases pronounced by a terminal user and adapts language models to the specific features of a given user's speech on a given topic. The tool 19 interacting with the cloud service 20 is responsible for downloading acoustic and language models from the cloud and storing them in the cloud.

Процесс общения людей, говорящих на разных языках, естественным образом разбивается на несколько стадий, фиг.3 иллюстрирует процесс, когда участник А говорит, а участник Б слушает:The process of communication between people speaking different languages naturally breaks down into several stages, Fig. 3 illustrates the process when participant A speaks and participant B listens:

a) ввод речи на языке А;a) speech input in language A;

b) перевод речи на языке А в текст на языке А (распознавание речи);b) the translation of speech in language A into text in language A (speech recognition);

c) перевод текста на языке А в текст на языке Б (машинный перевод);c) translation of a text in language A into a text in language B (machine translation);

d) перевод текста на языке Б в речь на языке Б (синтез речи);d) translation of a text in language B into speech in language B (speech synthesis);

e) вывод речи на языке Б.e) speech output in language B.

Стадии a-e образуют конвейер обработки информации при общении людей, говорящих на разных языках. Стадии а и е могут, в принципе, выполняться на терминале, не имеющем вычислительной мощности, например стационарном телефоне или маломощном сотовом телефоне. Выполнение стадий b-d требует существенной вычислительной мощности. В качестве терминала, на котором выполняются эти задачи, может выступать, например, смартфон или ноутбук. Поскольку стадия а выполняется на терминале А, а стадия e - на терминале Б, в промежутке между этими стадиями информация должна быть передана с терминала А на терминал Б.Stage a-e form an information processing pipeline for the communication of people speaking different languages. Stage a and e can, in principle, be performed on a terminal that does not have processing power, such as a stationary telephone or a low-power cell phone. Performing stages b-d requires significant processing power. The terminal on which these tasks are performed can be, for example, a smartphone or laptop. Since stage a is performed at terminal A, and stage e is performed at terminal B, in the interval between these stages, information must be transmitted from terminal A to terminal B.

В данном изобретении предлагается дать участникам разговора возможность распределения стадий работы конвейера между терминалами А и Б, например в случае использования однотипных терминалов А и Б стадии a-с могут выполняться на терминале А, а стадии d-e - на терминале Б. Если терминал Б является стационарным телефоном, стадии a-d будут выполняться на терминале А, а стадия е - на терминале Б. Распределение работ между терминалами А и Б может задаваться автоматически, полуавтоматически или вручную в зависимости от вычислительной мощности терминалов А и Б, наличия языковых моделей на этих терминалах и/или выбора участников разговора.The present invention proposes to give conversation participants the opportunity to distribute the stages of the conveyor between terminals A and B, for example, if the terminals A and B are of the same type, stages a-c can be performed on terminal A, and stages de on terminal B. If terminal B is stationary By telephone, the ad stages will be performed on terminal A, and the e-stage will be performed on terminal B. The distribution of work between terminals A and B can be set automatically, semi-automatically or manually, depending on the processing power of the term catch A and B, the presence of linguistic models of terminals and / or the choice of participants in the call.

Тип информации, передаваемой между терминалами А и Б, определяется распределением стадий конвейера между терминалами. Например, если на терминале А выполняются стадии а-с, будет передаваться текст на языке Б. Если же на терминале А выполняются стадии a-d, будет передаваться речевой сигнал (кодированный речевой сигнал). Передача текста является предпочтительной в силу меньшего объема и облегчения защиты передаваемых данных, но не обязательной.The type of information transmitted between terminals A and B is determined by the distribution of pipeline stages between the terminals. For example, if steps a-c are performed on terminal A, text in language B will be transmitted. If steps a-d are performed on terminal A, a speech signal (encoded speech signal) will be transmitted. Text transmission is preferable due to its smaller size and ease of protection of transmitted data, but not mandatory.

Приведенный выше конвейер обработки информации не является единственно возможным. Например, при отсутствии средств перевода с языка А на язык Б может использоваться перевод на промежуточный более распространенный язык В (см. фиг.4). Реализация способа иллюстрируемыми чертежами:The above information processing pipeline is not the only one possible. For example, in the absence of means of translation from language A to language B, a translation into the intermediate more common language B can be used (see Fig. 4). The implementation of the method of the illustrated drawings:

Фиг.5-6 - блок-схемы терминалов А и Б при симметричном распределении стадий конвейера обработки информации и передаче текстовой информации между терминалами.5-6 are block diagrams of terminals A and B with a symmetric distribution of stages of the information processing pipeline and transmission of text information between the terminals.

Фиг.7-8 - блок-схемы терминалов А и Б при симметричном распределении стадий конвейера обработки информации и передаче кодированного речевого сигнала (например, по стандарту GSM) между терминалами.7-8 are block diagrams of terminals A and B with a symmetrical distribution of stages of the information processing pipeline and transmission of the encoded speech signal (for example, according to the GSM standard) between the terminals.

Фиг.9-10 - блок-схемы терминалов А и Б при условии, что терминал А обладает большей вычислительной мощностью, чем терминал Б, и передаче текстовой информации между терминалами.9-10 are block diagrams of terminals A and B, provided that terminal A has more processing power than terminal B and transmission of text information between the terminals.

Фиг.11 - блок-схема терминала А при условии, что терминал Б является стационарным телефоном или маломощным сотовым телефонами, и передаче кодированного речевого сигнала (например, по стандарту GSM), между терминалами.11 is a block diagram of a terminal A, provided that the terminal B is a stationary telephone or low-power cell phones, and the transmission of an encoded speech signal (for example, according to the GSM standard) between the terminals.

Фиг.12 - блок-схема терминала А при условии, что терминал Б является пейджером или другим устройством, допускающим ввод/вывод текстовой информации.12 is a block diagram of a terminal A, provided that the terminal B is a pager or other device capable of inputting / outputting text information.

Согласно заявленному способу в терминал А 21 (фиг.5-6), вводится входной аналоговый речевой сигнал на языке А через микрофон 25. Затем посредством АЦП, входящего в состав средства 26 ввода речи, осуществляется преобразование аналогового речевого сигнала в цифровой речевой сигнал на языке А. Речевой сигнал в цифровой форме поступает на вход средства 27 распознавания речи, использующего одну из языковых моделей 24, хранящихся в запоминающем устройстве 23. Средство 27 распознавания речи преобразует речевой сигнал на языке А в текстовую форму (текст на языке А) и передает его к средство 28 машинного перевода на язык Б. Средство 28 машинного перевода использует одну из языковых моделей 24, хранящихся в запоминающем устройстве 23, и преобразует текст на языке А в текст на языке Б, и передает его по каналу связи 32 терминалу Б 33. Ответ от терминала Б 33 в виде текста на языке А приходит по каналу передачи данных 32 и поступает на вход средства синтеза речи на языке А 29. Средство 29 синтеза речи на языке преобразует текст на языке А в цифровой речевой сигнал на языке А и передает его средству 30 вывода речи. Средство 30 вывода речи посредством входящего в его состав ЦАП преобразует цифровой речевой сигнал в аналоговый речевой сигнал и выводит его через динамик 31. Средства 27 распознавания речи, средства 28 машинного перевода и средства 29 синтеза речи на языке А, функционирующие на терминале А 21, используют процессор 22 и могут иметь программную, программно-аппаратную или аппаратную реализацию.According to the claimed method, the input analog speech signal in language A is input into terminal A 21 (Fig. 5-6) through microphone 25. Then, through the ADC included in the speech input means 26, the analog speech signal is converted into a digital speech signal in language A. The speech signal in digital form is input to the speech recognition means 27 using one of the language models 24 stored in the storage device 23. The speech recognition means 27 converts the speech signal in language A into a text form (text n language A) and transfers it to the machine 28 means of translating into the language B. The means 28 of the machine translation uses one of the language models 24 stored in the storage device 23, and converts the text in the language A into the text in the language B, and transmits it over the communication channel 32 to terminal B 33. The response from terminal B 33 in the form of text in language A comes through a data channel 32 and is input to speech synthesis means in language A 29. Language synthesis tool 29 converts text in language A into a digital speech signal in language A and transmits it to the means 30 output speech. Means 30 for outputting speech by means of its DAC converts the digital speech signal into an analog speech signal and outputs it through the speaker 31. Means 27 for speech recognition, means 28 for machine translation and means for synthesizing speech in language A, operating on terminal A 21, use processor 22 and may have a software, hardware-software or hardware implementation.

Изложенная выше схема иллюстрирует вариант реализации изобретения, представленный на фиг.5-6, и не является исчерпывающей или единственно возможной.The above diagram illustrates an embodiment of the invention, shown in Fig.5-6, and is not exhaustive or solely possible.

Особенностями заявленного способа является следующее.The features of the claimed method is as follows.

При организации синхронного перевода устной речи используются только локальные вычислительные ресурсы терминалов и не используются сетевые ресурсы (в сети «Интернет», в облаке и т.д.).When organizing simultaneous interpretation of oral speech, only local computing resources of the terminals are used and network resources are not used (on the Internet, in the cloud, etc.).

Для преобразования входного речевого аудиосигнала в текст на языке А используются локальная система распознавания речи с языковой моделью, ориентированной на конкретную область (например, путешествия, бизнес, быт и т.д.), что позволяет повысить точность распознавания и снизить требования к вычислительной мощности терминала.To convert the input speech audio signal into text in language A, a local speech recognition system with a language model focused on a specific area (for example, travel, business, everyday life, etc.) is used, which allows to increase the recognition accuracy and reduce the computing power requirements of the terminal .

Для перевода текста на языке А в текст на языке Б или В используется локальная система машинного перевода с моделью языка, ориентированной на конкретную область (например, путешествия, бизнес, быт и т.д.), что позволяет повысить точность распознавания и снизить требования к вычислительной мощности терминала.To translate text in language A into text in language B or C, a local machine translation system is used with a language model that focuses on a specific area (for example, travel, business, everyday life, etc.), which allows to increase recognition accuracy and reduce requirements for computing power of the terminal.

При организации синхронного перевода устной речи формируется конвейер передачи информации от одного участника разговора к другому.When organizing simultaneous interpretation of oral speech, a conveyor of information transfer from one participant to the conversation is formed.

Конвейер передачи информации от участника разговора А к участнику разговора Б включает следующие стадии: ввод речи на языке А, преобразование речи в текст на языке А, перевод текста с языка А на язык Б, синтез речи на языке Б, вывод речи на языке Б.The conveyor of the transfer of information from the participant in conversation A to the participant in conversation B includes the following stages: inputting speech in language A, converting speech into text in language A, translating text from language A to language B, synthesizing speech in language B, outputting speech in language B.

При отсутствии системы машинного перевода с языка А на язык Б конвейер передачи информации от участника разговора А к участнику разговора Б включает следующие стадии: ввод речи на языке А, преобразование речи в текст на языке А, перевод текста с языка А на язык В, перевод текста с языка В на язык Б, синтез речи на языке Б, вывод речи на языке Б.In the absence of a machine translation system from language A to language B, the pipeline for transferring information from the participant in conversation A to the participant in conversation B includes the following stages: entering speech in language A, converting speech into text in language A, translating text from language A to language B, translation text from language B to language B, speech synthesis in language B, speech output in language B.

Конвейер передачи информации от участника разговора Б к участнику разговора А формируется аналогично (язык А заменяется на язык Б и обратно).The conveyor of information transfer from the participant in conversation B to the participant in conversation A is formed in the same way (language A is replaced by language B and vice versa).

Допускаются различные способы распределения стадий конвейера обработки информации между терминалами А и Б. После завершения работы последней стадии конвейера обработки информации на терминале А (Б) происходит передача информации (аудиосигнала, кодированного аудиосигнала или текста) терминалу Б (А), на котором выполняются оставшиеся стадии конвейера обработки информации.Various methods of distributing stages of the information processing pipeline between terminals A and B are allowed. After the last stage of the information processing pipeline at terminal A (B) is completed, information (audio signal, encoded audio signal or text) is transmitted to terminal B (A), where the remaining stages are performed information processing pipeline.

Терминалы А и Б при установлении связи осуществляют распределения стадий конвейера обработки информации в зависимости от вычислительных возможностей терминалов и предварительных установок, заданных участниками разговора А и Б.When establishing a connection, terminals A and B distribute the stages of the information processing pipeline depending on the computing capabilities of the terminals and presets set by the participants in conversation A and B.

В качестве терминалов А и Б могут использоваться устройства, соединенные посредством системы мобильной связи (например, стандарта GSM или CDMA) или сети «Интернет». Также в качестве терминалов А и Б могут использоваться (в том числе для обеспечения конфиденциальности) устройства, соединенные посредством протоколов Bluetooth или NFC и не использующие сотовую связь.As terminals A and B, devices connected via a mobile communication system (for example, GSM or CDMA standard) or the Internet can be used. Also, devices connected via Bluetooth or NFC protocols and not using cellular communication can be used as terminals A and B (including to ensure confidentiality).

Предлагаемая система позволяет обучать используемые акустические и языковые модели к конкретному пользователю и/или тематике разговора. Обученные модели могут быть сохранены в облачном сервисе и впоследствии использованы на этом или другом терминале данного пользователя. Кроме того, облачный сервис используется для распространения акустических и языковых моделей для других языков или тематик разговора.The proposed system allows you to train the used acoustic and language models for a specific user and / or topic of conversation. Trained models can be stored in the cloud service and subsequently used on this or another terminal of this user. In addition, the cloud service is used to distribute acoustic and language models for other languages or topics of conversation.

Одним из конкретных частных приложений технической реализации заявленного способа является электронная приемо-передающая система с функцией синхронного перевода устной речи с языка А на язык Б между двумя терминалами А и Б при симметричном распределении стадий конвейера обработки информации и передаче текстовой информации между терминалами (фиг.5-6).One of the particular private applications of the technical implementation of the claimed method is an electronic transceiver system with the function of simultaneous translation of oral speech from language A to language B between two terminals A and B with a symmetrical distribution of stages of the information processing pipeline and transmission of text information between the terminals (Fig. 5 -6).

Еще одним из конкретных частных приложений технической реализации заявленного способа является электронная приемо-передающая система с функцией синхронного перевода устной речи с языка А на язык Б между двумя терминалами А и Б при симметричном распределении стадий конвейера обработки информации передаче кодированного речевого сигнала (например, по стандарту GSM) между терминалами (фиг.7-8).Another particular private application for the technical implementation of the claimed method is an electronic transceiver system with the function of simultaneous translation of oral speech from language A to language B between two terminals A and B with a symmetrical distribution of stages of the information processing pipeline transmitting an encoded speech signal (for example, according to the standard GSM) between the terminals (Figs. 7-8).

Еще одним из конкретных частных приложений технической реализации заявленного способа является электронная приемо-передающая система с функцией синхронного перевода устной речи с языка А на язык Б между двумя терминалами А и Б при условии, что терминал А обладает большей вычислительной мощностью, чем терминал Б, и передаче текстовой информации между терминалами (фиг.9-10).Another particular private application of the technical implementation of the claimed method is an electronic transceiver system with the function of simultaneous translation of oral speech from language A to language B between two terminals A and B, provided that terminal A has more processing power than terminal B, and the transmission of text information between the terminals (Fig.9-10).

Еще одним из конкретных частных приложений технической реализации заявленного способа является электронная приемо-передающая система с функцией синхронного перевода устной речи с языка А на язык Б между двумя терминалами А и Б при условии, что терминал Б является стационарным телефоном или маломощным сотовым телефоном (фиг.11).Another particular private application for the technical implementation of the claimed method is an electronic transceiver system with the function of simultaneous translation of oral speech from language A to language B between two terminals A and B, provided that terminal B is a stationary telephone or a low-power cell phone (Fig. eleven).

Еще одним из конкретных частных приложений технической реализации заявленного способа является электронная приемо-передающая система с функцией синхронного перевода устной речи с языка А на язык Б между двумя терминалами А и Б при условии, что терминал Б является пейджером или другим устройством, допускающим ввод/вывод текстовой информации (фиг.12).Another particular private application of the technical implementation of the claimed method is an electronic transceiver system with the function of simultaneous translation of oral speech from language A to language B between two terminals A and B, provided that terminal B is a pager or other device that allows input / output textual information (Fig.12).

В результате реализации настоящего изобретения обеспечиваются:As a result of the implementation of the present invention are provided:

- возможность высококачественного распознавания речи и машинного перевода на терминале без доступа к сетевым ресурсам за счет использования тематических языковых моделей;- the possibility of high-quality speech recognition and machine translation on the terminal without access to network resources through the use of thematic language models;

- возможность использования для организации общения людей, говорящих на разных языках, различных, в том числе маломощных, терминалов за счет гибкого распределения стадий конвейера обработки информации между терминалами;- the ability to use for the organization of communication of people speaking different languages, various, including low-power, terminals due to the flexible distribution of stages of the conveyor of information processing between terminals;

- возможность управления набором используемых акустических и языковых моделей, в том числе загрузка моделей из облачного сервиса, адаптация моделей к речи участника разговора и тематике разговора, сохранение моделей в облачном сервисе.- the ability to control the set of used acoustic and language models, including downloading models from the cloud service, adapting models to the speech of the participant in the conversation and the subject of the conversation, saving models in the cloud service.

Claims

1. A method for transmitting and receiving speech of subscribers speaking different languages, according to which a communication channel is established between two communication terminals, and using local computing resources of terminals in which acoustic and language thematic models are provided, an audio signal processing pipeline isolated from network resources is formed, which one of the terminals receives the input analog speech audio signal in the input language, using the local computing resources of the terminals are recognized speech and its translation into coded text in the input language, and then using acoustic and language thematic models — machine translation of this text into coded text in the output language, followed by conversion of the translated coded text to synthesize an analog speech audio signal in the output language, which is output by another terminal.

2. The method according to p. 1, characterized in that one of the terminals receives an input analog speech audio signal in the input language and the same terminal using its local computing resources, which provide acoustic and language thematic models, speech recognition and translation into encoded text in the input language, machine translation of this text into encoded text in the output language, followed by conversion of the translated text to synthesize an analog speech audio signal in the output language, distributed for output via a communication channel to another terminal.

3. The method according to p. 1, characterized in that one of the terminals receives an input analog speech audio signal in the input language, transmitted via a communication channel to another terminal, which, using its local computing resources, which provide acoustic and language thematic models, recognizes speech and its translation into encoded text in the input language, machine translation of this text into encoded text in the output language, followed by conversion of the translated text for the synthesis of analog speech of the audio signal in the target language, which displays the same terminal.

4. The method according to p. 1, characterized in that one of the terminals receives the input analog speech audio signal in the input language and using its local computing resources, which provide acoustic and language thematic models, speech recognition and translation into encoded text are carried out on the input language transmitted over the communication channel to another terminal, which, using its local computing resources, which provide acoustic and language thematic models, performs machine translation d of this text into a coded text in the target language, followed by converting the translated text to speech synthesis of the analog audio signal in the target language which is output by another terminal.

5. The method according to any one of paragraphs. 1-4, characterized in that the analogue speech audio signal of the response message in the input or output language is received by the corresponding terminal, followed by the formation of an information processing pipeline for synthesizing the analogue speech audio signal in the output or input language, respectively, which is output by another terminal.

6. The method according to p. 5, characterized in that the reverse input and output speech signals are transmitted using two terminals connected by a communication channel, configured to form an information processing conveyor for converting the input and output speech signals, both in the direction from one terminal to another and in the opposite direction.

7. The method according to any one of paragraphs. 2, 3, characterized in that, as one of the terminals, a device from the group is used: a landline telephone, a mobile phone using a wire or cellular connection to organize a conversation.

8. The method according to any one of paragraphs. 1-4, 6, characterized in that at least one of the means of speech recognition, machine translation, speech synthesis has a hardware or hardware-software implementation.

9. The method according to any one of paragraphs. 1-4, 6, characterized in that as the communication terminals use mobile electronic devices connected by a wireless communication channel according to a protocol from the group: Bluetooth, NFC.

10. The method according to any one of paragraphs. 1-4, 6, characterized in that the communication channel for communication in different languages is established with the formation of two communication terminals by means of the same reversible terminal device.

11. The method according to any one of paragraphs. 1-4, 6, characterized in that thematic acoustic and language models are formed, which are stored in the permanent memory of at least one terminal or in a cloud service with the possibility of downloading them to communication terminals if necessary.

12. The method according to p. 11, characterized in that at least one terminal uses thematic acoustic and language models in various languages.