CN113035239A - Chinese-English bilingual cross-language emotion voice synthesis device - Google Patents

Chinese-English bilingual cross-language emotion voice synthesis device Download PDF

Info

Publication number
CN113035239A
CN113035239A CN201911253410.6A CN201911253410A CN113035239A CN 113035239 A CN113035239 A CN 113035239A CN 201911253410 A CN201911253410 A CN 201911253410A CN 113035239 A CN113035239 A CN 113035239A
Authority
CN
China
Prior art keywords
device body
chinese
english
voice synthesis
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911253410.6A
Other languages
Chinese (zh)
Inventor
吴沛文
李曜
吴云清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Aviation Electric Co Ltd
Original Assignee
Shanghai Aviation Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Aviation Electric Co Ltd filed Critical Shanghai Aviation Electric Co Ltd
Priority to CN201911253410.6A priority Critical patent/CN113035239A/en
Publication of CN113035239A publication Critical patent/CN113035239A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/027Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a Chinese-English bilingual cross-language emotion voice synthesis device which comprises a device body, wherein an armband is connected between the upper surface and the lower surface of the device body, an LCD (liquid crystal display) is arranged above the front surface of the device body, a text input unit is arranged below the front surface of the device body, a switch is arranged on the right side of the front surface of the device body between the LCD and the text input unit, a loudspeaker is arranged on the right side of the upper surface of the device body, a wireless transceiver is arranged above the right side of the device body, a voice synthesis FPGA is arranged in the device body, and the voice synthesis FPGA is respectively connected with an SD (secure digital) card, the switch, the loudspeaker, a voice recognition chip and a single chip microcomputer. The invention has the beneficial effects that: the method can help people who only speak one language among Chinese or English people to synthesize Chinese or English voices with speaker styles and emotions, so that people who speak different languages among the Chinese or English people can communicate naturally and smoothly.

Description

Chinese-English bilingual cross-language emotion voice synthesis device
Technical Field
The invention relates to the technical field of voice synthesis, in particular to a Chinese-English bilingual cross-language emotion voice synthesis device.
Background
The voice is one of the most natural modes of communication between people and is also one of the most ideal modes of man-machine interaction, along with the rapid development of artificial intelligence, intelligent electronic products are more and more integrated into the life of people, and people have demands on voice synthesis technology, particularly on emotion voice synthesis technology capable of expressing the style and emotion of speakers. People live in a multi-nationality and multi-language world, people can not conveniently communicate with each other by speaking different languages in the life, and people urgently need to realize communication among different languages by means of a cross-language emotion voice synthesis device. In addition, dumb people, people with disabilities such as unclear mouth and teeth and the like also urgently need to realize dumb open speaking by means of the cross-language emotion voice synthesis device, and the speaking problem is solved.
Disclosure of Invention
The invention aims to provide a Chinese-English bilingual cross-language emotion voice synthesis device which can help Chinese or English people who only use one language to communicate naturally between Chinese and English, and also can help dumb people and other handicapped people to communicate naturally with normal people by means of the system.
In order to achieve the purpose, the technical scheme of the invention is as follows: a Chinese-English bilingual cross-language emotion voice synthesis device comprises a device body, wherein the device body is of a cuboid structure, an arm belt is connected between the upper surface and the lower surface of the device body to fix the device body on an arm, an LCD display is arranged above the front surface of the device body, a text input device is arranged below the front surface of the device body, a switch is arranged on the right side of the front surface of the device body between the LCD display and the text input device and is divided into a left key, a middle key and a right key, the left key is an emotion voice synthesis key for controlling Chinese, the middle key is a power switch key, the right key is an emotion voice synthesis key for controlling English, a loudspeaker is arranged on the right side of the upper surface of the device body, a wireless transceiver is arranged above the right surface of the device body and is used for being in wireless connection with equipment such as a mobile phone and a vehicle-mounted computer to expand application of more Chinese-English bilingual cross-language emotion voice synthesis, the device comprises a device body and is characterized in that an earphone hole is formed in the right side of the device body, a microphone is arranged below the earphone hole, the microphone and the earphone hole are integrated, a power supply is arranged below the rear of the device body, a voice synthesis FPGA is arranged inside the device body, the voice synthesis FPGA is respectively connected with an SD card, a switch, a loudspeaker, a voice recognition chip and a single chip microcomputer, the single chip microcomputer is connected with the voice recognition chip, the wireless transceiver and an LCD display, wherein the microphone is connected with the voice recognition chip, and the power supply is connected with the switch.
Compared with the prior art, the invention has the beneficial effects that: the method can help people who only speak one language among Chinese or English people to synthesize Chinese or English voices with speaker styles and emotions, so that people who speak different languages among the Chinese or English people can communicate naturally and smoothly, and the problem that communication cannot be performed due to different languages is solved; meanwhile, the Chinese-English bilingual cross-language emotion speech synthesis device can help the disabled such as dumb or unclear mouth and teeth to communicate with other people in natural Chinese and English, and solves the problem that people cannot speak.
In addition to the technical problems addressed by the present invention, the technical features constituting the technical solutions, and the advantageous effects brought by the technical features of the technical solutions described above, other technical problems solved by the present invention, other technical features included in the technical solutions, and advantageous effects brought by the technical features will be described in further detail with reference to the accompanying drawings.
Drawings
FIG. 1 is a schematic diagram of a device for synthesizing Chinese-English bilingual cross-language emotion speech according to the present invention.
FIG. 2 is a schematic diagram of an internal control system of the Chinese-English bilingual cross-language emotion speech synthesis apparatus according to the present invention.
FIG. 3 is a schematic diagram of a flow of implementing the Chinese-English bilingual cross-language emotion voice synthesis by the FPGA in the Chinese-English bilingual cross-language emotion voice device of the present invention.
In the figure: 1. the device comprises a wireless transceiver, 2 a single chip microcomputer, 3 a voice recognition chip, 4 a microphone, 5 an SD card, 6 an FPGA, 7 a switch, 8 a power supply, 9 a voice parameter generator, 10 a loudspeaker, 11 a text input device, 12 an LCD display, 13 an armband, 14 a Chinese-English bilingual cross-language emotion voice synthesis device body and 15 an earphone hole.
Detailed Description
The present invention will be described in further detail below with reference to specific embodiments and drawings. Here, the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Referring to fig. 1, the apparatus for synthesizing a cross-lingual emotion speech of chinese-english bilingual of the present invention includes a body 14 of the apparatus, the body 14 is rectangular, and an arm band 13 is connected between the upper and lower surfaces of the body, so that the apparatus for synthesizing a cross-lingual emotion speech of chinese-english bilingual of the present invention can be fixed on an arm, and can be easily carried by being fixed on an arm due to its light weight and exquisite size. Above the front of the device body 14 is an LCD display 12 for displaying text and other information. Below the front of the device body 14 is a text input device 11 (keyboard) for inputting chinese or english text. The right side between the LCD display 12 and the text input device 11 is provided with a switch 7, the switch 7 is divided into three keys, the middle key is a power switch key for controlling the power supply of the Chinese-English bilingual cross-language speech synthesis device, the left key is an emotion speech synthesis key for controlling Chinese, and the right key is an emotion speech synthesis key for controlling English. The speaker 10 is on the right side above the device body 14 for playing synthesized chinese and english speech. The wireless transceiver 1 is arranged above the right side of the device body 14 and can be wirelessly connected with equipment such as a mobile phone and a vehicle-mounted computer, so that more applications of Chinese-English bilingual cross-language emotion voice synthesis are expanded. The earphone hole 15 is arranged below the wireless transceiver 1 and is used for connecting a wire or a wireless earphone, so that the wireless transceiver is convenient for a user to use. The microphone 4 is arranged below the earphone hole 15 and used for inputting voice, and the microphone 4 and the earphone hole 15 are integrated elements. The power supply 8 is arranged at the rear lower part of the device body 14, and the power supply 8 is used for supplying power to the whole Chinese-English bilingual cross-language emotion voice synthesis device.
The specific control system inside the Chinese-English bilingual cross-language emotion voice synthesis device is schematically shown in FIG. 2. The Chinese-English bilingual cross-language emotion voice synthesis device aims at two groups of people:
the first group of people: people who only speak one of the languages chinese and english. Under the condition, when people with different languages in Chinese and English communicate by means of the invention, the original voice signal of the opposite speaking is transmitted to the FPGA6 through the microphone 4 and the voice recognition chip 3 (the voice recognition chip can be purchased on the market, or the Chinese-English voice recognition chip can be customized in a company producing the voice recognition chip), the text information of the recognized voice is transmitted to the LCD display 12 through the singlechip 2 to display the text information, the text information is simultaneously connected with a wireless network of a mobile phone, a vehicle-mounted computer, wifi and the like through the wireless transceiver 1 on the singlechip 2 to perform text translation between Chinese and English, and the translated text information can be displayed on the LCD display 12 on the singlechip 2. The left and right keys on the switch 11 can control whether the text to be synthesized of the Chinese-English bilingual cross-language emotion speech synthesis device is Chinese or English, the keyboard of the text input device 7 can control whether the text to be synthesized is Chinese or English, and the text input device 7 can be set to be in a mode of automatically translating the text to be synthesized without manual control of a user. The text to be synthesized is transmitted to the FPGA6 through the single chip microcomputer, a speech synthesis system (speech synthesis system HTS based on hidden markov model, etc.) is loaded in the FPGA6, the speech synthesis system calls a corpus (a neutral english corpus, a neutral chinese corpus and a chinese emotion corpus with 11 emotions) stored in the SD card 5 and an original speech signal input by the microphone 4 to obtain an acoustic model of target language emotion speech similar to the speaker style through adaptive training, speech parameters are generated through a speech parameter generator 9 (STRAIGHT speech parameter generator, etc.), and then speech parameter signals are transmitted to the speaker 10, and synthesized target language emotion speech which can be understood by a user is played. When the user replies to the opposite party, the voice signal is input through the microphone 4, and the synthesized target language emotion voice which can be understood by the opposite party is synthesized through the steps.
The second group of people: dumb, unclear mouth, etc. (or people who like typing). The user inputs text by a keyboard of a text input device 11, the text information is displayed on an LCD display 12 through a singlechip 2 on one hand, and is translated into another language text of Chinese and English after being connected with a wireless network through a wireless transceiver and also displayed on the LCD display 12, the text of the language to be synthesized is selected and controlled through keys on the text input device 11 or a switch 7, the text signal of the language to be synthesized is input into an FPGA6 through the singlechip, the emotion voice parameter signal of the language to be synthesized is synthesized through a Chinese-English bilingual cross-language emotion voice synthesis system based on the FPGA (which is composed of a voice synthesis system loaded in the FPGA, an SD card 5 and a voice parameter generator 9), and the synthesized emotion voice parameter signal of the target language is transmitted to a loudspeaker 10 to play the synthesized emotion voice of the target language.
The schematic flow chart of the FPGA realization of the Chinese-English bilingual cross-language emotion voice synthesis in the Chinese-English bilingual cross-language emotion voice synthesis device of the invention is shown in FIG. 3. Firstly, the FPGA emotion voice synthesis system selects Chinese emotion voice material similar to speaker style from 11 emotion Chinese emotion corpus in SD card 5 according to speaker voice signal input by microphone 4, trains Chinese speaker related target emotion average acoustic model, then selects neutral target language (Chinese or English) voice material from SD card according to-be-synthesized language text input by single chip microcomputer 2, trains target language speaker related neutral average acoustic model of to-be-synthesized text, and transplants target emotion of Chinese speaker related target emotion average acoustic model similar to speaker style to target language speaker related to-be-synthesized language text In the neutral average acoustic model, a target emotion average acoustic model related to a target language speaker of a language text to be synthesized is obtained, a voice parameter signal is generated through a voice parameter generator 9, and then the voice parameter signal is transmitted to a loudspeaker 9 to generate target emotion voice of the language to be synthesized.
The invention relates to a Chinese-English bilingual cross-language emotion voice synthesis device, which comprises the following steps:
1. the inside of the device is provided with a 3.3V/5V direct current battery for supplying power to the device.
2. The armband 13 is arranged above and below the Chinese-English bilingual cross-language emotion voice synthesis device body, can be fixed on the arm, and is light in weight and convenient to carry.
3. The wireless transceiver 1 on the Chinese-English bilingual cross-language emotion voice synthesis device can be connected with wireless internet such as mobile phones, vehicle-mounted computers and WiFi to complete translation of Chinese-English texts, and can listen to music, broadcast and the like through the Chinese-English bilingual cross-language emotion voice synthesis device after being connected with the internet.
4. The Chinese-English bilingual cross-language emotion voice synthesis device can play a role in translating Chinese-English voices in real time, and the translated and synthesized voices have the style and emotion of a speaker, so that the device has high practical value and can promote the communication between Chinese and English people to a great extent.
5. The Chinese-English bilingual cross-language emotion voice synthesis device can help the disabled people such as the dumb, the unclear mouth and the like, can realize the 'dumb open-mouth speaking' by virtue of the text-voice conversion function of the device, and solves the problem of the speaking or the communication obstacle with the people of the disabled people such as the dumb, the unclear mouth and the like.
6. The Chinese-English bilingual cross-language emotion voice synthesis device can be applied to not only the problem of communication between people but also intelligent electronic products such as mobile phones, computers, robots and the like, and the intelligent electronic products can be more intelligent and humanized by being wired, wireless or embedded into the device, so that the development of artificial intelligence is promoted.
The foregoing merely represents embodiments of the present invention, which are described in some detail and detail, and therefore should not be construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (1)

1. A Chinese-English bilingual cross-language emotion voice synthesis device is characterized by comprising a device body, wherein the device body is of a cuboid structure, an arm belt is connected between the upper surface and the lower surface of the device body to fix the device body on an arm, an LCD (liquid crystal display) is arranged above the front surface of the device body, a text input device is arranged below the front surface of the device body, a switch is arranged on the right side of the front surface of the device body between the LCD and the text input device and is divided into a left key, a middle key and a right key, the left key is an emotion voice synthesis key for controlling Chinese, the middle key is a power switch key, the right key is an emotion voice synthesis key for controlling English, a loudspeaker is arranged on the right side of the upper surface of the device body, a wireless transceiver is arranged above the right surface of the device body and is used for being wirelessly connected with equipment such as a mobile phone, a vehicle-mounted computer and the like to expand application of more Chinese-English bilingual cross-language emotion voice, the device comprises a device body and is characterized in that an earphone hole is formed in the right side of the device body, a microphone is arranged below the earphone hole, the microphone and the earphone hole are integrated, a power supply is arranged below the rear of the device body, a voice synthesis FPGA is arranged inside the device body, the voice synthesis FPGA is respectively connected with an SD card, a switch, a loudspeaker, a voice recognition chip and a single chip microcomputer, the single chip microcomputer is connected with the voice recognition chip, the wireless transceiver and an LCD display, wherein the microphone is connected with the voice recognition chip, and the power supply is connected with the switch.
CN201911253410.6A 2019-12-09 2019-12-09 Chinese-English bilingual cross-language emotion voice synthesis device Pending CN113035239A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911253410.6A CN113035239A (en) 2019-12-09 2019-12-09 Chinese-English bilingual cross-language emotion voice synthesis device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911253410.6A CN113035239A (en) 2019-12-09 2019-12-09 Chinese-English bilingual cross-language emotion voice synthesis device

Publications (1)

Publication Number Publication Date
CN113035239A true CN113035239A (en) 2021-06-25

Family

ID=76451104

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911253410.6A Pending CN113035239A (en) 2019-12-09 2019-12-09 Chinese-English bilingual cross-language emotion voice synthesis device

Country Status (1)

Country Link
CN (1) CN113035239A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09292971A (en) * 1996-04-26 1997-11-11 Sony Corp Translation device
JP2007148039A (en) * 2005-11-28 2007-06-14 Matsushita Electric Ind Co Ltd Speech translation device and speech translation method
US20110165912A1 (en) * 2010-01-05 2011-07-07 Sony Ericsson Mobile Communications Ab Personalized text-to-speech synthesis and personalized speech feature extraction
CN106951417A (en) * 2017-05-05 2017-07-14 李宗展 Recognize the multi-lingual inter-translation method and portable machine of Chinese dialects
CN107103900A (en) * 2017-06-06 2017-08-29 西北师范大学 A kind of across language emotional speech synthesizing method and system
CN206649899U (en) * 2016-10-25 2017-11-17 北京分音塔科技有限公司 A kind of communicator for realizing real-time voice intertranslation
CN110149805A (en) * 2017-12-06 2019-08-20 创次源股份有限公司 Double-directional speech translation system, double-directional speech interpretation method and program

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09292971A (en) * 1996-04-26 1997-11-11 Sony Corp Translation device
JP2007148039A (en) * 2005-11-28 2007-06-14 Matsushita Electric Ind Co Ltd Speech translation device and speech translation method
US20110165912A1 (en) * 2010-01-05 2011-07-07 Sony Ericsson Mobile Communications Ab Personalized text-to-speech synthesis and personalized speech feature extraction
CN206649899U (en) * 2016-10-25 2017-11-17 北京分音塔科技有限公司 A kind of communicator for realizing real-time voice intertranslation
CN106951417A (en) * 2017-05-05 2017-07-14 李宗展 Recognize the multi-lingual inter-translation method and portable machine of Chinese dialects
CN107103900A (en) * 2017-06-06 2017-08-29 西北师范大学 A kind of across language emotional speech synthesizing method and system
CN110149805A (en) * 2017-12-06 2019-08-20 创次源股份有限公司 Double-directional speech translation system, double-directional speech interpretation method and program

Similar Documents

Publication Publication Date Title
CN110288077B (en) Method and related device for synthesizing speaking expression based on artificial intelligence
US11302302B2 (en) Method, apparatus, device and storage medium for switching voice role
JP4439740B2 (en) Voice conversion apparatus and method
CN207425137U (en) English study training device based on the dialogue of VR real scenes
US8082152B2 (en) Device for communication for persons with speech and/or hearing handicap
US20110270601A1 (en) Universal translator
CN103116576A (en) Voice and gesture interactive translation device and control method thereof
JP2007272773A (en) Interactive interface control system
CN109801618B (en) Audio information generation method and device
US20150039288A1 (en) Integrated oral translator with incorporated speaker recognition
US20190279529A1 (en) Portable robot for two-way communication with the hearing-impaired
CN109409255A (en) A kind of sign language scene generating method and device
CN108989558A (en) The method and device of terminal call
KR20190005103A (en) Electronic device-awakening method and apparatus, device and computer-readable storage medium
JP4729171B2 (en) Electronic book apparatus and audio reproduction system
CN108735049B (en) Auxiliary sound production system for deaf-mute and sound production method thereof
WO2022012413A1 (en) Three-party call terminal for use in mobile man-machine collaborative calling robot
CN104427125A (en) Method and mobile terminal for answering call
KR20160142079A (en) Language interpreter, speech synthesis server, speech recognition server, alarm device, lecture local server, and voice call support application for deaf auxiliaries based on the local area wireless communication network
CN108053821A (en) The method and apparatus for generating voice data
CN113035239A (en) Chinese-English bilingual cross-language emotion voice synthesis device
CN209625783U (en) Intelligent sound box and smart home system
CN208353534U (en) A kind of translation earphone system
CN206312136U (en) A kind of portable synchronous translation device
US20110116608A1 (en) Method of providing two-way communication between a deaf person and a hearing person

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination