CN113053373A - Intelligent vehicle-mounted voice interaction system supporting voice cloning - Google Patents
Intelligent vehicle-mounted voice interaction system supporting voice cloning Download PDFInfo
- Publication number
- CN113053373A CN113053373A CN202110216036.3A CN202110216036A CN113053373A CN 113053373 A CN113053373 A CN 113053373A CN 202110216036 A CN202110216036 A CN 202110216036A CN 113053373 A CN113053373 A CN 113053373A
- Authority
- CN
- China
- Prior art keywords
- voice
- module
- cloning
- user
- instruction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 40
- 238000010367 cloning Methods 0.000 title claims abstract description 23
- 230000004044 response Effects 0.000 claims abstract description 26
- 238000000605 extraction Methods 0.000 claims abstract description 18
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 12
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 12
- 230000002996 emotional effect Effects 0.000 abstract description 4
- 239000003086 colorant Substances 0.000 abstract description 2
- 230000000694 effects Effects 0.000 abstract description 2
- 238000012545 processing Methods 0.000 description 13
- 239000000463 material Substances 0.000 description 8
- 238000000034 method Methods 0.000 description 6
- 230000003213 activating effect Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000007781 pre-processing Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 241000272186 Falco columbarius Species 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)
Abstract
The embodiment of the invention provides an intelligent vehicle-mounted voice interaction system supporting voice cloning, so that the quality and the service experience of vehicle-mounted voice interaction are improved. An intelligent vehicle-mounted voice interaction system supporting voice cloning comprises a corpus collection module, a text feature extraction module, a voice feature extraction module, an instruction receiving module, an instruction analysis module, an instruction execution module, a text response module, a cloning synthesis module, a voice output module and a basic support module. Compared with the prior art, the embodiment of the invention has the technical effects and advantages that: the embodiment of the invention provides an intelligent vehicle-mounted voice interaction system supporting voice cloning, which not only can carry out real-time voice interaction with a user, but also can customize specific voice and idioms for voice response according to the requirements of the user. Therefore, the voice interaction system in the embodiment of the invention can quickly and conveniently convert the user voice instruction into actual driving operation, ensures the driving safety of the user, more importantly, can provide intelligent, personalized and humanized interaction of 'thousands of people and thousands of voices', endows the vehicle response system with emotional colors, greatly improves the driving experience of the user, and makes the journey more warm and comfortable.
Description
Technical Field
The invention relates to the technical field of voice interaction, in particular to an intelligent vehicle-mounted voice interaction system supporting voice cloning.
Background
In recent years, with the rapid development of the economic level of China and the quality of life of people, automobiles become important transportation tools essential for people to go out in daily life, and play a great role in various scenes such as daily commuting, holiday travel, cargo transportation and the like. The vehicle-mounted intelligent interactive system can provide convenient and rapid driving auxiliary service, driving experience of drivers and passengers is greatly improved, and the automobile is upgraded to a humanized emotion partner from a cold delivery vehicle. The voice interaction has the outstanding advantages of being rapid in input, simple in operation, safe and guaranteed, and the like, is an intelligent interaction mode naturally adaptive to a vehicle-mounted environment, and can provide various services such as application query, intelligent navigation, music playing, driving operation execution and the like.
At present, the functions of a vehicle-mounted voice interaction system are very limited, some simple operations can be realized by accessing a voice control terminal to some vehicle models, but the vehicle-mounted voice interaction system has poor voice recognition capability, simple functions, insufficient stability and mechanical interaction process, and cannot meet the interaction requirements of increasing intellectualization, humanization and individuation.
The voice cloning technology can extract the voice characteristics and logic characteristics of a specific speaker and simulate the unique voice and idiom of the speaker. The voice cloning technology is applied to the construction of the vehicle-mounted voice interaction system, customized services of 'thousands of people and thousands of voices' can be provided according to user preferences, emotional interaction is generated with a user while user instructions are intelligently read and stably executed, driving experience is improved, and the trip feeling between the user and a vehicle is strengthened.
Disclosure of Invention
In order to solve the above problems, the embodiment of the present invention provides an intelligent vehicle-mounted voice interaction system supporting voice cloning, so as to improve the quality of vehicle-mounted voice interaction and service experience.
In order to achieve the above purpose, the embodiment of the invention provides the following technical scheme:
an intelligent vehicle-mounted voice interaction system supporting voice cloning comprises a corpus collection module, a command receiving module, a command analysis module, a command execution module, a text response module, a cloning synthesis module, a voice output module and a basic support module.
Corpus collection module: the method comprises the steps of collecting an original target corpus which a user wants to clone through an external voice receiver, carrying out preprocessing such as noise reduction, filtering and volume equalization on the original target corpus, and inputting the preprocessed target corpus into a text feature extraction module and a voice feature extraction module.
The text feature extraction module: and receiving the target language material input by the language material acquisition module, and performing voice recognition on the target language material to obtain text information of the target language material. And converting the text information into a text characteristic vector to form a characteristic vector space of the text information, and storing the text characteristic vector space.
The voice feature extraction module: receiving the target corpus input by the corpus collection module, extracting acoustic features (such as linear predictive coding features, Mel frequency cepstral coefficients, glottal waves and the like), prosodic features (intonation, time domain distribution, accents and the like), energy features (short-time energy, short-time average amplitude and the like), and tone color features (pitch period, pitch frequency, formants and the like) of the target corpus, forming a speech feature vector space, and storing the speech feature vector space.
An instruction receiving module: the method comprises the steps of receiving an original voice command sent by a user in the driving process through an external voice receiver, carrying out preprocessing such as user identity verification, user authority determination, environmental sound separation and the like, and inputting the preprocessed voice command into a command analysis module.
The instruction analysis module: and receiving the voice instruction input by the instruction receiving module, intelligently analyzing the user intention, obtaining a corresponding instruction processing result, activating and inputting the instruction processing result into the instruction execution module and/or the text response module.
The instruction execution module: connected with the automobile control port. And after the instruction analysis module is activated, receiving the instruction processing result input by the instruction analysis module, and sending an execution command to the corresponding control port according to the content of the instruction processing result.
A text response module: and after the instruction analysis module is activated, the instruction processing result input by the instruction analysis module is received, the text feature vector space stored by the text feature extraction module is called, a response text with characteristics similar to the characteristics of the cloned object word sending sentence is intelligently generated, and the response text is input into the voice synthesis module.
Cloning and synthesizing a module: receiving the response text input by the text response module, calling a voice feature vector space stored by the voice feature extraction module, training a voice synthesis model (Merlin, WaveNet, Tacotron, Clarinet and other voice synthesis models) according to the voice feature vector space parameters, generating a voice spectrogram similar to the sound of the clone object, and inputting the voice spectrogram into a voice interaction module.
A voice output module: and receiving the voice spectrogram input by the voice synthesis module, decoding the voice spectrogram by using a vocoder (a WaveRNN, a Griffin-Lim vocoder and the like) to generate a voice signal, giving a voice response through an external voice player, and achieving intelligent voice interaction with a user.
A basic support module: the basic functions required by the intelligent vehicle-mounted voice interaction system supporting voice cloning, provided by the invention, are supported, such as operations of deleting, selecting, cleaning the memory, updating the version, self-checking and error reporting.
Compared with the prior art, the invention has the technical effects and advantages that: the invention provides an intelligent vehicle-mounted voice interaction system supporting voice cloning, which can not only perform real-time voice interaction with a user, but also customize specific voice and idioms for voice response according to the requirements of the user. Therefore, the voice interaction system in the embodiment of the invention can quickly and conveniently convert the voice instruction of the user into the actual driving operation, ensures the driving safety of the user, more importantly, can provide the intelligent, personalized and humanized interaction of 'thousands of people and thousands of voices', endows the vehicle response system with emotional colors, greatly improves the driving experience of the user, and makes the journey more warm and comfortable.
Drawings
Fig. 1 is a schematic flow chart of an intelligent vehicle-mounted voice interaction system supporting voice cloning in a specific application scenario according to an embodiment of the present invention.
Detailed Description
For the convenience of understanding and implementing the embodiment of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiment of the present invention. It is to be understood that the described embodiments are merely exemplary of some, and not necessarily all, embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without any creative effort, shall fall within the protection scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
In order to realize the construction of the vehicle-mounted voice interaction system, customized service of 'thousands of people and thousands of sounds' is provided according to user preferences, emotional interaction is generated with a user while user instructions are intelligently read and stably executed, and the aim of improving driving experience is fulfilled, the invention provides an example 1 of the intelligent vehicle-mounted voice interaction system supporting sound cloning; FIG. 1 is a schematic flow chart of embodiment 1 of implementing intelligent voice interaction according to the present invention; as shown in fig. 1, the following modules and steps may be included:
the patent provides an intelligent vehicle-mounted voice interaction system supporting voice cloning, which comprises a corpus collection module, a text feature extraction module, a voice feature extraction module, an instruction receiving module, an instruction analysis module, an instruction execution module, a text response module, a cloning synthesis module, a voice output module and a basic support module.
Corpus collection module: the original target corpus that the user wishes to clone is collected by a peripheral voice receiver (such as an on-board microphone array, etc., which is not included in the scope of the present invention) in a vehicle or other environment. In order to ensure the usability of the original target corpus, the original target corpus should be recorded in a quieter environment, and about 10-50 different cloned targets should be recorded. After the recording is finished, the corpus collection module automatically carries out preprocessing such as noise reduction, filtering, volume equalization and the like on the original target corpus, and inputs the preprocessed target corpus into the text feature extraction module and the voice feature extraction module.
The text feature extraction module: and receiving the target language material input by the language material acquisition module, and performing voice recognition on the target language material to obtain text information of the target language material. And converting the text information into a text feature vector, and forming and storing a text feature vector space.
The voice feature extraction module: and receiving the target corpus input by the corpus collection module, extracting acoustic features (such as linear predictive coding features, Mel frequency cepstral coefficients, glottal waves and the like), prosodic features (intonation, time domain distribution, accents and the like), energy features (short-time energy, short-time average amplitude and the like), tone color features (pitch period, pitch frequency, formants and the like) of the target corpus, and forming and storing a voice feature vector space.
An instruction receiving module: the method comprises the steps of receiving an original voice command sent by a user in the driving process through an external voice receiver, carrying out preprocessing such as user identity verification, user authority determination, environmental sound separation and the like, and inputting the preprocessed voice command into a command analysis module. And if the non-authorized user instructs to open the car window, the car window is not considered.
The instruction analysis module: and receiving the voice instruction input by the instruction receiving module, intelligently analyzing the user intention, obtaining a corresponding instruction processing result, activating and inputting the instruction processing result into the instruction execution module and/or the text response module. If the authorized user instructs to open the car window, activating the instruction execution module and sending a car window opening instruction to the instruction execution module; and simultaneously activating a text response module, and inputting a processing result 'requiring to open the window' into the text response module.
The instruction execution module: and is connected with other control ports of the automobile. And after the instruction analysis module is activated, receiving the instruction processing result input by the instruction analysis module, and sending an instruction execution command to the corresponding control port according to the content of the instruction processing result. And if the processing result indicates that the vehicle window is opened, connecting the vehicle window control module and automatically lowering the vehicle window.
A text response module: and after the instruction analysis module is activated, the instruction processing result input by the instruction analysis module is received, the text feature vector space stored by the text feature extraction module is called, a response text with characteristics similar to the characteristics of the cloned object word sending sentence is intelligently generated, and the response text is input into the voice synthesis module. If the processing result ' requires opening a window ', after analyzing the intention of the user, generating a response text ' is already opened for you and is highly suitable? "
Cloning and synthesizing a module: receiving the response text input by the text response module, calling a voice feature vector space stored by the voice feature extraction module, training a voice synthesis model (Merlin, WaveNet, Tacotron, Clarinet and other voice synthesis models) according to the voice feature vector space parameters, generating a voice spectrogram similar to the sound of the clone object, and inputting the voice spectrogram into a voice output module.
A voice output module: and receiving the voice spectrogram input by the voice synthesis module, decoding the voice spectrogram by using a vocoder (a WaveRNN, a Griffin-Lim vocoder and the like) to generate a voice signal, giving a voice response through an external voice player, and achieving intelligent voice interaction with a user. In response "do you open the window for you, is this height appropriate? And after that, if the user has other replies, continuing to respond from the instruction receiving module.
A basic support module: the basic functions required by the intelligent vehicle-mounted voice interaction system supporting voice cloning, which are provided by the embodiment of the invention, are supported, such as operations of deleting, selecting, cleaning the memory, updating the version, self-checking and error reporting.
The above-described embodiments are merely illustrative of several embodiments of the present application, which are described in more detail and detail, but are not to be construed as limiting the scope of the present application. It should be noted that various embodiments of the present invention can be combined freely, and should be regarded as the disclosure of the present invention as long as it does not depart from the idea of the present invention.
Claims (2)
1. The invention relates to an intelligent vehicle-mounted voice interaction system supporting voice cloning, which is used for improving the quality and service experience of vehicle-mounted voice interaction.
2. The intelligent vehicle-mounted voice interaction system supporting voice cloning as claimed in claim 1, comprising a corpus collection module, a text feature extraction module, a voice feature extraction module, an instruction receiving module, an instruction analysis module, an instruction execution module, a text response module, a clone synthesis module, a voice output module, and a basic support module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110216036.3A CN113053373A (en) | 2021-02-26 | 2021-02-26 | Intelligent vehicle-mounted voice interaction system supporting voice cloning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110216036.3A CN113053373A (en) | 2021-02-26 | 2021-02-26 | Intelligent vehicle-mounted voice interaction system supporting voice cloning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113053373A true CN113053373A (en) | 2021-06-29 |
Family
ID=76509182
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110216036.3A Pending CN113053373A (en) | 2021-02-26 | 2021-02-26 | Intelligent vehicle-mounted voice interaction system supporting voice cloning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113053373A (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011186143A (en) * | 2010-03-08 | 2011-09-22 | Hitachi Ltd | Speech synthesizer, speech synthesis method for learning user's behavior, and program |
CN106790938A (en) * | 2016-11-16 | 2017-05-31 | 上海趣讯网络科技有限公司 | A kind of man-machine interaction onboard system based on artificial intelligence |
CN108711423A (en) * | 2018-03-30 | 2018-10-26 | 百度在线网络技术(北京)有限公司 | Intelligent sound interacts implementation method, device, computer equipment and storage medium |
CN108962217A (en) * | 2018-07-28 | 2018-12-07 | 华为技术有限公司 | Phoneme synthesizing method and relevant device |
KR20190107289A (en) * | 2019-08-30 | 2019-09-19 | 엘지전자 주식회사 | Artificial robot and method for speech recognitionthe same |
CN111399798A (en) * | 2020-03-10 | 2020-07-10 | 上海博泰悦臻电子设备制造有限公司 | Vehicle-mounted voice assistant personalized realization method, system, medium and vehicle-mounted equipment |
CN111429882A (en) * | 2019-01-09 | 2020-07-17 | 北京地平线机器人技术研发有限公司 | Method and device for playing voice and electronic equipment |
CN112233646A (en) * | 2020-10-20 | 2021-01-15 | 携程计算机技术(上海)有限公司 | Voice cloning method, system, device and storage medium based on neural network |
-
2021
- 2021-02-26 CN CN202110216036.3A patent/CN113053373A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011186143A (en) * | 2010-03-08 | 2011-09-22 | Hitachi Ltd | Speech synthesizer, speech synthesis method for learning user's behavior, and program |
CN106790938A (en) * | 2016-11-16 | 2017-05-31 | 上海趣讯网络科技有限公司 | A kind of man-machine interaction onboard system based on artificial intelligence |
CN108711423A (en) * | 2018-03-30 | 2018-10-26 | 百度在线网络技术(北京)有限公司 | Intelligent sound interacts implementation method, device, computer equipment and storage medium |
CN108962217A (en) * | 2018-07-28 | 2018-12-07 | 华为技术有限公司 | Phoneme synthesizing method and relevant device |
CN111429882A (en) * | 2019-01-09 | 2020-07-17 | 北京地平线机器人技术研发有限公司 | Method and device for playing voice and electronic equipment |
KR20190107289A (en) * | 2019-08-30 | 2019-09-19 | 엘지전자 주식회사 | Artificial robot and method for speech recognitionthe same |
CN111399798A (en) * | 2020-03-10 | 2020-07-10 | 上海博泰悦臻电子设备制造有限公司 | Vehicle-mounted voice assistant personalized realization method, system, medium and vehicle-mounted equipment |
CN112233646A (en) * | 2020-10-20 | 2021-01-15 | 携程计算机技术(上海)有限公司 | Voice cloning method, system, device and storage medium based on neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Delić et al. | Speech technology progress based on new machine learning paradigm | |
US20230230572A1 (en) | End-to-end speech conversion | |
JP3479691B2 (en) | Automatic control method of one or more devices by voice dialogue or voice command in real-time operation and device for implementing the method | |
US9570066B2 (en) | Sender-responsive text-to-speech processing | |
JP2004525412A (en) | Runtime synthesis device adaptation method and system for improving intelligibility of synthesized speech | |
CN112581963B (en) | Voice intention recognition method and system | |
KR19980070329A (en) | Method and system for speaker independent recognition of user defined phrases | |
US20040098259A1 (en) | Method for recognition verbal utterances by a non-mother tongue speaker in a speech processing system | |
CN110539721A (en) | vehicle control method and device | |
Nafis et al. | Speech to text conversion in real-time | |
Lee | MLP-based phone boundary refining for a TTS database | |
JP6993376B2 (en) | Speech synthesizer, method and program | |
Bou-Ghazale et al. | HMM-based stressed speech modeling with application to improved synthesis and recognition of isolated speech under stress | |
Al-Anzi et al. | The capacity of mel frequency cepstral coefficients for speech recognition | |
Kothadiya et al. | Different methods review for speech to text and text to speech conversion | |
Wan et al. | Building HMM-TTS voices on diverse data | |
CN113053373A (en) | Intelligent vehicle-mounted voice interaction system supporting voice cloning | |
CN116312476A (en) | Speech synthesis method and device, storage medium and electronic equipment | |
CN115938340A (en) | Voice data processing method based on vehicle-mounted voice AI and related equipment | |
Westphal et al. | Towards spontaneous speech recognition for on-board car navigation and information systems | |
Flanagen | Talking with computers: Synthesis and recognition of speech by machines | |
Atal et al. | Speech research directions | |
Matsumoto et al. | Speech-like emotional sound generator by WaveNet | |
Lee | The conversational computer: an apple perspective. | |
CN112185368A (en) | Self-adaptive man-machine voice conversation device and equipment, interaction system and vehicle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210629 |