CN107093421A - A kind of speech simulation method and apparatus - Google Patents
A kind of speech simulation method and apparatus Download PDFInfo
- Publication number
- CN107093421A CN107093421A CN201710260306.4A CN201710260306A CN107093421A CN 107093421 A CN107093421 A CN 107093421A CN 201710260306 A CN201710260306 A CN 201710260306A CN 107093421 A CN107093421 A CN 107093421A
- Authority
- CN
- China
- Prior art keywords
- user
- voice data
- speech simulation
- module
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/75—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 for modelling vocal tract parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/61—Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
- G10L2013/105—Duration
Abstract
The invention provides a kind of speech simulation method and apparatus, wherein method comprises the following steps:Obtain the voice data of user;The voice data is parsed, characteristic information and the preservation of the voice data is extracted;The analog audio data corresponding with the voice data is generated according to the characteristic information preserved;The analog audio data is played out.The present invention is parsed to voice by algorithm and then extracts characteristic, reuse with user's identical phoneme and intonation to interact or read aloud with user, speech simulation effect is good, similarity is high, speech tone is similar, improve the cordial feeling of human-computer interaction, it is to avoid existing speech simulation method, the common change of voice can only be accomplished, sound can not change, and similitude is low, it is impossible to the problem of adaptability when improving human-computer interaction and cordial feeling.
Description
Technical field
The present invention relates to voice signal technical field, more particularly to a kind of speech simulation method and apparatus.
Background technology
The material shell of voice, i.e. language, is the carrier of linguistic notation system.It is sent by the vocal organs of people, load
Certain language meaning.The physical basis of voice mainly have pitch, loudness of a sound, the duration of a sound, tone color, and this is also to constitute the four of voice to want
Element.
Voice is the sound of language, is the carrier of linguistic notation system.It is sent by the vocal organs of people, is loaded certain
Language meaning.Language realizes its social function by voice.Language be the pronunciation and meaning combine notation, the sound of language and
The meaning of language is closely connected, therefore, though language is a kind of sound, but has the area of essence with general sound
Not.Voice is the sound with difference meaning function that human articulation's organ is sent, it is impossible to which voice is regarded as pure natural object
Matter;Voice is the symbolism for most directly recording thinking activities, is the form of sound of language communication instrument.
The physical basis of voice mainly have pitch, loudness of a sound, the duration of a sound, tone color, and this is also four key elements for constituting voice.Pitch refers to
The number of frequency of sound wave, i.e. vibrations per second;Loudness of a sound refers to the size of sonic wave amplitude;The duration of a sound refers to the acoustic vibration duration
Length, also referred to as " duration ";Tone color refers to the characteristic and essence of sound, also referred to as " tonequality ".
The vocal organs and its active situation of people are the physiological foundations of voice.3 parts of the vocal organs of people point:
(1) respiratory apparatus, including lung, trachea and bronchus.Lung is the center of respiratory apparatus, is the base for producing voice power
Plinth.
(2) larynx and vocal cords, they are the chatter bodies of pronunciation.
(3) oral cavity, pharyngeal cavity, nasal cavity, they are all the acoustic resonator of pronunciation.
The contact of voice and semanteme is that people arrange in long-term language practice, and the marriage relation of this pronunciation and meaning embodies
Voice has an important social property.
Speech simulation improves certain cordial feeling and adaptability, but existing people's voice mould in interactive process
Plan method, is common sound changing device, can only accomplish that channel model is carried out after being recognized according to voice is simulated, or can only adjust
Word speed and intonation, tone color can not be mentioned in the same breath with being modeled the sound of people.In a word, existing speech simulation method, can only accomplish
The common change of voice, sound can not change, and similitude is low, it is impossible to adaptability and cordial feeling when improving human-computer interaction.
The above is only used for auxiliary and understands technical scheme, does not represent and recognizes that the above is existing skill
Art.
The content of the invention
It is a primary object of the present invention to provide a kind of speech simulation method and apparatus, it is intended in the existing voice mould of solution
Plan method, can only accomplish the common change of voice, and sound can not change, and similitude is low, it is impossible to adaptability and parent when improving human-computer interaction
The problem of cutting sense.
To solve the above problems, the present invention provides a kind of speech simulation method, comprise the following steps:
Obtain the voice data of user;
The voice data is parsed, characteristic information and the preservation of the voice data is extracted;
The analog audio data corresponding with the voice data is generated according to the characteristic information preserved;
The analog audio data is played out.
Preferably, before the voice data for obtaining user, in addition to:
Obtain the speech simulation solicited message of the user;
According to the speech simulation solicited message user corresponding with the user is set to identify, and with the user
The corresponding memory space for being used to store audio user data of mark;
Prompting user starts to gather the voice data.
Preferably, it is described that the voice data is parsed, the characteristic information of the voice data is extracted, including:
After the voice data is obtained, each frame of the voice data is parsed;
The phoneme characteristic value corresponding with the voice data is extracted as characteristic information.
Preferably, after the speech simulation solicited message for obtaining the user, in addition to:
Whether judge the speech simulation solicited message of the user there is the user corresponding with the user to identify;
If so, recalling the analog audio data corresponding with user mark, and play out;
Identified if it is not, carrying out the user corresponding with the user according to speech simulation solicited message setting,
And it is corresponding with the user mark the step of be used to store the memory space of audio user data.
Preferably, the characteristic information that the basis has been preserved generates the analogue audio frequency corresponding with the voice data
Data, including:
Transfer the preset audio data that user's request is played;
The preset audio data are converted to according to the characteristic information preserved corresponding with the voice data
The analog audio data.
In addition, to solve the above problems, the present invention a kind of speech simulation device is also provided, including:Acquisition module, extraction mould
Block, generation module and playing module;
The acquisition module, the voice data for obtaining user;
The extraction module, for being parsed to the voice data, extracts the characteristic information of the voice data simultaneously
Preserve;
The generation module, for generating the mould corresponding with the voice data according to the characteristic information preserved
Intend voice data;
The playing module, for the analog audio data to be played out.
Preferably, in addition to:Setting module and reminding module;
The acquisition module, is additionally operable to obtain the speech simulation solicited message of the user;
The setting module, is marked for setting the user corresponding with the user according to the speech simulation solicited message
Know, and the memory space that is used to store audio user data corresponding with user mark;
The reminding module, the voice data is gathered for pointing out user to start.
Preferably, in addition to:Parsing module;
The parsing module, for after the voice data is obtained, each frame of the voice data to be parsed;
The extraction module, is additionally operable to extract the phoneme characteristic value corresponding with the voice data and believes as feature
Breath.
Preferably, in addition to:Judge module;
The judge module, for judging it is relative with the user whether the speech simulation solicited message of the user has
The user's mark answered;
The playing module, is additionally operable to if so, recall the analog audio data corresponding with user mark, and
Play out;
The setting module, is additionally operable to if it is not, carrying out described according to speech simulation solicited message setting and the use
The corresponding user's mark in family, and it is corresponding with user mark for storing the memory space of audio user data
Step.
Preferably, including:Transfer module and modular converter;
It is described to transfer module, for transferring the preset audio data that user's request is played;
The modular converter, for being converted to the preset audio data and institute according to the characteristic information preserved
State the corresponding analog audio data of voice data.
The present invention provides a kind of speech simulation method and apparatus, and wherein method is carried out by the audio user data to acquisition
Parse and characteristic information extraction, then the corresponding analog audio data of the voice data is generated by characteristic information, so that right
Analog audio data is played out.The present invention voice is parsed by algorithm and then characteristic is extracted, reuse and
User's identical phoneme and intonation are interacted or read aloud with user, and speech simulation effect is good, similarity is high, speech tone phase
Seemingly, improve the cordial feeling of human-computer interaction, it is to avoid existing speech simulation method, the common change of voice can only be accomplished, sound can not
Change, similitude is low, it is impossible to the problem of adaptability when improving human-computer interaction and cordial feeling.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of an embodiment of speech simulation method of the present invention;
Fig. 2 is the schematic flow sheet of two embodiments of speech simulation method of the present invention;
Fig. 3 is the schematic flow sheet of three embodiments of speech simulation method of the present invention;
Fig. 4 is the schematic flow sheet of four embodiments of speech simulation method of the present invention;
Fig. 5 is the schematic flow sheet of five embodiments of speech simulation method of the present invention;
Fig. 6 is the high-level schematic functional block diagram of the embodiment of speech simulation device of the present invention.
The realization, functional characteristics and advantage of the object of the invention will be described further referring to the drawings in conjunction with the embodiments.
Embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
The present invention provides a kind of speech simulation method.
Reference picture 1, Fig. 1 is the schematic flow sheet of an embodiment of speech simulation method of the present invention.
In one embodiment, the speech simulation method includes:
Step S10, obtains the voice data of user;
It is to be appreciated that the material shell of voice, i.e. language, is the carrier of linguistic notation system.It by people pronunciation
Organ is sent, and loads certain language meaning.The physical basis of voice mainly have pitch, loudness of a sound, the duration of a sound, tone color, and this is also
Constitute four key elements of voice.
Voice is the sound of language, is the carrier of linguistic notation system.It is sent by the vocal organs of people, is loaded certain
Language meaning.Language realizes its social function by voice.Language be the pronunciation and meaning combine notation, the sound of language and
The meaning of language is closely connected, therefore, though language is a kind of sound, but has the area of essence with general sound
Not.Voice is the sound with difference meaning function that human articulation's organ is sent, it is impossible to which voice is regarded as pure natural object
Matter;Voice is the symbolism for most directly recording thinking activities, is the form of sound of language communication instrument.
The physical basis of voice mainly have pitch, loudness of a sound, the duration of a sound, tone color, and this is also four key elements for constituting voice.Pitch refers to
The number of frequency of sound wave, i.e. vibrations per second;Loudness of a sound refers to the size of sonic wave amplitude;The duration of a sound refers to the acoustic vibration duration
Length, also referred to as " duration ";Tone color refers to the characteristic and essence of sound, also referred to as " tonequality ".
The vocal organs and its active situation of people are the physiological foundations of voice.3 parts of the vocal organs of people point:
(1) respiratory apparatus, including lung, trachea and bronchus.Lung is the center of respiratory apparatus, is the base for producing voice power
Plinth.
(2) larynx and vocal cords, they are the chatter bodies of pronunciation.
(3) oral cavity, pharyngeal cavity, nasal cavity, they are all the acoustic resonator of pronunciation.
The contact of voice and semanteme is that people arrange in long-term language practice, and the marriage relation of this pronunciation and meaning embodies
Voice has an important social property.
The mode for obtaining user speech can be to be recorded by microphone, or pass through mobile terminal and terminal connects
Connect, obtain the voice messaging sent.
Step S20, is parsed to the voice data, extracts characteristic information and the preservation of the voice data;
The characteristic information of preservation, can be preserved in the form of Wave data, can also be by the audio with the shape of frame
Formula enters between-line spacing preservation, in addition, characteristic information is saved into database, the database can be cloud database, and be used to obtain
The device for taking family voice data is terminating machine, when deployed, and terminating machine obtains the audio-frequency information of user, is sent to high in the clouds;Cloud
Hold the audio-frequency information of the user to getting to analyze, and extract its characteristic information, include intonation, accent, the language of voice
The information such as speed, frequency.
Step S30, the analog audio frequency corresponding with the voice data is generated according to the characteristic information preserved
According to;
High in the clouds generates corresponding analog audio data according to characteristic information.The analog audio data, can be will be default
Existing audio file is changed, so as to generate the analog audio data similar to user speech intonation;Can also be generation one
The form of kind of speech intonation, further interacting according to user and terminal device, the form that speech intonation is stated above is fed back.
For example, father and mother carry out speech simulation in terminal, terminal carries out the transmission of audio file to high in the clouds, after high in the clouds is obtained, according to father and mother
Audio file, the characteristic information corresponding with father and mother's sound is generated, further according to characteristic information generation with speech intonation form
Analog audio data, and then when child and terminal carry out interactive voice, terminal can be interacted by the sound of father and mother.
Step S40, the analog audio data is played out.
The present invention provides a kind of speech simulation method, is parsed by the audio user data to acquisition and extracts feature
Information, then the corresponding analog audio data of the voice data is generated by characteristic information, so as to enter to analog audio data
Row is played.The present invention is parsed to voice by algorithm and then extracts characteristic, is reused and user's identical phoneme
And intonation is interacted or read aloud with user, speech simulation effect is good, and similarity is high, speech tone is similar, improves man-machine
Interactive cordial feeling, it is to avoid existing speech simulation method, can only accomplish the common change of voice, sound can not change, and similitude is low,
The problem of adaptability when can not improve human-computer interaction and cordial feeling.
Present invention can apply to a variety of occasions of prenatal culture, early education, children education, children education etc., for passing through terminal-pair children
The simulation of the sound of known such as father and mother, is made children obtain the audio played with father and mother's sound of terminal plays, for example, says event
Thing, study etc., or interacted with children by the sound of people known to children, improve the cordial feeling of children's man-machine interaction.
Reference picture 2, Fig. 2 is the schematic flow sheet of two embodiments of speech simulation method of the present invention.
Based on an embodiment, before the step S10, in addition to:
Step S50, obtains the speech simulation solicited message of the user;
Solicited message is transmitted in terminal by user, speech simulation is made requests on.Button is for example triggered, language is opened
Sound simulated technological process, or registration log in unique account number cipher, information of filing a request, so as to carry out next step operation.
Step S60, sets the user corresponding with the user according to the speech simulation solicited message and identifies, Yi Jiyu
The user identifies the corresponding memory space for being used to store audio user data;
After the solicited message of user is got, terminal starts to be prepared speech simulation, first the voice for user
Solicited message setting user's mark is simulated, solicited message can be log-on message, and user's unique mark is generated according to log-on message,
It is corresponding with user.And then, set the memory space corresponding with user's mark, voice document, audio for depositing user
Data etc..
Step S70, points out user to start to gather the voice data.
Pointed out by terminal to user, the collection of voice data can be carried out.The step can by voice, shake into
Row prompting, also can carry out message notifying by mobile device.
Reference picture 3, Fig. 3 is the schematic flow sheet of three embodiments of speech simulation method of the present invention.
Based on an embodiment, in three embodiments, the step S20, including:
Step S21, after the voice data is obtained, each frame of the voice data is parsed;
Sound is actually a kind of waveform, and common MP3 is compressed format, it is necessary to which the file for being converted to uncompressed form is carried out
Processing, such as windowsPCM files, that is, the wav file being commonly called as.The voice data of user is stored with WAV forms
Afterwards, the waveform of the wav file is read, the mute part of two ends can be cut off first, clear band, also referred to as VAD is eliminated;
Phonetic analysis is carried out again, process is analyzed, and is that sound is cut into single segment, is turned into a frame per a bit of, use movement
Window function is realized.Can have overlapping between frame and frame, specifically, may be configured as having 25-10=15 between every 25 milliseconds of frame, every two frame
Second it is overlapping.Referred to as frame length 25ms, frame moves 10ms framings.After framing, voice is reformed into as some segments.
Step S22, extracts the phoneme characteristic value corresponding with the voice data as characteristic information.
It is above-mentioned, factor characteristic value corresponding in each frame voice data is extracted, the factor characteristic value can include
Wave character, is used as characteristic information.
Reference picture 4, Fig. 4 is the schematic flow sheet of four embodiments of speech simulation method of the present invention.
Based on two embodiments, in four embodiments, after the step S50, in addition to:
Whether step S80, judging the speech simulation solicited message of the user has the user corresponding with the user
Mark;
Above-mentioned steps are, after speech simulation solicited message is obtained, the speech simulation solicited message first to the user
Analyzed and judged, judge whether the user carried out speech simulation by terminal, be whether to preserve to believe with the request
The corresponding user's mark of manner of breathing.This deterministic process, can carry out in terminal, also the solicited message of terminal can be sent into high in the clouds, from
Carry out matching with the speech simulation solicited message in the database in high in the clouds.
Step S90, if so, recalling the analog audio data corresponding with user mark, and is played out;
When including the user mark corresponding with the speech simulation solicited message of the user in database, then no longer carry out
Further speech data analysis, directly invokes the analog audio data corresponding with user's mark, according to analog audio data
Play out or interacted with user.
Identified if it is not, carrying out the user corresponding with the user according to speech simulation solicited message setting,
And it is corresponding with the user mark the step of be used to store the memory space of audio user data.
If the user corresponding with the speech simulation solicited message of the user does not identify in the database of high in the clouds or terminal,
The user for re-establishing the user is then needed to identify and divide memory space, preparation is further preserved to the voice of user.
Reference picture 5, Fig. 5 is the schematic flow sheet of five embodiments of speech simulation method of the present invention.
Based on an embodiment, the step S30, including:
Step S31, transfers the preset audio data that user's request is played;
When carrying out interactive voice, user is called to specify the preset audio data played.The preset audio data, Ke Yiwei
Default audio file located at high in the clouds, such as programmed audio story, study content;Can also be to pass through algorithm
Judge the instruction that user is sent by voice.This process can be program request, or according to speech simulation data playback phase
The preset audio data file answered.
Step S32, is converted to the preset audio data and the audio number according to the characteristic information preserved
According to the corresponding analog audio data.
According to characteristic information, the preset audio data of user's program request are converted into analog audio data, or user is sent
The preset audio data transferred by algorithm accordingly by form of analog audio data of audio instructions fed back.For example,
When father and mother by terminal carry out speech simulation, and generate after analog audio data, children interact with terminal, terminal simulation father
Female sound carries out feedback.
The present invention also provides a kind of speech simulation device.
Reference picture 6, Fig. 6 is the module diagram of the embodiment of speech simulation device of the present invention.
In embodiment, the speech simulation device includes:
Acquisition module 10, extraction module 20, generation module 30, playing module 40, setting module 50, reminding module 60, sentence
Disconnected module 70, parsing module 80, transfer module 90 and modular converter 100;
The acquisition module 10, the voice data for obtaining user;
The extraction module 20, for being parsed to the voice data, extracts the characteristic information of the voice data
And preserve;
The generation module 30, for corresponding with the voice data according to the characteristic information generation preserved
Analog audio data;
The playing module 40, for the analog audio data to be played out.
The acquisition module 10, is additionally operable to obtain the speech simulation solicited message of the user;
The setting module 50, for setting the user corresponding with the user according to the speech simulation solicited message
Mark, and the memory space that is used to store audio user data corresponding with user mark;
The reminding module 60, the voice data is gathered for pointing out user to start.
The parsing module 80, for after the voice data is obtained, each frame of the voice data to be solved
Analysis;
The extraction module 20, is additionally operable to extract the phoneme characteristic value corresponding with the voice data and believes as feature
Breath.
The judge module 70, for judging whether the speech simulation solicited message of the user has and user's phase
Corresponding user's mark;
The playing module 40, is additionally operable to if so, recall the analog audio data corresponding with user mark,
And play out;
The setting module 50, be additionally operable to if it is not, carry out it is described according to the speech simulation solicited message setting with it is described
The corresponding user's mark of user, and the memory space that is used to store audio user data corresponding with user mark
The step of.
It is described to transfer module 90, for transferring the preset audio data that user's request is played;
The modular converter 100, for being converted to the preset audio data according to the characteristic information preserved
The analog audio data corresponding with the voice data.
The present invention provides a kind of speech simulation device, passes through acquisition module 10, extraction module 20, generation module 30, broadcasting
Module 40, setting module 50, reminding module 60, judge module 70, parsing module 80, transfer module 90 and modular converter 100
Cooperate, the audio user data to acquisition parse and characteristic information extraction, then generate the sound by characteristic information
Frequency is according to corresponding analog audio data, so as to be played out to analog audio data.The present invention is entered by algorithm to voice
Row parses and then extracts characteristic, reuses with user's identical phoneme and intonation to interact or read aloud with user,
Speech simulation effect is good, similarity is high, speech tone is similar, improves the cordial feeling of human-computer interaction, it is to avoid existing voice
Analogy method, can only accomplish the common change of voice, and sound can not change, and similitude is low, it is impossible to adaptability when improving human-computer interaction and
The problem of cordial feeling.
Present invention can apply to a variety of occasions of prenatal culture, early education, children education, children education etc., for passing through terminal-pair children
The simulation of the sound of known such as father and mother, is made children obtain the audio played with father and mother's sound of terminal plays, for example, says event
Thing, study etc., or interacted with children by the sound of people known to children, improve the cordial feeling of children's man-machine interaction.
The preferred embodiments of the present invention are these are only, are not intended to limit the scope of the invention, it is every to utilize this hair
Equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills
Art field, similarly includes within the scope of the present invention.
Claims (10)
1. a kind of speech simulation method, it is characterised in that comprise the following steps:
Obtain the voice data of user;
The voice data is parsed, characteristic information and the preservation of the voice data is extracted;
The analog audio data corresponding with the voice data is generated according to the characteristic information preserved;
The analog audio data is played out.
2. speech simulation method as claimed in claim 1, it is characterised in that before the voice data of the acquisition user, also wrap
Include:
Obtain the speech simulation solicited message of the user;
The user mark corresponding with the user is set according to the speech simulation solicited message, and is identified with the user
The corresponding memory space for being used to store audio user data;
Prompting user starts to gather the voice data.
3. speech simulation method as claimed in claim 1 or 2, it is characterised in that described to be parsed to the voice data, is carried
The characteristic information of the voice data is taken, including:
After the voice data is obtained, each frame of the voice data is parsed;
The phoneme characteristic value corresponding with the voice data is extracted as characteristic information.
4. speech simulation method as claimed in claim 2, it is characterised in that the speech simulation request letter of the acquisition user
After breath, in addition to:
Whether judge the speech simulation solicited message of the user there is the user corresponding with the user to identify;
If so, recalling the analog audio data corresponding with user mark, and play out;
Identified if it is not, carrying out the user corresponding with the user according to speech simulation solicited message setting, and
It is corresponding with the user mark the step of be used to store the memory space of audio user data.
5. speech simulation method as claimed in claim 1, it is characterised in that the characteristic information generation that the basis has been preserved
The analog audio data corresponding with the voice data, including:
Transfer the preset audio data that user's request is played;
The preset audio data are converted to the institute corresponding with the voice data according to the characteristic information preserved
State analog audio data.
6. a kind of speech simulation device, it is characterised in that including:Acquisition module, extraction module, generation module and playing module;
The acquisition module, the voice data for obtaining user;
The extraction module, for being parsed to the voice data, extracts characteristic information and the preservation of the voice data;
The generation module, for generating the analog audio corresponding with the voice data according to the characteristic information preserved
Frequency evidence;
The playing module, for the analog audio data to be played out.
7. speech simulation device as claimed in claim 6, it is characterised in that also include:Setting module and reminding module;
The acquisition module, is additionally operable to obtain the speech simulation solicited message of the user;
The setting module, is identified for setting the user corresponding with the user according to the speech simulation solicited message,
And the memory space that is used to store audio user data corresponding with user mark;
The reminding module, the voice data is gathered for pointing out user to start.
8. speech simulation device as claimed in claim 7, it is characterised in that also include:Parsing module;
The parsing module, for after the voice data is obtained, each frame of the voice data to be parsed;
The extraction module, is additionally operable to extract the phoneme characteristic value corresponding with the voice data as characteristic information.
9. speech simulation device as claimed in claim 8, it is characterised in that also include:Judge module;
The judge module, for judging it is corresponding with the user whether the speech simulation solicited message of the user has
User identifies;
The playing module, is additionally operable to if so, recalling the analog audio data corresponding with user mark, and carry out
Play;
The setting module, is additionally operable to if it is not, carrying out described according to speech simulation solicited message setting and user's phase
Corresponding user's mark, and the step that is used to store the memory space of audio user data corresponding with user mark
Suddenly.
10. speech simulation device as claimed in claim 9, it is characterised in that including:Transfer module and modular converter;
It is described to transfer module, for transferring the preset audio data that user's request is played;
The modular converter, for being converted to the preset audio data and the sound according to the characteristic information preserved
Frequency is according to the corresponding analog audio data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710260306.4A CN107093421A (en) | 2017-04-20 | 2017-04-20 | A kind of speech simulation method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710260306.4A CN107093421A (en) | 2017-04-20 | 2017-04-20 | A kind of speech simulation method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107093421A true CN107093421A (en) | 2017-08-25 |
Family
ID=59638527
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710260306.4A Pending CN107093421A (en) | 2017-04-20 | 2017-04-20 | A kind of speech simulation method and apparatus |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107093421A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107481735A (en) * | 2017-08-28 | 2017-12-15 | 中国移动通信集团公司 | A kind of method, server and the computer-readable recording medium of transducing audio sounding |
CN108364658A (en) * | 2018-03-21 | 2018-08-03 | 冯键能 | Cyberchat method and server-side |
CN108806699A (en) * | 2018-05-30 | 2018-11-13 | Oppo广东移动通信有限公司 | Voice feedback method, apparatus, storage medium and electronic equipment |
CN109215629A (en) * | 2018-11-22 | 2019-01-15 | Oppo广东移动通信有限公司 | Method of speech processing, device and terminal |
CN109697290A (en) * | 2018-12-29 | 2019-04-30 | 咪咕数字传媒有限公司 | A kind of information processing method, equipment and computer storage medium |
CN110415680A (en) * | 2018-09-05 | 2019-11-05 | 满金坝(深圳)科技有限公司 | A kind of simultaneous interpretation method, synchronous translation apparatus and a kind of electronic equipment |
WO2020052665A1 (en) * | 2018-09-12 | 2020-03-19 | 咪咕音乐有限公司 | Live broadcast interaction method and apparatus, and storage medium |
CN112786026A (en) * | 2019-12-31 | 2021-05-11 | 深圳市木愚科技有限公司 | Parent-child story personalized audio generation system and method based on voice migration learning |
CN113223493A (en) * | 2020-01-20 | 2021-08-06 | Tcl集团股份有限公司 | Voice nursing method, device, system and storage medium |
US20240029710A1 (en) * | 2018-06-19 | 2024-01-25 | Georgetown University | Method and System for a Parametric Speech Synthesis |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6101470A (en) * | 1998-05-26 | 2000-08-08 | International Business Machines Corporation | Methods for generating pitch and duration contours in a text to speech system |
CN1731509A (en) * | 2005-09-02 | 2006-02-08 | 清华大学 | Mobile speech synthesis method |
CN104867489A (en) * | 2015-04-27 | 2015-08-26 | 苏州大学张家港工业技术研究院 | Method and system for simulating reading and pronunciation of real person |
CN105425953A (en) * | 2015-11-02 | 2016-03-23 | 小天才科技有限公司 | Man-machine interaction method and system |
CN106328139A (en) * | 2016-09-14 | 2017-01-11 | 努比亚技术有限公司 | Voice interaction method and voice interaction system |
-
2017
- 2017-04-20 CN CN201710260306.4A patent/CN107093421A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6101470A (en) * | 1998-05-26 | 2000-08-08 | International Business Machines Corporation | Methods for generating pitch and duration contours in a text to speech system |
CN1731509A (en) * | 2005-09-02 | 2006-02-08 | 清华大学 | Mobile speech synthesis method |
CN104867489A (en) * | 2015-04-27 | 2015-08-26 | 苏州大学张家港工业技术研究院 | Method and system for simulating reading and pronunciation of real person |
CN105425953A (en) * | 2015-11-02 | 2016-03-23 | 小天才科技有限公司 | Man-machine interaction method and system |
CN106328139A (en) * | 2016-09-14 | 2017-01-11 | 努比亚技术有限公司 | Voice interaction method and voice interaction system |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107481735A (en) * | 2017-08-28 | 2017-12-15 | 中国移动通信集团公司 | A kind of method, server and the computer-readable recording medium of transducing audio sounding |
CN108364658A (en) * | 2018-03-21 | 2018-08-03 | 冯键能 | Cyberchat method and server-side |
CN108806699A (en) * | 2018-05-30 | 2018-11-13 | Oppo广东移动通信有限公司 | Voice feedback method, apparatus, storage medium and electronic equipment |
US20240029710A1 (en) * | 2018-06-19 | 2024-01-25 | Georgetown University | Method and System for a Parametric Speech Synthesis |
CN110415680A (en) * | 2018-09-05 | 2019-11-05 | 满金坝(深圳)科技有限公司 | A kind of simultaneous interpretation method, synchronous translation apparatus and a kind of electronic equipment |
CN110415680B (en) * | 2018-09-05 | 2022-10-04 | 梁志军 | Simultaneous interpretation method, simultaneous interpretation device and electronic equipment |
WO2020052665A1 (en) * | 2018-09-12 | 2020-03-19 | 咪咕音乐有限公司 | Live broadcast interaction method and apparatus, and storage medium |
CN109215629A (en) * | 2018-11-22 | 2019-01-15 | Oppo广东移动通信有限公司 | Method of speech processing, device and terminal |
CN109215629B (en) * | 2018-11-22 | 2021-01-01 | Oppo广东移动通信有限公司 | Voice processing method and device and terminal |
CN109697290A (en) * | 2018-12-29 | 2019-04-30 | 咪咕数字传媒有限公司 | A kind of information processing method, equipment and computer storage medium |
CN112786026A (en) * | 2019-12-31 | 2021-05-11 | 深圳市木愚科技有限公司 | Parent-child story personalized audio generation system and method based on voice migration learning |
CN113223493A (en) * | 2020-01-20 | 2021-08-06 | Tcl集团股份有限公司 | Voice nursing method, device, system and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107093421A (en) | A kind of speech simulation method and apparatus | |
JP6113302B2 (en) | Audio data transmission method and apparatus | |
Liss et al. | The effects of familiarization on intelligibility and lexical segmentation in hypokinetic and ataxic dysarthria | |
US9547642B2 (en) | Voice to text to voice processing | |
Jovičić et al. | Serbian emotional speech database: design, processing and evaluation | |
CN110136687B (en) | Voice training based cloned accent and rhyme method | |
CN108831436A (en) | A method of text speech synthesis after simulation speaker's mood optimization translation | |
US20180130462A1 (en) | Voice interaction method and voice interaction device | |
CN113010138B (en) | Article voice playing method, device and equipment and computer readable storage medium | |
WO2005093713A1 (en) | Speech synthesis device | |
CN110675886A (en) | Audio signal processing method, audio signal processing device, electronic equipment and storage medium | |
CN105448289A (en) | Speech synthesis method, speech synthesis device, speech deletion method, speech deletion device and speech deletion and synthesis method | |
CN110390928A (en) | It is a kind of to open up the speech synthesis model training method and system for increasing corpus automatically | |
CN111105776A (en) | Audio playing device and playing method thereof | |
CN108986785B (en) | Text recomposition method and device | |
JP6792091B1 (en) | Speech learning system and speech learning method | |
CN109065019A (en) | A kind of narration data processing method and system towards intelligent robot | |
CN106471569A (en) | Speech synthesis apparatus, phoneme synthesizing method and its program | |
CN105303909B (en) | A kind of methods, devices and systems based on vibration English learning | |
WO2023276539A1 (en) | Voice conversion device, voice conversion method, program, and recording medium | |
JP6291808B2 (en) | Speech synthesis apparatus and method | |
Bansal et al. | Emotional Hindi speech database | |
CN109036373A (en) | A kind of method of speech processing and electronic equipment | |
Westall et al. | Speech technology for telecommunications | |
US20200175988A1 (en) | Information providing method and information providing apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170825 |
|
RJ01 | Rejection of invention patent application after publication |