CN107093421A - A kind of speech simulation method and apparatus - Google Patents

A kind of speech simulation method and apparatus Download PDF

Info

Publication number
CN107093421A
CN107093421A CN201710260306.4A CN201710260306A CN107093421A CN 107093421 A CN107093421 A CN 107093421A CN 201710260306 A CN201710260306 A CN 201710260306A CN 107093421 A CN107093421 A CN 107093421A
Authority
CN
China
Prior art keywords
user
voice data
speech simulation
module
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710260306.4A
Other languages
Chinese (zh)
Inventor
王斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yifang Digital Technology Co Ltd
Original Assignee
Shenzhen Yifang Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yifang Digital Technology Co Ltd filed Critical Shenzhen Yifang Digital Technology Co Ltd
Priority to CN201710260306.4A priority Critical patent/CN107093421A/en
Publication of CN107093421A publication Critical patent/CN107093421A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/75Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 for modelling vocal tract parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/61Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation
    • G10L2013/105Duration

Abstract

The invention provides a kind of speech simulation method and apparatus, wherein method comprises the following steps:Obtain the voice data of user;The voice data is parsed, characteristic information and the preservation of the voice data is extracted;The analog audio data corresponding with the voice data is generated according to the characteristic information preserved;The analog audio data is played out.The present invention is parsed to voice by algorithm and then extracts characteristic, reuse with user's identical phoneme and intonation to interact or read aloud with user, speech simulation effect is good, similarity is high, speech tone is similar, improve the cordial feeling of human-computer interaction, it is to avoid existing speech simulation method, the common change of voice can only be accomplished, sound can not change, and similitude is low, it is impossible to the problem of adaptability when improving human-computer interaction and cordial feeling.

Description

A kind of speech simulation method and apparatus
Technical field
The present invention relates to voice signal technical field, more particularly to a kind of speech simulation method and apparatus.
Background technology
The material shell of voice, i.e. language, is the carrier of linguistic notation system.It is sent by the vocal organs of people, load Certain language meaning.The physical basis of voice mainly have pitch, loudness of a sound, the duration of a sound, tone color, and this is also to constitute the four of voice to want Element.
Voice is the sound of language, is the carrier of linguistic notation system.It is sent by the vocal organs of people, is loaded certain Language meaning.Language realizes its social function by voice.Language be the pronunciation and meaning combine notation, the sound of language and The meaning of language is closely connected, therefore, though language is a kind of sound, but has the area of essence with general sound Not.Voice is the sound with difference meaning function that human articulation's organ is sent, it is impossible to which voice is regarded as pure natural object Matter;Voice is the symbolism for most directly recording thinking activities, is the form of sound of language communication instrument.
The physical basis of voice mainly have pitch, loudness of a sound, the duration of a sound, tone color, and this is also four key elements for constituting voice.Pitch refers to The number of frequency of sound wave, i.e. vibrations per second;Loudness of a sound refers to the size of sonic wave amplitude;The duration of a sound refers to the acoustic vibration duration Length, also referred to as " duration ";Tone color refers to the characteristic and essence of sound, also referred to as " tonequality ".
The vocal organs and its active situation of people are the physiological foundations of voice.3 parts of the vocal organs of people point:
(1) respiratory apparatus, including lung, trachea and bronchus.Lung is the center of respiratory apparatus, is the base for producing voice power Plinth.
(2) larynx and vocal cords, they are the chatter bodies of pronunciation.
(3) oral cavity, pharyngeal cavity, nasal cavity, they are all the acoustic resonator of pronunciation.
The contact of voice and semanteme is that people arrange in long-term language practice, and the marriage relation of this pronunciation and meaning embodies Voice has an important social property.
Speech simulation improves certain cordial feeling and adaptability, but existing people's voice mould in interactive process Plan method, is common sound changing device, can only accomplish that channel model is carried out after being recognized according to voice is simulated, or can only adjust Word speed and intonation, tone color can not be mentioned in the same breath with being modeled the sound of people.In a word, existing speech simulation method, can only accomplish The common change of voice, sound can not change, and similitude is low, it is impossible to adaptability and cordial feeling when improving human-computer interaction.
The above is only used for auxiliary and understands technical scheme, does not represent and recognizes that the above is existing skill Art.
The content of the invention
It is a primary object of the present invention to provide a kind of speech simulation method and apparatus, it is intended in the existing voice mould of solution Plan method, can only accomplish the common change of voice, and sound can not change, and similitude is low, it is impossible to adaptability and parent when improving human-computer interaction The problem of cutting sense.
To solve the above problems, the present invention provides a kind of speech simulation method, comprise the following steps:
Obtain the voice data of user;
The voice data is parsed, characteristic information and the preservation of the voice data is extracted;
The analog audio data corresponding with the voice data is generated according to the characteristic information preserved;
The analog audio data is played out.
Preferably, before the voice data for obtaining user, in addition to:
Obtain the speech simulation solicited message of the user;
According to the speech simulation solicited message user corresponding with the user is set to identify, and with the user The corresponding memory space for being used to store audio user data of mark;
Prompting user starts to gather the voice data.
Preferably, it is described that the voice data is parsed, the characteristic information of the voice data is extracted, including:
After the voice data is obtained, each frame of the voice data is parsed;
The phoneme characteristic value corresponding with the voice data is extracted as characteristic information.
Preferably, after the speech simulation solicited message for obtaining the user, in addition to:
Whether judge the speech simulation solicited message of the user there is the user corresponding with the user to identify;
If so, recalling the analog audio data corresponding with user mark, and play out;
Identified if it is not, carrying out the user corresponding with the user according to speech simulation solicited message setting, And it is corresponding with the user mark the step of be used to store the memory space of audio user data.
Preferably, the characteristic information that the basis has been preserved generates the analogue audio frequency corresponding with the voice data Data, including:
Transfer the preset audio data that user's request is played;
The preset audio data are converted to according to the characteristic information preserved corresponding with the voice data The analog audio data.
In addition, to solve the above problems, the present invention a kind of speech simulation device is also provided, including:Acquisition module, extraction mould Block, generation module and playing module;
The acquisition module, the voice data for obtaining user;
The extraction module, for being parsed to the voice data, extracts the characteristic information of the voice data simultaneously Preserve;
The generation module, for generating the mould corresponding with the voice data according to the characteristic information preserved Intend voice data;
The playing module, for the analog audio data to be played out.
Preferably, in addition to:Setting module and reminding module;
The acquisition module, is additionally operable to obtain the speech simulation solicited message of the user;
The setting module, is marked for setting the user corresponding with the user according to the speech simulation solicited message Know, and the memory space that is used to store audio user data corresponding with user mark;
The reminding module, the voice data is gathered for pointing out user to start.
Preferably, in addition to:Parsing module;
The parsing module, for after the voice data is obtained, each frame of the voice data to be parsed;
The extraction module, is additionally operable to extract the phoneme characteristic value corresponding with the voice data and believes as feature Breath.
Preferably, in addition to:Judge module;
The judge module, for judging it is relative with the user whether the speech simulation solicited message of the user has The user's mark answered;
The playing module, is additionally operable to if so, recall the analog audio data corresponding with user mark, and Play out;
The setting module, is additionally operable to if it is not, carrying out described according to speech simulation solicited message setting and the use The corresponding user's mark in family, and it is corresponding with user mark for storing the memory space of audio user data Step.
Preferably, including:Transfer module and modular converter;
It is described to transfer module, for transferring the preset audio data that user's request is played;
The modular converter, for being converted to the preset audio data and institute according to the characteristic information preserved State the corresponding analog audio data of voice data.
The present invention provides a kind of speech simulation method and apparatus, and wherein method is carried out by the audio user data to acquisition Parse and characteristic information extraction, then the corresponding analog audio data of the voice data is generated by characteristic information, so that right Analog audio data is played out.The present invention voice is parsed by algorithm and then characteristic is extracted, reuse and User's identical phoneme and intonation are interacted or read aloud with user, and speech simulation effect is good, similarity is high, speech tone phase Seemingly, improve the cordial feeling of human-computer interaction, it is to avoid existing speech simulation method, the common change of voice can only be accomplished, sound can not Change, similitude is low, it is impossible to the problem of adaptability when improving human-computer interaction and cordial feeling.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of an embodiment of speech simulation method of the present invention;
Fig. 2 is the schematic flow sheet of two embodiments of speech simulation method of the present invention;
Fig. 3 is the schematic flow sheet of three embodiments of speech simulation method of the present invention;
Fig. 4 is the schematic flow sheet of four embodiments of speech simulation method of the present invention;
Fig. 5 is the schematic flow sheet of five embodiments of speech simulation method of the present invention;
Fig. 6 is the high-level schematic functional block diagram of the embodiment of speech simulation device of the present invention.
The realization, functional characteristics and advantage of the object of the invention will be described further referring to the drawings in conjunction with the embodiments.
Embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
The present invention provides a kind of speech simulation method.
Reference picture 1, Fig. 1 is the schematic flow sheet of an embodiment of speech simulation method of the present invention.
In one embodiment, the speech simulation method includes:
Step S10, obtains the voice data of user;
It is to be appreciated that the material shell of voice, i.e. language, is the carrier of linguistic notation system.It by people pronunciation Organ is sent, and loads certain language meaning.The physical basis of voice mainly have pitch, loudness of a sound, the duration of a sound, tone color, and this is also Constitute four key elements of voice.
Voice is the sound of language, is the carrier of linguistic notation system.It is sent by the vocal organs of people, is loaded certain Language meaning.Language realizes its social function by voice.Language be the pronunciation and meaning combine notation, the sound of language and The meaning of language is closely connected, therefore, though language is a kind of sound, but has the area of essence with general sound Not.Voice is the sound with difference meaning function that human articulation's organ is sent, it is impossible to which voice is regarded as pure natural object Matter;Voice is the symbolism for most directly recording thinking activities, is the form of sound of language communication instrument.
The physical basis of voice mainly have pitch, loudness of a sound, the duration of a sound, tone color, and this is also four key elements for constituting voice.Pitch refers to The number of frequency of sound wave, i.e. vibrations per second;Loudness of a sound refers to the size of sonic wave amplitude;The duration of a sound refers to the acoustic vibration duration Length, also referred to as " duration ";Tone color refers to the characteristic and essence of sound, also referred to as " tonequality ".
The vocal organs and its active situation of people are the physiological foundations of voice.3 parts of the vocal organs of people point:
(1) respiratory apparatus, including lung, trachea and bronchus.Lung is the center of respiratory apparatus, is the base for producing voice power Plinth.
(2) larynx and vocal cords, they are the chatter bodies of pronunciation.
(3) oral cavity, pharyngeal cavity, nasal cavity, they are all the acoustic resonator of pronunciation.
The contact of voice and semanteme is that people arrange in long-term language practice, and the marriage relation of this pronunciation and meaning embodies Voice has an important social property.
The mode for obtaining user speech can be to be recorded by microphone, or pass through mobile terminal and terminal connects Connect, obtain the voice messaging sent.
Step S20, is parsed to the voice data, extracts characteristic information and the preservation of the voice data;
The characteristic information of preservation, can be preserved in the form of Wave data, can also be by the audio with the shape of frame Formula enters between-line spacing preservation, in addition, characteristic information is saved into database, the database can be cloud database, and be used to obtain The device for taking family voice data is terminating machine, when deployed, and terminating machine obtains the audio-frequency information of user, is sent to high in the clouds;Cloud Hold the audio-frequency information of the user to getting to analyze, and extract its characteristic information, include intonation, accent, the language of voice The information such as speed, frequency.
Step S30, the analog audio frequency corresponding with the voice data is generated according to the characteristic information preserved According to;
High in the clouds generates corresponding analog audio data according to characteristic information.The analog audio data, can be will be default Existing audio file is changed, so as to generate the analog audio data similar to user speech intonation;Can also be generation one The form of kind of speech intonation, further interacting according to user and terminal device, the form that speech intonation is stated above is fed back. For example, father and mother carry out speech simulation in terminal, terminal carries out the transmission of audio file to high in the clouds, after high in the clouds is obtained, according to father and mother Audio file, the characteristic information corresponding with father and mother's sound is generated, further according to characteristic information generation with speech intonation form Analog audio data, and then when child and terminal carry out interactive voice, terminal can be interacted by the sound of father and mother.
Step S40, the analog audio data is played out.
The present invention provides a kind of speech simulation method, is parsed by the audio user data to acquisition and extracts feature Information, then the corresponding analog audio data of the voice data is generated by characteristic information, so as to enter to analog audio data Row is played.The present invention is parsed to voice by algorithm and then extracts characteristic, is reused and user's identical phoneme And intonation is interacted or read aloud with user, speech simulation effect is good, and similarity is high, speech tone is similar, improves man-machine Interactive cordial feeling, it is to avoid existing speech simulation method, can only accomplish the common change of voice, sound can not change, and similitude is low, The problem of adaptability when can not improve human-computer interaction and cordial feeling.
Present invention can apply to a variety of occasions of prenatal culture, early education, children education, children education etc., for passing through terminal-pair children The simulation of the sound of known such as father and mother, is made children obtain the audio played with father and mother's sound of terminal plays, for example, says event Thing, study etc., or interacted with children by the sound of people known to children, improve the cordial feeling of children's man-machine interaction.
Reference picture 2, Fig. 2 is the schematic flow sheet of two embodiments of speech simulation method of the present invention.
Based on an embodiment, before the step S10, in addition to:
Step S50, obtains the speech simulation solicited message of the user;
Solicited message is transmitted in terminal by user, speech simulation is made requests on.Button is for example triggered, language is opened Sound simulated technological process, or registration log in unique account number cipher, information of filing a request, so as to carry out next step operation.
Step S60, sets the user corresponding with the user according to the speech simulation solicited message and identifies, Yi Jiyu The user identifies the corresponding memory space for being used to store audio user data;
After the solicited message of user is got, terminal starts to be prepared speech simulation, first the voice for user Solicited message setting user's mark is simulated, solicited message can be log-on message, and user's unique mark is generated according to log-on message, It is corresponding with user.And then, set the memory space corresponding with user's mark, voice document, audio for depositing user Data etc..
Step S70, points out user to start to gather the voice data.
Pointed out by terminal to user, the collection of voice data can be carried out.The step can by voice, shake into Row prompting, also can carry out message notifying by mobile device.
Reference picture 3, Fig. 3 is the schematic flow sheet of three embodiments of speech simulation method of the present invention.
Based on an embodiment, in three embodiments, the step S20, including:
Step S21, after the voice data is obtained, each frame of the voice data is parsed;
Sound is actually a kind of waveform, and common MP3 is compressed format, it is necessary to which the file for being converted to uncompressed form is carried out Processing, such as windowsPCM files, that is, the wav file being commonly called as.The voice data of user is stored with WAV forms Afterwards, the waveform of the wav file is read, the mute part of two ends can be cut off first, clear band, also referred to as VAD is eliminated; Phonetic analysis is carried out again, process is analyzed, and is that sound is cut into single segment, is turned into a frame per a bit of, use movement Window function is realized.Can have overlapping between frame and frame, specifically, may be configured as having 25-10=15 between every 25 milliseconds of frame, every two frame Second it is overlapping.Referred to as frame length 25ms, frame moves 10ms framings.After framing, voice is reformed into as some segments.
Step S22, extracts the phoneme characteristic value corresponding with the voice data as characteristic information.
It is above-mentioned, factor characteristic value corresponding in each frame voice data is extracted, the factor characteristic value can include Wave character, is used as characteristic information.
Reference picture 4, Fig. 4 is the schematic flow sheet of four embodiments of speech simulation method of the present invention.
Based on two embodiments, in four embodiments, after the step S50, in addition to:
Whether step S80, judging the speech simulation solicited message of the user has the user corresponding with the user Mark;
Above-mentioned steps are, after speech simulation solicited message is obtained, the speech simulation solicited message first to the user Analyzed and judged, judge whether the user carried out speech simulation by terminal, be whether to preserve to believe with the request The corresponding user's mark of manner of breathing.This deterministic process, can carry out in terminal, also the solicited message of terminal can be sent into high in the clouds, from Carry out matching with the speech simulation solicited message in the database in high in the clouds.
Step S90, if so, recalling the analog audio data corresponding with user mark, and is played out;
When including the user mark corresponding with the speech simulation solicited message of the user in database, then no longer carry out Further speech data analysis, directly invokes the analog audio data corresponding with user's mark, according to analog audio data Play out or interacted with user.
Identified if it is not, carrying out the user corresponding with the user according to speech simulation solicited message setting, And it is corresponding with the user mark the step of be used to store the memory space of audio user data.
If the user corresponding with the speech simulation solicited message of the user does not identify in the database of high in the clouds or terminal, The user for re-establishing the user is then needed to identify and divide memory space, preparation is further preserved to the voice of user.
Reference picture 5, Fig. 5 is the schematic flow sheet of five embodiments of speech simulation method of the present invention.
Based on an embodiment, the step S30, including:
Step S31, transfers the preset audio data that user's request is played;
When carrying out interactive voice, user is called to specify the preset audio data played.The preset audio data, Ke Yiwei Default audio file located at high in the clouds, such as programmed audio story, study content;Can also be to pass through algorithm Judge the instruction that user is sent by voice.This process can be program request, or according to speech simulation data playback phase The preset audio data file answered.
Step S32, is converted to the preset audio data and the audio number according to the characteristic information preserved According to the corresponding analog audio data.
According to characteristic information, the preset audio data of user's program request are converted into analog audio data, or user is sent The preset audio data transferred by algorithm accordingly by form of analog audio data of audio instructions fed back.For example, When father and mother by terminal carry out speech simulation, and generate after analog audio data, children interact with terminal, terminal simulation father Female sound carries out feedback.
The present invention also provides a kind of speech simulation device.
Reference picture 6, Fig. 6 is the module diagram of the embodiment of speech simulation device of the present invention.
In embodiment, the speech simulation device includes:
Acquisition module 10, extraction module 20, generation module 30, playing module 40, setting module 50, reminding module 60, sentence Disconnected module 70, parsing module 80, transfer module 90 and modular converter 100;
The acquisition module 10, the voice data for obtaining user;
The extraction module 20, for being parsed to the voice data, extracts the characteristic information of the voice data And preserve;
The generation module 30, for corresponding with the voice data according to the characteristic information generation preserved Analog audio data;
The playing module 40, for the analog audio data to be played out.
The acquisition module 10, is additionally operable to obtain the speech simulation solicited message of the user;
The setting module 50, for setting the user corresponding with the user according to the speech simulation solicited message Mark, and the memory space that is used to store audio user data corresponding with user mark;
The reminding module 60, the voice data is gathered for pointing out user to start.
The parsing module 80, for after the voice data is obtained, each frame of the voice data to be solved Analysis;
The extraction module 20, is additionally operable to extract the phoneme characteristic value corresponding with the voice data and believes as feature Breath.
The judge module 70, for judging whether the speech simulation solicited message of the user has and user's phase Corresponding user's mark;
The playing module 40, is additionally operable to if so, recall the analog audio data corresponding with user mark, And play out;
The setting module 50, be additionally operable to if it is not, carry out it is described according to the speech simulation solicited message setting with it is described The corresponding user's mark of user, and the memory space that is used to store audio user data corresponding with user mark The step of.
It is described to transfer module 90, for transferring the preset audio data that user's request is played;
The modular converter 100, for being converted to the preset audio data according to the characteristic information preserved The analog audio data corresponding with the voice data.
The present invention provides a kind of speech simulation device, passes through acquisition module 10, extraction module 20, generation module 30, broadcasting Module 40, setting module 50, reminding module 60, judge module 70, parsing module 80, transfer module 90 and modular converter 100 Cooperate, the audio user data to acquisition parse and characteristic information extraction, then generate the sound by characteristic information Frequency is according to corresponding analog audio data, so as to be played out to analog audio data.The present invention is entered by algorithm to voice Row parses and then extracts characteristic, reuses with user's identical phoneme and intonation to interact or read aloud with user, Speech simulation effect is good, similarity is high, speech tone is similar, improves the cordial feeling of human-computer interaction, it is to avoid existing voice Analogy method, can only accomplish the common change of voice, and sound can not change, and similitude is low, it is impossible to adaptability when improving human-computer interaction and The problem of cordial feeling.
Present invention can apply to a variety of occasions of prenatal culture, early education, children education, children education etc., for passing through terminal-pair children The simulation of the sound of known such as father and mother, is made children obtain the audio played with father and mother's sound of terminal plays, for example, says event Thing, study etc., or interacted with children by the sound of people known to children, improve the cordial feeling of children's man-machine interaction.
The preferred embodiments of the present invention are these are only, are not intended to limit the scope of the invention, it is every to utilize this hair Equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills Art field, similarly includes within the scope of the present invention.

Claims (10)

1. a kind of speech simulation method, it is characterised in that comprise the following steps:
Obtain the voice data of user;
The voice data is parsed, characteristic information and the preservation of the voice data is extracted;
The analog audio data corresponding with the voice data is generated according to the characteristic information preserved;
The analog audio data is played out.
2. speech simulation method as claimed in claim 1, it is characterised in that before the voice data of the acquisition user, also wrap Include:
Obtain the speech simulation solicited message of the user;
The user mark corresponding with the user is set according to the speech simulation solicited message, and is identified with the user The corresponding memory space for being used to store audio user data;
Prompting user starts to gather the voice data.
3. speech simulation method as claimed in claim 1 or 2, it is characterised in that described to be parsed to the voice data, is carried The characteristic information of the voice data is taken, including:
After the voice data is obtained, each frame of the voice data is parsed;
The phoneme characteristic value corresponding with the voice data is extracted as characteristic information.
4. speech simulation method as claimed in claim 2, it is characterised in that the speech simulation request letter of the acquisition user After breath, in addition to:
Whether judge the speech simulation solicited message of the user there is the user corresponding with the user to identify;
If so, recalling the analog audio data corresponding with user mark, and play out;
Identified if it is not, carrying out the user corresponding with the user according to speech simulation solicited message setting, and It is corresponding with the user mark the step of be used to store the memory space of audio user data.
5. speech simulation method as claimed in claim 1, it is characterised in that the characteristic information generation that the basis has been preserved The analog audio data corresponding with the voice data, including:
Transfer the preset audio data that user's request is played;
The preset audio data are converted to the institute corresponding with the voice data according to the characteristic information preserved State analog audio data.
6. a kind of speech simulation device, it is characterised in that including:Acquisition module, extraction module, generation module and playing module;
The acquisition module, the voice data for obtaining user;
The extraction module, for being parsed to the voice data, extracts characteristic information and the preservation of the voice data;
The generation module, for generating the analog audio corresponding with the voice data according to the characteristic information preserved Frequency evidence;
The playing module, for the analog audio data to be played out.
7. speech simulation device as claimed in claim 6, it is characterised in that also include:Setting module and reminding module;
The acquisition module, is additionally operable to obtain the speech simulation solicited message of the user;
The setting module, is identified for setting the user corresponding with the user according to the speech simulation solicited message, And the memory space that is used to store audio user data corresponding with user mark;
The reminding module, the voice data is gathered for pointing out user to start.
8. speech simulation device as claimed in claim 7, it is characterised in that also include:Parsing module;
The parsing module, for after the voice data is obtained, each frame of the voice data to be parsed;
The extraction module, is additionally operable to extract the phoneme characteristic value corresponding with the voice data as characteristic information.
9. speech simulation device as claimed in claim 8, it is characterised in that also include:Judge module;
The judge module, for judging it is corresponding with the user whether the speech simulation solicited message of the user has User identifies;
The playing module, is additionally operable to if so, recalling the analog audio data corresponding with user mark, and carry out Play;
The setting module, is additionally operable to if it is not, carrying out described according to speech simulation solicited message setting and user's phase Corresponding user's mark, and the step that is used to store the memory space of audio user data corresponding with user mark Suddenly.
10. speech simulation device as claimed in claim 9, it is characterised in that including:Transfer module and modular converter;
It is described to transfer module, for transferring the preset audio data that user's request is played;
The modular converter, for being converted to the preset audio data and the sound according to the characteristic information preserved Frequency is according to the corresponding analog audio data.
CN201710260306.4A 2017-04-20 2017-04-20 A kind of speech simulation method and apparatus Pending CN107093421A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710260306.4A CN107093421A (en) 2017-04-20 2017-04-20 A kind of speech simulation method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710260306.4A CN107093421A (en) 2017-04-20 2017-04-20 A kind of speech simulation method and apparatus

Publications (1)

Publication Number Publication Date
CN107093421A true CN107093421A (en) 2017-08-25

Family

ID=59638527

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710260306.4A Pending CN107093421A (en) 2017-04-20 2017-04-20 A kind of speech simulation method and apparatus

Country Status (1)

Country Link
CN (1) CN107093421A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107481735A (en) * 2017-08-28 2017-12-15 中国移动通信集团公司 A kind of method, server and the computer-readable recording medium of transducing audio sounding
CN108364658A (en) * 2018-03-21 2018-08-03 冯键能 Cyberchat method and server-side
CN108806699A (en) * 2018-05-30 2018-11-13 Oppo广东移动通信有限公司 Voice feedback method, apparatus, storage medium and electronic equipment
CN109215629A (en) * 2018-11-22 2019-01-15 Oppo广东移动通信有限公司 Method of speech processing, device and terminal
CN109697290A (en) * 2018-12-29 2019-04-30 咪咕数字传媒有限公司 A kind of information processing method, equipment and computer storage medium
CN110415680A (en) * 2018-09-05 2019-11-05 满金坝(深圳)科技有限公司 A kind of simultaneous interpretation method, synchronous translation apparatus and a kind of electronic equipment
WO2020052665A1 (en) * 2018-09-12 2020-03-19 咪咕音乐有限公司 Live broadcast interaction method and apparatus, and storage medium
CN112786026A (en) * 2019-12-31 2021-05-11 深圳市木愚科技有限公司 Parent-child story personalized audio generation system and method based on voice migration learning
CN113223493A (en) * 2020-01-20 2021-08-06 Tcl集团股份有限公司 Voice nursing method, device, system and storage medium
US20240029710A1 (en) * 2018-06-19 2024-01-25 Georgetown University Method and System for a Parametric Speech Synthesis

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6101470A (en) * 1998-05-26 2000-08-08 International Business Machines Corporation Methods for generating pitch and duration contours in a text to speech system
CN1731509A (en) * 2005-09-02 2006-02-08 清华大学 Mobile speech synthesis method
CN104867489A (en) * 2015-04-27 2015-08-26 苏州大学张家港工业技术研究院 Method and system for simulating reading and pronunciation of real person
CN105425953A (en) * 2015-11-02 2016-03-23 小天才科技有限公司 Man-machine interaction method and system
CN106328139A (en) * 2016-09-14 2017-01-11 努比亚技术有限公司 Voice interaction method and voice interaction system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6101470A (en) * 1998-05-26 2000-08-08 International Business Machines Corporation Methods for generating pitch and duration contours in a text to speech system
CN1731509A (en) * 2005-09-02 2006-02-08 清华大学 Mobile speech synthesis method
CN104867489A (en) * 2015-04-27 2015-08-26 苏州大学张家港工业技术研究院 Method and system for simulating reading and pronunciation of real person
CN105425953A (en) * 2015-11-02 2016-03-23 小天才科技有限公司 Man-machine interaction method and system
CN106328139A (en) * 2016-09-14 2017-01-11 努比亚技术有限公司 Voice interaction method and voice interaction system

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107481735A (en) * 2017-08-28 2017-12-15 中国移动通信集团公司 A kind of method, server and the computer-readable recording medium of transducing audio sounding
CN108364658A (en) * 2018-03-21 2018-08-03 冯键能 Cyberchat method and server-side
CN108806699A (en) * 2018-05-30 2018-11-13 Oppo广东移动通信有限公司 Voice feedback method, apparatus, storage medium and electronic equipment
US20240029710A1 (en) * 2018-06-19 2024-01-25 Georgetown University Method and System for a Parametric Speech Synthesis
CN110415680A (en) * 2018-09-05 2019-11-05 满金坝(深圳)科技有限公司 A kind of simultaneous interpretation method, synchronous translation apparatus and a kind of electronic equipment
CN110415680B (en) * 2018-09-05 2022-10-04 梁志军 Simultaneous interpretation method, simultaneous interpretation device and electronic equipment
WO2020052665A1 (en) * 2018-09-12 2020-03-19 咪咕音乐有限公司 Live broadcast interaction method and apparatus, and storage medium
CN109215629A (en) * 2018-11-22 2019-01-15 Oppo广东移动通信有限公司 Method of speech processing, device and terminal
CN109215629B (en) * 2018-11-22 2021-01-01 Oppo广东移动通信有限公司 Voice processing method and device and terminal
CN109697290A (en) * 2018-12-29 2019-04-30 咪咕数字传媒有限公司 A kind of information processing method, equipment and computer storage medium
CN112786026A (en) * 2019-12-31 2021-05-11 深圳市木愚科技有限公司 Parent-child story personalized audio generation system and method based on voice migration learning
CN113223493A (en) * 2020-01-20 2021-08-06 Tcl集团股份有限公司 Voice nursing method, device, system and storage medium

Similar Documents

Publication Publication Date Title
CN107093421A (en) A kind of speech simulation method and apparatus
JP6113302B2 (en) Audio data transmission method and apparatus
Liss et al. The effects of familiarization on intelligibility and lexical segmentation in hypokinetic and ataxic dysarthria
US9547642B2 (en) Voice to text to voice processing
Jovičić et al. Serbian emotional speech database: design, processing and evaluation
CN110136687B (en) Voice training based cloned accent and rhyme method
CN108831436A (en) A method of text speech synthesis after simulation speaker's mood optimization translation
US20180130462A1 (en) Voice interaction method and voice interaction device
CN113010138B (en) Article voice playing method, device and equipment and computer readable storage medium
WO2005093713A1 (en) Speech synthesis device
CN110675886A (en) Audio signal processing method, audio signal processing device, electronic equipment and storage medium
CN105448289A (en) Speech synthesis method, speech synthesis device, speech deletion method, speech deletion device and speech deletion and synthesis method
CN110390928A (en) It is a kind of to open up the speech synthesis model training method and system for increasing corpus automatically
CN111105776A (en) Audio playing device and playing method thereof
CN108986785B (en) Text recomposition method and device
JP6792091B1 (en) Speech learning system and speech learning method
CN109065019A (en) A kind of narration data processing method and system towards intelligent robot
CN106471569A (en) Speech synthesis apparatus, phoneme synthesizing method and its program
CN105303909B (en) A kind of methods, devices and systems based on vibration English learning
WO2023276539A1 (en) Voice conversion device, voice conversion method, program, and recording medium
JP6291808B2 (en) Speech synthesis apparatus and method
Bansal et al. Emotional Hindi speech database
CN109036373A (en) A kind of method of speech processing and electronic equipment
Westall et al. Speech technology for telecommunications
US20200175988A1 (en) Information providing method and information providing apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170825

RJ01 Rejection of invention patent application after publication