CN107093421A

CN107093421A - A kind of speech simulation method and apparatus

Info

Publication number: CN107093421A
Application number: CN201710260306.4A
Authority: CN
Inventors: 王斌
Original assignee: Shenzhen Yifang Digital Technology Co Ltd
Current assignee: Shenzhen Yifang Digital Technology Co Ltd
Priority date: 2017-04-20
Filing date: 2017-04-20
Publication date: 2017-08-25

Abstract

The invention provides a kind of speech simulation method and apparatus, wherein method comprises the following steps：Obtain the voice data of user；The voice data is parsed, characteristic information and the preservation of the voice data is extracted；The analog audio data corresponding with the voice data is generated according to the characteristic information preserved；The analog audio data is played out.The present invention is parsed to voice by algorithm and then extracts characteristic, reuse with user's identical phoneme and intonation to interact or read aloud with user, speech simulation effect is good, similarity is high, speech tone is similar, improve the cordial feeling of human-computer interaction, it is to avoid existing speech simulation method, the common change of voice can only be accomplished, sound can not change, and similitude is low, it is impossible to the problem of adaptability when improving human-computer interaction and cordial feeling.

Description

A kind of speech simulation method and apparatus

Technical field

The present invention relates to voice signal technical field, more particularly to a kind of speech simulation method and apparatus.

Background technology

The material shell of voice, i.e. language, is the carrier of linguistic notation system.It is sent by the vocal organs of people, load Certain language meaning.The physical basis of voice mainly have pitch, loudness of a sound, the duration of a sound, tone color, and this is also to constitute the four of voice to want Element.

Voice is the sound of language, is the carrier of linguistic notation system.It is sent by the vocal organs of people, is loaded certain Language meaning.Language realizes its social function by voice.Language be the pronunciation and meaning combine notation, the sound of language and The meaning of language is closely connected, therefore, though language is a kind of sound, but has the area of essence with general sound Not.Voice is the sound with difference meaning function that human articulation's organ is sent, it is impossible to which voice is regarded as pure natural object Matter；Voice is the symbolism for most directly recording thinking activities, is the form of sound of language communication instrument.

The physical basis of voice mainly have pitch, loudness of a sound, the duration of a sound, tone color, and this is also four key elements for constituting voice.Pitch refers to The number of frequency of sound wave, i.e. vibrations per second；Loudness of a sound refers to the size of sonic wave amplitude；The duration of a sound refers to the acoustic vibration duration Length, also referred to as " duration "；Tone color refers to the characteristic and essence of sound, also referred to as " tonequality ".

The vocal organs and its active situation of people are the physiological foundations of voice.3 parts of the vocal organs of people point：

(1) respiratory apparatus, including lung, trachea and bronchus.Lung is the center of respiratory apparatus, is the base for producing voice power Plinth.

(2) larynx and vocal cords, they are the chatter bodies of pronunciation.

(3) oral cavity, pharyngeal cavity, nasal cavity, they are all the acoustic resonator of pronunciation.

The contact of voice and semanteme is that people arrange in long-term language practice, and the marriage relation of this pronunciation and meaning embodies Voice has an important social property.

Speech simulation improves certain cordial feeling and adaptability, but existing people's voice mould in interactive process Plan method, is common sound changing device, can only accomplish that channel model is carried out after being recognized according to voice is simulated, or can only adjust Word speed and intonation, tone color can not be mentioned in the same breath with being modeled the sound of people.In a word, existing speech simulation method, can only accomplish The common change of voice, sound can not change, and similitude is low, it is impossible to adaptability and cordial feeling when improving human-computer interaction.

The above is only used for auxiliary and understands technical scheme, does not represent and recognizes that the above is existing skill Art.

The content of the invention

It is a primary object of the present invention to provide a kind of speech simulation method and apparatus, it is intended in the existing voice mould of solution Plan method, can only accomplish the common change of voice, and sound can not change, and similitude is low, it is impossible to adaptability and parent when improving human-computer interaction The problem of cutting sense.

To solve the above problems, the present invention provides a kind of speech simulation method, comprise the following steps：

Obtain the voice data of user；

The voice data is parsed, characteristic information and the preservation of the voice data is extracted；

The analog audio data corresponding with the voice data is generated according to the characteristic information preserved；

The analog audio data is played out.

Preferably, before the voice data for obtaining user, in addition to：

Obtain the speech simulation solicited message of the user；

According to the speech simulation solicited message user corresponding with the user is set to identify, and with the user The corresponding memory space for being used to store audio user data of mark；

Prompting user starts to gather the voice data.

Preferably, it is described that the voice data is parsed, the characteristic information of the voice data is extracted, including：

After the voice data is obtained, each frame of the voice data is parsed；

The phoneme characteristic value corresponding with the voice data is extracted as characteristic information.

Preferably, after the speech simulation solicited message for obtaining the user, in addition to：

Whether judge the speech simulation solicited message of the user there is the user corresponding with the user to identify；

If so, recalling the analog audio data corresponding with user mark, and play out；

Identified if it is not, carrying out the user corresponding with the user according to speech simulation solicited message setting, And it is corresponding with the user mark the step of be used to store the memory space of audio user data.

Preferably, the characteristic information that the basis has been preserved generates the analogue audio frequency corresponding with the voice data Data, including：

Transfer the preset audio data that user's request is played；

The preset audio data are converted to according to the characteristic information preserved corresponding with the voice data The analog audio data.

In addition, to solve the above problems, the present invention a kind of speech simulation device is also provided, including：Acquisition module, extraction mould Block, generation module and playing module；

The acquisition module, the voice data for obtaining user；

The extraction module, for being parsed to the voice data, extracts the characteristic information of the voice data simultaneously Preserve；

The generation module, for generating the mould corresponding with the voice data according to the characteristic information preserved Intend voice data；

The playing module, for the analog audio data to be played out.

Preferably, in addition to：Setting module and reminding module；

The acquisition module, is additionally operable to obtain the speech simulation solicited message of the user；

The setting module, is marked for setting the user corresponding with the user according to the speech simulation solicited message Know, and the memory space that is used to store audio user data corresponding with user mark；

The reminding module, the voice data is gathered for pointing out user to start.

Preferably, in addition to：Parsing module；

The parsing module, for after the voice data is obtained, each frame of the voice data to be parsed；

The extraction module, is additionally operable to extract the phoneme characteristic value corresponding with the voice data and believes as feature Breath.

Preferably, in addition to：Judge module；

The judge module, for judging it is relative with the user whether the speech simulation solicited message of the user has The user's mark answered；

The playing module, is additionally operable to if so, recall the analog audio data corresponding with user mark, and Play out；

The setting module, is additionally operable to if it is not, carrying out described according to speech simulation solicited message setting and the use The corresponding user's mark in family, and it is corresponding with user mark for storing the memory space of audio user data Step.

Preferably, including：Transfer module and modular converter；

It is described to transfer module, for transferring the preset audio data that user's request is played；

The modular converter, for being converted to the preset audio data and institute according to the characteristic information preserved State the corresponding analog audio data of voice data.

The present invention provides a kind of speech simulation method and apparatus, and wherein method is carried out by the audio user data to acquisition Parse and characteristic information extraction, then the corresponding analog audio data of the voice data is generated by characteristic information, so that right Analog audio data is played out.The present invention voice is parsed by algorithm and then characteristic is extracted, reuse and User's identical phoneme and intonation are interacted or read aloud with user, and speech simulation effect is good, similarity is high, speech tone phase Seemingly, improve the cordial feeling of human-computer interaction, it is to avoid existing speech simulation method, the common change of voice can only be accomplished, sound can not Change, similitude is low, it is impossible to the problem of adaptability when improving human-computer interaction and cordial feeling.

Brief description of the drawings

Fig. 1 is the schematic flow sheet of an embodiment of speech simulation method of the present invention；

Fig. 2 is the schematic flow sheet of two embodiments of speech simulation method of the present invention；

Fig. 3 is the schematic flow sheet of three embodiments of speech simulation method of the present invention；

Fig. 4 is the schematic flow sheet of four embodiments of speech simulation method of the present invention；

Fig. 5 is the schematic flow sheet of five embodiments of speech simulation method of the present invention；

Fig. 6 is the high-level schematic functional block diagram of the embodiment of speech simulation device of the present invention.

The realization, functional characteristics and advantage of the object of the invention will be described further referring to the drawings in conjunction with the embodiments.

Embodiment

It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.

The present invention provides a kind of speech simulation method.

Reference picture 1, Fig. 1 is the schematic flow sheet of an embodiment of speech simulation method of the present invention.

In one embodiment, the speech simulation method includes：

Step S10, obtains the voice data of user；

It is to be appreciated that the material shell of voice, i.e. language, is the carrier of linguistic notation system.It by people pronunciation Organ is sent, and loads certain language meaning.The physical basis of voice mainly have pitch, loudness of a sound, the duration of a sound, tone color, and this is also Constitute four key elements of voice.

(2) larynx and vocal cords, they are the chatter bodies of pronunciation.

The mode for obtaining user speech can be to be recorded by microphone, or pass through mobile terminal and terminal connects Connect, obtain the voice messaging sent.

Step S20, is parsed to the voice data, extracts characteristic information and the preservation of the voice data；

The characteristic information of preservation, can be preserved in the form of Wave data, can also be by the audio with the shape of frame Formula enters between-line spacing preservation, in addition, characteristic information is saved into database, the database can be cloud database, and be used to obtain The device for taking family voice data is terminating machine, when deployed, and terminating machine obtains the audio-frequency information of user, is sent to high in the clouds；Cloud Hold the audio-frequency information of the user to getting to analyze, and extract its characteristic information, include intonation, accent, the language of voice The information such as speed, frequency.

Step S30, the analog audio frequency corresponding with the voice data is generated according to the characteristic information preserved According to；

High in the clouds generates corresponding analog audio data according to characteristic information.The analog audio data, can be will be default Existing audio file is changed, so as to generate the analog audio data similar to user speech intonation；Can also be generation one The form of kind of speech intonation, further interacting according to user and terminal device, the form that speech intonation is stated above is fed back. For example, father and mother carry out speech simulation in terminal, terminal carries out the transmission of audio file to high in the clouds, after high in the clouds is obtained, according to father and mother Audio file, the characteristic information corresponding with father and mother's sound is generated, further according to characteristic information generation with speech intonation form Analog audio data, and then when child and terminal carry out interactive voice, terminal can be interacted by the sound of father and mother.

Step S40, the analog audio data is played out.

The present invention provides a kind of speech simulation method, is parsed by the audio user data to acquisition and extracts feature Information, then the corresponding analog audio data of the voice data is generated by characteristic information, so as to enter to analog audio data Row is played.The present invention is parsed to voice by algorithm and then extracts characteristic, is reused and user's identical phoneme And intonation is interacted or read aloud with user, speech simulation effect is good, and similarity is high, speech tone is similar, improves man-machine Interactive cordial feeling, it is to avoid existing speech simulation method, can only accomplish the common change of voice, sound can not change, and similitude is low, The problem of adaptability when can not improve human-computer interaction and cordial feeling.

Present invention can apply to a variety of occasions of prenatal culture, early education, children education, children education etc., for passing through terminal-pair children The simulation of the sound of known such as father and mother, is made children obtain the audio played with father and mother's sound of terminal plays, for example, says event Thing, study etc., or interacted with children by the sound of people known to children, improve the cordial feeling of children's man-machine interaction.

Reference picture 2, Fig. 2 is the schematic flow sheet of two embodiments of speech simulation method of the present invention.

Based on an embodiment, before the step S10, in addition to：

Step S50, obtains the speech simulation solicited message of the user；

Solicited message is transmitted in terminal by user, speech simulation is made requests on.Button is for example triggered, language is opened Sound simulated technological process, or registration log in unique account number cipher, information of filing a request, so as to carry out next step operation.

Step S60, sets the user corresponding with the user according to the speech simulation solicited message and identifies, Yi Jiyu The user identifies the corresponding memory space for being used to store audio user data；

After the solicited message of user is got, terminal starts to be prepared speech simulation, first the voice for user Solicited message setting user's mark is simulated, solicited message can be log-on message, and user's unique mark is generated according to log-on message, It is corresponding with user.And then, set the memory space corresponding with user's mark, voice document, audio for depositing user Data etc..

Step S70, points out user to start to gather the voice data.

Pointed out by terminal to user, the collection of voice data can be carried out.The step can by voice, shake into Row prompting, also can carry out message notifying by mobile device.

Reference picture 3, Fig. 3 is the schematic flow sheet of three embodiments of speech simulation method of the present invention.

Based on an embodiment, in three embodiments, the step S20, including：

Step S21, after the voice data is obtained, each frame of the voice data is parsed；

Sound is actually a kind of waveform, and common MP3 is compressed format, it is necessary to which the file for being converted to uncompressed form is carried out Processing, such as windowsPCM files, that is, the wav file being commonly called as.The voice data of user is stored with WAV forms Afterwards, the waveform of the wav file is read, the mute part of two ends can be cut off first, clear band, also referred to as VAD is eliminated； Phonetic analysis is carried out again, process is analyzed, and is that sound is cut into single segment, is turned into a frame per a bit of, use movement Window function is realized.Can have overlapping between frame and frame, specifically, may be configured as having 25-10=15 between every 25 milliseconds of frame, every two frame Second it is overlapping.Referred to as frame length 25ms, frame moves 10ms framings.After framing, voice is reformed into as some segments.

Step S22, extracts the phoneme characteristic value corresponding with the voice data as characteristic information.

It is above-mentioned, factor characteristic value corresponding in each frame voice data is extracted, the factor characteristic value can include Wave character, is used as characteristic information.

Reference picture 4, Fig. 4 is the schematic flow sheet of four embodiments of speech simulation method of the present invention.

Based on two embodiments, in four embodiments, after the step S50, in addition to：

Whether step S80, judging the speech simulation solicited message of the user has the user corresponding with the user Mark；

Above-mentioned steps are, after speech simulation solicited message is obtained, the speech simulation solicited message first to the user Analyzed and judged, judge whether the user carried out speech simulation by terminal, be whether to preserve to believe with the request The corresponding user's mark of manner of breathing.This deterministic process, can carry out in terminal, also the solicited message of terminal can be sent into high in the clouds, from Carry out matching with the speech simulation solicited message in the database in high in the clouds.

Step S90, if so, recalling the analog audio data corresponding with user mark, and is played out；

When including the user mark corresponding with the speech simulation solicited message of the user in database, then no longer carry out Further speech data analysis, directly invokes the analog audio data corresponding with user's mark, according to analog audio data Play out or interacted with user.

If the user corresponding with the speech simulation solicited message of the user does not identify in the database of high in the clouds or terminal, The user for re-establishing the user is then needed to identify and divide memory space, preparation is further preserved to the voice of user.

Reference picture 5, Fig. 5 is the schematic flow sheet of five embodiments of speech simulation method of the present invention.

Based on an embodiment, the step S30, including：

Step S31, transfers the preset audio data that user's request is played；

When carrying out interactive voice, user is called to specify the preset audio data played.The preset audio data, Ke Yiwei Default audio file located at high in the clouds, such as programmed audio story, study content；Can also be to pass through algorithm Judge the instruction that user is sent by voice.This process can be program request, or according to speech simulation data playback phase The preset audio data file answered.

Step S32, is converted to the preset audio data and the audio number according to the characteristic information preserved According to the corresponding analog audio data.

According to characteristic information, the preset audio data of user's program request are converted into analog audio data, or user is sent The preset audio data transferred by algorithm accordingly by form of analog audio data of audio instructions fed back.For example, When father and mother by terminal carry out speech simulation, and generate after analog audio data, children interact with terminal, terminal simulation father Female sound carries out feedback.

The present invention also provides a kind of speech simulation device.

Reference picture 6, Fig. 6 is the module diagram of the embodiment of speech simulation device of the present invention.

In embodiment, the speech simulation device includes：

Acquisition module 10, extraction module 20, generation module 30, playing module 40, setting module 50, reminding module 60, sentence Disconnected module 70, parsing module 80, transfer module 90 and modular converter 100；

The acquisition module 10, the voice data for obtaining user；

The extraction module 20, for being parsed to the voice data, extracts the characteristic information of the voice data And preserve；

The generation module 30, for corresponding with the voice data according to the characteristic information generation preserved Analog audio data；

The playing module 40, for the analog audio data to be played out.

The acquisition module 10, is additionally operable to obtain the speech simulation solicited message of the user；

The setting module 50, for setting the user corresponding with the user according to the speech simulation solicited message Mark, and the memory space that is used to store audio user data corresponding with user mark；

The reminding module 60, the voice data is gathered for pointing out user to start.

The parsing module 80, for after the voice data is obtained, each frame of the voice data to be solved Analysis；

The extraction module 20, is additionally operable to extract the phoneme characteristic value corresponding with the voice data and believes as feature Breath.

The judge module 70, for judging whether the speech simulation solicited message of the user has and user's phase Corresponding user's mark；

The playing module 40, is additionally operable to if so, recall the analog audio data corresponding with user mark, And play out；

The setting module 50, be additionally operable to if it is not, carry out it is described according to the speech simulation solicited message setting with it is described The corresponding user's mark of user, and the memory space that is used to store audio user data corresponding with user mark The step of.

It is described to transfer module 90, for transferring the preset audio data that user's request is played；

The modular converter 100, for being converted to the preset audio data according to the characteristic information preserved The analog audio data corresponding with the voice data.

The present invention provides a kind of speech simulation device, passes through acquisition module 10, extraction module 20, generation module 30, broadcasting Module 40, setting module 50, reminding module 60, judge module 70, parsing module 80, transfer module 90 and modular converter 100 Cooperate, the audio user data to acquisition parse and characteristic information extraction, then generate the sound by characteristic information Frequency is according to corresponding analog audio data, so as to be played out to analog audio data.The present invention is entered by algorithm to voice Row parses and then extracts characteristic, reuses with user's identical phoneme and intonation to interact or read aloud with user, Speech simulation effect is good, similarity is high, speech tone is similar, improves the cordial feeling of human-computer interaction, it is to avoid existing voice Analogy method, can only accomplish the common change of voice, and sound can not change, and similitude is low, it is impossible to adaptability when improving human-computer interaction and The problem of cordial feeling.

The preferred embodiments of the present invention are these are only, are not intended to limit the scope of the invention, it is every to utilize this hair Equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills Art field, similarly includes within the scope of the present invention.

Claims

1. a kind of speech simulation method, it is characterised in that comprise the following steps：

Obtain the voice data of user；

The analog audio data is played out.

2. speech simulation method as claimed in claim 1, it is characterised in that before the voice data of the acquisition user, also wrap Include：

Obtain the speech simulation solicited message of the user；

The user mark corresponding with the user is set according to the speech simulation solicited message, and is identified with the user The corresponding memory space for being used to store audio user data；

Prompting user starts to gather the voice data.

3. speech simulation method as claimed in claim 1 or 2, it is characterised in that described to be parsed to the voice data, is carried The characteristic information of the voice data is taken, including：

After the voice data is obtained, each frame of the voice data is parsed；

4. speech simulation method as claimed in claim 2, it is characterised in that the speech simulation request letter of the acquisition user After breath, in addition to：

5. speech simulation method as claimed in claim 1, it is characterised in that the characteristic information generation that the basis has been preserved The analog audio data corresponding with the voice data, including：

Transfer the preset audio data that user's request is played；

The preset audio data are converted to the institute corresponding with the voice data according to the characteristic information preserved State analog audio data.

6. a kind of speech simulation device, it is characterised in that including：Acquisition module, extraction module, generation module and playing module；

The acquisition module, the voice data for obtaining user；

The extraction module, for being parsed to the voice data, extracts characteristic information and the preservation of the voice data；

The generation module, for generating the analog audio corresponding with the voice data according to the characteristic information preserved Frequency evidence；

The playing module, for the analog audio data to be played out.

7. speech simulation device as claimed in claim 6, it is characterised in that also include：Setting module and reminding module；

The setting module, is identified for setting the user corresponding with the user according to the speech simulation solicited message, And the memory space that is used to store audio user data corresponding with user mark；

8. speech simulation device as claimed in claim 7, it is characterised in that also include：Parsing module；

The extraction module, is additionally operable to extract the phoneme characteristic value corresponding with the voice data as characteristic information.

9. speech simulation device as claimed in claim 8, it is characterised in that also include：Judge module；

The judge module, for judging it is corresponding with the user whether the speech simulation solicited message of the user has User identifies；

The playing module, is additionally operable to if so, recalling the analog audio data corresponding with user mark, and carry out Play；

The setting module, is additionally operable to if it is not, carrying out described according to speech simulation solicited message setting and user's phase Corresponding user's mark, and the step that is used to store the memory space of audio user data corresponding with user mark Suddenly.

10. speech simulation device as claimed in claim 9, it is characterised in that including：Transfer module and modular converter；

The modular converter, for being converted to the preset audio data and the sound according to the characteristic information preserved Frequency is according to the corresponding analog audio data.