CN106557298A

CN106557298A - Background towards intelligent robot matches somebody with somebody sound outputting method and device

Info

Publication number: CN106557298A
Application number: CN201610982284.8A
Authority: CN
Inventors: 谢文静
Original assignee: Beijing Guangnian Wuxian Technology Co Ltd
Current assignee: Beijing Guangnian Wuxian Technology Co Ltd
Priority date: 2016-11-08
Filing date: 2016-11-08
Publication date: 2017-04-05

Abstract

The invention provides a kind of background towards intelligent robot matches somebody with somebody sound outputting method, which comprises the following steps：Judge the type of voice content to be exported；Obtain voice data is dubbed with the background of the type matching；Background is played while output voice content and dubs voice data.Background of the invention with sound outputting method can allow user machine is converted a text to voice experience it is truer, broadcastings that background is dubbed can allow people to have sensation on the spot in person, allow express it is more lively.

Description

Background towards intelligent robot matches somebody with somebody sound outputting method and device

Technical field

The present invention relates to field in intelligent robotics, specifically, be related to a kind of background towards intelligent robot dub it is defeated Go out method and device.

Background technology

Current robot chat is mainly, and computer, will be system to be exported using TTS technologies according to interactive result Text carries out voice conversion, then plays back again.However, this chat interactive mode can not allow user to feel true Experience.There can be experience on the spot in person to allow user, need a kind of interactive energy that can improve constantly intelligent robot Power enters the technical scheme of experience so as to lift user.

The content of the invention

It is an object of the invention to provide a kind of background towards intelligent robot solves above-mentioned skill with sound outputting method Art problem.In the method for the invention, which comprises the following steps：

Judge the type of voice content to be exported；

Obtain voice data is dubbed with the background of the type matching；

Background is played while output voice and dubs voice data.

Background towards intelligent robot of the invention matches somebody with somebody sound outputting method, it is preferred that in the same of output voice When and trigger condition meet in the case of play background dub voice data, wherein, trigger condition includes following several situations：

When the particular statement of user input is received, the broadcasting that background is dubbed is triggered；

Automatically the broadcasting beginning and ending time that background is dubbed is played in setting in systems；

Dub in the when broadcasting background for playing the corresponding voice of text data.

Background towards intelligent robot of the invention matches somebody with somebody sound outputting method, it is preferred that judging to be exported In the type step of voice content, according to current application, the type of voice content to be exported is judged.

Background towards intelligent robot of the invention matches somebody with somebody sound outputting method, it is preferred that by dialog interface Reception will export the corresponding text data of voice.

According to another aspect of the present invention, additionally provide a kind of background towards intelligent robot and dub output device, Described device is comprised the following steps：

Text data receiving unit, which is to receive the corresponding text data of voice to be exported, and analyzes the text The semanteme of data；

Background dubs search unit, and which is to the type belonging to the semantic content that represented according to the text data in data Matched background is searched in storehouse and dubs voice data；

Audio output unit, plays while output text data corresponding voice and in the case where trigger condition meets Background dubs voice data.

Background towards intelligent robot of the invention dubs output device, it is preferred that to export text The audio output list that background dubs voice data is played while data corresponding voice and in the case where trigger condition meets In unit, trigger condition includes following several situations：

Background towards intelligent robot of the invention dubs output device, it is preferred that to according to described The type belonging to semantic content that text data is represented is searched for matched background in data bank and dubs voice data Background is dubbed in search unit, also including judging unit, its to judge the corresponding sound-type of text data to be exported, with true Fixed matching background music.

Background towards intelligent robot of the invention dubs output device, it is preferred that by dialog interface Reception will export the corresponding text data of voice.

Present invention be advantageous in that, by realize the method for the present invention can greatly improve intelligence machine person to person it Between interaction capabilities, so as to lift the experience of user.Specifically, background of the invention can allow with sound outputting method and make The experience that user converts a text to voice to machine is truer, and the broadcasting that background is dubbed can allow people to have sense on the spot in person Feel, make expression more lively.

Other features and advantages of the present invention will be illustrated in the following description, also, partly be become from specification Obtain it is clear that or being understood by implementing the present invention.The purpose of the present invention and other advantages can be by specification, rights In claim and accompanying drawing, specifically noted structure is realizing and obtain.

Description of the drawings

Accompanying drawing is used for providing a further understanding of the present invention, and constitutes a part for specification, the reality with the present invention Apply example to be provided commonly for explaining the present invention, be not construed as limiting the invention.In the accompanying drawings：

Fig. 1 is the ensemble stream with sound outputting method according to the background towards intelligent robot of one embodiment of the present of invention Cheng Tu

Fig. 2 is the detailed stream with sound outputting method according to the background towards intelligent robot of one embodiment of the present of invention Cheng Tu；

Fig. 3 is tactile with sound outputting method towards the background of intelligent robot according to the triggering of one embodiment of the present of invention Send out process flow diagram flow chart；And

Fig. 4 is the structural frames that output device is dubbed according to the background towards intelligent robot of one embodiment of the present of invention Figure.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, the embodiment of the present invention is made below in conjunction with accompanying drawing Further describe in detail.

As shown in figure 1, which show carrying out the overview flow chart that background is dubbed according to the present invention.

In the method, text input is carried out first, for example, the defeated of user is obtained by text scanner of robot etc. Enter, or the input of user is obtained by way of screen touch.After robot obtains text input result, text point is carried out Analysis and phonetic synthesis.Voice output is carried out finally, the voice of output contains the result and the selected back of the body out of TTS process Scape is dubbed.The details of these technologies will hereafter be discussed in detail.

As shown in Fig. 2 which show a kind of overview flow chart of background towards intelligent robot with sound outputting method. Method starts from step S101.In this step, system judges the type of voice content to be exported.In intelligence machine person to person When interacting, it will usually the interactive instruction of receive user first, or when some conditions meet, actively send chat language Sound.Robot system of the invention internally receives the corresponding text data of voice to be exported, and analyzes the text The semanteme of data.For example, the semantic content for representing for obtaining text data by analysis is recitation of poems, children's stories etc..System root It is marked with label according to the different classifications of different voice contents.According to for voice content mark come judge be poem or Person's children's stories.

Preferably, it is when being analyzed to text data, further comprising the steps of：

Text structure detecting step, is processed according to punctuation mark, text normalization rule, participle and part-of-speech tagging, pause And making character fonts are detected to the text structure being input into；

The rhythm produces step, obtains the parameter for characterizing prosodic features according to the contextual information of text analyzing acquisition；

Unit selection step, according to phone string to be synthesized and its contextual information, prosodic features parameter, and in accordance with Specified criteria, selects one group of optimal voice unit to carry out waveform concatenation as synthesis unit from corpus.

In one embodiment, system can receive the corresponding text data of voice to be exported by dialog interface.

In the TTS of the present invention is processed, need to be analyzed text first.During beginning, system needs first to recognize word, Reasonable participle is carried out, and judges that there is pause etc. where.Machine pronunciation also needs to produce certain rhythm generation.Characterize the rhythm special The parameter levied includes such as fundamental frequency, duration and energy.And in the present invention data utilized by the generation rhythm are from text analyzing portion The contextual information for separately winning.

In TTS process, need to carry out Unit selection to select most suitable voice unit to carry out phonetic synthesis.Specifically Say, system is according to pinyin string (phone string) to be synthesized and its contextual information, prosodic information, it then follows a certain criterion, from One group of optimal voice unit is selected in corpus is used for waveform concatenation as synthesis unit, and criterion here is exactly to make certain in fact The value of one cost function is minimum.The value of this cost function will be affected by some factors, such as：The rhythm it is inconsistent, Different mismatch with context environmental of spectral difference etc..

Last processing module of tts system is Waveform composition unit.When Waveform composition is carried out, generally two kinds are adopted Strategy, one does not need prosody modification when being splicing, and another is to need prosody modification.

Processing procedure of the tts system from Text To Speech is described generally above.And in the present invention, through TTS process Voice afterwards directly might not be exported.Also need to ensuing process.

As shown in Fig. 2 in step s 102, obtain and voice data is dubbed with the background of the type matching.When previous The result obtained in step is that voice content is recitation of poems, then system can search the background matched with the poem in thesaurus Music.For example, after intelligent robot is by further analyzing semanteme, after substantially having understood the style of poem, by setting Labeling which is further marked.Then by search and the mark in mark word bank different in storage Corresponding background is dubbed.For example the music of magnificence will be equipped with for the recitation of poems of bold and unconstrained group.For example, poem content is to eulogize ancestral State, then by " love of the republic, I come up as snowflake day, red flag song, Long March symphony, Long March symphony, the army of volunteers are carried out Song, the Five-Starred Red Flag (the national flag of the People's Republic of China), the Yellow River piano concerto, the sound in township, the sound in township, the sound in township, ten send Red Army to dub in background music, youth China dubs in background music, yellow River work song, I and I motherland, Great Wall ballad, the Yellow River lead my hand, rivers and mountains is unlimited, climb snow mountain, same first song, the song in the Changjiang river " Scan in the word bank constituted etc. class song.If poem content is singer's emotional affection township feelings, by " white hair real mother, big Bie Shan, old father, the ballad of mother, mother, that be exactly me, Qianmen stall tea, dear Papa and Mama, sunset, in candle light Mother, recall the south of the River, think of one's home, pray within thousand, the moon over a fountain " etc. class song constitute word bank in scan for.

Storage background is dubbed the thesaurus of music and can be set up in many ways.For example can be with the wind according to melody itself Lattice are setting up music word bank.For example, violin theme word bank, symphony word bank, light music word bank, classic Gu fun storehouse etc. are set up Deng to adapt to the voice of plurality of kinds of contents.

The matched back of the body is searched out in data bank in the type according to belonging to the semantic content that text data is represented After scape dubs voice data, just exported.

In step s 103, it is preferred that system can be while output text data corresponding voice and in triggering Condition is played background in the case of meeting and dubs voice data.

So the content of output is dubbed with background and is matched, and user hears the voice of existing machine synthesis, also There is interesting to listen to melodious background music so that interactive experience more horn of plenty.

Background towards intelligent robot of the invention matches somebody with somebody sound outputting method, it is preferred that as shown in Fig. 2 defeated The step of background played while going out text data corresponding voice and in the case where trigger condition meets dubbing voice data In, trigger condition generally comprises following several situations：

For example, when carrying out background and dubbing trigger condition and judge, step S201 as shown in Figure 3, a kind of situation is, can be with When system receives the particular statement of user input, the broadcasting that background is dubbed just is triggered.That is, in this case, It is not to say that when Text To Speech conversion is exported, background will be exported simultaneously and be dubbed, but also need to the specific instruction of user To start.When judging to there is the particular statement of user, then trigger background and dub with speech text while playing.

When judging there is no the particular statement of user, then continue to determine whether that setting automatically background dubs rising for broadcasting Only time, step S202.If it is, according to the pre-set beginning and ending time carry out background dub it is synchronous with speech text Play.

When judgement system, the broadcasting beginning and ending time that background is dubbed is not played in setting automatically, then continued to determine whether artificial Selection function is carrying out the broadcasting that background is dubbed, step S203.If it is, background is carried out under conditions of artificial selection dubbing Synchronization with speech text is played.

Further, generate as no speech text can pass through different applications, for example, poem can pass through name Be that the application of " recitation of poems " is generated, and the application that children's stories can pass through entitled " children's story " generated, thus it is actual should With in, can pass through to determine the application of current operation, judge the type of voice content to be exported.

As the method for the present invention describes what is realized in computer systems.The computer system can for example be arranged In the control core processor of robot.For example, method described herein can be implemented as what is can performed with control logic Software, which is performed by the CPU in robot control system.Function as herein described can be implemented as being stored in non-transitory to be had Programmed instruction set in shape computer-readable medium.When implemented in this fashion, the computer program includes one group of instruction, When the group instruction is run by computer, which promotes computer to perform the method that can implement above-mentioned functions.FPGA can be temporary When or be permanently mounted in non-transitory tangible computer computer-readable recording medium, for example ROM chip, computer storage, Disk or other storage mediums.In addition to realizing except with software, logic as herein described can utilize discrete parts, integrated electricity What road and programmable logic device (such as, field programmable gate array (FPGA) or microprocessor) were used in combination programmable patrols Volume, or any other equipment being combined including them is embodying.All such embodiments are intended to fall under the model of the present invention Within enclosing.

Therefore, according to another aspect of the present invention, additionally provide a kind of background towards intelligent robot and dub output Device 300, as shown in Figure 4.The device is included with lower unit：

Text data receiving unit 301, which is to receive the corresponding text data of voice to be exported, and analyzes the text The semanteme of notebook data；

Background dubs search unit 302, and which exists to the type belonging to the semantic content that represented according to the text data Matched background is searched in data bank and dubs voice data；

Audio output unit 303, while output text data corresponding voice and in the case where trigger condition meets Play background and dub voice data.

Background towards intelligent robot of the invention dubs output device 300, it is preferred that to export text The audio output that background dubs voice data is played while notebook data corresponding voice and in the case where trigger condition meets In unit, trigger condition includes following several situations：

Background towards intelligent robot of the invention dubs output device 300, it is preferred that to according to institute State the type belonging to the semantic content of text data representative matched background is searched in data bank to dub voice data Background dub in search unit, also including judging unit, its to judge the corresponding sound-type of text data to be exported, with Determine matching background music.

Background towards intelligent robot of the invention dubs output device 300, it is preferred that by dialog box circle Face receives and will export the corresponding text data of voice.

Background towards intelligent robot of the invention dubs output device 300, it is preferred that to the text In the text data receiving unit 301 that data are analyzed, also include with lower unit：

Text structure detector unit, its to according to punctuation mark, text normalization rule, participle and part-of-speech tagging, stop Process and making character fonts to be input into text structure detect；

Rhythm generation unit, which obtains the ginseng for characterizing prosodic features to the contextual information that obtains according to text analyzing Number；

Unit selection unit, its to according to phone string to be synthesized and its contextual information, prosodic features parameter, And in accordance with specified criteria, from corpus, select one group of optimal voice unit to carry out waveform concatenation as synthesis unit.

By each embodiment of the present invention, can cause between computer and people, can be as interpersonal Exchanged by language.When TTS is played, while play background dubbing, the two combines, and makes computer language output trueer It is real and attractive.Dub the model-based optimization sound experience combined with TTS, it is necessary first to which output information is converted into using background Voice, is then selected to be dubbed with the background that TTS matches, background is dubbed and is combined with TTS, while play reception and registration giving people.Example Such as, when playing poem TTS, while playing the music matched with poem situation, the two matches and combines, and allows the people for listening to produce Sensation on the spot in person.

It should be understood that disclosed embodiment of this invention is not limited to ad hoc structure disclosed herein, process step Or material, and the equivalent substitute of these features that those of ordinary skill in the related art are understood should be extended to.Should also manage Solution, term as used herein are only used for the purpose for describing specific embodiment, and are not intended to limit.

" one embodiment " or " embodiment " mentioned in specification means special characteristic, the structure for describing in conjunction with the embodiments Or characteristic is included at least one embodiment of the present invention.Therefore, the phrase " reality that specification various places throughout occurs Apply example " or " embodiment " same embodiment might not be referred both to.

While it is disclosed that embodiment as above, but described content only to facilitate understand the present invention and adopt Embodiment, is not limited to the present invention.Technical staff in any the technical field of the invention, without departing from this On the premise of the disclosed spirit and scope of invention, any modification and change can be made in the formal and details implemented, But the scope of patent protection of the present invention, still must be defined by the scope of which is defined in the appended claims.

Claims

1. a kind of background towards intelligent robot matches somebody with somebody sound outputting method, it is characterised in that the method comprising the steps of：

Judge the type of voice content to be exported；

Obtain voice data is dubbed with the background of the type matching；

Background is played while output voice content and dubs voice data.

2. the background towards intelligent robot as claimed in claim 1 matches somebody with somebody sound outputting method, it is characterised in that in output voice While and trigger condition meet in the case of play background dub voice data, wherein, trigger condition includes following several Situation：

3. the background towards intelligent robot as claimed in claim 1 matches somebody with somebody sound outputting method, it is characterised in that judging defeated In the type step of the voice content for going out, the type of voice content to be exported is judged according to current application.

4. the background towards intelligent robot as claimed in claim 1 matches somebody with somebody sound outputting method, it is characterised in that by dialog box Interface receives and will export the corresponding text data of voice.

5. a kind of background towards intelligent robot dubs output device, it is characterised in that described device is included with lower unit：

Text data receiving unit, which is to receive the corresponding text data of voice to be exported, and analyzes the text data Semanteme；

Background dubs search unit, and which is to the type belonging to the semantic content that represented according to the text data in data bank The matched background of search dubs voice data；

Audio output unit, plays background while output text data corresponding voice and in the case where trigger condition meets Dub voice data.

6. the background towards intelligent robot as claimed in claim 5 dubs output device, it is characterised in that to export Play while text data corresponding voice and in the case where trigger condition meets background dub voice data audio frequency it is defeated Go out in unit, trigger condition includes following several situations：

7. the background towards intelligent robot as claimed in claim 6 dubs output device, it is characterised in that to basis The type belonging to semantic content that the text data is represented is searched for matched background in data bank and dubs audio frequency number According to background dub in search unit, also including judging unit, its to judge the corresponding sound-type of text data to be exported, To determine matching background music.

8. the background towards intelligent robot as claimed in claim 7 dubs output device, it is characterised in that by dialog box Interface receives and will export the corresponding text data of voice.