CN109065019A

CN109065019A - A kind of narration data processing method and system towards intelligent robot

Info

Publication number: CN109065019A
Application number: CN201810981546.8A
Authority: CN
Inventors: 贾志强
Original assignee: Beijing Guangnian Wuxian Technology Co Ltd
Current assignee: Beijing Guangnian Wuxian Technology Co Ltd
Priority date: 2018-08-27
Filing date: 2018-08-27
Publication date: 2018-12-21
Anticipated expiration: 2038-08-27
Also published as: CN109065019B

Abstract

The invention discloses a kind of narration data processing method and system towards intelligent robot.The described method includes: obtaining story text data；The story text data are parsed, identify the dialogue in story text and aside；Call narration data handle model, in the story text dialogue and aside carry out audio effect processing, generate have audio dialogue and aside data；The multi-modal data for matching the story text is generated and exports, the multi-modal data includes the dialogue and aside data for having audio.Compared to the prior art, according to the method for the present invention and system, the story of textual form can be converted to the multi-modal data that multi-modal can show and specific aim optimization is carried out to the ways of presentation of dialogue and aside in story, to greatly improve the user experience of listener when telling about story.

Description

A kind of narration data processing method and system towards intelligent robot

Technical field

The present invention relates to computer fields, and in particular to a kind of narration data processing method towards intelligent robot and is System.

Background technique

In traditional mankind's daily life, word read is the main path that people appreciate literary works.But at certain Under a little special scenes, people appreciate literary works also by sound, for example, listening to storytelling, listening to and read aloud.It is wherein most common , it is usually to pass through other people to tell about and (others is listened to tell a story) to listen to literature for the children that word read ability is inadequate Works.

With the continuous development of multimedia technology, the daily life that more and more multimedia equipments are applied to the mankind is worked as In.Under the support of multimedia technology, the broadcasting of the form of sound of literary works is especially told a story, main body gradually convert It has arrived on multimedia equipment.

It usually manually tells a story and recording audio file in advance in general, being told a story using multimedia equipment.Multimedia Equipment only plays the audio file recorded.With the development of computer technology, in order to simply and easily obtain sound source, existing Have in technology, additionally uses the mode for converting lteral data to audio data.In this way, there is no need to manually carry out word recitation And record, it is only necessary to story text be provided can realize and be told a story using multimedia equipment.But it is straight using computer technology The conversion of row Text To Speech is tapped into, can only guarantee the direct conversion of content of text, true man can not be accomplished when story is told about It is excellent in voice and affection, this is resulted in the prior art, and the story based on text conversion techniques tells about very dry and astringent barren, Zhi Nengjian Single direct text meaning of reception and registration, user experience are very poor.

Summary of the invention

The narration data processing method towards intelligent robot that the present invention provides a kind of, which comprises

Obtain story text data；

The story text data are parsed, identify the dialogue in story text and aside；

Call narration data handle model, in the story text dialogue and aside carry out audio effect processing, generate Dialogue and aside data with audio；

The multi-modal data for matching the story text is generated and exports, the multi-modal data includes described with audio Dialogue and aside data.

In one embodiment, the multi-modal data further includes intelligent robot action data, in which:

For in the story text dialogue and aside generate corresponding intelligent robot action data.

In one embodiment, the method also includes:

In conjunction with the dialogue with audio and aside data, in the story text dialogue and aside carry out text The conversion of word voice, generates the dialogue with audio and aside voice data；

The first voice data is converted by the text in the story text data in addition to dialogue and voice；

The fusion dialogue and aside voice data with audio and first voice data, generate story voice Data.

In one embodiment, narration data is called to handle model, to the dialogue and aside progress in the story text Audio effect processing, comprising:

Text identification is carried out to the story text, content element dismantling is carried out to story based on text identification result, is mentioned Take story element；

Determine that the audio of the matching dialogue and aside is special according to story element corresponding with the dialogue and aside Sign；

Convert the dialogue and aside to dialogue and the aside number matched described in the audio feature with audio According to.

In one embodiment, story element corresponding with the dialogue includes conversational character, conversation content, session context And/or the context of dialogue refers to.

In one embodiment, story element corresponding with the aside includes on aside content, aside environment and/or aside Hereinafter referred to.

The invention also provides a kind of storage medium, it is stored on the storage medium and method as described herein can be achieved Program code.

The narration data processing system towards intelligent robot that the invention also provides a kind of, the system comprises:

Text obtains module, is configured to obtain story text data；

Text resolution module is configured to parse the story text data, identifies the dialogue in story text and side It is white；

Narration data handles model library, is configured to save narration data processing model；

Audio effect processing module is configured to call the narration data processing model, to the dialogue in the story text And aside carries out audio effect processing, generates the dialogue and aside data for having audio；

Multi-modal narration data generation module is configured to generate and export the multi-modal number for matching the story text According to the multi-modal data includes the dialogue and aside data for having audio.

In one embodiment, the multi-modal narration data generation module further include:

First voice conversion unit is configured in conjunction with the dialogue with audio and aside data, to the story Dialogue and aside in text carry out text-to-speech conversion, generate the dialogue with audio and aside voice data；

Second voice conversion unit is configured to the text in the story text data in addition to dialogue and voice It is converted into the first voice data；

Speech synthesis unit is configured to the fusion dialogue and aside voice data and described first with audio Voice data generates story voice data.

The invention also provides a kind of Intelligent story device, the Story machine includes:

Input obtains module, is configured to the acquisition multi-modal input of user, receives user's story demand；

Narration data processing system as described in the present invention is configured to be obtained according to the user's story demand corresponding Story text data generate the multi-modal data；

Output module is configured to export the multi-modal data to user.

Compared to the prior art, according to the method for the present invention and system, can convert the story of textual form to can be more Multi-modal data that mode shows simultaneously carries out specific aim optimization to the ways of presentation of dialogue and aside in story, thus significantly Improve the user experience of listener when telling about story.

Other feature or advantage of the invention will illustrate in the following description.Also, Partial Feature of the invention or Advantage will be become apparent by specification, or be appreciated that by implementing the present invention.The purpose of the present invention and part Advantage can be realized or be obtained by step specifically noted in the specification, claims and drawings.

Detailed description of the invention

Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention It applies example and is used together to explain the present invention, be not construed as limiting the invention.In the accompanying drawings:

Fig. 1 and Fig. 2 is flow chart according to the method for the embodiment of the present invention；

Fig. 3 and Fig. 4 is structure diagram according to the system in the embodiment of the present invention；

Fig. 5 and Fig. 6 is Story machine structure diagram according to an embodiment of the present invention.

Specific embodiment

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings and examples, implementation personnel of the invention whereby Can fully understand that how the invention applies technical means to solve technical problems, and reach technical effect realization process and according to The present invention is embodied according to above-mentioned realization process.As long as each embodiment it should be noted that do not constitute conflict, in the present invention And each feature in each embodiment can be combined with each other, be formed by technical solution protection scope of the present invention it It is interior.

In view of the above-mentioned problems, the invention proposes a kind of narration data processing method towards intelligent robot.In this hair In bright method, the story of textual form is converted to the multi-modal data that multi-modal can show to improve the table of story content Existing power.

Further, in actual application scenarios, for the mankind when carrying out speech exchange, the sound that different people issues is not With, have the phonetic feature of sounder itself.In general, dialogue and aside, dialogue would generally be included in story text And aside can be considered as certain role in story and speak.Therefore, in one embodiment, in story dialogue and side It is white targetedly to add matched audio, so that the phonetic representation of dialogue and aside is more true lively, Jin Erti The vividness that high story is told about optimizes user experience.

Next based on attached drawing detailed description detailed process according to the method for the embodiment of the present invention, in the flow chart of attached drawing The step of showing can execute in the computer system comprising such as a group of computer-executable instructions.Although in flow charts The logical order of each step is shown, but in some cases, it can be to be different from shown by sequence execution herein or retouch The step of stating.

As shown in Figure 1, in one embodiment, method includes following below scheme:

S110 obtains story text data；

S120 parses the story text data, identifies the dialogue in story text and aside；

S131 calls narration data to handle model；

S132, in story text dialogue and aside carry out audio effect processing, generate with audio dialogue and aside Data；

S140, generates and exports the multi-modal data of matching story text, which includes that step S132 is generated The dialogue and aside data with audio.

Further, in one embodiment, it is mainly engaged in the dialogue by voice mode and the TTS of aside is exported, therefore, What the multi-modal data of final output was included is with audio, the dialogue and aside data that have been converted into voice.Specifically , in one embodiment, in conjunction with audio dialogue and aside data in story text dialogue and aside carry out text The conversion of word voice generates dialogue and aside voice data with audio.

Further, in order to which the vividness for further increasing story performance in one embodiment is not limited in using voice Mode tells about story, also shows dialogue and aside by text mode.Specifically, in one embodiment, multi-modal data packet Containing the dialogue and aside text data for having audio.

Further, in order to which the vividness for further increasing story performance in one embodiment is not limited in using voice And/or text mode tells about story.Specifically, in one embodiment, the multi-modal data that step S130 is generated further includes intelligence Robot motion data, in which:

For in story text dialogue and aside generate corresponding intelligent robot action data.

In this way, when intelligent robot progress story is told about, in the same of dialogue and aside data of its output with audio When can also assist with corresponding movement, to greatly improve the vividness that story is told about.

Further, in story text, in addition to dialogue and aside may also contain other content.In an embodiment In, the text in story text data in addition to dialogue and voice is also converted to voice data and and dialogue with audio And aside data fusion.Specifically, method further include:

In conjunction with audio dialogue and aside data in story text dialogue and aside carry out text-to-speech turn It changes and generates dialogue and aside voice data with audio；

The first voice data is converted by the text in story text data in addition to dialogue and voice；

Dialogue and aside data and first voice data of the fusion with audio, generate story voice data.

Further, in order to ensure the vividness that story shows can be improved in audio attached by dialogue and aside data, Rather than the audio for using mistake reduces story expressive force instead, in one embodiment, parses story text data, determines Story content determines dialogue and the corresponding audio of aside according to the particular content of story.

Specifically, in one embodiment, being parsed based on text recognition technique to story text data.Specifically, In one embodiment, story text data are parsed, comprising: text identification is carried out to story text data, determines story content.

Further, it is contemplated that the characteristics of computer analyzing, in one embodiment, to story by the way of element decomposition Text data is parsed.Specifically, in one embodiment, content element dismantling is carried out to story based on text identification result, Story element is extracted, story element includes style, personage and/or the dialogue of story.

Specifically, in one embodiment, narration data is called to handle model, in story text dialogue and aside into Row audio effect processing, comprising:

Text identification is carried out to story text, content element dismantling is carried out to story based on text identification result, extracts event Thing element；

The audio feature of Matching conversation and aside is determined according to story element corresponding with dialogue and aside；

Convert dialogue and aside to the dialogue and aside data with audio of matching audio feature.

Specifically, in one embodiment, as shown in Fig. 2, method includes following below scheme:

S210 obtains story text data；

S220 parses the story text data；

S221 carries out content element dismantling to story based on text identification result, extracts story element；

S222 identifies the dialogue in story text and aside；

S230 calls narration data to handle model；

S231 determines the audio feature of Matching conversation and aside according to story element corresponding with dialogue and aside；

S232 converts dialogue and aside to the dialogue and aside data with audio of matching audio feature.

Specifically, in one embodiment, parsing target is divided into specific several classes (several story elements), for each Story element carries out keyword extraction, and the keyword of extraction is saved as parsing result together with story element tags.

Further, in story text, according to the progress of story content, the description person of dialogue and aside, description Content and/or description background may be different.Therefore, in one embodiment, respectively according to corresponding to dialogue and aside Story element determines its corresponding audio.Specifically, in one embodiment, respectively for each sentence pair words and each sentence Aside determines its audio.

Specifically, in one embodiment, story element corresponding with dialogue includes conversational character, conversation content, dialogue ring Border and/or the context of dialogue refer to.

Specifically, in one embodiment, story element corresponding with aside includes aside content, aside environment and/or side White context refers to.

Further, it based on method of the invention, the invention also provides a kind of storage medium, is stored on the storage medium There is the program code that method as described herein can be achieved.

Further, based on method of the invention, the invention also provides a kind of narration datas towards intelligent robot Processing system.

Specifically, as shown in figure 3, in one embodiment, system includes:

Text obtains module 310, is configured to obtain story text data；

Text resolution module 320 is configured to parsing story text data, identifies the dialogue in story text and side It is white；

Narration data handles model library 341, is configured to save narration data processing model；

Audio effect processing module 340, be configured to call narration data processing model, in story text dialogue and side White carry out audio effect processing generates the dialogue and aside data for having audio；

Multi-modal narration data generation module 330 is configured to generate and export the multi-modal data of matching story text, Multi-modal data includes dialogue and aside data with audio.

Further, in one embodiment, multi-modal narration data generation module 330 is additionally configured in story text Dialogue and aside generate corresponding intelligent robot action data.

Further, in one embodiment, as shown in figure 4, multi-modal narration data generation module 430 further include:

Voice conversion unit 431 is configured to the text conversion in story text data in addition to dialogue and voice For the first voice data；

Voice conversion unit 432 is configured to combine dialogue and aside data with audio, to pair in story text Words and aside carry out text-to-speech conversion, generate the dialogue with audio and aside voice data；

Speech synthesis unit 433 is configured to dialogue and aside voice data and first voice of the fusion with audio Data generate story voice data.

Further, it is based on narration data processing system proposed by the present invention, the invention also provides a kind of intelligent stories Machine.Specifically, as shown in figure 5, in one embodiment, Story machine includes:

Input obtains module 510, is configured to the acquisition multi-modal input of user, receives user's story demand；

Narration data processing system 520 is configured to obtain corresponding story text data according to user's story demand, raw At multi-modal data；

Output module 530 is configured to export multi-modal data to user.

Specifically, in one embodiment, output module 530 includes broadcast unit, broadcast unit is configured to play with sound The dialogue of effect and aside voice data.

Specifically, as shown in fig. 6, in one embodiment, Story machine includes smart machine 610 and cloud server 620, Wherein:

Cloud server 620 includes narration data processing system 630.Narration data processing system 630 is configured to call cloud The ability interface of end server 620 obtains story text data and parses, and generates and exports the dialogue comprising having audio and side The multi-modal data of white data.Specifically, each ability interface during data dissection process of narration data processing system 630 Corresponding logical process is called respectively.

Specifically, in one embodiment, the ability interface of cloud server 620 include text identification interface 621, text/ Voice translation interface 622, audio synthesize interface 623.

Smart machine 610 includes human-computer interaction input/output module 611, communication module 612, playing module 613 and moves Make module 614.

Human-computer interaction input/output module 611 is configured to obtain the control instruction of user, determines that user's story listens to demand.

Communication module 612 is configured as output to the user's story that human-computer interaction input/output module 611 is got and listens to demand To cloud server 620, and receive the multi-modal data from cloud server 620.

Playing module 613 is configured to play dialogue with audio in multi-modal data and aside voice data or story Voice data.

Action module 614 is configured to make corresponding action row according to the intelligent robot action data in multi-modal data For.

Specifically, human-computer interaction input/output module 611 obtains the control instruction of user in a concrete application scene, Determine that user's story listens to demand.

User's story listening demand is sent cloud server 620 by communication module 612.

Cloud server 620 is based on user's story and listens to the corresponding story text data of demand selection.Cloud server 620 In narration data processing system obtain and story text data and parse, generate and export multi-modal data.The multi-modal data Comprising intelligent robot action data and story voice data, story voice data includes dialogue and aside language with audio Sound data.

Communication module 612 receives the multi-modal data that cloud server 620 is sent.

Playing module 613 plays the story voice data in the received multi-modal data of communication module 612.

Action module 614 makes corresponding action behavior according to the intelligent robot action data in multi-modal data.

It should be understood that disclosed embodiment of this invention is not limited to specific structure disclosed herein, processing step Or material, and the equivalent substitute for these features that those of ordinary skill in the related art are understood should be extended to.It should also manage Solution, term as used herein is used only for the purpose of describing specific embodiments, and is not intended to limit.

" embodiment " mentioned in specification means that a particular feature, structure, or characteristic described in conjunction with the embodiments is included in In at least one embodiment of the present invention.Therefore, the phrase " embodiment " that specification various places throughout occurs might not Refer both to the same embodiment.

While it is disclosed that embodiment content as above but described only to facilitate understanding the present invention and adopting Embodiment is not intended to limit the invention.Method of the present invention can also have other various embodiments.Without departing substantially from In the case where essence of the present invention, those skilled in the art make various corresponding changes or change in accordance with the present invention Shape, but these corresponding changes or deformation all should belong to scope of protection of the claims of the invention.

Claims

1. a kind of narration data processing method towards intelligent robot, which is characterized in that the described method includes:

Obtain story text data；

Call narration data to handle model, in the story text dialogue and aside carry out audio effect processing, generation has The dialogue and aside data of audio；

The multi-modal data for matching the story text is generated and exports, the multi-modal data includes pair for having audio Words and aside data.

2. the method according to claim 1, wherein the multi-modal data further includes intelligence machine human action number According to, in which:

3. the method according to claim 1, wherein the method also includes:

In conjunction with the dialogue with audio and aside data, in the story text dialogue and aside carry out text language Sound conversion, generates the dialogue with audio and aside voice data；

The fusion dialogue and aside voice data with audio and first voice data, generate story voice number According to.

4. according to the method described in claim 3, it is characterized in that, calling narration data to handle model, to the story text In dialogue and aside carry out audio effect processing, comprising:

Text identification is carried out to the story text, content element dismantling is carried out to story based on text identification result, extracts event Thing element；

The audio feature of the matching dialogue and aside is determined according to story element corresponding with the dialogue and aside；

It converts the dialogue and aside to and matches the dialogue and aside data that have audio described in the audio feature.

5. according to the method described in claim 4, it is characterized in that, story element corresponding with the dialogue includes dialogue angle Color, conversation content, session context and/or the context of dialogue refer to.

6. according to the method described in claim 4, it is characterized in that, story element corresponding with the aside includes in aside Hold, aside environment and/or aside context refer to.

7. a kind of storage medium, which is characterized in that being stored on the storage medium can be achieved such as any one of claim 1-6 The program code of the method.

8. a kind of narration data processing system towards intelligent robot, which is characterized in that the system comprises:

Text obtains module, is configured to obtain story text data；

Text resolution module is configured to parse the story text data, identifies the dialogue in story text and aside；

Audio effect processing module is configured to call the narration data processing model, in the story text dialogue and Aside carries out audio effect processing, generates the dialogue and aside data for having audio；

Multi-modal narration data generation module is configured to generate and export the multi-modal data for matching the story text, institute Stating multi-modal data includes the dialogue and aside data for having audio.

9. system according to claim 8, which is characterized in that the multi-modal narration data generation module further include:

First voice conversion unit is configured in conjunction with the dialogue with audio and aside data, to the story text In dialogue and aside carry out text-to-speech conversion, generate with audio dialogue and aside voice data；

Second voice conversion unit is configured to the text conversion in the story text data in addition to dialogue and voice For the first voice data；

Speech synthesis unit is configured to the fusion dialogue and aside voice data with audio and first voice Data generate story voice data.

10. a kind of Intelligent story device, which is characterized in that the Story machine includes:

Narration data processing system as claimed in claim 8 or 9 is configured to obtain correspondence according to the user's story demand Story text data, generate the multi-modal data；

Output module is configured to export the multi-modal data to user.