CN109460548A

CN109460548A - A kind of narration data processing method and system towards intelligent robot

Info

Publication number: CN109460548A
Application number: CN201811154925.6A
Authority: CN
Inventors: 贾志强
Original assignee: Beijing Guangnian Wuxian Technology Co Ltd
Current assignee: Beijing Guangnian Wuxian Technology Co Ltd
Priority date: 2018-09-30
Filing date: 2018-09-30
Publication date: 2019-03-12
Anticipated expiration: 2038-09-30
Also published as: CN109460548B

Abstract

The invention discloses a kind of narration data processing method and system towards intelligent robot.The described method includes: obtaining story text data；Parse the story text data, wherein word segmentation processing and text identification are carried out to the story text data；Narration data is called to handle model, the parsing result based on the story text data carries out audio matching for each text fragments, corresponding audio data are called when there are matched audio；The multi-modal data for telling about story content is generated according to the story text data and the audio data.Compared to the prior art, according to the method for the present invention and system, the story of textual form can be converted to multi-modal data with audio, multi-modal can showing, to greatly improve the user experience of listener when telling about story.

Description

A kind of narration data processing method and system towards intelligent robot

Technical field

The present invention relates to computer fields, and in particular to a kind of narration data processing method towards intelligent robot and is System.

Background technique

In traditional mankind's daily life, word read is the main path that people appreciate literary works.But at certain Under a little special scenes, people appreciate literary works also by sound, for example, listening to storytelling, listening to and read aloud.It is wherein most common , it is usually to pass through other people to tell about and (others is listened to tell a story) to listen to literature for the children that word read ability is inadequate Works.

With the continuous development of multimedia technology, the daily life that more and more multimedia equipments are applied to the mankind is worked as In.Under the support of multimedia technology, the broadcasting of the form of sound of literary works is especially told a story, main body gradually convert It has arrived on multimedia equipment.

It usually manually tells a story and recording audio file in advance in general, being told a story using multimedia equipment.Multimedia Equipment only plays the audio file recorded.With the development of computer technology, in order to simply and easily obtain sound source, existing Have in technology, additionally uses the mode for converting lteral data to audio data.In this way, there is no need to manually carry out word recitation And record, it is only necessary to story text be provided can realize and be told a story using multimedia equipment.But it is straight using computer technology The conversion of row Text To Speech is tapped into, can only guarantee the direct conversion of content of text, true man can not be accomplished when story is told about It is excellent in voice and affection, this is resulted in the prior art, and the story based on text conversion techniques tells about very dry and astringent barren, Zhi Nengjian Single direct text meaning of reception and registration, user experience are very poor.

Summary of the invention

In order to improve user experience, the present invention provides a kind of narration data processing method towards intelligent robot, institute The method of stating includes:

Obtain story text data；

Parse the story text data, wherein word segmentation processing and text identification are carried out to the story text data；

Call narration data handle model, the parsing result based on the story text data for each text fragments into The matching of row audio, calls corresponding audio data when there are matched audio；

The multi-modal data for telling about story content is generated according to the story text data and the audio data.

In one embodiment, the multi-modal data includes intelligent robot action data, in which:

Corresponding intelligent robot action data is generated for the audio data.

In one embodiment, it is generated according to the story text data and the audio data for telling about story content Multi-modal data, comprising:

Story voice data is converted by the story text data；

The story voice data and the audio data are merged, story audio data is generated.

In one embodiment, audio matching is carried out for each text fragments, calls and corresponds to when there are matched audio Audio data, comprising:

It is described according to the Semantic judgement of the text fragments with the presence or absence of audio；

When describing there are audio, the audio is called to describe corresponding audio data.

In one embodiment:

Parse the story text data, comprising:

Content element dismantling is carried out to story, extracts story element, the story element includes tale characters and/or story Scene；

Determine the corresponding story element of each text fragments；

Audio matching is carried out for each text fragments, corresponding audio data are called when there are matched audio, is wrapped It includes:

Judge whether there is the corresponding audio of corresponding with text fragments story element；

The corresponding audio data of the audio are called when there is corresponding audio.

In one embodiment:

Parse the story text data, comprising:

Content element dismantling is carried out to story, extracts story element, the story element includes story type and/or story Background；

Call corresponding audio data, comprising:

The selection range of the audio data is limited based on the story element.

In one embodiment, corresponding audio data are called when there are matched audio, comprising:

It determines user identity, when user is child user, calls the audio library of matching child user；

The audio data are called from the audio library.

The invention also provides a kind of storage medium, being stored on the storage medium be can be achieved as in claim 1-7 The program code of any one the method.

The narration data processing system towards intelligent robot that the invention also provides a kind of, the system comprises:

Text obtains module, is configured to obtain story text data；

Text resolution module is configured to parse the story text data, wherein carries out to the story text data Word segmentation processing and text identification；

Audio effect processing module is configured to call narration data processing model, the parsing based on the story text data As a result audio matching is carried out for each text fragments, corresponding audio data is called when there are matched audio；

Multi-modal narration data generation module is configured to raw according to the story text data and the audio data At the multi-modal data for telling about story content.

The invention also provides a kind of Intelligent story device, the Story machine includes:

Input obtains module, is configured to the acquisition multi-modal input of user, confirms user's story demand；

Narration data processing system as claimed in claim 9 is configured to according to user's story demand acquisition pair The story text data answered, generate the multi-modal data；

Output module is configured to export the multi-modal data to user, wherein plays the dialogue for having audio And aside voice data.

Compared to the prior art, according to the method for the present invention and system, the story of textual form can be converted to and is had Audio, the multi-modal data that multi-modal can show, to greatly improve the user experience of listener when telling about story.

Other feature or advantage of the invention will illustrate in the following description.Also, Partial Feature of the invention or Advantage will be become apparent by specification, or be appreciated that by implementing the present invention.The purpose of the present invention and part Advantage can be realized or be obtained by step specifically noted in the specification, claims and drawings.

Detailed description of the invention

Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention It applies example and is used together to explain the present invention, be not construed as limiting the invention.In the accompanying drawings:

Fig. 1 and Fig. 3 is flow chart according to the method for the embodiment of the present invention；

Fig. 2 is Part Methods flow chart according to an embodiment of the invention；

Fig. 4 is structure diagram according to the system in the embodiment of the present invention；

Fig. 5 and Fig. 6 is Story machine structure diagram according to an embodiment of the present invention.

Specific embodiment

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings and examples, implementation personnel of the invention whereby Can fully understand that how the invention applies technical means to solve technical problems, and reach technical effect realization process and according to The present invention is embodied according to above-mentioned realization process.As long as each embodiment it should be noted that do not constitute conflict, in the present invention And each feature in each embodiment can be combined with each other, be formed by technical solution protection scope of the present invention it It is interior.

In order to improve user experience, the narration data processing method towards intelligent robot that the invention proposes a kind of.? In method of the invention, corresponding audio is added for story text to improve the expressive force of story content.Specifically, considering Only audio is mutually matched to improve with story content and be demonstrated by, and unmatched audio cannot not only improve the table of story content Existing power, will affect user experience instead.Therefore, in the method for the invention, when adding audio, story text is carried out first Parsing, determines which type of audio and story text are matched according to parsing result.

Specifically, in practical application scene, it would generally be with sound when event occurs, the source of sound can be story Role action, ambient sound in time etc..In an embodiment of method of the invention, the main function of audio is simulation thing The sound that part occur when occurring to give user's feeling on the spot in person, and then improves the expressive force of story.Therefore, In one embodiment, audio matching is which type of sound should be able to be issued in determining current story process.Specifically, being Current story process is determined first, determines current role action, current environment etc..Therefore, in one embodiment, to event Thing text carries out text identification, so that it is determined that current story process situation, determines according to current story process situation and correspond to Audio.

Further, in practical application scene, along with different events, the sound issued is unlikely to be one one-tenth not Become.That is, being directed to story, it is impossible to the case where there are the matchings of single audio whole process.Along with the exhibition of story process It opens, corresponding audio should also change with the variation of story content.Therefore, in the method for the invention, for story text Story text is divided into multiple text fragments by this progress word segmentation processing, carries out audio matching for each text fragments respectively.

It further, is not that the expansion of each section of story process can all make a sound in practical application scene.Cause This, in one embodiment, audio match the step of further include judge current text fragments whether there is corresponding audio.

Next based on attached drawing detailed description detailed process according to the method for the embodiment of the present invention, in the flow chart of attached drawing The step of showing can execute in the computer system comprising such as a group of computer-executable instructions.Although in flow charts The logical order of each step is shown, but in some cases, it can be to be different from shown by sequence execution herein or retouch The step of stating.

As shown in Figure 1, in one embodiment, the method for the present invention includes the following steps:

S110 obtains story text data；

S120 parses story text data, wherein carries out word segmentation processing and text identification to story text data；

S130, calls narration data to handle model, and the parsing result based on story text data is directed to each text fragments Audio matching is carried out, corresponding audio data are called when there are matched audio；

S140 generates the multi-modal data for telling about story content according to story text data and audio data.

Further, in one embodiment, in order to further increase the expressive force that story is told about, user experience is promoted, It further include intelligent robot action data in the multi-modal data ultimately generated.In this way, when intelligent robot tells about story It waits, the expressive force of story content can be not only improved with play sound effect, but also can be according to intelligent robot action data Action demonstration is carried out, to further increase the expressive force of story content.

Specifically, in one embodiment, intelligent robot action data be mutually matched with audio data, specifically, During generating the multi-modal data for telling about story content, it is dynamic that corresponding intelligent robot is generated for audio data Make data.In this manner it is possible to avoid the lofty sense of the movement of intelligent robot, avoids intelligent machine human action from interrupting story and tells about, The generation for the case where influencing user experience.

Further, in one embodiment, telling about for story is carried out based on voice mode.Specifically, as shown in Fig. 2, In one embodiment, the mistake for telling about the multi-modal data of story content is being generated according to story voice data and audio data Cheng Zhong:

Story text data are converted story voice data by S210；

S220 merges story voice data and audio data, generates story audio data.

In this way, being equivalent in multi-modal data comprising story audio data.Intelligent robot exports multi-modal data Process just includes to play story audio data, and play story audio data and be equivalent to broadcasting while reading aloud story text Corresponding audio.

Further, in one embodiment, telling about for story is carried out based on text mode.Specifically, what is ultimately generated is more It include story text in modal data.The process that intelligent robot exports multi-modal data just includes text exhibition, and is being opened up Play sound effect while showing text.

Further, in one embodiment, telling about for story is carried out based on text and voice mode.That is, intelligence machine The process that people exports multi-modal data is included in text exhibition while playing story audio data.

Further, in one embodiment, according to the semanteme of story text determine the corresponding story of current story text into Journey, to judge whether it has corresponding audio and specifically correspond to which type of audio.

Specifically, in one embodiment, during matched for each text fragments progress audio:

It is described according to the Semantic judgement of text fragments with the presence or absence of audio；

When describing there are audio, audio is called to describe corresponding audio data.

For example, text fragments " door creakily open by a sound ", can directly judge " a creakily sound " as audio according to its semanteme Description, therefore call directly its corresponding audio data.

Further, in actual application scenarios, the text fragments of story text data be not it is isolated existing, It is to be mutually related with story whole progress, that is to say, that in certain application scenarios, although some individual text fragments is simultaneously It is described not comprising audio, but there are corresponding audios for the story process representated by it.For example, " he lies in bed to text fragments Upper thinking " from the analysis of the direct semantics of text segment, and does not include audio and describes, therefore also just there is no matchings Audio.But if it is considered that story whole progress corresponding to text segment, for example, " he lie on a bed thinking " is hair It is raw " rainy night in a lightning accompanied by peals of thunder, occurs for thing ", then " he lie on a bed thinking " should match " rain of lightning accompanied by peals of thunder " The description of this audio.

Therefore, in one embodiment, when carrying out audio matching, story is integrally analyzed, determines text fragments institute Corresponding story process status judges whether text fragments have corresponding audio and specific correspondence from the angle of story entirety Which type of audio.

Further, it is contemplated that the characteristics of computer analyzing, in one embodiment, to story by the way of element decomposition Text data parse realizing the analysis to story entirety.Specifically, in one embodiment, carrying out content member to story Element dismantling, extracts story element, story element includes tale characters and/or scenes for plot.By text fragments and story element into Row association, the story element according to associated by text fragments judge whether text fragments have corresponding audio and specifically correspond to Which type of audio.

Scenes for plot associated by " him lie on a bed thinking " is " rainy night of lightning accompanied by peals of thunder " for example, text fragments, therefore, It may determine that the matched audio of text fragments " he lie on a bed thinking " is thunderstorm audio.

Specifically, in one embodiment, as shown in figure 3, during parsing story text data:

S321 carries out content element dismantling to story, extracts story element, and story element includes tale characters and/or event Thing scene；

S322 determines the corresponding story element of each text fragments；

During matched for each text fragments progress audio:

S330 judges whether there is the corresponding audio of corresponding with text fragments story element；

S340 calls the corresponding audio data of audio when there is corresponding audio.

Further, in practical application scene, the sound that scene may issue is sufficiently complex changeable.And in story In text data, audio description is generally only a general description, is described for same audio, it is understood that there may be a variety of matchings Audio.Therefore, in order to screen audio, and the matching degree of audio and story itself is further improved, in one embodiment, Audio is screened also according to the whole style of story.Specifically, in one embodiment, it is first determined story style, then The selection range of audio data is limited according to story style, in this way, carry out audio it is matched during, can only be from having limited Audio is chosen in range, this guarantees the matchings of the audio selected and story whole style.

Specifically, in one embodiment:

Parse story text data, comprising:

Content element dismantling is carried out to story, extracts story element, story element includes story type and/or story back Scape；

Call corresponding audio data, comprising:

The selection range of audio data is limited based on story element.

Further, in practical application scene, different users is different the sense organ of audio.That is, The atmosphere that same audio plays different listeners is set off effect by contrast and is different.Therefore, in one embodiment, in order to the greatest extent may be used The raising user experience of energy limits the different selection ranges of audio data for different story listeners and/or for difference Story listener choose different audio matching strategies.That is, choosing corresponding audio data according to the identity of story listener.

Specifically, in one embodiment, when calling narration data processing model, confirming user identity, calling and user The narration data of identities match handles model.

Further, in practical application scene, different identity user is concentrated mainly in the difference of the impression to audio On age differences.Therefore, in one embodiment, when calling story music model, confirm age of user, calling and age of user The selection range and/or audio Data Matching strategy of the audio data of stage match.

Further, when distinguishing audio impression feature according to age level, the audio that many age levels are liked is With certain general character.That is, the audio between certain age levels is experienced, difference is not fairly obvious, and many audios are multiple ages What the user in stage liked jointly.But the audio of child user group impression feature difference is but especially prominent, children can receive Audio there are apparent differences with other age levels.Specifically, many adult audios be familiar with and understood are children without legal principle Solution, even, audios that many adults are accustomed to are not appropriate for children and listen to or receive.For example, children are terrified to rendering It is very low that the audio of atmosphere receives ability.

Therefore, when audio selects mistake, the user's body experience of child user can seriously be reduced.Therefore, in an embodiment In, for the selection range and/or audio Data Matching strategy of child user setting audio data.Specifically, in an embodiment In, when user is child user, call children stories data processing model.

Specifically, in one embodiment, corresponding audio data are called when there are matched audio, comprising:

Audio data are called from the audio library of matching child user.

Further, it based on method of the invention, the invention also provides a kind of storage medium, is stored on the storage medium There is the program code that method as described herein can be achieved.

Further, based on method of the invention, the invention also provides a kind of narration datas towards intelligent robot Processing system.

Specifically, as shown in figure 4, in one embodiment, system includes:

Text obtains module 410, is configured to obtain story text data；

Text resolution module 420 is configured to parsing story text data, wherein segment to story text data Processing and text identification；

Audio effect processing module 430 is configured to call narration data processing model, the parsing knot based on story text data Gynophore carries out audio matching to each text fragments, and corresponding audio data are called when there are matched audio；

Multi-modal narration data generation module 440 is configured to be generated according to story text data and audio data and use In the multi-modal data for telling about story content.

Further, it is based on narration data processing system proposed by the present invention, the invention also provides a kind of intelligent stories Machine.Specifically, as shown in figure 5, in one embodiment, Story machine includes:

Input obtains module 510, is configured to the acquisition multi-modal input of user, confirms user's story demand；

Narration data processing system 520 is configured to obtain corresponding story text data according to user's story demand, raw At multi-modal data；

Output module 530 is configured to export multi-modal data to user, wherein play the dialogue with audio and side White voice data.

Specifically, as shown in fig. 6, in one embodiment, Story machine includes smart machine 610 and cloud server 620, Wherein:

Cloud server 620 includes narration data processing system 630 and audio database 640.Narration data processing system System 630 is configured to that the ability interface of cloud server 620 is called to obtain story text data and parses, from cloud server 620 Audio database 640 in select audio data, generate and export multi-modal data.Specifically, multi-modal data resolving In each ability interface call corresponding logical process respectively.

Specifically, in one embodiment, the ability interface of cloud server 620 include text identification interface 621, text/ Voice translation interface 622, audio synthesize interface 623.

Smart machine 610 includes human-computer interaction input/output module 611, communication module 612 and playing module 613.

It should be noted that the smart machine can be with are as follows: tablet computer, children's wrist-watch, robot, mobile phone, Story machine, Draw this reading machine people.

Human-computer interaction input/output module 611 is configured to obtain the control instruction of user, determines that user's story listens to demand.

Communication module 612 is configured as output to the user's story that human-computer interaction input/output module 611 is got and listens to demand To cloud server 620, and receive the multi-modal data from cloud server 620.

Playing module 613 is configured to play audio data (audio data or the story comprising audio in multi-modal data Audio data).

Specifically, human-computer interaction input/output module 611 obtains the control instruction of user in a concrete application scene, Determine that user's story listens to demand.

User's story listening demand is sent cloud server 620 by communication module 612.

Cloud server 620 is based on user's story and listens to the corresponding story text data of demand selection.Cloud server 620 In narration data processing system obtain and story text data and parse, selected from the musical database of cloud server 620 Background music data generates and exports story audio file.

Communication module 612 receives the story audio file that cloud server 620 is sent；

Playing module 613 plays the received story audio file of communication module 612.

It should be understood that disclosed embodiment of this invention is not limited to specific structure disclosed herein, processing step Or material, and the equivalent substitute for these features that those of ordinary skill in the related art are understood should be extended to.It should also manage Solution, term as used herein is used only for the purpose of describing specific embodiments, and is not intended to limit.

" embodiment " mentioned in specification means that a particular feature, structure, or characteristic described in conjunction with the embodiments is included in In at least one embodiment of the present invention.Therefore, the phrase " embodiment " that specification various places throughout occurs might not Refer both to the same embodiment.

While it is disclosed that embodiment content as above but described only to facilitate understanding the present invention and adopting Embodiment is not intended to limit the invention.Method of the present invention can also have other various embodiments.Without departing substantially from In the case where essence of the present invention, those skilled in the art make various corresponding changes or change in accordance with the present invention Shape, but these corresponding changes or deformation all should belong to scope of protection of the claims of the invention.

Claims

1. a kind of narration data processing method towards intelligent robot, which is characterized in that the described method includes:

Obtain story text data；

Narration data is called to handle model, the parsing result based on the story text data carries out sound for each text fragments Effect matching, calls corresponding audio data when there are matched audio；

2. the method according to claim 1, wherein the multi-modal data includes intelligence machine human action number According to, in which:

Corresponding intelligent robot action data is generated for the audio data.

3. method according to claim 1 or 2, which is characterized in that according to the story text data and the audio Data generate the multi-modal data for telling about story content, comprising:

Story voice data is converted by the story text data；

4. method described in any one of claim 1 to 3, which is characterized in that carry out audio for each text fragments Matching, calls corresponding audio data when there are matched audio, comprising:

5. method according to any one of claims 1 to 4, it is characterised in that:

Parse the story text data, comprising:

Content element dismantling is carried out to story, extracts story element, the story element includes tale characters and/or story field Scape；

Determine the corresponding story element of each text fragments；

Audio matching is carried out for each text fragments, corresponding audio data are called when there are matched audio, comprising:

6. method according to any one of claims 1 to 5, it is characterised in that:

Parse the story text data, comprising:

Content element dismantling is carried out to story, extracts story element, the story element includes story type and/or story back Scape；

Call corresponding audio data, comprising:

The selection range of the audio data is limited based on the story element.

7. method described according to claim 1~any one of 6, which is characterized in that the calling pair when there are matched audio The audio data answered, comprising:

The audio data are called from the audio library.

8. a kind of storage medium, which is characterized in that being stored on the storage medium can be achieved such as any one of claim 1-7 The program code of the method.

9. a kind of narration data processing system towards intelligent robot, which is characterized in that the system comprises:

Text obtains module, is configured to obtain story text data；

Text resolution module is configured to parse the story text data, wherein segment the story text data Processing and text identification；

Audio effect processing module is configured to call narration data processing model, the parsing result based on the story text data Audio matching is carried out for each text fragments, corresponding audio data are called when there are matched audio；

Multi-modal narration data generation module is configured to be generated according to the story text data and the audio data and use In the multi-modal data for telling about story content.

10. a kind of Intelligent story device, which is characterized in that the Story machine includes:

Narration data processing system as claimed in claim 9 is configured to be obtained according to the user's story demand corresponding Story text data generate the multi-modal data；

Output module is configured to export the multi-modal data to user, wherein plays the dialogue with audio and side White voice data.