CN109460548B

CN109460548B - Intelligent robot-oriented story data processing method and system

Info

Publication number: CN109460548B
Application number: CN201811154925.6A
Authority: CN
Inventors: 贾志强
Original assignee: Beijing Guangnian Infinite Technology Co ltd
Current assignee: Beijing Guangnian Infinite Technology Co ltd
Priority date: 2018-09-30
Filing date: 2018-09-30
Publication date: 2022-03-15
Anticipated expiration: 2038-09-30
Also published as: CN109460548A

Abstract

The invention discloses a story data processing method and system for an intelligent robot. The method comprises the following steps: acquiring story text data; analyzing the story text data, wherein word segmentation processing and text recognition are carried out on the story text data; calling a story data processing model, carrying out sound effect matching on each text clip based on the analysis result of the story text data, and calling corresponding sound effect data when matched sound effects exist; and generating multi-mode data for telling story content according to the story text data and the sound effect data. Compared with the prior art, the method and the system can convert the story in the text form into multi-mode data with sound effect and capable of being displayed in a multi-mode, so that the user experience of a listener when telling the story is greatly improved.

Description

Intelligent robot-oriented story data processing method and system

Technical Field

The invention relates to the field of computers, in particular to a story data processing method and system for an intelligent robot.

Background

In the traditional daily life of human beings, reading characters is a main way for people to appreciate literary works. However, in certain specific scenarios, people also appreciate literary works by sound, e.g., listening to a comment, listening to a reciting, etc. Most often, children with inadequate literacy are often listened to by others' narration (listening to others telling a story).

With the continuous development of multimedia technology, more and more multimedia devices are applied to the daily life of human beings. With the support of multimedia technology, the body of the acoustic form of the literary works, in particular the storytelling, is gradually transformed to multimedia devices.

In general, storytelling using multimedia devices is usually manual storytelling in advance and recording audio files. The multimedia device simply plays the recorded audio file. With the development of computer technology, in order to simply and conveniently acquire a sound source, in the prior art, a method of converting text data into audio data is also adopted. Therefore, manual text recitation and recording are not needed, and story telling can be realized by using the multimedia equipment only by providing story text. However, the text-to-speech conversion is directly performed by using a computer technology, and only the direct conversion of text contents can be ensured, and the harmony of real people in story telling cannot be achieved, so that in the prior art, story telling based on the text conversion technology is quite dry and uninteresting, only direct word meanings can be simply conveyed, and the user experience is very poor.

Disclosure of Invention

In order to improve user experience, the invention provides a story data processing method facing an intelligent robot, which comprises the following steps:

acquiring story text data;

analyzing the story text data, wherein word segmentation processing and text recognition are carried out on the story text data;

calling a story data processing model, carrying out sound effect matching on each text clip based on the analysis result of the story text data, and calling corresponding sound effect data when matched sound effects exist;

and generating multi-mode data for telling story content according to the story text data and the sound effect data.

In an embodiment, the multimodal data comprises intelligent robot motion data, wherein:

and generating corresponding intelligent robot action data aiming at the sound effect data.

In one embodiment, generating multi-modal data for telling story content from the story text data and the sound effect data comprises:

converting the story text data into story voice data;

and fusing the story voice data and the sound effect data to generate story audio data.

In one embodiment, performing sound effect matching on each text segment, and calling corresponding sound effect data when there is a matched sound effect, includes:

judging whether sound effect description exists according to the semantics of the text fragment;

and when the sound effect description exists, calling sound effect data corresponding to the sound effect description.

In one embodiment:

parsing the story text data, including:

decomposing content elements of a story, and extracting story elements, wherein the story elements comprise story roles and/or story scenes;

determining a story element corresponding to each text fragment;

and performing sound effect matching on each text segment, and calling corresponding sound effect data when matched sound effects exist, wherein the sound effect matching comprises the following steps:

judging whether a sound effect corresponding to the story element corresponding to the text clip exists or not;

and calling sound effect data corresponding to the sound effect when the corresponding sound effect exists.

In one embodiment:

parsing the story text data, including:

decomposing content elements of a story, and extracting story elements, wherein the story elements comprise story types and/or story backgrounds;

calling corresponding sound effect data, including:

and limiting the selection range of the sound effect data based on the story elements.

In one embodiment, invoking corresponding sound effect data when there is a matching sound effect comprises:

determining the user identity, and calling a sound effect library matched with the child user when the user is the child user;

and calling the sound effect data from the sound effect library.

The invention also proposes a storage medium having stored thereon a program code enabling the implementation of the method according to any one of claims 1 to 7.

The invention also provides a story data processing system facing the intelligent robot, which comprises:

a text acquisition module configured to acquire story text data;

the text analysis module is configured to analyze the story text data, wherein the story text data is subjected to word segmentation processing and text recognition;

the sound effect processing module is configured to call a story data processing model, carry out sound effect matching on each text clip based on the analysis result of the story text data, and call corresponding sound effect data when matched sound effects exist;

a multi-modal story data generation module configured to generate multi-modal data for telling story content based on the story text data and the sound effect data.

The invention also provides an intelligent story machine, comprising:

the input acquisition module is configured to acquire multi-modal input of a user and confirm story requirements of the user;

the story data processing system in the embodiment is configured to obtain corresponding story text data according to the user story requirement, and generate the multi-modal data;

an output module configured to output the multimodal data to a user, wherein the dialogue and voice-over data with sound effects is played.

Compared with the prior art, the method and the system can convert the story in the text form into multi-mode data with sound effect and capable of being displayed in a multi-mode, so that the user experience of a listener when telling the story is greatly improved.

Additional features and advantages of the invention will be set forth in the description which follows. Also, some of the features and advantages of the invention will be apparent from the description, or may be learned by practice of the invention. The objectives and some of the advantages of the invention may be realized and attained by the process particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIGS. 1 and 3 are flow diagrams of methods according to embodiments of the invention;

FIG. 2 is a partial flow diagram of a method according to an embodiment of the invention;

FIG. 4 is a system architecture diagram according to an embodiment of the invention;

fig. 5 and 6 are schematic diagrams of story teller according to embodiments of the present invention.

Detailed Description

The following detailed description will be provided for the embodiments of the present invention with reference to the accompanying drawings and examples, so that the practitioner of the present invention can fully understand how to apply the technical means to solve the technical problems, achieve the technical effects, and implement the present invention according to the implementation procedures. It should be noted that, as long as there is no conflict, the embodiments and the features of the embodiments of the present invention may be combined with each other, and the technical solutions formed are within the scope of the present invention.

In order to improve user experience, the invention provides a story data processing method facing an intelligent robot. In the method, the corresponding sound effect is added to the story text, so that the expressive force of story content is improved. Specifically, considering that the performance can only be improved if the sound effect and the story content are matched with each other, the unmatched sound effect cannot improve the expressive force of the story content, but affects the user experience. Therefore, in the method of the invention, when the sound effect is added, the story text is firstly analyzed, and what sound effect is matched with the story text is determined according to the analysis result.

Specifically, in a practical application scenario, an event usually accompanies a sound when occurring, and the source of the sound may be a character action in the event time, an environmental sound, and the like. In an embodiment of the method of the present invention, the sound effect mainly plays a role in simulating the sound that should occur when an event occurs, so as to give the user a feeling of being personally on the scene, thereby improving the expressive power of the story. Thus, in one embodiment, sound effect matching is used to determine what sound should be emitted during the course of the current story. Specifically, the current story progress is determined first, and the current character action, the current environment, and the like are determined. Therefore, in one embodiment, text recognition is performed on the story text, so as to determine the current story progress condition, and the corresponding sound effect is determined according to the current story progress condition.

Further, in a practical application scenario, the sound emitted by the device cannot be a constant sound with different events. That is, for stories, there may not be a single sound effect that matches all the way through. Along with the expansion of the story progress, the corresponding sound effects should also change with the change of the story content. Therefore, in the method, the story text is subjected to word segmentation, the story text is divided into a plurality of text segments, and sound effect matching is performed on each text segment.

Further, in a practical application scenario, not every story segment progresses to be expanded may be sounded. Therefore, in an embodiment, the step of sound effect matching further includes determining whether the corresponding sound effect exists in the current text segment.

The detailed flow of a method according to an embodiment of the invention is described in detail below based on the accompanying drawings, the steps shown in the flow chart of which can be executed in a computer system containing instructions such as a set of computer executable instructions. Although a logical order of steps is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

As shown in FIG. 1, in one embodiment, the method of the present invention comprises the steps of:

s110, acquiring story text data;

s120, analyzing the story text data, wherein the story text data is subjected to word segmentation processing and text recognition;

s130, calling a story data processing model, carrying out sound effect matching on each text clip based on the analysis result of story text data, and calling corresponding sound effect data when matched sound effects exist;

and S140, generating multi-mode data for telling story content according to the story text data and the sound effect data.

Further, in an embodiment, in order to further improve the expressive force of the story telling and improve the user experience, the finally generated multi-modal data further includes the intelligent robot action data. Therefore, when the intelligent robot tells a story, the sound effect can be played to improve the expressive force of story content, and the action display can be carried out according to the action data of the intelligent robot, so that the expressive force of story content is further improved.

Specifically, in an embodiment, the smart robot action data is matched with the sound effect data, specifically, in the process of generating the multi-modal data for telling the story content, the corresponding smart robot action data is generated according to the sound effect data. Therefore, the action of the intelligent robot can be prevented from being obtrusive, and the situation that the action of the intelligent robot interrupts story telling to influence user experience is avoided.

Further, in one embodiment, the telling of the story is performed based on a voice manner. Specifically, as shown in fig. 2, in one embodiment, in the process of generating multi-modal data for telling story content according to story voice data and sound effect data:

s210, converting the story text data into story voice data;

s220, combining the story voice data and the sound effect data to generate story audio data.

Thus, it is equivalent to including story audio data in the multimodal data. The process of the intelligent robot outputting the multi-mode data comprises the step of playing story audio data, and the playing of the story audio data is equivalent to playing corresponding sound effects while reading story texts.

Further, in one embodiment, the telling of the story is performed based on a text method. Specifically, the multi-modal data generated finally includes story text. The process of the intelligent robot outputting the multi-modal data comprises displaying the text and playing sound effects while displaying the text.

Further, in one embodiment, the telling of the story is performed based on text and voice. That is, the process of the intelligent robot outputting multimodal data involves presenting text while the story audio data is being played.

Further, in an embodiment, a story process corresponding to the current story text is determined according to the semantics of the story text, so as to determine whether the story text has a corresponding sound effect and what sound effect the story text specifically corresponds to.

Specifically, in an embodiment, in the process of performing sound effect matching on each text segment:

For example, the text segment "gate squeaky one sound is open," and "squeaky one sound" can be directly judged as the sound effect description according to the semantic meaning, so that the corresponding sound effect data is directly called.

Further, in an actual application scenario, the text segment of the story text data does not exist in isolation, and is correlated with the overall story progress, that is, in some application scenarios, although a single text segment does not contain an audio effect description, the story progress represented by the text segment has a corresponding audio effect. For example, the text passage "he thinks lying in bed" does not contain sound effect descriptions from a direct semantic analysis of the text passage, and therefore there is no matching sound effect. However, if the overall progress of the story corresponding to the text passage is considered, for example, if "he lies in bed to think" is happening at "things happen at a thunderstorm night", then the sound effect description of "he lies in bed to think" matches the thunderstorm meet "is described.

Therefore, in one embodiment, when sound effect matching is performed, the whole story is analyzed, the story progress state corresponding to the text clip is determined, and whether the text clip has the corresponding sound effect and what sound effect the text clip specifically corresponds to are judged from the perspective of the whole story.

Further, in consideration of the characteristics of computer analysis, in one embodiment, the text data of the story is analyzed in an element decomposition mode so as to realize the analysis of the whole story. Specifically, in one embodiment, the story is subjected to content element decomposition, and story elements are extracted, wherein the story elements comprise story roles and/or story scenes. And associating the text clip with the story elements, and judging whether the text clip has the corresponding sound effect and what sound effect the text clip specifically corresponds to according to the story elements associated with the text clip.

For example, the story scene associated with the text passage "he thinks while lying in bed" is "rainy night accompanied by thunder", and therefore, it can be judged that the sound effect matched with the text passage "he thinks while lying in bed" is a thunderstorm sound effect.

Specifically, in one embodiment, as shown in fig. 3, in the process of parsing the story text data:

s321, disassembling content elements of the story, and extracting story elements, wherein the story elements comprise story roles and/or story scenes;

s322, determining a story element corresponding to each text fragment;

in the process of sound effect matching for each text segment:

s330, judging whether a sound effect corresponding to the story element corresponding to the text clip exists or not;

s340, calling sound effect data corresponding to the sound effect when the corresponding sound effect exists.

Further, in practical application scenarios, the sounds that may be emitted in the field are very complicated and varied. In the story text data, the sound effect description is usually only a rough description, and there may be a plurality of matching sound effects for the same sound effect description. Therefore, in order to filter the sound effects and further improve the matching degree of the sound effects and the story itself, in one embodiment, the sound effects are further filtered according to the overall style of the story. Specifically, in an embodiment, the story style is determined first, and then the selection range of the sound effect data is limited according to the story style, so that in the process of sound effect matching, only the sound effect can be selected from the limited range, and the matching of the selected sound effect and the overall style of the story is ensured.

Specifically, in one embodiment:

parsing story text data, including:

decomposing content elements of the story, and extracting story elements, wherein the story elements comprise story types and/or story backgrounds;

calling corresponding sound effect data, including:

Further, in a practical application scenario, different users have different senses of sound effects. That is, the atmosphere-uplifting effect of the same sound effect for different listeners is different. Therefore, in an embodiment, in order to improve the user experience as much as possible, different selection ranges of the sound-effect data are defined for different story listeners and/or different sound-effect matching strategies are selected for different story listeners. Namely, the corresponding sound effect data is selected according to the identity of the listener listening to the story.

Specifically, in one embodiment, when the story data processing model is called, the user identity is confirmed, and the story data processing model matched with the user identity is called.

Further, in practical application scenarios, different identity users focus on the difference of the perception of sound effect mainly on age difference. Therefore, in one embodiment, when the story music model is called, the age of the user is confirmed, and the selection range of the sound effect data and/or the sound effect data matching strategy matched with the age stage of the user are called.

Furthermore, when the sound effect perception characteristics are distinguished according to the age stages, sound effects preferred by many age stages have certain commonality. That is, the difference in sound effect feeling between some age stages is not very obvious, and many sound effects are commonly preferred by users of a plurality of age stages. However, the difference of sound effect perception characteristics of the children user groups is particularly prominent, and the sound effect accepted by children is obviously different from that accepted by other age groups. In particular, sound effects familiar and understood by many adults are unintelligible to children, and even sound effects that many adults are accustomed to are not suitable for children to listen to or receive. For example, children have low sound effect reception for rendering terrorist atmospheres.

Therefore, when the sound effect is selected incorrectly, the user experience of the child user is seriously degraded. Therefore, in one embodiment, the selection range of sound effect data and/or the sound effect data matching strategy are set for the child user. Specifically, in one embodiment, when the user is a child user, the child story data processing model is invoked.

Specifically, in an embodiment, invoking corresponding sound effect data when there is a matched sound effect includes:

and calling sound effect data from the sound effect library of the matched child user.

Further, based on the method of the present invention, the present invention also provides a storage medium having stored thereon program codes that can implement the method of the present invention.

Furthermore, based on the method, the invention also provides a story data processing system facing the intelligent robot.

Specifically, as shown in fig. 4, in an embodiment, the system includes:

a text acquisition module 410 configured to acquire story text data;

a text parsing module 420 configured to parse the story text data, wherein the story text data is subjected to word segmentation processing and text recognition;

the sound effect processing module 430 is configured to invoke a story data processing model, perform sound effect matching for each text clip based on the analysis result of story text data, and invoke corresponding sound effect data when a matched sound effect exists;

a multimodal story data generating module 440 configured to generate multimodal data for telling story content from the story text data and the sound effect data.

Furthermore, based on the story data processing system provided by the invention, the invention also provides an intelligent story machine. Specifically, as shown in fig. 5, in an embodiment, the story teller includes:

an input acquisition module 510 configured to collect user multimodal input, confirming user story requirements;

a story data processing system 520 configured to obtain corresponding story text data according to a user story requirement, and generate multi-modal data;

an output module 530 configured to output multimodal data to a user, wherein dialogue and voice-over data with sound effects is played.

Specifically, as shown in fig. 6, in an embodiment, the story machine includes an intelligent device 610 and a cloud server 620, wherein:

cloud server 620 includes a story data processing system 630 and a sound effects database 640. The story data processing system 630 is configured to call a capability interface of the cloud server 620 to obtain story text data and analyze the story text data, select sound effect data from the sound effect database 640 of the cloud server 620, and generate and output multi-modal data. Specifically, each capability interface calls corresponding logic processing in the multi-modal data analysis process.

Specifically, in one embodiment, the capability interfaces of the cloud server 620 include a text recognition interface 621, a text/speech conversion interface 622, and an audio synthesis interface 623.

The smart device 610 includes a human-machine interaction input output module 611, a communication module 612, and a play module 613.

It should be noted that the intelligent device may be: tablet computer, children's wrist-watch, robot, cell-phone, story machine, draw this reading robot.

The human-computer interaction input/output module 611 is configured to obtain a control instruction of the user and determine a story listening requirement of the user.

The communication module 612 is configured to output the user story listening requirement acquired by the human-computer interaction input/output module 611 to the cloud server 620, and receive multi-modal data from the cloud server 620.

The playing module 613 is configured to play audio data (sound effect data or story audio data containing sound effects) in the multimodal data.

Specifically, in a specific application scenario, the human-computer interaction input/output module 611 obtains a control instruction of the user, and determines a story listening requirement of the user.

Communication module 612 sends the user story listening requirements to cloud server 620.

Cloud server 620 selects corresponding story text data based on user story listening requirements. The story data processing system in the cloud server 620 obtains and analyzes the story text data, selects background music data from the music database of the cloud server 620, and generates and outputs a story audio file.

The communication module 612 receives a story audio file sent by the cloud server 620;

the playing module 613 plays the story audio file received by the communication module 612.

It is to be understood that the disclosed embodiments of the invention are not limited to the particular structures, process steps, or materials disclosed herein but are extended to equivalents thereof as would be understood by those ordinarily skilled in the relevant arts. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.

Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, appearances of the phrase "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.

Although the embodiments of the present invention have been described above, the above description is only for the convenience of understanding the present invention, and is not intended to limit the present invention. There are various other embodiments of the method of the present invention. Various corresponding changes or modifications may be made by those skilled in the art without departing from the spirit of the invention, and these corresponding changes or modifications are intended to fall within the scope of the appended claims.

Claims

1. A story data processing method facing an intelligent robot is characterized by comprising the following steps:

acquiring story text data;

generating multi-mode data for telling story content according to the story text data and the sound effect data; the multimodal data comprises intelligent robot motion data, wherein: generating corresponding intelligent robot action data aiming at the sound effect data;

when the matched sound effect exists, calling a process corresponding to the sound effect data, wherein the process comprises the following steps:

and calling the sound effect data from the sound effect library.

2. The method of claim 1, wherein generating multi-modal data for telling story content from the story text data and the sound-effect data comprises:

converting the story text data into story voice data;

3. The method of claim 1, wherein performing sound effect matching for each text segment, and calling corresponding sound effect data when there is a matched sound effect comprises:

4. The method of claim 1, wherein:

parsing the story text data, including:

determining a story element corresponding to each text fragment;

5. The method according to any one of claims 1 to 4, wherein:

parsing the story text data, including:

calling corresponding sound effect data, including:

6. A storage medium having stored thereon program code for implementing the method according to any one of claims 1-5.

7. An intelligent robot-oriented story data processing system, the system comprising:

a text acquisition module configured to acquire story text data;

when the matched sound effect exists, the sound effect processing module is also configured to call the process of the corresponding sound effect data through the following operations, including:

calling the sound effect data from the sound effect library;

a multi-modal story data generation module configured to generate multi-modal data for telling story content from the story text data and the sound effect data; the multimodal data comprises intelligent robot motion data;

the multimodal story data generation module is further configured to: and generating corresponding intelligent robot action data aiming at the sound effect data.

8. An intelligent story machine, the story machine comprising:

the story data processing system of claim 7, configured to retrieve corresponding story text data in accordance with the user story requirements, generating the multimodal data;

an output module configured to output the multimodal data to a user, wherein dialog and voice-over data with sound effects are played.