CN109460548B - Intelligent robot-oriented story data processing method and system - Google Patents

Intelligent robot-oriented story data processing method and system Download PDF

Info

Publication number
CN109460548B
CN109460548B CN201811154925.6A CN201811154925A CN109460548B CN 109460548 B CN109460548 B CN 109460548B CN 201811154925 A CN201811154925 A CN 201811154925A CN 109460548 B CN109460548 B CN 109460548B
Authority
CN
China
Prior art keywords
story
data
sound effect
text
calling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811154925.6A
Other languages
Chinese (zh)
Other versions
CN109460548A (en
Inventor
贾志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Guangnian Infinite Technology Co ltd
Original Assignee
Beijing Guangnian Infinite Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Guangnian Infinite Technology Co ltd filed Critical Beijing Guangnian Infinite Technology Co ltd
Priority to CN201811154925.6A priority Critical patent/CN109460548B/en
Publication of CN109460548A publication Critical patent/CN109460548A/en
Application granted granted Critical
Publication of CN109460548B publication Critical patent/CN109460548B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/04Electrically-operated educational appliances with audible presentation of the material to be studied

Abstract

The invention discloses a story data processing method and system for an intelligent robot. The method comprises the following steps: acquiring story text data; analyzing the story text data, wherein word segmentation processing and text recognition are carried out on the story text data; calling a story data processing model, carrying out sound effect matching on each text clip based on the analysis result of the story text data, and calling corresponding sound effect data when matched sound effects exist; and generating multi-mode data for telling story content according to the story text data and the sound effect data. Compared with the prior art, the method and the system can convert the story in the text form into multi-mode data with sound effect and capable of being displayed in a multi-mode, so that the user experience of a listener when telling the story is greatly improved.

Description

Intelligent robot-oriented story data processing method and system
Technical Field
The invention relates to the field of computers, in particular to a story data processing method and system for an intelligent robot.
Background
In the traditional daily life of human beings, reading characters is a main way for people to appreciate literary works. However, in certain specific scenarios, people also appreciate literary works by sound, e.g., listening to a comment, listening to a reciting, etc. Most often, children with inadequate literacy are often listened to by others' narration (listening to others telling a story).
With the continuous development of multimedia technology, more and more multimedia devices are applied to the daily life of human beings. With the support of multimedia technology, the body of the acoustic form of the literary works, in particular the storytelling, is gradually transformed to multimedia devices.
In general, storytelling using multimedia devices is usually manual storytelling in advance and recording audio files. The multimedia device simply plays the recorded audio file. With the development of computer technology, in order to simply and conveniently acquire a sound source, in the prior art, a method of converting text data into audio data is also adopted. Therefore, manual text recitation and recording are not needed, and story telling can be realized by using the multimedia equipment only by providing story text. However, the text-to-speech conversion is directly performed by using a computer technology, and only the direct conversion of text contents can be ensured, and the harmony of real people in story telling cannot be achieved, so that in the prior art, story telling based on the text conversion technology is quite dry and uninteresting, only direct word meanings can be simply conveyed, and the user experience is very poor.
Disclosure of Invention
In order to improve user experience, the invention provides a story data processing method facing an intelligent robot, which comprises the following steps:
acquiring story text data;
analyzing the story text data, wherein word segmentation processing and text recognition are carried out on the story text data;
calling a story data processing model, carrying out sound effect matching on each text clip based on the analysis result of the story text data, and calling corresponding sound effect data when matched sound effects exist;
and generating multi-mode data for telling story content according to the story text data and the sound effect data.
In an embodiment, the multimodal data comprises intelligent robot motion data, wherein:
and generating corresponding intelligent robot action data aiming at the sound effect data.
In one embodiment, generating multi-modal data for telling story content from the story text data and the sound effect data comprises:
converting the story text data into story voice data;
and fusing the story voice data and the sound effect data to generate story audio data.
In one embodiment, performing sound effect matching on each text segment, and calling corresponding sound effect data when there is a matched sound effect, includes:
judging whether sound effect description exists according to the semantics of the text fragment;
and when the sound effect description exists, calling sound effect data corresponding to the sound effect description.
In one embodiment:
parsing the story text data, including:
decomposing content elements of a story, and extracting story elements, wherein the story elements comprise story roles and/or story scenes;
determining a story element corresponding to each text fragment;
and performing sound effect matching on each text segment, and calling corresponding sound effect data when matched sound effects exist, wherein the sound effect matching comprises the following steps:
judging whether a sound effect corresponding to the story element corresponding to the text clip exists or not;
and calling sound effect data corresponding to the sound effect when the corresponding sound effect exists.
In one embodiment:
parsing the story text data, including:
decomposing content elements of a story, and extracting story elements, wherein the story elements comprise story types and/or story backgrounds;
calling corresponding sound effect data, including:
and limiting the selection range of the sound effect data based on the story elements.
In one embodiment, invoking corresponding sound effect data when there is a matching sound effect comprises:
determining the user identity, and calling a sound effect library matched with the child user when the user is the child user;
and calling the sound effect data from the sound effect library.
The invention also proposes a storage medium having stored thereon a program code enabling the implementation of the method according to any one of claims 1 to 7.
The invention also provides a story data processing system facing the intelligent robot, which comprises:
a text acquisition module configured to acquire story text data;
the text analysis module is configured to analyze the story text data, wherein the story text data is subjected to word segmentation processing and text recognition;
the sound effect processing module is configured to call a story data processing model, carry out sound effect matching on each text clip based on the analysis result of the story text data, and call corresponding sound effect data when matched sound effects exist;
a multi-modal story data generation module configured to generate multi-modal data for telling story content based on the story text data and the sound effect data.
The invention also provides an intelligent story machine, comprising:
the input acquisition module is configured to acquire multi-modal input of a user and confirm story requirements of the user;
the story data processing system in the embodiment is configured to obtain corresponding story text data according to the user story requirement, and generate the multi-modal data;
an output module configured to output the multimodal data to a user, wherein the dialogue and voice-over data with sound effects is played.
Compared with the prior art, the method and the system can convert the story in the text form into multi-mode data with sound effect and capable of being displayed in a multi-mode, so that the user experience of a listener when telling the story is greatly improved.
Additional features and advantages of the invention will be set forth in the description which follows. Also, some of the features and advantages of the invention will be apparent from the description, or may be learned by practice of the invention. The objectives and some of the advantages of the invention may be realized and attained by the process particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIGS. 1 and 3 are flow diagrams of methods according to embodiments of the invention;
FIG. 2 is a partial flow diagram of a method according to an embodiment of the invention;
FIG. 4 is a system architecture diagram according to an embodiment of the invention;
fig. 5 and 6 are schematic diagrams of story teller according to embodiments of the present invention.
Detailed Description
The following detailed description will be provided for the embodiments of the present invention with reference to the accompanying drawings and examples, so that the practitioner of the present invention can fully understand how to apply the technical means to solve the technical problems, achieve the technical effects, and implement the present invention according to the implementation procedures. It should be noted that, as long as there is no conflict, the embodiments and the features of the embodiments of the present invention may be combined with each other, and the technical solutions formed are within the scope of the present invention.
In the traditional daily life of human beings, reading characters is a main way for people to appreciate literary works. However, in certain specific scenarios, people also appreciate literary works by sound, e.g., listening to a comment, listening to a reciting, etc. Most often, children with inadequate literacy are often listened to by others' narration (listening to others telling a story).
With the continuous development of multimedia technology, more and more multimedia devices are applied to the daily life of human beings. With the support of multimedia technology, the body of the acoustic form of the literary works, in particular the storytelling, is gradually transformed to multimedia devices.
In general, storytelling using multimedia devices is usually manual storytelling in advance and recording audio files. The multimedia device simply plays the recorded audio file. With the development of computer technology, in order to simply and conveniently acquire a sound source, in the prior art, a method of converting text data into audio data is also adopted. Therefore, manual text recitation and recording are not needed, and story telling can be realized by using the multimedia equipment only by providing story text. However, the text-to-speech conversion is directly performed by using a computer technology, and only the direct conversion of text contents can be ensured, and the harmony of real people in story telling cannot be achieved, so that in the prior art, story telling based on the text conversion technology is quite dry and uninteresting, only direct word meanings can be simply conveyed, and the user experience is very poor.
In order to improve user experience, the invention provides a story data processing method facing an intelligent robot. In the method, the corresponding sound effect is added to the story text, so that the expressive force of story content is improved. Specifically, considering that the performance can only be improved if the sound effect and the story content are matched with each other, the unmatched sound effect cannot improve the expressive force of the story content, but affects the user experience. Therefore, in the method of the invention, when the sound effect is added, the story text is firstly analyzed, and what sound effect is matched with the story text is determined according to the analysis result.
Specifically, in a practical application scenario, an event usually accompanies a sound when occurring, and the source of the sound may be a character action in the event time, an environmental sound, and the like. In an embodiment of the method of the present invention, the sound effect mainly plays a role in simulating the sound that should occur when an event occurs, so as to give the user a feeling of being personally on the scene, thereby improving the expressive power of the story. Thus, in one embodiment, sound effect matching is used to determine what sound should be emitted during the course of the current story. Specifically, the current story progress is determined first, and the current character action, the current environment, and the like are determined. Therefore, in one embodiment, text recognition is performed on the story text, so as to determine the current story progress condition, and the corresponding sound effect is determined according to the current story progress condition.
Further, in a practical application scenario, the sound emitted by the device cannot be a constant sound with different events. That is, for stories, there may not be a single sound effect that matches all the way through. Along with the expansion of the story progress, the corresponding sound effects should also change with the change of the story content. Therefore, in the method, the story text is subjected to word segmentation, the story text is divided into a plurality of text segments, and sound effect matching is performed on each text segment.
Further, in a practical application scenario, not every story segment progresses to be expanded may be sounded. Therefore, in an embodiment, the step of sound effect matching further includes determining whether the corresponding sound effect exists in the current text segment.
The detailed flow of a method according to an embodiment of the invention is described in detail below based on the accompanying drawings, the steps shown in the flow chart of which can be executed in a computer system containing instructions such as a set of computer executable instructions. Although a logical order of steps is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
As shown in FIG. 1, in one embodiment, the method of the present invention comprises the steps of:
s110, acquiring story text data;
s120, analyzing the story text data, wherein the story text data is subjected to word segmentation processing and text recognition;
s130, calling a story data processing model, carrying out sound effect matching on each text clip based on the analysis result of story text data, and calling corresponding sound effect data when matched sound effects exist;
and S140, generating multi-mode data for telling story content according to the story text data and the sound effect data.
Further, in an embodiment, in order to further improve the expressive force of the story telling and improve the user experience, the finally generated multi-modal data further includes the intelligent robot action data. Therefore, when the intelligent robot tells a story, the sound effect can be played to improve the expressive force of story content, and the action display can be carried out according to the action data of the intelligent robot, so that the expressive force of story content is further improved.
Specifically, in an embodiment, the smart robot action data is matched with the sound effect data, specifically, in the process of generating the multi-modal data for telling the story content, the corresponding smart robot action data is generated according to the sound effect data. Therefore, the action of the intelligent robot can be prevented from being obtrusive, and the situation that the action of the intelligent robot interrupts story telling to influence user experience is avoided.
Further, in one embodiment, the telling of the story is performed based on a voice manner. Specifically, as shown in fig. 2, in one embodiment, in the process of generating multi-modal data for telling story content according to story voice data and sound effect data:
s210, converting the story text data into story voice data;
s220, combining the story voice data and the sound effect data to generate story audio data.
Thus, it is equivalent to including story audio data in the multimodal data. The process of the intelligent robot outputting the multi-mode data comprises the step of playing story audio data, and the playing of the story audio data is equivalent to playing corresponding sound effects while reading story texts.
Further, in one embodiment, the telling of the story is performed based on a text method. Specifically, the multi-modal data generated finally includes story text. The process of the intelligent robot outputting the multi-modal data comprises displaying the text and playing sound effects while displaying the text.
Further, in one embodiment, the telling of the story is performed based on text and voice. That is, the process of the intelligent robot outputting multimodal data involves presenting text while the story audio data is being played.
Further, in an embodiment, a story process corresponding to the current story text is determined according to the semantics of the story text, so as to determine whether the story text has a corresponding sound effect and what sound effect the story text specifically corresponds to.
Specifically, in an embodiment, in the process of performing sound effect matching on each text segment:
judging whether sound effect description exists according to the semantics of the text fragment;
and when the sound effect description exists, calling sound effect data corresponding to the sound effect description.
For example, the text segment "gate squeaky one sound is open," and "squeaky one sound" can be directly judged as the sound effect description according to the semantic meaning, so that the corresponding sound effect data is directly called.
Further, in an actual application scenario, the text segment of the story text data does not exist in isolation, and is correlated with the overall story progress, that is, in some application scenarios, although a single text segment does not contain an audio effect description, the story progress represented by the text segment has a corresponding audio effect. For example, the text passage "he thinks lying in bed" does not contain sound effect descriptions from a direct semantic analysis of the text passage, and therefore there is no matching sound effect. However, if the overall progress of the story corresponding to the text passage is considered, for example, if "he lies in bed to think" is happening at "things happen at a thunderstorm night", then the sound effect description of "he lies in bed to think" matches the thunderstorm meet "is described.
Therefore, in one embodiment, when sound effect matching is performed, the whole story is analyzed, the story progress state corresponding to the text clip is determined, and whether the text clip has the corresponding sound effect and what sound effect the text clip specifically corresponds to are judged from the perspective of the whole story.
Further, in consideration of the characteristics of computer analysis, in one embodiment, the text data of the story is analyzed in an element decomposition mode so as to realize the analysis of the whole story. Specifically, in one embodiment, the story is subjected to content element decomposition, and story elements are extracted, wherein the story elements comprise story roles and/or story scenes. And associating the text clip with the story elements, and judging whether the text clip has the corresponding sound effect and what sound effect the text clip specifically corresponds to according to the story elements associated with the text clip.
For example, the story scene associated with the text passage "he thinks while lying in bed" is "rainy night accompanied by thunder", and therefore, it can be judged that the sound effect matched with the text passage "he thinks while lying in bed" is a thunderstorm sound effect.
Specifically, in one embodiment, as shown in fig. 3, in the process of parsing the story text data:
s321, disassembling content elements of the story, and extracting story elements, wherein the story elements comprise story roles and/or story scenes;
s322, determining a story element corresponding to each text fragment;
in the process of sound effect matching for each text segment:
s330, judging whether a sound effect corresponding to the story element corresponding to the text clip exists or not;
s340, calling sound effect data corresponding to the sound effect when the corresponding sound effect exists.
Further, in practical application scenarios, the sounds that may be emitted in the field are very complicated and varied. In the story text data, the sound effect description is usually only a rough description, and there may be a plurality of matching sound effects for the same sound effect description. Therefore, in order to filter the sound effects and further improve the matching degree of the sound effects and the story itself, in one embodiment, the sound effects are further filtered according to the overall style of the story. Specifically, in an embodiment, the story style is determined first, and then the selection range of the sound effect data is limited according to the story style, so that in the process of sound effect matching, only the sound effect can be selected from the limited range, and the matching of the selected sound effect and the overall style of the story is ensured.
Specifically, in one embodiment:
parsing story text data, including:
decomposing content elements of the story, and extracting story elements, wherein the story elements comprise story types and/or story backgrounds;
calling corresponding sound effect data, including:
and limiting the selection range of the sound effect data based on the story elements.
Further, in a practical application scenario, different users have different senses of sound effects. That is, the atmosphere-uplifting effect of the same sound effect for different listeners is different. Therefore, in an embodiment, in order to improve the user experience as much as possible, different selection ranges of the sound-effect data are defined for different story listeners and/or different sound-effect matching strategies are selected for different story listeners. Namely, the corresponding sound effect data is selected according to the identity of the listener listening to the story.
Specifically, in one embodiment, when the story data processing model is called, the user identity is confirmed, and the story data processing model matched with the user identity is called.
Further, in practical application scenarios, different identity users focus on the difference of the perception of sound effect mainly on age difference. Therefore, in one embodiment, when the story music model is called, the age of the user is confirmed, and the selection range of the sound effect data and/or the sound effect data matching strategy matched with the age stage of the user are called.
Furthermore, when the sound effect perception characteristics are distinguished according to the age stages, sound effects preferred by many age stages have certain commonality. That is, the difference in sound effect feeling between some age stages is not very obvious, and many sound effects are commonly preferred by users of a plurality of age stages. However, the difference of sound effect perception characteristics of the children user groups is particularly prominent, and the sound effect accepted by children is obviously different from that accepted by other age groups. In particular, sound effects familiar and understood by many adults are unintelligible to children, and even sound effects that many adults are accustomed to are not suitable for children to listen to or receive. For example, children have low sound effect reception for rendering terrorist atmospheres.
Therefore, when the sound effect is selected incorrectly, the user experience of the child user is seriously degraded. Therefore, in one embodiment, the selection range of sound effect data and/or the sound effect data matching strategy are set for the child user. Specifically, in one embodiment, when the user is a child user, the child story data processing model is invoked.
Specifically, in an embodiment, invoking corresponding sound effect data when there is a matched sound effect includes:
determining the user identity, and calling a sound effect library matched with the child user when the user is the child user;
and calling sound effect data from the sound effect library of the matched child user.
Further, based on the method of the present invention, the present invention also provides a storage medium having stored thereon program codes that can implement the method of the present invention.
Furthermore, based on the method, the invention also provides a story data processing system facing the intelligent robot.
Specifically, as shown in fig. 4, in an embodiment, the system includes:
a text acquisition module 410 configured to acquire story text data;
a text parsing module 420 configured to parse the story text data, wherein the story text data is subjected to word segmentation processing and text recognition;
the sound effect processing module 430 is configured to invoke a story data processing model, perform sound effect matching for each text clip based on the analysis result of story text data, and invoke corresponding sound effect data when a matched sound effect exists;
a multimodal story data generating module 440 configured to generate multimodal data for telling story content from the story text data and the sound effect data.
Furthermore, based on the story data processing system provided by the invention, the invention also provides an intelligent story machine. Specifically, as shown in fig. 5, in an embodiment, the story teller includes:
an input acquisition module 510 configured to collect user multimodal input, confirming user story requirements;
a story data processing system 520 configured to obtain corresponding story text data according to a user story requirement, and generate multi-modal data;
an output module 530 configured to output multimodal data to a user, wherein dialogue and voice-over data with sound effects is played.
Specifically, as shown in fig. 6, in an embodiment, the story machine includes an intelligent device 610 and a cloud server 620, wherein:
cloud server 620 includes a story data processing system 630 and a sound effects database 640. The story data processing system 630 is configured to call a capability interface of the cloud server 620 to obtain story text data and analyze the story text data, select sound effect data from the sound effect database 640 of the cloud server 620, and generate and output multi-modal data. Specifically, each capability interface calls corresponding logic processing in the multi-modal data analysis process.
Specifically, in one embodiment, the capability interfaces of the cloud server 620 include a text recognition interface 621, a text/speech conversion interface 622, and an audio synthesis interface 623.
The smart device 610 includes a human-machine interaction input output module 611, a communication module 612, and a play module 613.
It should be noted that the intelligent device may be: tablet computer, children's wrist-watch, robot, cell-phone, story machine, draw this reading robot.
The human-computer interaction input/output module 611 is configured to obtain a control instruction of the user and determine a story listening requirement of the user.
The communication module 612 is configured to output the user story listening requirement acquired by the human-computer interaction input/output module 611 to the cloud server 620, and receive multi-modal data from the cloud server 620.
The playing module 613 is configured to play audio data (sound effect data or story audio data containing sound effects) in the multimodal data.
Specifically, in a specific application scenario, the human-computer interaction input/output module 611 obtains a control instruction of the user, and determines a story listening requirement of the user.
Communication module 612 sends the user story listening requirements to cloud server 620.
Cloud server 620 selects corresponding story text data based on user story listening requirements. The story data processing system in the cloud server 620 obtains and analyzes the story text data, selects background music data from the music database of the cloud server 620, and generates and outputs a story audio file.
The communication module 612 receives a story audio file sent by the cloud server 620;
the playing module 613 plays the story audio file received by the communication module 612.
It is to be understood that the disclosed embodiments of the invention are not limited to the particular structures, process steps, or materials disclosed herein but are extended to equivalents thereof as would be understood by those ordinarily skilled in the relevant arts. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.
Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, appearances of the phrase "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.
Although the embodiments of the present invention have been described above, the above description is only for the convenience of understanding the present invention, and is not intended to limit the present invention. There are various other embodiments of the method of the present invention. Various corresponding changes or modifications may be made by those skilled in the art without departing from the spirit of the invention, and these corresponding changes or modifications are intended to fall within the scope of the appended claims.

Claims (8)

1. A story data processing method facing an intelligent robot is characterized by comprising the following steps:
acquiring story text data;
analyzing the story text data, wherein word segmentation processing and text recognition are carried out on the story text data;
calling a story data processing model, carrying out sound effect matching on each text clip based on the analysis result of the story text data, and calling corresponding sound effect data when matched sound effects exist;
generating multi-mode data for telling story content according to the story text data and the sound effect data; the multimodal data comprises intelligent robot motion data, wherein: generating corresponding intelligent robot action data aiming at the sound effect data;
when the matched sound effect exists, calling a process corresponding to the sound effect data, wherein the process comprises the following steps:
determining the user identity, and calling a sound effect library matched with the child user when the user is the child user;
and calling the sound effect data from the sound effect library.
2. The method of claim 1, wherein generating multi-modal data for telling story content from the story text data and the sound-effect data comprises:
converting the story text data into story voice data;
and fusing the story voice data and the sound effect data to generate story audio data.
3. The method of claim 1, wherein performing sound effect matching for each text segment, and calling corresponding sound effect data when there is a matched sound effect comprises:
judging whether sound effect description exists according to the semantics of the text fragment;
and when the sound effect description exists, calling sound effect data corresponding to the sound effect description.
4. The method of claim 1, wherein:
parsing the story text data, including:
decomposing content elements of a story, and extracting story elements, wherein the story elements comprise story roles and/or story scenes;
determining a story element corresponding to each text fragment;
and performing sound effect matching on each text segment, and calling corresponding sound effect data when matched sound effects exist, wherein the sound effect matching comprises the following steps:
judging whether a sound effect corresponding to the story element corresponding to the text clip exists or not;
and calling sound effect data corresponding to the sound effect when the corresponding sound effect exists.
5. The method according to any one of claims 1 to 4, wherein:
parsing the story text data, including:
decomposing content elements of a story, and extracting story elements, wherein the story elements comprise story types and/or story backgrounds;
calling corresponding sound effect data, including:
and limiting the selection range of the sound effect data based on the story elements.
6. A storage medium having stored thereon program code for implementing the method according to any one of claims 1-5.
7. An intelligent robot-oriented story data processing system, the system comprising:
a text acquisition module configured to acquire story text data;
the text analysis module is configured to analyze the story text data, wherein the story text data is subjected to word segmentation processing and text recognition;
the sound effect processing module is configured to call a story data processing model, carry out sound effect matching on each text clip based on the analysis result of the story text data, and call corresponding sound effect data when matched sound effects exist;
when the matched sound effect exists, the sound effect processing module is also configured to call the process of the corresponding sound effect data through the following operations, including:
determining the user identity, and calling a sound effect library matched with the child user when the user is the child user;
calling the sound effect data from the sound effect library;
a multi-modal story data generation module configured to generate multi-modal data for telling story content from the story text data and the sound effect data; the multimodal data comprises intelligent robot motion data;
the multimodal story data generation module is further configured to: and generating corresponding intelligent robot action data aiming at the sound effect data.
8. An intelligent story machine, the story machine comprising:
the input acquisition module is configured to acquire multi-modal input of a user and confirm story requirements of the user;
the story data processing system of claim 7, configured to retrieve corresponding story text data in accordance with the user story requirements, generating the multimodal data;
an output module configured to output the multimodal data to a user, wherein dialog and voice-over data with sound effects are played.
CN201811154925.6A 2018-09-30 2018-09-30 Intelligent robot-oriented story data processing method and system Active CN109460548B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811154925.6A CN109460548B (en) 2018-09-30 2018-09-30 Intelligent robot-oriented story data processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811154925.6A CN109460548B (en) 2018-09-30 2018-09-30 Intelligent robot-oriented story data processing method and system

Publications (2)

Publication Number Publication Date
CN109460548A CN109460548A (en) 2019-03-12
CN109460548B true CN109460548B (en) 2022-03-15

Family

ID=65607262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811154925.6A Active CN109460548B (en) 2018-09-30 2018-09-30 Intelligent robot-oriented story data processing method and system

Country Status (1)

Country Link
CN (1) CN109460548B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109994000B (en) * 2019-03-28 2021-10-19 掌阅科技股份有限公司 Reading accompanying method, electronic equipment and computer storage medium
CN110534131A (en) * 2019-08-30 2019-12-03 广州华多网络科技有限公司 A kind of audio frequency playing method and system
CN112270198B (en) * 2020-10-27 2021-08-17 北京百度网讯科技有限公司 Role determination method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101653660A (en) * 2008-08-22 2010-02-24 鸿富锦精密工业(深圳)有限公司 Type biological device for automatically doing actions in storytelling and method thereof
CN105096932A (en) * 2015-07-14 2015-11-25 百度在线网络技术(北京)有限公司 Voice synthesis method and apparatus of talking book
CN106557298A (en) * 2016-11-08 2017-04-05 北京光年无限科技有限公司 Background towards intelligent robot matches somebody with somebody sound outputting method and device
CN107340865A (en) * 2017-06-29 2017-11-10 北京光年无限科技有限公司 Multi-modal virtual robot exchange method and system
CN107784355A (en) * 2017-10-26 2018-03-09 北京光年无限科技有限公司 The multi-modal interaction data processing method of visual human and system
CN107807734A (en) * 2017-09-27 2018-03-16 北京光年无限科技有限公司 A kind of interaction output intent and system for intelligent robot

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101653660A (en) * 2008-08-22 2010-02-24 鸿富锦精密工业(深圳)有限公司 Type biological device for automatically doing actions in storytelling and method thereof
CN105096932A (en) * 2015-07-14 2015-11-25 百度在线网络技术(北京)有限公司 Voice synthesis method and apparatus of talking book
CN106557298A (en) * 2016-11-08 2017-04-05 北京光年无限科技有限公司 Background towards intelligent robot matches somebody with somebody sound outputting method and device
CN107340865A (en) * 2017-06-29 2017-11-10 北京光年无限科技有限公司 Multi-modal virtual robot exchange method and system
CN107807734A (en) * 2017-09-27 2018-03-16 北京光年无限科技有限公司 A kind of interaction output intent and system for intelligent robot
CN107784355A (en) * 2017-10-26 2018-03-09 北京光年无限科技有限公司 The multi-modal interaction data processing method of visual human and system

Also Published As

Publication number Publication date
CN109460548A (en) 2019-03-12

Similar Documents

Publication Publication Date Title
CN110517689B (en) Voice data processing method, device and storage medium
JP6463825B2 (en) Multi-speaker speech recognition correction system
US6181351B1 (en) Synchronizing the moveable mouths of animated characters with recorded speech
CN109543021B (en) Intelligent robot-oriented story data processing method and system
CN107040452B (en) Information processing method and device and computer readable storage medium
JP2020034895A (en) Responding method and device
CN109460548B (en) Intelligent robot-oriented story data processing method and system
CN113010138B (en) Article voice playing method, device and equipment and computer readable storage medium
CN108885869A (en) The playback of audio data of the control comprising voice
CN109710799B (en) Voice interaction method, medium, device and computing equipment
JP2019015951A (en) Wake up method for electronic device, apparatus, device and computer readable storage medium
JP2012181358A (en) Text display time determination device, text display system, method, and program
CN109065019B (en) Intelligent robot-oriented story data processing method and system
CN112750187A (en) Animation generation method, device and equipment and computer readable storage medium
JP2000207170A (en) Device and method for processing information
CN109376145B (en) Method and device for establishing movie and television dialogue database and storage medium
CN110767233A (en) Voice conversion system and method
CN109065018B (en) Intelligent robot-oriented story data processing method and system
US20190088258A1 (en) Voice recognition device, voice recognition method, and computer program product
CN110992984B (en) Audio processing method and device and storage medium
CN110660393B (en) Voice interaction method, device, equipment and storage medium
CN113314096A (en) Speech synthesis method, apparatus, device and storage medium
CN109241331B (en) Intelligent robot-oriented story data processing method
CN116403583A (en) Voice data processing method and device, nonvolatile storage medium and vehicle
CN107967308B (en) Intelligent interaction processing method, device, equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant