CN109241331B - Intelligent robot-oriented story data processing method - Google Patents

Intelligent robot-oriented story data processing method Download PDF

Info

Publication number
CN109241331B
CN109241331B CN201811114587.3A CN201811114587A CN109241331B CN 109241331 B CN109241331 B CN 109241331B CN 201811114587 A CN201811114587 A CN 201811114587A CN 109241331 B CN109241331 B CN 109241331B
Authority
CN
China
Prior art keywords
story
sound effect
data
text
word segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811114587.3A
Other languages
Chinese (zh)
Other versions
CN109241331A (en
Inventor
贾志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Guangnian Infinite Technology Co ltd
Original Assignee
Beijing Guangnian Infinite Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Guangnian Infinite Technology Co ltd filed Critical Beijing Guangnian Infinite Technology Co ltd
Priority to CN201811114587.3A priority Critical patent/CN109241331B/en
Publication of CN109241331A publication Critical patent/CN109241331A/en
Application granted granted Critical
Publication of CN109241331B publication Critical patent/CN109241331B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Toys (AREA)

Abstract

The invention discloses a story data processing method and system for an intelligent robot. The method comprises the following steps: acquiring story text data; analyzing the story text data, wherein text recognition and word segmentation processing are carried out on the story text data; performing sound effect data matching on a story text word segmentation result based on a story sound effect model, acquiring sound effect data corresponding to the story text word segmentation result and determining the mutual relation between the sound effect data and the text word segmentation result corresponding to the sound effect data; and based on the mutual relation, fusing the sound effect data and the story text data to generate and output story content audio data. Compared with the prior art, the method and the system can convert the story in the text form into the story content audio data with the sound effect, thereby greatly improving the user experience of a listener when telling the story.

Description

Intelligent robot-oriented story data processing method
Technical Field
The invention relates to the field of computers, in particular to a story data processing method for an intelligent robot.
Background
In the traditional daily life of human beings, reading characters is a main way for people to appreciate literary works. However, in certain specific scenarios, people also appreciate literary works by sound, e.g., listening to a comment, listening to a reciting, etc. Most often, children with inadequate literacy are often listened to by others' narration (listening to others telling a story).
With the continuous development of multimedia technology, more and more multimedia devices are applied to the daily life of human beings. With the support of multimedia technology, the body of the acoustic form of the literary works, in particular the storytelling, is gradually transformed to multimedia devices.
In general, storytelling using multimedia devices is usually manual storytelling in advance and recording audio files. The multimedia device simply plays the recorded audio file. With the development of computer technology, in order to simply and conveniently acquire a sound source, in the prior art, a method of converting text data into audio data is also adopted. Therefore, manual text recitation and recording are not needed, and story telling can be realized by using the multimedia equipment only by providing story text. However, the text-to-speech conversion is directly performed by using a computer technology, and only the direct conversion of text contents can be ensured, and the harmony of real people in story telling cannot be achieved, so that in the prior art, story telling based on the text conversion technology is quite dry and uninteresting, only direct word meanings can be simply conveyed, and the user experience is very poor.
Disclosure of Invention
In order to improve user experience, the invention provides a story data processing method facing an intelligent robot, which comprises the following steps:
acquiring story text data;
analyzing the story text data, wherein text recognition and word segmentation processing are carried out on the story text data;
performing sound effect data matching on a story text word segmentation result based on a story sound effect model, acquiring sound effect data corresponding to the story text word segmentation result and determining the mutual relation between the sound effect data and the text word segmentation result corresponding to the sound effect data;
and based on the mutual relation, fusing the sound effect data and the story text data to generate and output story content audio data.
In one embodiment:
the mutual relation comprises a story position corresponding to the sound effect data;
and fusing the sound effect data and the story text data to generate story content audio data, wherein the audio corresponding to the sound effect data is fused at a corresponding story position in the story content audio data.
In one embodiment, the sound effect data includes:
a sound effect tag, the sound effect tag comprising a sound effect type;
and/or the presence of a gas in the gas,
sound effect control data, the sound effect control data including sound effect duration.
In one embodiment:
analyzing the story text data, wherein a story type corresponding to the current story text data is obtained;
and performing sound effect data matching on the story text word segmentation result based on a story sound effect model, wherein the matched story sound effect model is called based on the story type.
In one embodiment, sound effect data matching is performed on a story text word segmentation result based on a story sound effect model, sound effect data corresponding to the story text word segmentation result is obtained, and matching detail description between the sound effect data and a text word segmentation result corresponding to the sound effect data is determined, wherein:
and selecting corresponding sound effect data from a sound effect library according to story elements corresponding to the text word segmentation result and/or the semantics of the text word segmentation result, and determining the matching detailed description between the sound effect data and the corresponding text word segmentation result.
The invention also proposes a storage medium on which a program code implementing the method according to the invention is stored.
The invention also provides a story data processing system facing the intelligent robot, which comprises:
a text acquisition module configured to acquire story text data;
the text analysis module is configured to analyze the story text data, wherein text recognition and word segmentation processing are carried out on the story text data;
the sound effect processing module is configured to perform sound effect data matching on story text word segmentation results based on a story sound effect model, acquire sound effect data corresponding to the story text word segmentation results and determine the mutual relationship between the sound effect data and the text word segmentation results corresponding to the sound effect data;
and the multi-mode story data generation module is configured to fuse the sound effect data and the story text data based on the mutual relation, generate story content audio data and output the story content audio data.
In one embodiment:
the text analysis module is also configured to acquire a story type corresponding to the current story text data;
the sound-effect processing module is further configured to invoke a matched story sound-effect model based on the story type.
In one embodiment:
the text parsing module is further configured to acquire story elements corresponding to the text word segmentation result and/or semantics of the text word segmentation result;
the sound effect processing module is also configured to select corresponding sound effect data from a sound effect library according to story elements corresponding to the text word segmentation result and/or the semantics of the text word segmentation result and determine the matching detail description between the sound effect data and the corresponding text word segmentation result.
The invention also provides an intelligent story machine, comprising:
the input acquisition module is configured to acquire multi-modal input of a user and confirm story requirements of the user;
the story data processing system is configured to acquire corresponding story text data according to the story requirements of the user and generate story content audio data;
an output module configured to output the story content audio data to a user.
Compared with the prior art, the method and the system can convert the story in the text form into the story content audio data with the sound effect, thereby greatly improving the user experience of a listener when telling the story.
Additional features and advantages of the invention will be set forth in the description which follows. Also, some of the features and advantages of the invention will be apparent from the description, or may be learned by practice of the invention. The objectives and some of the advantages of the invention may be realized and attained by the process particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow diagram of a method according to an embodiment of the invention;
FIGS. 2 and 3 are partial flow diagrams of methods according to embodiments of the invention;
FIG. 4 is a system architecture diagram according to an embodiment of the invention;
fig. 5 and 6 are schematic diagrams of story teller according to embodiments of the present invention.
Detailed Description
The following detailed description will be provided for the embodiments of the present invention with reference to the accompanying drawings and examples, so that the practitioner of the present invention can fully understand how to apply the technical means to solve the technical problems, achieve the technical effects, and implement the present invention according to the implementation procedures. It should be noted that, as long as there is no conflict, the embodiments and the features of the embodiments of the present invention may be combined with each other, and the technical solutions formed are within the scope of the present invention.
In the traditional daily life of human beings, reading characters is a main way for people to appreciate literary works. However, in certain specific scenarios, people also appreciate literary works by sound, e.g., listening to a comment, listening to a reciting, etc. Most often, children with inadequate literacy are often listened to by others' narration (listening to others telling a story).
With the continuous development of multimedia technology, more and more multimedia devices are applied to the daily life of human beings. With the support of multimedia technology, the body of the acoustic form of the literary works, in particular the storytelling, is gradually transformed to multimedia devices.
In general, storytelling using multimedia devices is usually manual storytelling in advance and recording audio files. The multimedia device simply plays the recorded audio file. With the development of computer technology, in order to simply and conveniently acquire a sound source, in the prior art, a method of converting text data into audio data is also adopted. Therefore, manual text recitation and recording are not needed, and story telling can be realized by using the multimedia equipment only by providing story text. However, the text-to-speech conversion is directly performed by using a computer technology, and only the direct conversion of text contents can be ensured, and the harmony of real people in story telling cannot be achieved, so that in the prior art, story telling based on the text conversion technology is quite dry and uninteresting, only direct word meanings can be simply conveyed, and the user experience is very poor.
In order to improve user experience, the invention provides a story data processing method facing an intelligent robot. In the method, the corresponding sound effect is matched for the story, the sound effect data and the story text data are fused to generate the story content audio data, and therefore the expressive force of the story content is improved.
Furthermore, in an actual application scene, the sound effect playing needs to be matched with a specific playing time, and the sound effect is played at an incorrect time, so that the expressive force of the story content cannot be improved, and the user experience of listening to the story can be reduced. Therefore, in the method of the invention, the sound effect data corresponding to the story text word segmentation result is obtained and the correlation between the sound effect data and the text word segmentation result corresponding to the sound effect data is determined. And then based on the mutual relation, fusing the sound effect data and the story text data, so that the final sound effect playing can achieve the perfect expression effect.
Compared with the prior art, the method and the system can convert the story in the text form into the story content audio data with the sound effect, thereby greatly improving the user experience of a listener when telling the story.
The detailed flow of a method according to an embodiment of the invention is described in detail below based on the accompanying drawings, the steps shown in the flow chart of which can be executed in a computer system containing instructions such as a set of computer executable instructions. Although a logical order of steps is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
As shown in fig. 1, in one embodiment, the method includes the following steps:
s110, acquiring story text data;
s120, analyzing the story text data, wherein text recognition and word segmentation processing are carried out on the story text data;
s130, performing sound effect data matching on the story text word segmentation result based on the story sound effect model, acquiring sound effect data corresponding to the story text word segmentation result (S131) and determining the mutual relation between the sound effect data and the text word segmentation result corresponding to the sound effect data (S132);
and S140, fusing the sound effect data and the story text data based on the mutual relation between the sound effect data and the text word segmentation result corresponding to the sound effect data, and generating and outputting story content audio data.
Specifically, in one embodiment, the story text is converted into story speech based on a text-to-speech conversion engine (TTS) and the sound effect audio in the sound effect data is fused in the story speech.
Further, in an embodiment, the sound effect data is fused with the story text data, and what position of the story text data the sound effect data is fused to is considered first. Specifically, in an embodiment, the relationship between the sound effect data and the corresponding text segmentation result includes a story position corresponding to the sound effect data. In the process of fusing the sound effect data and the story text data to generate story content audio data, audio corresponding to the sound effect data is fused at a corresponding story position in the story content audio data.
Further, in order to fuse the audio corresponding to the sound effect data to the corresponding story position in the story content audio data as perfectly as possible, in an embodiment, the playing time length of the audio needs to be referred to in the process of fusing the audio. Specifically, in one embodiment, the sound effect data includes:
a sound effect tag, wherein the sound effect tag comprises a sound effect type;
and/or the presence of a gas in the gas,
sound effect control data, the sound effect control data including a sound effect duration.
Further, in an embodiment, the sound effect data is fused with the story text data, and the way of fusing the sound effect data to the story text data is also considered. For example, it is possible to insert the effect audio directly into the story text, to replace a part of the story text with the effect audio, or to use the effect audio as background sound.
In one embodiment, the mode of fusing the sound effect data to the story text data is determined by the mutual relationship between the sound effect data and the corresponding text segmentation result. Specifically, in an embodiment, the correlation between the sound effect data and the text segmentation result corresponding thereto includes a matching detail description between the sound effect data and the text segmentation result corresponding thereto, and the manner in which the sound effect data is fused to the story text data is determined by the matching detail description between the sound effect data and the text segmentation result corresponding thereto.
Further, in order to ensure that the sound effect can improve the vividness of story expression instead of adopting an error sound effect to reduce the story expression, in an embodiment, the corresponding sound effect data is determined according to the semantics of the text word segmentation result.
Further, in one embodiment, the corresponding sound effects are determined based on the story elements. Specifically, in one embodiment, text recognition is performed on a story text, content elements are disassembled on the story based on a text recognition result, and story elements are extracted; and determining sound effect data matched with the text word segmentation result according to the story elements corresponding to the text word segmentation result.
Specifically, in an embodiment, the parsing target is divided into several specific categories (several story elements), keyword extraction is performed for each story element, and the extracted keywords and the story element tags are saved as a parsing result. In particular, in one embodiment, the story elements include story backgrounds, story characters, event content, event occurrence contexts, and/or event progression stages. For example, in a specific application scenario, "a class ring is sounded, and he rushes to go to and get away from the classroom and enter the classroom," when telling that the class ring is sounded, a school ring mixed with a story is played, thereby bringing a child into such a scenario.
Specifically, in an embodiment, the story text segmentation result is subjected to sound effect data matching based on the story sound effect model, and sound effect data corresponding to the story text segmentation result is obtained, wherein:
and selecting corresponding sound effect data from a sound effect library according to story elements corresponding to the text word segmentation result and/or the semantics of the text word segmentation result, and determining the matching detailed description between the sound effect data and the corresponding text word segmentation result.
Specifically, in an embodiment, as shown in fig. 2, the method includes the following steps:
s210, text recognition and word segmentation processing are carried out on story text data;
s220, performing content element disassembly on the story based on the text recognition result, and extracting story elements;
s230, matching story elements for the story text word segmentation result;
s240, calling a story sound effect model;
s250, selecting corresponding sound effect data from a sound effect library according to story elements corresponding to the text word segmentation result and/or the semantics of the text word segmentation result;
s260, determining a story position corresponding to the sound effect data;
s270, determining the matching detailed description between the sound effect data and the corresponding text word segmentation result.
Furthermore, in an actual application scenario, for different types of stories, the types of audiences are different, and correspondingly, listening preferences of the audiences are different, so that if user experience is to be improved as much as possible, corresponding sound effect selection strategies need to be adopted according to different story types, for example, for stories of forest adventure types, sound effect strategies which are animal-related and have tense sound effect style are preferentially adopted.
Specifically, in one embodiment, different styles of sound effects are employed, depending on the type of story. Furthermore, in an embodiment, the sound effects of the uniform style are adopted corresponding to the same story, so that the abrupt feeling generated when the sound effects of different styles are switched can be avoided, and the user experience is improved.
Further, in one embodiment, different story sound effect models are invoked corresponding to different types of stories, thereby ensuring that the finally matched sound effect data matches the story types. Specifically, in one embodiment:
analyzing the story text data, wherein a story type corresponding to the current story text data is obtained;
and performing sound effect data matching on the story text word segmentation result based on the story sound effect model, wherein the matched story sound effect model is called based on the story type.
Specifically, in an embodiment, as shown in fig. 3, the method includes the following steps:
s310, text recognition and word segmentation processing are carried out on story text data;
s320, performing content element disassembly on the story based on the text recognition result, and extracting story elements;
s330, determining the story type based on the text recognition result;
s340, matching story elements for the story text word segmentation result;
s350, calling a story sound effect model matched with the story type;
s360, selecting corresponding sound effect data from a sound effect library according to story elements corresponding to the text word segmentation result and/or the semantics of the text word segmentation result;
s370, determining a story position corresponding to the sound effect data;
and S380, determining the matching detailed description between the sound effect data and the corresponding text word segmentation result.
Further, based on the method of the present invention, the present invention also provides a storage medium having stored thereon program codes that can implement the method of the present invention.
Furthermore, based on the method, the invention also provides a story data processing system facing the intelligent robot.
Specifically, as shown in fig. 4, in an embodiment, the system includes:
a text acquisition module 410 configured to acquire story text data;
a text parsing module 420 configured to parse the story text data, wherein the story text data is subjected to text recognition and word segmentation;
the sound effect processing module 430 is configured to perform sound effect data matching on the story text segmentation result based on the story sound effect model, acquire sound effect data corresponding to the story text segmentation result and determine the correlation between the sound effect data and the text segmentation result corresponding to the sound effect data;
and the multi-modal story data generation module 440 is configured to fuse the sound effect data and the story text data based on the correlation between the sound effect data and the text segmentation result corresponding to the sound effect data, and generate and output story content audio data.
Specifically, in one embodiment, sound-effect processing module 430 is configured to invoke a story sound-effect model from story sound-effect model library 431, and select matched sound-effect data from sound-effect library 432 by using the story sound-effect model.
Further, in one embodiment:
the text parsing module 420 is further configured to obtain a story type corresponding to the current story text data;
the sound effects processing module 430 is further configured to invoke the matched story sound effects model based on the story type.
Further, in one embodiment:
the text parsing module 420 is further configured to obtain a story element corresponding to the text segmentation result and/or semantics of the text segmentation result;
the sound effect processing module 430 is further configured to select corresponding sound effect data from the sound effect library according to the story element corresponding to the text segmentation result and/or the semantic meaning of the text segmentation result, and determine the matching detail description between the sound effect data and the text segmentation result corresponding to the sound effect data.
Furthermore, based on the story data processing system provided by the invention, the invention also provides an intelligent story machine. Specifically, as shown in fig. 5, in an embodiment, the story teller includes:
an input acquisition module 510 configured to collect user multimodal input, confirming user story requirements;
a story data processing system 520 configured to obtain corresponding story text data according to a user story requirement, and generate story content audio data;
an output module 530 configured to output the story content audio data to a user.
Specifically, in one embodiment, the output module 530 includes a playing unit configured to play the story content audio data.
Specifically, as shown in fig. 6, in an embodiment, the story machine includes an intelligent device 610 and a cloud server 620, wherein:
cloud server 620 includes a story data processing system 630. The story data processing system 630 is configured to invoke a capability interface of the cloud server 620 to obtain story text data and analyze the story text data, and generate and output story content audio data containing merged sound effects. Specifically, each capability interface of the story data processing system 630 calls a corresponding logic process during the data parsing process.
Specifically, in an embodiment, the capability interfaces of the cloud server 620 include a text segmentation interface 624, a text recognition interface 621, a text/speech conversion interface 622, and a sound effect synthesis interface 623.
The intelligent device 610 includes a human-computer interaction input and output module 611, a communication module 612 and a playing module 613.
The smart device 610 may be a tablet computer, a robot, a mobile phone, a story machine, a picture book reading robot.
The human-computer interaction input/output module 611 is configured to obtain a control instruction of the user and determine a story listening requirement of the user.
The communication module 612 is configured to output the user story listening requirement acquired by the human-computer interaction input/output module 611 to the cloud server 620, and receive multi-modal data from the cloud server 620.
The playing module 613 is configured to play the story content audio data.
Specifically, in a specific application scenario, the human-computer interaction input/output module 611 obtains a control instruction of the user, and determines a story listening requirement of the user.
Communication module 612 sends the user story listening requirements to cloud server 620.
Cloud server 620 selects corresponding story text data based on user story listening requirements. The story data processing system in cloud server 620 obtains and parses story text data, and generates and outputs story content audio data.
The communication module 612 receives story content audio data sent by the cloud server 620.
The playing module 613 plays the story content audio data received by the communication module 612.
It is to be understood that the disclosed embodiments of the invention are not limited to the particular structures, process steps, or materials disclosed herein but are extended to equivalents thereof as would be understood by those ordinarily skilled in the relevant arts. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.
Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, appearances of the phrase "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.
Although the embodiments of the present invention have been described above, the above description is only for the convenience of understanding the present invention, and is not intended to limit the present invention. There are various other embodiments of the method of the present invention. Various corresponding changes or modifications may be made by those skilled in the art without departing from the spirit of the invention, and these corresponding changes or modifications are intended to fall within the scope of the appended claims.

Claims (8)

1. A story data processing method facing an intelligent robot is characterized by comprising the following steps:
acquiring story text data;
analyzing the story text data, wherein text recognition and word segmentation processing are carried out on the story text data;
performing sound effect data matching on a story text word segmentation result based on a story sound effect model, acquiring sound effect data corresponding to the story text word segmentation result and determining the mutual relation between the sound effect data and the text word segmentation result corresponding to the sound effect data;
based on the mutual relation, the sound effect data and the story text data are fused to generate and output story content audio data;
in the process of carrying out sound effect data matching on story text word segmentation results based on a story sound effect model, obtaining sound effect data corresponding to the story text word segmentation results and determining matching detail description between the sound effect data and the corresponding text word segmentation results, wherein:
decomposing content elements of the story based on the text recognition result, and extracting story elements; determining sound effect data matched with the text word segmentation result according to the story elements corresponding to the text word segmentation result;
selecting corresponding sound effect data from a sound effect library according to story elements corresponding to the text word segmentation result and/or the semantics of the text word segmentation result, and determining matching detail description between the sound effect data and the corresponding text word segmentation result;
specifically, the target to be analyzed is divided into several specific story elements, keyword extraction is performed on each story element, and the extracted keywords and story element labels are stored as analysis results; the story elements include story backgrounds, story characters, event content, event occurrence contexts, and/or event progression stages.
2. The method of claim 1, wherein:
the mutual relation comprises a story position corresponding to the sound effect data;
and fusing the sound effect data and the story text data to generate story content audio data, wherein the audio corresponding to the sound effect data is fused at a corresponding story position in the story content audio data.
3. The method of claim 1, wherein the sound effect data comprises:
a sound effect tag, the sound effect tag comprising a sound effect type;
and/or the presence of a gas in the gas,
sound effect control data, the sound effect control data including sound effect duration.
4. The method of claim 1, wherein:
analyzing the story text data, wherein a story type corresponding to the current story text data is obtained;
and performing sound effect data matching on the story text word segmentation result based on a story sound effect model, wherein the matched story sound effect model is called based on the story type.
5. A storage medium having stored thereon program code for implementing the method according to any one of claims 1-4.
6. An intelligent robot-oriented story data processing system, the system comprising:
a text acquisition module configured to acquire story text data;
the text analysis module is configured to analyze the story text data, wherein text recognition and word segmentation processing are carried out on the story text data;
the sound effect processing module is configured to perform sound effect data matching on story text word segmentation results based on a story sound effect model, acquire sound effect data corresponding to the story text word segmentation results and determine the mutual relationship between the sound effect data and the text word segmentation results corresponding to the sound effect data;
the multi-mode story data generation module is configured to fuse the sound effect data and the story text data based on the mutual relation, generate story content audio data and output the story content audio data;
the text parsing module is further configured to acquire story elements corresponding to the text word segmentation result and/or semantics of the text word segmentation result;
the sound effect processing module is also configured to select corresponding sound effect data from a sound effect library according to story elements corresponding to the text word segmentation result and/or the semantics of the text word segmentation result and determine matching detail description between the sound effect data and the corresponding text word segmentation result;
the sound effect processing module is specifically configured to disassemble content elements of the story based on the text recognition result and extract story elements; determining sound effect data matched with the text word segmentation result according to the story elements corresponding to the text word segmentation result;
specifically, the text analysis module divides the target to be analyzed into several specific story elements, extracts keywords for each story element, and stores the extracted keywords and story element labels as analysis results; the story elements include story backgrounds, story characters, event content, event occurrence contexts, and/or event progression stages.
7. The system of claim 6, wherein:
the text analysis module is also configured to acquire a story type corresponding to the current story text data;
the sound-effect processing module is further configured to invoke a matched story sound-effect model based on the story type.
8. An intelligent story machine, the story machine comprising:
the input acquisition module is configured to acquire multi-modal input of a user and confirm story requirements of the user;
a story data processing system according to claim 6 or 7, configured to obtain corresponding story text data in accordance with the user's story requirements, generating story content audio data;
an output module configured to output the story content audio data to a user.
CN201811114587.3A 2018-09-25 2018-09-25 Intelligent robot-oriented story data processing method Active CN109241331B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811114587.3A CN109241331B (en) 2018-09-25 2018-09-25 Intelligent robot-oriented story data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811114587.3A CN109241331B (en) 2018-09-25 2018-09-25 Intelligent robot-oriented story data processing method

Publications (2)

Publication Number Publication Date
CN109241331A CN109241331A (en) 2019-01-18
CN109241331B true CN109241331B (en) 2022-03-15

Family

ID=65056125

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811114587.3A Active CN109241331B (en) 2018-09-25 2018-09-25 Intelligent robot-oriented story data processing method

Country Status (1)

Country Link
CN (1) CN109241331B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950266A (en) * 2019-04-30 2020-11-17 北京搜狗科技发展有限公司 Data processing method and device and data processing device
CN111767740A (en) * 2020-06-23 2020-10-13 北京字节跳动网络技术有限公司 Sound effect adding method and device, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093658A (en) * 2013-01-14 2013-05-08 中国科学院软件研究所 Child real object interaction story building method and system
CN105096932A (en) * 2015-07-14 2015-11-25 百度在线网络技术(北京)有限公司 Voice synthesis method and apparatus of talking book
CN106557298A (en) * 2016-11-08 2017-04-05 北京光年无限科技有限公司 Background towards intelligent robot matches somebody with somebody sound outputting method and device
CN107451265A (en) * 2017-07-31 2017-12-08 广州网嘉玩具科技开发有限公司 A kind of story platform based on Internet of Things and artificial intelligence technology
CN107729441A (en) * 2017-09-30 2018-02-23 北京酷我科技有限公司 The processing method and system of a kind of audio file

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093658A (en) * 2013-01-14 2013-05-08 中国科学院软件研究所 Child real object interaction story building method and system
CN105096932A (en) * 2015-07-14 2015-11-25 百度在线网络技术(北京)有限公司 Voice synthesis method and apparatus of talking book
CN106557298A (en) * 2016-11-08 2017-04-05 北京光年无限科技有限公司 Background towards intelligent robot matches somebody with somebody sound outputting method and device
CN107451265A (en) * 2017-07-31 2017-12-08 广州网嘉玩具科技开发有限公司 A kind of story platform based on Internet of Things and artificial intelligence technology
CN107729441A (en) * 2017-09-30 2018-02-23 北京酷我科技有限公司 The processing method and system of a kind of audio file

Also Published As

Publication number Publication date
CN109241331A (en) 2019-01-18

Similar Documents

Publication Publication Date Title
CN110517689B (en) Voice data processing method, device and storage medium
CN109543021B (en) Intelligent robot-oriented story data processing method and system
Schröder The SEMAINE API: Towards a Standards‐Based Framework for Building Emotion‐Oriented Systems
CN108847214B (en) Voice processing method, client, device, terminal, server and storage medium
CN114401438B (en) Video generation method and device for virtual digital person, storage medium and terminal
KR101628050B1 (en) Animation system for reproducing text base data by animation
WO2022170848A1 (en) Human-computer interaction method, apparatus and system, electronic device and computer medium
CN109065019B (en) Intelligent robot-oriented story data processing method and system
CN109782997B (en) Data processing method, device and storage medium
CN109460548B (en) Intelligent robot-oriented story data processing method and system
CN114401431A (en) Virtual human explanation video generation method and related device
CN109241331B (en) Intelligent robot-oriented story data processing method
CN115497448A (en) Method and device for synthesizing voice animation, electronic equipment and storage medium
CN109065018B (en) Intelligent robot-oriented story data processing method and system
WO2023142590A1 (en) Sign language video generation method and apparatus, computer device, and storage medium
CN111161710A (en) Simultaneous interpretation method and device, electronic equipment and storage medium
CN112242132A (en) Data labeling method, device and system in speech synthesis
WO2022262080A1 (en) Dialogue relationship processing method, computer and readable storage medium
KR102281298B1 (en) System and method for video synthesis based on artificial intelligence
CN114595314A (en) Emotion-fused conversation response method, emotion-fused conversation response device, terminal and storage device
CN114514576A (en) Data processing method, device and storage medium
US20220236945A1 (en) Information processing device, information processing method, and program
CN114333758A (en) Speech synthesis method, apparatus, computer device, storage medium and product
KR102376552B1 (en) Voice synthetic apparatus and voice synthetic method
CN110718119A (en) Educational ability support method and system based on wearable intelligent equipment special for children

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant