CN113761113A

CN113761113A - User interaction method and device for telling stories through pictures

Info

Publication number: CN113761113A
Application number: CN202110003767.XA
Authority: CN
Inventors: 杨慕葵
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2021-01-04
Filing date: 2021-01-04
Publication date: 2021-12-07

Abstract

The invention discloses a user interaction method and device for telling stories by looking at pictures, and relates to the technical field of computers. One embodiment of the method comprises: receiving a voice request of a user for triggering story on demand; analyzing story keywords in the voice request, and inquiring a target picture story corresponding to the voice request from a picture story set; and playing the target picture story. The implementation mode can avoid dependence on terminal equipment such as mobile phones and other matched equipment, more flexibly expand content sources and enhance the satisfaction mode of edutainment requirements.

Description

User interaction method and device for telling stories through pictures

Technical Field

The invention relates to the technical field of computers, in particular to a user interaction method and device for telling stories by looking at pictures.

Background

Viewing the story helps to improve the language skills of the child. Products in the aspect of storytelling by looking at pictures in the prior art are often dependent on terminal equipment such as a mobile phone and the like, so that carelessness of entertainment abuse is brought; content resources mostly depend on well-made sound and picture resources, and are not convenient to expand; some reading products need to depend on matched books and pens, and are partially manual in inspiration, retrieval and the like.

Disclosure of Invention

In view of this, embodiments of the present invention provide a user interaction method and apparatus for telling stories through pictures, which can avoid dependence on terminal devices such as mobile phones and other supporting devices, extend content sources more flexibly, and enhance a satisfying manner of demands of lively education.

To achieve one or more of the above objects, according to an aspect of an embodiment of the present invention, there is provided a user interaction method for telling a story with pictures, including:

receiving a voice request of a user for triggering story on demand;

analyzing story keywords in the voice request, and inquiring a target picture story corresponding to the voice request from a picture story set;

and playing the target picture story.

Optionally, before querying a target voice story corresponding to the voice request from the picture story set, the method further includes:

extracting keywords from each picture story of the picture story set to generate a story label of the picture story; and generating a label index of the picture story set according to the story label of each picture story in the picture story set.

Optionally, before generating the tab index of the picture story set, the method further includes: and filtering story labels existing in a preset label library in all story labels of the picture story.

Optionally, the picture story set comprises a system picture story set and/or a user picture story set; according to the story keyword, before inquiring a target picture story corresponding to the voice request from the picture story set, the method further comprises the following steps:

extracting text information on each preset picture material, and generating a picture story corresponding to the preset picture material according to the text information to obtain a system picture story set; and/or the presence of a gas in the gas,

and receiving each personal material uploaded by a user, and generating a user picture story according to each personal material to obtain the user picture story set.

Optionally, before playing the target picture story, the method further includes:

and analyzing the voice style of the voice request, and adjusting the audio stream of the target picture story according to the voice style.

Optionally, according to the story keyword, querying a target picture story corresponding to the voice request from a picture story set includes:

and inquiring each picture story corresponding to the story key words from the picture story set, and screening the picture stories with higher story scores from the picture stories as the target picture stories.

Optionally, before the screening of the picture stories with higher story scores from the respective picture stories, the method further includes: setting a story score of each picture story in the picture story set;

after the target picture stories with higher story scores are screened from the various picture stories, the method further comprises the following steps: and adjusting the story score of the target picture story according to the service scene of the voice request and/or according to user feedback data in the playing process or after the playing is finished.

According to yet another aspect of an embodiment of the present invention, there is provided a user interaction device for telling a story with a picture, including:

the request receiving module is used for receiving a voice request for triggering the story to be played by a user;

the story query module is used for analyzing story keywords in the voice request and querying a target picture story corresponding to the voice request from a picture story set;

and the story playing module plays the target picture story.

Optionally, the user interaction device of the embodiment of the present invention further includes a story creation module, configured to extract a keyword from each picture story of the picture story set before querying a target voice story corresponding to the voice request from the picture story set, and generate a story tag of the picture story; and generating a label index of the picture story set according to the story label of each picture story in the picture story set.

Optionally, the story creation module is further configured to: and filtering story labels existing in a preset label library in all story labels of the picture story before generating a label index of the picture story set.

Optionally, the picture story set comprises a system picture story set and/or a user picture story set; the story creation module is further to: prior to querying a target picture story corresponding to the voice request from a picture story set according to the story keyword,

Optionally, the playing module is further configured to: and before the target picture story is played, analyzing the voice style of the voice request, and adjusting the audio stream of the target picture story according to the voice style.

Optionally, the story query module is further configured to: setting a story score of each picture story in the picture story set before screening picture stories with higher story scores from the picture stories; and the number of the first and second groups,

and after a target picture story with higher story score is screened from all the picture stories, adjusting the story score of the target picture story according to the service scene of the voice request and/or according to user feedback data in the playing process or after the playing is finished.

According to another aspect of an embodiment of the present invention, there is provided a user-interactive electronic device for storytelling by viewing, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the user interaction methods for storytelling by way of the present invention.

According to a further aspect of embodiments of the present invention, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the user interaction method for storytelling by viewing provided by the present invention.

One embodiment of the above invention has the following advantages or benefits: by inquiring and playing the corresponding picture story according to the voice request, dependence on terminal equipment such as a mobile phone and other matched equipment can be avoided; the content source can be more flexibly expanded by generating the picture story set according to the preset picture materials or the personal materials uploaded by the user; retrieval heuristics are facilitated by extracting keywords from each picture story of a picture story set to generate a tab index for the picture story set.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is an exemplary system architecture diagram of a user interaction method for storytelling, or a user interaction device for storytelling, suitable for application to embodiments of the present invention;

FIG. 2 is a schematic diagram of a main flow of a user interaction method for storytelling by viewing according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a user interaction scenario for supporting a tape-on-screen type speaker in an alternative embodiment of the present invention;

FIG. 4 is a schematic diagram of an alternative embodiment of the present invention for generating a picture story set based on pre-defined picture material;

FIG. 5 is a schematic flow chart of a user ordering a story in an alternative embodiment of the present invention;

FIG. 6 is a schematic diagram of a story screening strategy in an alternative embodiment of the present invention;

FIG. 7 is a schematic diagram of an alternative embodiment of the present invention for generating a picture story set based on a user's personal material;

FIG. 8 is a schematic diagram of the main modules of a user interaction device for viewing storytelling, according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 illustrates an exemplary system architecture diagram of a user interaction method for storytelling by viewing or a user interaction device for storytelling by applying to an embodiment of the present invention, as illustrated in fig. 1, an exemplary system architecture of a user interaction method for storytelling by viewing or a user interaction device for storytelling by viewing of an embodiment of the present invention includes:

as shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to speaker-type devices with screens, smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for shopping-like websites browsed by users using the

terminal devices

101, 102, 103. The background management server may analyze and/or otherwise process the received data, such as the on-demand request, and may feed back the processing result (e.g., a picture story-by way of example only) to the

terminal devices

101, 102, and 103.

It should be noted that the user interaction method for viewing a story telling provided by the embodiment of the present invention is generally performed by the server 105, and accordingly, the user interaction device for viewing a story telling is generally disposed in the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 is a schematic diagram of a main flow of a user interaction method for viewing a story-telling according to an embodiment of the present invention, and as shown in fig. 2, the user interaction method for viewing a story-telling includes:

step S201, receiving a voice request of a user for triggering story on demand;

step S202, analyzing story keywords in the voice request, and inquiring a target picture story corresponding to the voice request from a picture story set;

and step S203, playing the target picture story.

The process of playing a story is the process of playing a story description (i.e., story text). In the actual application process, the picture corresponding to the target picture story can be displayed through the screen. The user can rely on the audio amplifier or intelligent voice equipment such as a screen-type audio amplifier, projection equipment with a voice recognition function and other on-demand stories. Illustratively, after a user asks for a 'deer-related story' through a voice request sent by the screen-equipped sound box to inquire a corresponding picture event according to the on-demand story of the screen-equipped sound box, the description content of the picture story is played in voice, and the corresponding picture and a subtitle are displayed on a screen, as shown in fig. 3.

According to the embodiment of the invention, the corresponding picture story is inquired and played according to the voice request, so that dependence on terminal equipment such as a mobile phone and other matched equipment can be avoided. It should be emphasized that, although the method of the embodiment of the present invention can avoid the dependence on the terminal device such as the mobile phone, the method can also be implemented on the terminal device such as the mobile phone.

Optionally, before querying a target voice story corresponding to the voice request from the picture story set, the method further includes: extracting keywords from each picture story of the picture story set to generate a story label of the picture story; and generating a label index of the picture story set according to the story label of each picture story in the picture story set. By extracting keywords from each picture story of the picture story set to generate a tag index of the picture story set, an inverted index is established in tag dimensions, which facilitates retrieval heuristics.

Illustratively, as shown in fig. 4, text data on each picture in the picture file storage system is recognized using character recognition technology (e.g., "weather today" in fig. 4), and text data describing a story is generated for the picture using image description generation technology (e.g., "deer. On the basis of the first two text data, the label of the picture story (such as 'deer' and 'grassland' in fig. 4) is obtained through keyword extraction. The OCR in FIG. 4 is an abbreviation for Optical Character Recognition, which refers to Optical Character Recognition.

Optionally, before generating the tab index of the picture story set, the method further includes: and filtering story labels existing in a preset label library in all story labels of the picture story. Data that is not of interest to the application may be filtered through filtering. For example, for a children's class picture story, the tags in which only adults are available are filtered out.

In some optional embodiments, the picture story set comprises a system picture story set. According to the story keyword, before inquiring a target picture story corresponding to the voice request from the picture story set, the method further comprises the following steps: and extracting text information on each preset picture material, and generating a picture story corresponding to the preset picture material according to the text information to obtain the system picture story set. Referring to the embodiment shown in fig. 4, a picture story generated based on pictures in a picture file storage system is stored in a picture story storage system. The process of generating the picture story according to the preset picture materials is essentially the process of generating the story description, and the implementation mode of the process can be selectively set according to the actual situation, for example, a recurrent neural network is used for generating the image text. The content source can be more flexibly expanded by generating the picture story set according to the preset picture materials.

In other alternative embodiments, the picture story set includes a user picture story set. According to the story keyword, before inquiring a target picture story corresponding to the voice request from the picture story set, the method further comprises the following steps: and receiving each personal material uploaded by a user, and generating a user picture story according to each personal material to obtain the user picture story set. The user can provide personal materials such as family photo albums, market picture books and the like, and the user can define content sources independently, so that the rhythm limit of a producer is eliminated. The process of generating the picture story according to the personal materials is essentially the process of generating the story description, and the implementation mode can be selectively set according to the actual situation, for example, the image text is generated by using a recurrent neural network.

As shown in fig. 7, taking a photo taken by a user or an in-application associated album as a personal material, automatically triggering generation of content (e.g., voice content uploaded by the user, etc.), text on a graph (e.g., text edited by the user, etc.), and a tag, and then saving a corresponding picture story to a user-level story storage system with the user as a dimension. When the user clicks on the spot, the corresponding picture story is retrieved from the story generated by the user-level story storage system or the content party and system.

The picture story set is generated according to the personal materials uploaded by the user, so that the content source can be more flexibly expanded, and the personalized requirements of the user are met.

Optionally, before playing the target picture story, the method further includes: and analyzing the voice style of the voice request, and adjusting the audio stream of the target picture story according to the voice style. The implementation mode of the voice style switching can be selectively set according to the actual situation, such as prosody feature conversion, spectrum feature conversion, and the like. Through personalized voice synthesis, the method can provide a parent style voice and close the distance between audiences.

Optionally, according to the story keyword, querying a target picture story corresponding to the voice request from a picture story set includes: and inquiring each picture story corresponding to the story key words from the picture story set, and screening the picture stories with higher story scores from the picture stories as the target picture stories. The story score of each picture story may be preset. And the picture stories are screened according to the story score and played, so that the screened picture stories are more suitable for user requirements, and the accuracy of screening results is improved.

Optionally, before the screening of the picture stories with higher story scores from the respective picture stories, the method further includes: and setting the story score of each picture story in the picture story set. After the target picture stories with higher story scores are screened from the various picture stories, the method further comprises the following steps: and adjusting the story score of the target picture story according to the service scene of the voice request and/or according to user feedback data in the playing process or after the playing is finished.

Illustratively, in the process of playing the target picture story, if information which is fed back by a user and expresses that the user dislikes the target picture story is received, the story score of the target picture story is reduced; and if the information which is fed back by the user and expresses that the user likes the target picture story is received, the story score of the target picture story is increased. It is shown that in an animal scene, stories of picture stories containing more animals are scored higher, routinely, when the scores are determined from the business scene.

By adjusting the story score of the picture story according to the user feedback data and/or the service scene, the screened picture story can better meet the user requirements, and the accuracy of the screening result is further improved.

The user interaction method of the embodiment of the present invention is exemplified below with reference to fig. 5 and 6. As shown in fig. 5, an audio stream sent by a user is received, a voice request of the user is recognized through a voice recognition technology to say a story about deer, the voice request is obtained through semantic understanding and parsing, the task is to say the story, the keyword is to say the deer, then picture stories are retrieved from a content party story set and a self-generated story set respectively to serve as candidate picture stories, the candidate picture stories with higher story score are screened through a story skill PK strategy to serve as target picture stories, and the target picture stories are directly displayed or are displayed after personalized voice synthesis processing. Story skill PK strategy referring to fig. 6, for convenience of presentation, the candidate picture story from the content party story set is denoted as a, the candidate picture story from the generated story set is denoted as B, and it is determined whether the score of story B is greater than the score of story a. And if so, taking the story B as a target picture story. Otherwise, judging whether the story A exists, if so, taking the story A as a target picture story, and otherwise, taking the story B as a target picture story. In the cold starting stage, the PK strategy can be simply set as the bottom of a story of a content party, and the operation background is used for manually grading self-generated materials to select the content; and the scoring system of the self-generated materials can be adjusted at the later stage according to the user feedback data.

The embodiment can provide an interactive scheme for telling stories by looking at pictures on the intelligent voice equipment with the screen sound box, and can flexibly and flexibly search content sources and enhance the meeting mode of edutainment requirements.

According to still another aspect of an embodiment of the present invention, there is provided an apparatus for implementing the above method.

As shown in fig. 8, the user interaction device 800 for viewing a story telling includes:

a request receiving module 801, which receives a voice request for triggering the story to be requested by a user;

a story query module 802 for analyzing the story keywords in the voice request and querying a target picture story corresponding to the voice request from a picture story set;

and a story playing module 803 for playing the target picture story.

Fig. 9 is a schematic structural diagram of a computer system suitable for implementing a terminal device according to an embodiment of the present invention, and as shown in fig. 9, the computer system 900 of the terminal device according to the embodiment of the present invention includes:

a Central Processing Unit (CPU)901 is included, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM903, various programs and data necessary for the operation of the system 900 are also stored. The CPU901, ROM902, and RAM903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The above-described functions defined in the system of the present invention are executed when the computer program is executed by a Central Processing Unit (CPU) 901.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a request receiving module, a story query module, and a story playback module. Where the names of these modules do not in some cases constitute a limitation on the module itself, for example, a story query module may also be described as a "module playing the target picture story".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: receiving a voice request of a user for triggering story on demand; analyzing story keywords in the voice request, and inquiring a target picture story corresponding to the voice request from a picture story set; and playing the target picture story.

According to the technical scheme of the embodiment of the invention, the corresponding picture story is inquired and played according to the voice request, so that dependence on terminal equipment such as a mobile phone and other matched equipment can be avoided; the content source can be more flexibly expanded by generating the picture story set according to the preset picture materials or the personal materials uploaded by the user; retrieval heuristics are facilitated by extracting keywords from each picture story of a picture story set to generate a tab index for the picture story set.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A user interaction method for storytelling by picture, comprising:

receiving a voice request of a user for triggering story on demand;

and playing the target picture story.

2. The user interaction method of claim 1, prior to querying a target voice story corresponding to the voice request from a picture story set, further comprising:

3. The user interaction method of claim 2, prior to generating the tab index for the picture story set, further comprising: and filtering story labels existing in a preset label library in all story labels of the picture story.

4. The method of claim 1, wherein the picture story set includes a system picture story set and/or a user picture story set; according to the story keyword, before inquiring a target picture story corresponding to the voice request from the picture story set, the method further comprises the following steps:

5. The method of claim 1, wherein prior to playing the target picture story, further comprising:

6. The method of claim 1, wherein querying a target picture story from a picture story set corresponding to the voice request based on the story keyword comprises:

7. The method of claim 6, wherein prior to filtering the picture stories having higher story scores from the respective picture stories, further comprising: setting a story score of each picture story in the picture story set;

8. A user interaction device for storytelling by picture, comprising:

and the story playing module plays the target picture story.

9. A user-interactive electronic device for storytelling by picture, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.