CN116052492B

CN116052492B - Multi-mode information processing method, device and medium based on interactive drawing scenario

Info

Publication number: CN116052492B
Application number: CN202310316712.3A
Authority: CN
Inventors: 王一
Original assignee: Shenzhen Renma Interactive Technology Co Ltd
Current assignee: Shenzhen Renma Interactive Technology Co Ltd
Priority date: 2023-03-29
Filing date: 2023-03-29
Publication date: 2023-06-23
Anticipated expiration: 2043-03-29
Also published as: CN116052492A

Abstract

The embodiment of the application discloses a method, a device and a medium for processing multi-mode information based on interactive drawing scenario, wherein the method of the embodiment of the application can present the interactive scenario to a user in a voice mode, thereby being beneficial to improving the immersion feeling of drawing of the user, guiding the user to explain the creation thought of the user when the drawing of the user does not accord with the interactive condition, being beneficial to improving the language expression capability of the user, and being beneficial to a server of the application to better know the creation thought or creation intention of the user; the method can also provide personalized modification opinions for the user in a man-machine interaction mode aiming at the authored content of the user under a specific theme while respecting the authored thought of the user to the greatest extent; the user can be guided to associate the authored content with the interaction requirement while the degree of freedom of the user authoring is ensured, and the method is beneficial to culturing the imagination of the user authoring.

Description

Multi-mode information processing method, device and medium based on interactive drawing scenario

Technical Field

The application relates to the technical field of internet, in particular to a method, a device and a medium for processing multi-mode information based on interactive drawing scenario.

Background

During the growth process, children often choose to express their own ideas and moods in the form of drawings, and parents and teachers can further understand the psychological world of the children through the drawing images of the children. The painting can cultivate the intuition ability of the child, can cultivate the innovative spirit of the child, and can also cultivate the imagination of the child, thereby improving the thinking and the hands-on ability of the child.

At present, parents can select a painting guidance class to guide children to systematically learn painting knowledge, but the guidance class is generally in a one-to-many education mode, and a teacher cannot fully interact with each child, so that the teacher is difficult to notice and comprehensively master the painting emotion, the creation thought and the painting requirement of the child, and thus cannot provide modification opinions which better meet the self requirement or condition of the child for the child, and the enthusiasm of child creation is not maintained.

Therefore, how to provide a personalized drawing guiding drawing teaching method which is more relevant to the user creation thought is a problem which needs to be solved by the technicians in the field.

Disclosure of Invention

An object of the present application is to provide a method, an apparatus, and a medium for processing multimodal information based on an interactive drawing scenario, which can propose personalized modification suggestions more conforming to the drawing intention of a user for the user according to the drawing content or the creation thought of the user, and guide the user to create a work conforming to the interactive requirement while not affecting the drawing enthusiasm and the drawing imagination of the user.

In a first aspect, an embodiment of the present application provides a method for processing multimodal information based on an interactive drawing scenario, which is applied to a server of a system for processing multimodal information based on an interactive drawing scenario, where the system for processing multimodal information based on an interactive drawing scenario includes a user terminal and a server, and the method may include the following steps:

receiving a first selection instruction sent by a user through a user terminal, wherein the first selection instruction can be used for selecting an interactive story;

determining an interactive scenario according to the first selection instruction, wherein the interactive scenario can comprise at least one scenario branch, and each scenario branch can comprise at least one teaching node;

when the interaction progress is positioned at the teaching node, sending a guiding interaction statement to the user terminal, wherein the guiding interaction statement can be used for expressing the interaction requirement corresponding to the current teaching node;

receiving first drawing image data sent by a user through a user terminal;

judging whether the first drawing image data accords with the interaction requirement corresponding to the guiding interaction statement or not;

if not, sending an inquiry interactive statement to the user terminal, wherein the inquiry interactive statement can be used for guiding the user to describe the creation thought of the first drawing image data and the meaning of the drawing image data;

Receiving first voice information sent by a user through a user terminal, and determining at least one target modification content according to the first voice information, wherein the first voice information can comprise an creation idea of first drawing image data and a drawing image data meaning;

generating at least one teaching interactive statement according to the at least one target modification content, and sending the at least one teaching interactive statement to the user terminal, wherein the teaching interactive statement can be used for guiding a user to modify the first drawing image data so that the first drawing image data accords with the interaction requirement corresponding to the guiding interactive statement.

In one possible implementation manner, the method for receiving the first voice information sent by the user via the user terminal and determining at least one target modification content according to the first voice information may include the following steps:

determining at least one first keyword according to the first voice information, wherein the first keyword can be related to an interaction requirement corresponding to an inquiry interaction statement;

determining at least one alternative modification content in the first drawing image data according to the first keyword;

if the single first keyword corresponds to a plurality of alternative modification contents, determining the alternative modification content with the highest first modification priority as target modification content, wherein the first modification priority is related to the association degree of the alternative modification content and the interaction requirement corresponding to the inquiry interaction sentence;

If the single first keyword corresponds to only the single alternative modified content, determining the target modified content from the single alternative modified content.

In another possible implementation manner, the generating a teaching interactive sentence according to the at least one target modification content and transmitting the teaching interactive sentence to the user terminal may include the following steps:

determining a second modification priority of the at least one target modification content according to the modification difficulty of the at least one target modification content;

arranging at least one target modification content according to the order from high to low of the second modification priority, and generating a modification list;

and generating corresponding teaching interactive sentences according to the modification list, and sending the teaching interactive sentences to the user terminal.

In another possible implementation manner, the generating the corresponding teaching interaction sentence according to the modification list and transmitting the teaching interaction sentence to the user terminal may include the following steps:

transmitting a first teaching interactive statement corresponding to the highest-ordered target modification content in the modification list to a user terminal;

receiving second drawing image data sent by a user through a user terminal;

judging whether the second drawing image data is modified corresponding to the teaching interactive statement or not by comparing the first drawing image data with the second drawing image data;

If not, generating a second teaching interactive sentence according to the second drawing image data and the first teaching interactive sentence, and sending the second teaching interactive sentence to the user terminal;

if yes, deleting the target modification content corresponding to the first teaching interaction statement from the modification list to obtain an updated modification list;

transmitting a third teaching interactive statement corresponding to the highest-ordered target modification content in the updated modification list to the user terminal;

when the updated modification list has no target modification content, generating corresponding interactive sentences according to the interactive progress, wherein the interactive sentences can comprise guiding interactive sentences or plot interactive sentences, and the plot interactive sentences can be used for describing the story line of the interactive story.

In another possible embodiment, after determining whether the first drawing image data meets the interaction requirement corresponding to the guiding interaction sentence, the method further includes the following steps:

if yes, determining at least one preset feature point according to the interaction requirement corresponding to the guiding interaction statement;

extracting at least one alternative interaction characteristic point corresponding to at least one preset characteristic point in the first drawing image data;

Calculating the characteristic deviation value of the alternative interaction characteristic point and the preset characteristic point corresponding to the alternative interaction characteristic point through the characteristic comparison model;

taking the alternative interaction characteristic points with the characteristic deviation value larger than the preset value as target interaction characteristic points;

and generating comment interactive sentences according to the target interactive feature points, wherein the comment interactive sentences can be used for confirming the creation ideas and/or drawing image data meanings of the target interactive feature points with a user.

In another possible implementation manner, after determining the interactive scenario according to the first selection instruction, the method may further include the following steps:

and when the interaction progress is not located at the teaching node, sending corresponding scenario interaction sentences to the user terminal.

receiving a second selection instruction sent by a user through a user terminal, wherein the second selection instruction can be used for selecting a scenario branch;

and according to the second selection instruction, adjusting the interaction progress to the corresponding scenario node.

In a second aspect, an embodiment of the present application provides an apparatus for processing multimodal information based on an interactive drawing scenario, where the apparatus may include: the device comprises a communication module, a calculation module and a judgment module;

The communication module can be used for receiving a first selection instruction sent by a user through the user terminal, and the first selection instruction can be used for selecting an interactive story;

the computing module can be used for determining an interactive scenario according to the first selection instruction, the interactive scenario can comprise at least one scenario branch, and each scenario branch can comprise at least one teaching node;

the communication module is further used for sending a guiding interaction statement to the user terminal when the interaction progress is located at the teaching node, wherein the guiding interaction statement can be used for expressing the interaction requirement corresponding to the current teaching node;

the communication module is also used for receiving first drawing image data sent by a user through the user terminal;

the judging module can be used for judging whether the first drawing image data accords with the interaction requirement corresponding to the guiding interaction statement;

the communication module is further configured to send an inquiry interaction sentence to the user terminal when the first drawing image data does not meet the interaction requirement corresponding to the guiding interaction sentence, where the inquiry interaction sentence may be used to guide the user to describe the creation idea of the first drawing image data and the meaning of the drawing image data;

the communication module is further used for receiving first voice information sent by a user through the user terminal, wherein the first voice information can comprise an creation idea of first drawing image data and a drawing image data meaning;

The computing module is further used for determining at least one target modification content according to the first voice information;

the computing module is further used for generating at least one teaching interaction statement according to the at least one target modification content, and the teaching interaction statement can be used for guiding a user to modify the first drawing image data so that the first drawing image data accords with interaction requirements corresponding to the guiding interaction statement;

the communication module can also be used for sending at least one teaching interaction statement to the user terminal.

In a third aspect, an embodiment of the present application provides an apparatus for processing multimodal information based on interactive drawing scenario, where the apparatus may include: a processor, a memory, and a bus;

the processor and the memory are connected by a bus, wherein the memory is adapted to store a set of program code and the processor is adapted to invoke the program code stored in the memory for performing the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, comprising:

the computer readable storage medium has stored therein instructions which, when run on a computer, implement the method as described in the first aspect.

According to the method and the device, various interactive stories can be provided for the user, after the user selects corresponding interaction event, personalized interaction script is generated for the user, so that the interaction or teaching content is more in line with the creation habit and creation level of the user, and the user is helped to keep drawing interests; the method and the system can generate the modification opinion which accords with the user's authoring intention or the authoring thought for the user according to the interaction requirement and the user's authoring content, and are helpful for guiding the user to correctly author knowledge while protecting the imagination of the user.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are of some embodiments of the present application and that other drawings may be derived from these drawings without the exercise of inventive faculty.

FIG. 1 is a schematic diagram of a system for multi-modal information processing based on interactive drawing scenario according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for processing multi-modal information based on interactive drawing scenario according to an embodiment of the present application;

FIG. 3 is a schematic view of a scenario for selecting an interactive story according to an embodiment of the present disclosure;

fig. 4 is a schematic view of a scene for understanding the meaning of user creation thought and drawing image data according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a device for processing multi-modal information based on interactive drawing scenario according to an embodiment of the present application;

fig. 6 is a schematic diagram of an apparatus for processing multi-modal information based on an interactive drawing scenario according to another embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

The terms "first," "second," "third," and "fourth" and the like in the description and in the claims of this application and in the drawings, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to better understand the technical solution of the embodiments of the present application, a system for processing multimodal information based on interactive drawing scenario, which may be related to the embodiments of the present application, is first described. Referring to fig. 1, an architecture diagram of a system for multi-modal information processing based on interactive drawing scenario provided in an embodiment of the present application may include the following parts: a user terminal 10 and a server 20.

The User terminal 10 may also be referred to as User Equipment (UE). It may be deployed on land, including indoors or outdoors, hand held, wearable or vehicle mounted. It may also be referred to as a user terminal, terminal device, access terminal device, vehicle terminal, UE unit, UE station, mobile station, remote terminal device, mobile device, UE terminal device, mobile terminal, wireless communication device, UE agent, UE apparatus, or the like. The terminal may be fixed or mobile, etc. The specific form of the mobile phone can be a mobile phone (mobile phone), a tablet personal computer (Pad), a smart watch, a computer with a wireless receiving and transmitting function, a vehicle-mounted terminal device, a wireless terminal in a smart home (smart home), a wearable terminal device and the like. The operating system of the terminal device at the PC end, such as an all-in-one machine, may include, but is not limited to, linux system, unix system, windows series system (such as Windows xp, windows 7, etc.), mac OS X system (operating system of apple computer), etc. The operating system of the terminal device of the mobile terminal, such as a smart phone, may include, but is not limited to, an android system, an IOS (operating system of an apple phone), a Window system, and other operating systems. The user terminal 10 may be configured to respond to a wake-up instruction, a login instruction, or a start instruction of a user, and present a man-machine interaction teaching interface for the user to select an interaction story for learning; the user terminal 10 may further communicate a first selection instruction of the user (for selecting an interactive story) to the server 20 after receiving the first selection instruction; the user terminal 10 may also be configured to receive teaching information (such as a guidance interactive sentence, an inquiry interactive sentence, a teaching interactive sentence, or a scenario interactive sentence) sent by the server, and present the teaching information to the user; the user terminal 10 may also receive first drawing image data edited by the user according to the guide interactive sentence, and send the first drawing image data to the server 20; the user terminal 10 may be further configured to receive first voice information input by a user, and send the first voice information to the server 20, where the first voice information may include an authoring concept of the first drawing image data and a meaning of the drawing image data; more, the user terminal may also receive a modification instruction from the user and transmit the modified drawing image data information to the server 20; the user terminal 10 may also receive a selection instruction for selecting a scenario branch from the user, and transmit information corresponding to the selection instruction to the server 20.

Further, the server 20 may be a server, or a server cluster composed of several servers, or a cloud computing service center. One server 20 may correspond to a plurality of user terminals 10 at the same time, or a system for multi-modal information processing based on interactive drawing scenario mentioned in the method of the embodiment of the present application may include voice servers 20, each voice server 20 corresponding to one or more user terminals 10. Specifically, the server 20 may be configured to receive a first selection instruction sent by the user terminal 10, and determine an interactive scenario according to the first selection instruction, where the interactive scenario may include at least one scenario branch, and each scenario branch may include at least one teaching node. Possibly, the server may further generate an interactive scenario according to the first selection instruction and user information, where the user information may include an age of the user and/or an authoring preference object; the server 20 may be further configured to send a guiding interaction sentence to the user terminal when the interaction progress is located at the teaching node, where the guiding interaction sentence may be used to express an interaction requirement corresponding to the current teaching node; the server 20 may be further configured to receive the first drawing image data sent by the user terminal 10, and determine whether the first drawing image data meets an interaction requirement corresponding to the guiding interaction sentence; the server 20 may further send an inquiry interaction sentence to the user terminal when the first drawing image data does not meet the interaction requirement corresponding to the guiding interaction sentence, where the inquiry interaction sentence may be used to guide the user to describe the creation idea of the first drawing image data and the meaning of the drawing image data; the server 20 may be further configured to determine at least one target modification content according to the first voice information, generate at least one teaching interaction sentence according to the at least one target modification content, and send the at least one teaching interaction sentence to the user terminal 10, where the teaching interaction sentence may be used to guide the user to modify the first drawing image data so that the first drawing image data meets the interaction requirement corresponding to the guiding interaction sentence.

It should be noted that, the interactive sentences (such as scenario interactive sentences, guidance interactive sentences, inquiry interactive sentences, comment interactive sentences, teaching interactive sentences, etc.) are presented in the form of voice playing to the user terminal. More possibly, the form of presenting the interactive sentence can be selected according to the model of the user terminal, for example, when the user terminal is a mobile phone or a tablet, the form of presenting the interactive sentence can be a form of combining voice playing and displaying of corresponding characters; when the user terminal is a mobile watch, the interactive sentence can be presented in the form of voice playing only because the screen of the mobile watch is small.

The following describes in detail a method for processing multimodal information based on interactive drawing scenario according to an embodiment of the present application with reference to fig. 2.

Referring to fig. 2, a flowchart of a method for processing multi-modal information based on interactive drawing scenario according to an embodiment of the present application is provided, and the method may include the following steps:

s201, receiving a first selection instruction sent by a user through the user terminal.

Wherein the first selection instruction may be used to select the interactive story.

For example, a user may wake up, initiate or log in an interactive teaching program on a user terminal, where the interactive teaching program may present (or provide) multiple types (or topics) of interactive stories to the user, and the user may directly click on a top page of the interactive teaching program to select a target interactive story; the user may also select the type of interactive session story first and then precisely select the interactive story of the cardiometer. Further, the user may also search for a target interactive story (i.e., an interactive story that the user wants to learn) within the interactive tutorial program.

Referring to fig. 3, fig. 3 is a schematic view of a scene for selecting an interactive story according to an embodiment of the present application.

As shown in fig. 3, the user terminal 10 may present a type selection interface 310 after receiving an instruction from a user to wake up, initiate or log in an interactive tutorial program, and a plurality of types of interactive stories (such as historical humanity, science popularization knowledge, and fairy tale in fig. 3) may be presented in the type selection interface 310. If the user wants to learn authored knowledge in a fairy tale, an icon 311 may be clicked in the type selection interface 310. After clicking the icon 311, the user terminal 10 displays a story selection interface 320, where a plurality of interactive stories related to fairy tale stories (such as "princess of herring," little sun, "and" little white of puppy "in fig. 3) may be presented in the story selection interface 320, and if the user wants to select" princess of herring "as the interactive story for learning, the icon 312 may be selected in the story selection interface 320.

S202, determining an interactive script according to the first selection instruction.

It should be noted that the interactive scenario may include at least one scenario branch, and each scenario branch may include at least one teaching node.

Specifically, the embodiment of the application can generate a plurality of plot branches according to plot development and creation difficulty of the interactive story and the like. For example, if the novel corresponding to the interactive story 1 has 15 chapters, 15 plot branches can be set for the interactive scenario 1 corresponding to the interactive story 1, and the content of each plot branch corresponds to the content of the original novel chapter; if the creation difficulty of the 1 st to 8 th chapters corresponding to the interactive story 1 is 2-level (the creation difficulty is sequentially increased from 1 to 10-level, the difficulty can be overlapped), the creation difficulty of the 9 th to 13 th chapters is 4-level, the creation difficulty of the 14 th to 15 th chapters is 8-level, if the scenario branches are divided according to the difficulty of 4 as the division standard, the following groups of scenario branches 1 (including the section 1 and the section 2), scenario branches 2 (including the section 3 and the section 4), scenario branches 3 (including the section 5 and the section 6), scenario branches 4 (including the section 7 and the section 8), scenario branches 5 (including the section 9), scenario branches 6 (including the section 10), scenario branches 7 (including the section 11), scenario branches 8 (including the section 12), scenario branches 9 (including the section 13), scenario branches 10 (including the first half of the section 14), scenario branches 11 (including the section 14), and second half of the section 12 (including the section 15) may include the first half of the scenario section 15. The above examples of scenario branching are only for describing embodiments of the present application in more detail, and should not be construed as limiting, and specific rules for branching scenario are set by a technician according to actual situations, and the present application is not limited thereto.

Further, the server may also generate an interactive script according to the first selection instruction and user information, which may include the age of the user and/or authoring preference objects. The user's user level in the interactive tutorial program may also affect the number of scenario branches and tutorial nodes. For example, when the age of the user is lower than the preset age value, the teaching node may be set in a story portion of a simple character or landscape, for example, when the user is 6 years old (the preset age value is 10 years old, and the interactive story selected by the user includes drawing elements such as trees, sun, plane, ship and cloud, etc.), the teaching node may be set on drawing elements such as trees, sun and cloud with lower difficulty. The technical staff can preset the creation difficulty of various drawing elements, and the teaching node selection is facilitated. Or, if the user creates a ship as the preferred object, when the drawing elements included in the interactive story selected by the user are a ship, a sailing boat, a yacht, a tree, a dolphin, and a gull, the teaching node is set on the story line (or the plot) corresponding to the ship, the sailing boat, and the yacht. Or, the higher the user level of the user, the more complex (or difficult) the teaching node can be set on the storyline (or scenario) corresponding to the teaching node, for example, when the user level is 3, the teaching node can be set on the storyline (or scenario) corresponding to the drawing element with the creation difficulty of 3; when the user level is 9, the teaching node can be set on a story line (or scenario) corresponding to the drawing element with the creation difficulty of 9. It should be noted that, setting rules such as scenario branches, teaching nodes, difficulty of drawing elements, etc. are set by technicians according to actual situations, and are not limited herein, and the foregoing examples are only for describing the method of the embodiments of the present application in more detail.

In one possible implementation, after determining the interactive scenario according to the first selection instruction, the method may further include the steps of:

By way of example, the teaching nodes in the scenario branch 14 are located at 12% scenario progress positions (requiring the user to make a climbing picture), 20% scenario progress positions (requiring the user to make a swimming picture), and 63% scenario progress positions (requiring the user to make a running picture), the scenario in the scenario progress region of 0% -12% (excluding 12%) is a spring tour for school organization students going to the mountain, the scenario in the scenario progress region of 12% -20% (excluding 12% and 20%) is a summer holiday life, the scenario in the scenario progress region of 20% -63% (excluding 20% and 63%) is a school campus sport meeting, and if the user's interaction progress in the scenario branch 14 is 15% scenario progress positions of the scenario corresponding to the scenario, the current interaction progress of the user is not located at the teaching nodes, the server can send scenario interaction sentences of the scenario corresponding to the scenario progress positions of 15% scenario progress positions to the user terminal, such as a scenario according to "summer holiday life".

Possibly, after determining the interactive scenario, the server may send the name of the scenario branch of the interactive scenario, the composition of the scenario branch and the position of the teaching node to the user terminal for reference by the user. For example, if the interactive scenario 1 (set as an interactive scenario corresponding to "doggie small white") includes scenario branch 15 (set as "meet with new neighbor small black"), scenario branch 16 (set as "rescue small black"), and scenario branch 17 (set as "rescue small black"), after the user views or analyzes the scenario branch name, the user is most interested in the content of "rescue small black", the user may click on select scenario branch 17, and the user terminal may send a second selection instruction (i.e. for selecting learning scenario branch 17) to the server. After receiving the second selection instruction, the server sends interactive sentences (such as scenario interactive sentences, guiding interactive sentences, inquiry interactive sentences, comment interactive sentences, teaching interactive sentences and the like) corresponding to the scenario branches 17 to the user terminal.

And S203, when the interaction progress is positioned at the teaching node, sending a guiding interaction statement to the user terminal.

The guiding interactive sentence can be used for expressing the interactive requirement corresponding to the current teaching node.

For example, based on the scenario branch 14 related example, if the user can consider that the current interaction progress of the user is located at the teaching node when the interaction progress of the scenario branch 14 is 63% of the scenario progress of the interaction scenario corresponding to the scenario branch 14, the server may send a guiding interaction sentence similar to "please draw a work that the athlete rushes through the finish line" to the user terminal.

S204, receiving first drawing image data sent by the user through the user terminal.

It should be noted that, the user draws the first drawing image data in the interactive teaching program interface of the user terminal; the user can draw the first drawing image data in other software with drawing functions of the user terminal, and then import the first drawing image data into the interactive teaching program; the user can also mobilize the camera of the user terminal through the interactive teaching program, shoot the obtained picture content drawn by the user (picture content related to the interactive requirement), and then send the picture content to the server in the form of first drawing image data; the user can also directly shoot through the camera of the user terminal to obtain the picture content drawn by the user (picture content related to the interaction requirement), then the picture content is stored in the album in the form of first drawing image data, then the user can select the album importing mode in the interaction teaching program to import the first drawing image data, and finally the first drawing image data is sent to the server. For example, after the user knows the guiding interactive sentence "please draw a work that the athlete rushes through the finish line" sent by the server, the user can draw the corresponding first drawing image data in the interactive teaching program interface of the user terminal; the user can draw the corresponding picture content in the mobile phone memo, and import the picture content into the interactive teaching program in the form of drawing image data; the user may draw the first drawing image data in the software with the drawing function of another user terminal, then send the first drawing image data to the user terminal having communication connection with the server (i.e. the user terminal presenting the guiding interactive sentence of "please draw a work of athletes rushing through the finish line"), and then send the first drawing image data to the server by the user terminal having communication connection with the server (i.e. the user terminal presenting the guiding interactive sentence of "please draw a work of athletes rushing through the finish line").

S205, judging whether the first drawing image data accords with the interaction requirement corresponding to the guiding interaction statement.

For example, if the guiding interactive sentence is "please draw a work that the athlete rushes through the finish line", the keywords "athlete", "rushes through" and "finish line" may be extracted, and the corresponding image elements may be "person", "gesture of opening both arms" and "ribbon", and by performing image recognition on the first drawing image data, it may be determined whether the first drawing image data contains elements such as "person", "gesture of opening both arms" and "ribbon", so as to determine whether the first drawing image data meets the interaction requirement corresponding to the guiding interactive sentence.

In one possible implementation manner, after determining whether the first drawing image data meets the interaction requirement corresponding to the guiding interaction sentence, the method further includes the following steps:

For example, let you draw a representation of your mother "and let the preset feature points corresponding to the guide interactive statement be" hairstyle "(let" tail braid "feature value 0," long curly "feature value 8," short curly "feature value 10 and" head "feature value 50)," color "(let" black "feature value 0," brown "feature value 5," yellow "feature value 13 and" red "feature value 20)," clothing "(let" short sleeve "feature value 0," long sleeve "feature value 2," dress "feature value 2 and" down jacket "feature value 24) and" accessory "(let" earring "feature value 2," necklace "feature value 2 and" cap "feature value 10), if the representation of the first image data drawn by the user is: the method comprises the steps that a lady wearing black dress with red long curls can determine that candidate interaction feature points in first drawing image data comprise ' black dress ' (corresponding to ' clothes ') ', ' long curls ' (corresponding to ' hairstyles ') and ' red ' (corresponding to ' colors '), and candidate interaction feature points which do not correspond to ' accessories ', then the feature deviation values of each candidate interaction feature point in the first drawing image data and the corresponding preset feature point can be calculated, the feature values of the relevant preset feature points in the current example are all 0, and the ' hairstyles ' feature deviation values are 8, the ' colors ' feature deviation values are 20 and the ' clothes ' feature deviation values are 2. If the preset value is 10, for the current first drawing image data, the candidate interaction feature point "red" corresponding to the "color development" preset feature point is the target interaction feature point, so the server may generate a comment interaction sentence related to "color development" is "red'", such as "the hair of the original mother is red". It should be noted that, the server may initiate multiple rounds of interaction with respect to the same target interaction feature point, such as "the mother's hair is very special woolen", "the mother's hair is dyed with other colors, or" the red hair is very beautiful woolen ", etc.

More, the server can also conduct intention recognition on the voice information of the user answering comment interactive sentences, so that the user's creation thought and creation intention are known better. For example, if the comment interactive sentence is "the hair of the original mother is red, the reply voice information corresponding to the user is" the hair of the mother is black, but me likes the mother to dye the hair red ", the server can know that the first drawing image data is not the current image information of the represented user mother, but the image information the user most hopes the mother to have through intention recognition, and the server is provided with the user's own image information.

It should be noted that the foregoing examples of the preset feature points and the related feature values and/or preset values are only for describing the method of the embodiments of the present application in more detail, and should not be construed as limiting, and specific preset feature points and the related feature values and/or preset values are set by the skilled person according to actual situations, which is not limited herein.

And S206, if the judgment is negative, sending an inquiry interaction statement to the user terminal.

The query interactive sentence can be used for guiding the user to describe the creation idea of the first drawing image data and the meaning of the drawing image data. Therefore, the embodiment of the application can guide the user to explain the creation thought, is beneficial to improving the language expression capability of the user, and is also beneficial to the server to know the creation thought or creation intention of the user.

S207, receiving first voice information sent by the user through the user terminal, and determining at least one target modification content according to the first voice information.

The first voice information may include an authoring concept of the first drawing image data and a meaning of the drawing image data.

Referring to fig. 4, fig. 4 is a schematic view of a scene for understanding the concept of user creation and meaning of drawing image data according to an embodiment of the present application. It should be noted that, fig. 4 only shows the display screen of the user terminal 10 for convenience in explaining the method according to the embodiment of the present application.

As shown in fig. 4, in the teaching node 410, the server 20 sends a guiding interactive sentence 411 "please draw a drawing of your mother" to the user terminal, and then the user draws the first drawing image data 412 (only one flower in the first drawing image data 412) according to the guiding interactive sentence 411, and sends the first drawing image data 412 to the server 20 via the user terminal 10. After performing image recognition on the first drawing image data 412, the server 20 may generate and transmit a query interactive sentence 413 (the content may be "no figure is recognized in the drawing image data, please explain your creation idea") to the user terminal 10 when only one flower is found in the first drawing image data 412 and no image element of "person" is recognized. After knowing the query interactive sentence 413, the user may enter the first voice information 414 into the user terminal 10 (the content may be "mom is beautiful like a flower, so that me wants to draw mom as a flower, the flower on the drawing image data is the mom of me drawing"), and after receiving the first voice information 414 sent by the user terminal 10, the server 20 performs intention recognition on the first voice information 414, so that me wants to draw mom as a flower "as an creation idea of the user, and" the flower on the drawing image data is the mom of me drawing "is the meaning of the drawing image data of the user.

For example, if the guiding interactive sentence is "please draw a representation of your mother", the first drawing image data drawn by the user includes a flower and a book, and then the inquiry interactive sentence is sent to the user terminal, the first voice information of the user is received as "i feel beautiful like flowers and know more of mother than books", so i feel flowers are mother and books are mother. First, since the interaction requirement of the guide interaction sentence is "image of mother", the server can confirm that the modification contents are "flower" and "book" according to the first keyword "flower is mother" and "book is mother". Since both "flower" and "book" correspond to "mother", the first modified priorities of "flower" and "book" need to be compared. Flowers are generally considered to be more anthropomorphic than books, so that the first modification priority of "flowers" can be considered to be higher than the first modification priority of "books", and "flowers" can be regarded as the target modification content.

For example, if the guiding interactive sentence is "please imagine and draw a work of fishing by himself at the pond", the first drawing image data drawn by the user is: a cat sits on the pond and grabs the fish with the paws, and after query interaction sentences are sent to the user terminal, the first voice information of the user is received, namely, the cat feels that the fish is grabbed very severely, so that the cat is drawn as the cat by the user. I have not seen fishing, so I have drawn to catch the fish with the paw. Firstly, because the interaction requirements of the guiding interaction statement are 'self', 'pond edge' and 'fishing', the first keywords 'cat' and 'fish catching' can be determined according to the first voice information, wherein 'cat' corresponds to 'self', 'fish catching' corresponds to 'fishing', and therefore 'cat' and 'fish catching' can be determined as target modification contents.

S208, generating at least one teaching interaction statement according to the at least one target modification content, and sending the at least one teaching interaction statement to the user terminal.

The teaching interactive sentence can be used for guiding the user to modify the first drawing image data, so that the first drawing image data accords with the interaction requirement corresponding to the guiding interactive sentence.

For example, for the embodiment of the first drawing image data including a flower and a book, the server may generate a teaching interactive sentence similar to "you can draw a flower above a book, showing that the flower is an image sprouting and growing from the book, and can also draw a five sense organ and a hairstyle for the flower; the first drawing image data drawn for the user is: in the embodiment that a cat sits on the pond and grabs the fish with the claws, the server can generate teaching interactive sentences similar to ' you can draw the most common decorations on the kittens in your normal times ', and can draw a fishing rod for the kittens '.

In one possible implementation manner, the method for generating the teaching interactive statement according to the at least one target modification content and sending the teaching interactive statement to the user terminal may include the following steps:

Exemplary, the first drawing image data drawn for the user is: in the embodiment that a cat sits on the side of the pond and grabs the fish with the claw, the difficulty of drawing the fishing rod is greater than that of drawing the ornament, the second modification priority of the target modification content of grabbing the fish is higher than that of the target modification content of grabbing the fish, then the fish and the cat are arranged in the order from high to low according to the second modification priority, and a modification list can be obtained, wherein the fish grabbing is positioned above the cat in the modification list.

receiving second drawing image data sent by a user through a user terminal;

For example, if the second modification priority in the modification list is, from top to bottom, the target modification content 1, the target modification content 2, and the target modification content 3, the server first generates a corresponding teaching interactive sentence 1 according to the target modification content 1, sends the teaching interactive sentence 1 to the user terminal, and then receives the second drawing image data sent by the user via the user terminal. If the second drawing image data does not complete the corresponding modification of the teaching interactive statement 1, the server generates a teaching interactive statement 2 according to the second drawing image data and the teaching interactive statement 1 to help the user to complete the modification task; if the second drawing image data completes the modification corresponding to the teaching sentence 1, the server deletes the target modification content 1 from the modification list, and then the target modification content 2 becomes the target modification content with the highest second modification priority in the modification list, so that the server generates a teaching interactive sentence 3 corresponding to the target modification content 2 and sends the teaching interactive sentence 3 to the user terminal. After the user completes the modification task of the target modification content 3 (refer to the modification judgment process of the target modification content 1 to complete the modification judgment of the target modification content 2 and the modification content 3, which will not be described in detail herein), the server will continue to push the interactive progress, and generate and send the interactive sentence (such as scenario interactive sentence, guidance interactive sentence, inquiry interactive sentence, comment interactive sentence, teaching interactive sentence, etc.) according to the next scenario.

It can be seen that the embodiment of the application can provide various interactive drawing teaching stories for users, is beneficial to meeting various learning requirements of the users and improves the use experience of the users; the embodiment of the application can also guide the user to learn and create in a manner of combining story telling, so that teaching is more scene and interesting, and the interest of the user in learning is improved; according to the embodiment of the application, personalized modification opinions can be provided for the user on the basis of respecting the user creation intention and creation thought according to the interaction requirement and the drawing image data drawn by the user, so that the user can be guided to learn correct operation knowledge on the premise of not influencing the user creation enthusiasm; the embodiment of the application can also guide the user to associate the authored content with the interaction requirement while guaranteeing the degree of freedom of the user authoring, thereby being beneficial to culturing the imagination of the user authoring.

The following describes the apparatus according to the embodiments of the present application with reference to the drawings.

Referring to fig. 5, a schematic diagram of a device for processing multi-modal information based on interactive drawing scenario according to an embodiment of the present application is provided, where the device may include: a communication module 510, a calculation module 520, and a determination module 530;

The communication module 510 may be configured to receive a first selection instruction sent by a user via the user terminal, where the first selection instruction may be used to select an interactive story;

a computing module 520, configured to determine an interactive scenario according to the first selection instruction, where the interactive scenario may include at least one scenario branch, and each scenario branch may include at least one teaching node;

the communication module 510 may be further configured to send a guiding interaction sentence to the user terminal when the interaction progress is located at the teaching node, where the guiding interaction sentence may be used to express an interaction requirement corresponding to the current teaching node;

the communication module 510 may be further configured to receive first drawing image data sent by a user via a user terminal;

the judging module 530 may be configured to judge whether the first drawing image data meets an interaction requirement corresponding to the guiding interaction sentence;

the communication module 510 may be further configured to send an inquiry interaction sentence to the user terminal when the first drawing image data does not meet the interaction requirement corresponding to the guiding interaction sentence, where the inquiry interaction sentence may be used to guide the user to describe the creation idea of the first drawing image data and the meaning of the drawing image data;

the communication module 510 may be further configured to receive first voice information sent by the user via the user terminal, where the first voice information may include an authoring idea of the first drawing image data and a meaning of the drawing image data;

The calculation module 520 may be further configured to determine at least one target modification content according to the first voice information;

the computing module 520 may be further configured to generate at least one teaching interaction sentence according to the at least one target modification content, where the teaching interaction sentence may be used to guide the user to modify the first drawing image data, so that the first drawing image data meets an interaction requirement corresponding to the guiding interaction sentence;

the communication module 510 may be further configured to send at least one teaching interaction sentence to the user terminal.

In one possible embodiment, the apparatus may further include:

the computing module 520 may be further configured to determine at least one first keyword according to the first voice information, where the first keyword may be related to an interaction requirement corresponding to the query interaction statement;

the calculation module 520 may be further configured to determine, when the single first keyword corresponds to a plurality of candidate modification contents, the candidate modification content with the highest first modification priority as the target modification content, where the first modification priority is related to a degree of association between the candidate modification content and an interaction requirement corresponding to the query interaction sentence;

the calculation module 520 may be further configured to determine the target modification content from the single candidate modification content when the single first keyword corresponds to only the single candidate modification content.

In another possible embodiment, the apparatus may further include:

a calculation module 520 operable to determine a second modification priority of the at least one target modified content based on the modification difficulty of the at least one target modified content;

the calculation module 520 may be further configured to rank at least one target modification content in order of the second modification priority from high to low, and generate a modification list;

the calculation module 520 may be further configured to generate a corresponding teaching interaction sentence according to the modification list;

the communication module 510 may be further configured to send the teaching interaction sentence to the user terminal.

In another possible embodiment, the apparatus may further include:

the communication module 510 may be further configured to send a first teaching interaction sentence corresponding to the highest-ranked target modification content in the modification list to the user terminal;

the communication module 510 may be further configured to receive second drawing image data sent by the user via the user terminal;

the judging module 530 may be further configured to judge whether the modification corresponding to the teaching interactive sentence is completed by comparing the first drawing image data with the second drawing image data;

the calculation module 520 may be further configured to generate a second teaching interaction sentence according to the second drawing image data and the first teaching interaction sentence when the modification corresponding to the teaching interaction sentence is not completed by the second drawing image data;

The communication module 510 may be further configured to send the second teaching interaction sentence to the user terminal;

the calculation module 520 may be further configured to delete, when the modification corresponding to the teaching interactive sentence is completed by the second drawing image data, the target modification content corresponding to the first teaching interactive sentence from the modification list, so as to obtain an updated modification list;

the communication module 510 may be further configured to send a third teaching interaction sentence corresponding to the highest-ranked target modification content in the updated modification list to the user terminal;

the calculation module 520 may be further configured to generate, when there is no target modification content in the updated modification list, a corresponding interactive sentence according to the interaction progress, where the interactive sentence may include a guide interactive sentence or a scenario interactive sentence, and the scenario interactive sentence may be used to describe a story line of the interactive story.

In another possible embodiment, the apparatus may further include:

the calculation module 520 may be further configured to determine at least one preset feature point according to the interaction requirement corresponding to the guiding interaction sentence when the first drawing image data meets the interaction requirement corresponding to the guiding interaction sentence;

the calculation module 520 may be further configured to extract at least one alternative interaction feature point corresponding to at least one preset feature point in the first drawing image data;

The calculation module 520 may be further configured to calculate, through a feature comparison model, a feature deviation value of the candidate interaction feature point and a preset feature point corresponding to the candidate interaction feature point;

the calculation module 520 may be further configured to use the candidate interaction feature point with the feature deviation value greater than the preset value as a target interaction feature point;

the calculation module 520 may be further configured to generate a comment interaction sentence according to the target interaction feature point, where the comment interaction sentence may be used to confirm the authoring concept and/or the drawing image data meaning of the target interaction feature point with the user.

In another possible embodiment, the apparatus may further include:

the communication module 510 may be further configured to send a corresponding scenario interaction sentence to the user terminal when the interaction progress is not located at the teaching node.

In another possible embodiment, the apparatus may further include:

the communication module 510 may be further configured to receive a second selection instruction sent by the user via the user terminal, where the second selection instruction may be used to select a scenario branch;

the calculation module 520 may be further configured to adjust the interaction progress to the corresponding scenario node according to the second selection instruction.

Referring to fig. 6, a schematic diagram of another apparatus for processing multi-modal information based on interactive drawing scenario according to an embodiment of the present application may include:

A processor 610, a memory 620, and an I/O interface 630. The processor 610, the memory 620 and the I/O interface 630 may be communicatively connected, the memory 620 being configured to store instructions, the processor 610 being configured to execute the instructions stored by the memory 620 to implement the method steps corresponding to fig. 2 as described above.

The processor 610 is configured to execute the instructions stored in the memory 620 to control the I/O interface 630 to receive and transmit signals, thereby completing the steps in the method described above. The memory 620 may be integrated into the processor 610 or may be provided separately from the processor 610.

Memory 620 may also include a storage system 621, cache 622, and RAM623. Wherein the cache 622 is a level one memory that exists between the RAM623 and the CPU, consisting of static memory chips (SRAMs), which are smaller in capacity but much faster than main memory, approaching the CPU's speed; RAM623 is internal memory that exchanges data directly with the CPU, can be read and written at any time (except when refreshed), and is fast, typically as a temporary data storage medium for an operating system or other program in operation. The three combine to implement the memory 620 function.

As an implementation, the functions of the I/O interface 630 may be considered to be implemented by a transceiver circuit or a dedicated chip for transceiving. Processor 610 may be considered to be implemented by a dedicated processing chip, a processing circuit, a processor, or a general-purpose chip.

As another implementation manner, a manner of using a general purpose computer may be considered to implement the apparatus provided in the embodiments of the present application. I.e., program code that implements the functions of the processor 610, i/O interface 630 is stored in the memory 620. A general purpose processor implements the functions of the processor 610, i/O interface 630 by executing code in the memory 620.

The concepts related to the technical solutions provided in the embodiments of the present application, explanation, detailed description and other steps related to the apparatus refer to the foregoing methods or descriptions of the contents of the method steps performed by the apparatus in other embodiments, which are not repeated herein.

As another implementation of this embodiment, a computer-readable storage medium is provided, on which instructions are stored, which when executed perform the method in the method embodiment described above.

As another implementation of this embodiment, a computer program product is provided that contains instructions that, when executed, perform the method of the method embodiment described above.

Those skilled in the art will appreciate that only one memory and processor is shown in fig. 6 for ease of illustration. In an actual terminal or server, there may be multiple processors and memories. The memory may also be referred to as a storage medium or storage device, etc., and embodiments of the present application are not limited in this regard.

It should be appreciated that in embodiments of the present application, the processor may be a central processing unit (Central Processing Unit, CPU for short), other general purpose processor, digital signal processor (Digital Signal Processing, DSP for short), application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), off-the-shelf programmable gate array (Field-Progracmable Gate Array, FPGA for short) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

It should also be understood that the memory referred to in the embodiments of the present application may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable ROM (Electrically EPROM, EEPROM), or a flash Memory. The volatile memory may be a random access memory (Random Access Memory, RAM for short) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (Direct Rambus RAM, DR RAM).

Note that when the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, the memory (storage module) is integrated into the processor.

It should be noted that the memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

The bus may include a power bus, a control bus, a status signal bus, and the like in addition to the data bus. But for clarity of illustration, the various buses are labeled as buses in the figures.

It should also be understood that the first, second, third, fourth, and various numerical numbers referred to herein are merely descriptive convenience and are not intended to limit the scope of the present application.

It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.

In various embodiments of the present application, the sequence number of each process does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative logical blocks (illustrative logical block, abbreviated ILBs) and steps described in connection with the embodiments disclosed herein can be implemented in electronic hardware, or in combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc.

Embodiments of the present application also provide a computer storage medium storing a computer program that is executed by a processor to implement some or all of the steps of any of the methods for interactive drawing scenario-based multimodal information processing described in the method embodiments above.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of a method of multimodal information processing based on an interactive drawing scenario as any one of the method embodiments described above.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for processing multi-modal information based on an interactive drawing scenario, which is applied to a server of a system for processing multi-modal information based on the interactive drawing scenario, wherein the system for processing multi-modal information based on the interactive drawing scenario comprises a user terminal and the server, and is characterized in that the method comprises the following steps:

Receiving a first selection instruction sent by a user through the user terminal, wherein the first selection instruction is used for selecting an interactive story;

determining an interactive scenario according to the first selection instruction, wherein the interactive scenario comprises at least one scenario branch, and each scenario branch comprises at least one teaching node;

when the interaction progress is positioned at the teaching node, sending a guiding interaction statement to the user terminal, wherein the guiding interaction statement is used for expressing the interaction requirement corresponding to the current teaching node;

receiving first drawing image data sent by the user through the user terminal;

if not, sending an inquiry interaction statement to the user terminal, wherein the inquiry interaction statement is used for guiding the user to describe the creation thought of the first drawing image data and the meaning of the drawing image data;

receiving first voice information sent by the user through the user terminal, and determining at least one target modification content according to the first voice information, wherein the first voice information comprises an creation idea of the first drawing image data and a drawing image data meaning;

Generating at least one teaching interactive statement according to the at least one target modification content, and sending the at least one teaching interactive statement to the user terminal, wherein the teaching interactive statement is used for guiding the user to modify the first drawing image data so that the first drawing image data accords with the interaction requirement corresponding to the guiding interactive statement.

2. The method according to claim 1, wherein said receiving first voice information sent by said user via said user terminal and determining at least one target modification content based on said first voice information comprises the steps of:

determining at least one first keyword according to the first voice information, wherein the first keyword is related to an interaction requirement corresponding to the inquiry interaction statement;

if the single first keyword corresponds to a plurality of alternative modification contents, determining the alternative modification content with the highest first modification priority as the target modification content, wherein the first modification priority is related to the association degree of the alternative modification content and the interaction requirement corresponding to the inquiry interaction statement;

And if the single first keyword only corresponds to the single alternative modified content, determining the single alternative modified content as the target modified content.

3. The method according to claim 2, wherein the generating a tutorial interaction sentence according to the at least one target modification content and transmitting the tutorial interaction sentence to the user terminal comprises the steps of:

arranging the at least one target modification content according to the order of the second modification priority from high to low, and generating a modification list;

and generating a corresponding teaching interaction sentence according to the modification list, and sending the teaching interaction sentence to the user terminal.

4. A method according to claim 3, wherein the generating a corresponding teaching interaction sentence according to the modification list and transmitting the teaching interaction sentence to the user terminal comprises the steps of:

sending a first teaching interaction statement corresponding to the highest-ordered target modification content in the modification list to the user terminal;

Receiving second drawing image data sent by the user through the user terminal;

if not, generating a second teaching interaction sentence according to the second drawing image data and the first teaching interaction sentence, and sending the second teaching interaction sentence to the user terminal;

transmitting a third teaching interaction statement corresponding to the highest-ordered target modification content in the updated modification list to the user terminal;

and when the updated modification list does not have target modification content, generating corresponding interactive sentences according to the interactive progress, wherein the interactive sentences comprise guiding interactive sentences or plot interactive sentences, and the plot interactive sentences are used for describing the story line of the interactive story.

5. The method according to claim 1 or 4, further comprising, after said determining whether the first drawing image data meets the interaction requirement corresponding to the guiding interaction sentence, the steps of:

extracting at least one alternative interaction characteristic point corresponding to the at least one preset characteristic point in the first drawing image data;

calculating the characteristic deviation value of the alternative interaction characteristic point and a preset characteristic point corresponding to the alternative interaction characteristic point through a characteristic comparison model;

taking the alternative interaction characteristic points with the characteristic deviation value larger than a preset value as target interaction characteristic points;

and generating comment interactive sentences according to the target interactive feature points, wherein the comment interactive sentences are used for confirming the creation ideas and/or drawing image data meanings of the target interactive feature points with the user.

6. The method of claim 5, further comprising, after said determining an interactive scenario according to said first selection instruction, the steps of:

and when the interaction progress is not positioned at the teaching node, sending a corresponding scenario interaction statement to the user terminal.

7. The method of claim 6, further comprising, after said determining an interactive scenario according to said first selection instruction, the steps of:

Receiving a second selection instruction sent by the user through the user terminal, wherein the second selection instruction is used for selecting the scenario branches;

and according to the second selection instruction, the interaction progress is adjusted to the corresponding scenario node.

8. An apparatus for multi-modal information processing based on interactive drawing scenarios, the apparatus comprising: the device comprises a communication module, a calculation module and a judgment module;

the communication module is used for receiving a first selection instruction sent by a user through the user terminal, wherein the first selection instruction is used for selecting an interactive story;

the computing module is used for determining an interactive scenario according to the first selection instruction, wherein the interactive scenario comprises at least one scenario branch, and each scenario branch comprises at least one teaching node;

the communication module is further used for sending a guiding interaction statement to the user terminal when the interaction progress is located at the teaching node, wherein the guiding interaction statement is used for expressing the interaction requirement corresponding to the current teaching node;

the communication module is further used for receiving first drawing image data sent by the user through the user terminal;

the judging module is used for judging whether the first drawing image data accords with the interaction requirement corresponding to the guiding interaction statement;

The communication module is further configured to send an inquiry interaction statement to the user terminal when the first drawing image data does not meet the interaction requirement corresponding to the guiding interaction statement, where the inquiry interaction statement is used to guide the user to describe an authoring idea of the first drawing image data and a meaning of the drawing image data;

the communication module is further configured to receive first voice information sent by the user through the user terminal, where the first voice information includes an creation idea of the first drawing image data and a meaning of the drawing image data;

the computing module is further configured to generate at least one teaching interaction sentence according to the at least one target modification content, where the teaching interaction sentence is used to guide the user to modify the first drawing image data, so that the first drawing image data meets an interaction requirement corresponding to the guiding interaction sentence;

the communication module is further configured to send the at least one teaching interaction statement to the user terminal.

9. An apparatus for multi-modal information processing based on interactive drawing scenario, comprising:

A processor, a memory and an I/O interface, the processor, the memory and the I/O interface being communicatively connected, wherein the memory is to store a set of program code, the processor to invoke the program code stored in the memory to perform the method of any of claims 1-7.

10. A computer-readable storage medium, comprising:

the computer readable storage medium having instructions stored therein which, when run on a computer, implement the method of any of claims 1-7.