CN112100391B

CN112100391B - User intention recognition method, device, service end, client and terminal equipment

Info

Publication number: CN112100391B
Application number: CN201910472461.1A
Authority: CN
Inventors: 阙育飞; 杜朋; 胡晓祥
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2023-06-13
Anticipated expiration: 2039-05-31
Also published as: CN112100391A

Abstract

The embodiment of the application provides a user intention recognition method, a server, a client, terminal equipment, electronic equipment and a storage medium, comprising the following steps: the client sends the corpus data of the user and at least part of target content data displayed in the screen to the server; the server side determines user intention information according to the content data and the user corpus data; the server side sends user intention information to the client side; and the client performs business operation aiming at the execution object information according to the user intention information. In the method, the server side can determine the execution object information corresponding to the user corpus data by utilizing the target content data so as to correct deviation generated by understanding the user corpus data in the ASR process, so that finally determined user intention is associated with the target content data currently displayed on the screen, and the accuracy of determining the user intention is improved.

Description

User intention recognition method, device, service end, client and terminal equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a user intention recognition method, a server, a client, a terminal device, an electronic device, and a storage medium.

Background

In the internet market, devices for realizing man-machine interaction through voice interaction are very popular, and as some mobile terminals at present can further realize voice interaction with users on the basis of display contents.

At present, a device for realizing man-machine interaction through voice interaction can generally receive audio information sent by a user, and adopts an automatic voice recognition (Automatic Speech Recognition, ASR) technology and a natural language understanding (Natural Language Understanding, NLU) technology to understand user intention corresponding to corpus information spoken by the user, so that the device can execute corresponding instructions according to the user intention. Specifically, ASR technology is a technology that converts human speech into text, the goal of NLU technology is to convert text into semantic representations, the exact meaning of words in text is not important, what is important is semantic information conveyed by text, and NLU is also known as semantic decoding. And the NLU system is used for understanding and acquiring the user intention corresponding to the text information according to the text information provided by the ASR system. Because of the diversity and complexity of speech signals, ASR systems may achieve more satisfactory performance under certain constraints, or may be used only in certain specific situations, such as according to homonyms "shoe" and "write", where when a user speaks one of them, ASR systems typically give two results, and NLU systems also understand the intent based on the results provided by the ASR systems.

However, the NLU system determines the user intention only according to the text information provided by the ASR system, so that deviation occurs in the process of understanding the text information, and the probability of the deviation is amplified when the ASR has a recognition error, so that the accuracy of the user intention is not high, for example, the user speaks "i want shoes" on the online shopping interface of the device, and the error intention of "i want to write" can be generated at a certain probability after the processing of the ASR system and the NLU system.

Disclosure of Invention

The embodiment of the application provides a user intention recognition method, so that a server can correct deviation generated by understanding of user corpus data in an ASR process by utilizing content data, and finally determined user intention is associated with the content data currently displayed on a screen, thereby greatly improving the accuracy of determining the user intention. .

Correspondingly, the embodiment of the application also provides a server, a client, a terminal device, an electronic device and a storage medium, which are used for guaranteeing the implementation and application of the method.

In order to solve the above problems, an embodiment of the present application discloses a user intention recognition method, which includes:

The client sends the corpus data of the user and at least part of content data displayed in the screen to the server;

the server side obtains the user corpus data and the content data sent by the client side;

the server determines user intention information according to the content data and the user corpus data, wherein the user intention information comprises execution object information determined according to the content data;

the server side sends the user intention information to the client side;

the client receives the user intention information;

and the client executes business operation aiming at the execution object information according to the user intention information.

The embodiment of the application discloses a user intention recognition method which is applied to a server, wherein the method comprises the following steps:

acquiring user corpus data and content data sent by the client, wherein the content data is at least part of data displayed in a screen of the client;

determining user intention information according to the content data and the user corpus data, wherein the user intention information comprises execution object information determined according to the content data;

And sending the user intention information to the client.

The embodiment of the application discloses a user intention recognition method which is applied to a client comprising a screen, wherein the method comprises the following steps:

transmitting the corpus data of the user and at least part of content data displayed in the screen to a server;

receiving user intention information sent by the server; the user intention information includes execution object information corresponding to the content data;

and executing business operation aiming at the execution object information according to the user intention information.

The embodiment of the application discloses a user intention recognition method which is applied to terminal equipment comprising a screen, wherein the method comprises the following steps:

acquiring user corpus data and at least part of content data displayed in the screen;

The embodiment of the application also discloses a user intention recognition device, which comprises:

The first sending module is used for sending the corpus data of the user and at least part of content data displayed in the screen to the server through the client;

the first receiving module is used for acquiring the user corpus data and the content data sent by the client through the server;

the first determining module is used for determining user intention information according to the content data and the user corpus data through the server, wherein the user intention information comprises execution object information determined according to the content data;

the second sending module is used for sending the user intention information to the client through the server;

the second receiving module is used for receiving the user intention information through the client;

and the first execution module is used for executing the business operation aiming at the execution object information through the client according to the user intention information.

The embodiment of the application also discloses a server, which comprises:

the third receiving module is used for acquiring user corpus data and content data sent by the client, wherein the content data is at least part of data displayed in a screen of the client;

The second determining module is used for determining user intention information according to the content data and the user corpus data, wherein the user intention information comprises execution object information determined according to the content data;

and the third sending module is used for sending the user intention information to the client.

The embodiment of the application also discloses a client, which comprises:

a fourth sending module, configured to send the corpus data of the user and at least part of the content data displayed in the screen to a server;

the fourth receiving module is used for receiving the user intention information sent by the server; the user intention information includes execution object information corresponding to the content data;

and the second execution module is used for executing business operation aiming at the execution object information according to the user intention information.

The embodiment of the application also discloses a terminal device, which comprises:

a fifth receiving module, configured to obtain corpus data of a user and at least part of content data displayed in the screen;

a third determining module, configured to determine user intention information according to the content data and the user corpus data, where the user intention information includes execution object information determined according to the content data;

And the third execution module is used for executing business operation aiming at the execution object information according to the user intention information.

The embodiment of the application also discloses electronic equipment, which comprises: a processor; and a memory having executable code stored thereon that, when executed, causes the processor to perform the user intent recognition method as described in one or more of the embodiments of the present application.

One or more machine-readable media having stored thereon executable code that, when executed, causes a processor to perform a user intent recognition method as described in one or more of the embodiments of the present application are also disclosed.

Compared with the prior art, the embodiment of the application has the following advantages:

in an embodiment of the present application, the method includes: the client sends the user corpus data and at least part of content data displayed in a client screen to the server; the server side determines user intention information corresponding to the user corpus data according to the content data and the user corpus data, wherein the user intention information comprises execution object information determined according to the content data and the user intention information comprises corresponding information; the server side sends user intention information to the client side; and the client performs business operation aiming at the execution object information according to the user intention information. In the embodiment of the application, the server matches the user corpus data with the content data, so that the execution object information corresponding to the user corpus data is determined, and the execution object information is related to the content data, so that deviation generated by understanding of the user corpus data aiming at the user corpus text information in an ASR process is corrected, finally determined user intention is related to the content data currently displayed on a screen, and the accuracy of determining the user intention is greatly improved.

Drawings

FIG. 1 is a schematic diagram of a user intent recognition system of the present application;

FIG. 2 is an interactive schematic diagram of a user intent recognition system of the present application;

FIG. 3 is a flowchart of the steps of a method for identifying user intention at the system side of the present application;

FIG. 4 is a flowchart illustrating steps of a method for identifying user intention at a server side according to the present application;

FIG. 5 is a flowchart of the steps of a method for identifying user intention at a client side of the present application;

FIG. 6 is a flowchart of steps of a method for recognizing user intention at a terminal device side of the present application;

FIG. 7 is a flowchart of the interactive steps of a user intent recognition method of the present application;

FIG. 8 is a block diagram of a user intent recognition system of the present application;

fig. 9 is a flowchart of specific steps of a method for identifying user intention at a terminal device side of the present application;

FIG. 10 is a block diagram of a user intent recognition device of the present application;

FIG. 11 is a block diagram of a server embodiment of the present application;

FIG. 12 is a block diagram of a client embodiment of the present application;

fig. 13 is a block diagram of an embodiment of a terminal device of the present application;

fig. 14 is a schematic structural view of the device provided herein.

Detailed Description

In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments that are illustrated in the appended drawings.

Referring to fig. 1, a schematic diagram of a user intent recognition system according to an embodiment of the present application is shown. The user intention recognition method provided by the embodiment of the application can be applied to the user intention recognition system.

In an embodiment of the present application, a user intention recognition system may include: a server 10 and a client 20. The server 10 may be a server, or may be an internet of things (Internet of Things, ioT) device with relatively high computing power, a router, a mobile phone, or the like. The client 20 may have a screen for displaying a screen interface including one or more interface elements, and the client 20 may include a speech receiving device, such as a microphone or the like, for receiving user speech information from the user 20 and may utilize ASR technology to convert the user speech information into user corpus data.

In addition, the obtaining manner of the content data displayed in the screen interface of the client 20 may be multiple, and in one implementation manner, the content data in the screen interface may be obtained by calling an interface document of the screen interface and analyzing the interface document.

The client 20 may add the user corpus data and the content data to the intent recognition request and send the intent recognition request to the server 10 for the server 10 to recognize the real intent of the user.

Further, a search engine may be set up in the server 10, and according to the correspondence between the interface elements in the screen interface and the intent template, an index directory including the correspondence may be set up in the search engine, so as to match the user corpus data in the intent recognition request with the intent template, thereby determining relevant content in the user intent information according to the matching result.

Specifically, the user intention information may reflect the real intention of the user and may be further converted by the client to generate a corresponding control instruction, and according to the most basic requirement on one control instruction, the user intention information needs to have at least two parameters including an execution action and an execution object information, where in this embodiment of the present application, the user intention information may include three parameters including an interface element, action information and execution object information, the execution action may relate to the interface element and action information understood from user corpus data, for example, when the user speaks that "i'm buy" is included in the voice, and the interface element includes a commodity purchase button, it may be determined that the action executed by the control instruction is a purchase operation in the online shopping application according to the two information. In addition, the object to be performed is related to the user corpus data and content data, for example, suppose that the user speaks the voice "i want to listen to the third song" to the music interface. According to the determination of the executed actions, the interface element can be determined to be a music element, and the action information is a song listening, but at this time, the client does not know which song the user wants to listen to, so that error correction can be further performed on the user corpus data through the content data, and final execution object information, namely, the execution object is determined.

In this embodiment of the present application, the server 10 may determine corresponding interface elements, action information, and execution object information according to the matching error correction between the user corpus data and the intent template, and content data, that is, determine two important parameters of action and parameters required by a control instruction, and after obtaining the two parameters, the client 20 may execute corresponding service operations.

Specifically, in the embodiment of the present application, referring to fig. 2, an interaction schematic diagram of a user intention recognition system in the embodiment of the present application is shown. The server may execute step S1, where the server obtains the user corpus data and the content data sent by the client.

In this embodiment of the present application, the obtaining, by the server, the user corpus data sent by the client may specifically be two implementation processes:

implementation 1: when the client 20 performs voice interaction with the user 30, user voice information sent by the user 30 can be obtained, the client 20 can further utilize an ASR module integrated with the client 20 to convert the user voice information into one or more pieces of user corpus data, and the converted user corpus data is sent to the server 10.

Implementation 2: when the client 20 performs voice interaction with the user 30, it may acquire user voice information sent by the user 30, and send the user voice information to the server 10, where the server 10 may further utilize an ASR module integrated with itself to convert the user voice information into one or more pieces of user corpus data.

In addition, the client 20 may also extract the content data currently displayed in the screen, and send the user corpus data and the content data to the server 10.

The server may execute step S2, and determine user intention information according to the content data and the user corpus data, where the user intention information includes execution object information determined according to the content data.

The server may perform step S3, and the server sends the user intention information to the client.

In the embodiment of the present application, when understanding the semantic intent of the user corpus data, due to the limitation of the ASR technology, deviation will generally occur in understanding the user corpus data. The server may correct the deviation using the content data.

For example, a user speaks "me wants shoes" on an online shopping interface of a client, and supposing that content data in the current online shopping interface comprises text labels such as "shoes", "caps", and the like, two user corpus data such as "me shoes" and "me write" are generated through processing of an ASR system at the client side, the two user corpus data and the content data are sent to a server, the server finds that the content data comprises "shoes" in the user corpus data, and then determines that the final user intention is "me shoes", but not that the ASR system understands "me write" generated by deviation.

It should be noted that, in order to further improve accuracy of user intention, error correction may be performed on the corpus data of the user through the intention template, referring to fig. 1, in the above example, the "shoe" obtained after correction of the content data is execution object information, that is, an object representing execution. Further, an intention template corresponding to the online shopping interface element can be established in the server, the intention template comprises at least one template corpus data, the template corpus data comprises a fixed corpus reflecting action information and a dynamic corpus reflecting execution object information, in the scene of the example, the fixed corpus data is assumed to be ' I want to buy ' a $product ', wherein the fixed corpus ' I want to buy ' can be matched with ' I want ' in the user corpus data, the ' I want ' is determined to be the action information, the ' I want ' is combined with the online shopping interface element, the preliminary user intention of ' I want to buy ' the $product ' can be obtained, at the moment, the user is informed about what product the user wants to buy, but the user does not know about what product the user wants to buy, therefore, the entity product of ' shoe ' obtained after error correction by the content data is further used for replacing the dynamic label of the $product, and the accurate user intention of ' I want to buy ' shoe ' can be obtained.

In addition, the service end can calculate the similarity between the shoes and the writing respectively and finally find that the similarity between the shoes and the manufacturers 'Aldicdas for producing the sports shoes is highest, and then determine that the final user intention is' I 'intended Aldicdas'.

Specifically, it is assumed that both "adidas" and "kodak" are commercial brands, and that "adidas" is a sports shoe manufacturer brand, and that "kodak" film manufacturer brand. The similarity between the "shoe" and the "adidas" and the "kodak" is calculated, specifically, the probability of having the "adidas" brand or the probability of having the "kodak" brand in the commercial brands included in the "shoe" product may be calculated, and the commercial brands included in the "shoe" product may include the sports shoe manufacturer brand, the leather shoe manufacturer brand, the women's shoe manufacturer brand, and the like, but the commercial brands included in the "shoe" product may not include the film manufacturer brand. Therefore, the service end can search commercial brands contained in the product 'shoes' in the internet or a database by taking the 'shoes' as keywords, and determine the similarity between the 'shoes' and the 'adidas' brands and the 'kodak' brands according to the times of occurrence of the 'adidas' brands and the 'kodak' brands in the commercial brands contained in the 'shoes'.

The client may execute step S4, where the client performs a business operation for the execution object information according to the user intention information.

In the step, the client obtains user intention information through analysis response, the user intention information can comprise two parameters of executing action and executing object information, a control instruction can be formed by the two parameters, and the client can execute the control instruction to realize man-machine voice interaction.

Accordingly, in an embodiment of the present application, it includes: the client sends the user corpus data and at least part of content data displayed in a client screen to the server; the server side determines user intention information according to the content data and the user corpus data, wherein the user intention information comprises execution object information determined according to the content data; the server side sends user intention information to the client side; and the client performs business operation aiming at the execution object information according to the user intention information. In the embodiment of the application, the server matches the user corpus data with the content data, so that the execution object information corresponding to the user corpus data is determined, and the execution object information is related to the content data, so that deviation generated by understanding of the user corpus data aiming at the user corpus text information in an ASR process is corrected, finally determined user intention is related to the content data currently displayed on a screen, and the accuracy of determining the user intention is greatly improved.

Based on the above-described voice interaction apparatus, the user intention recognition means may perform the user intention recognition step of:

referring to fig. 3, a flowchart of steps of an embodiment of a system-side user intention recognition method of the present application is shown.

And step 101, the client transmits the corpus data of the user and at least part of content data displayed in the screen to the server.

In the embodiment of the application, when the client triggers the voice interaction, the user voice information collected by each interaction is converted into user corpus data through an ASR system,

it should be noted that, due to the diversity and complexity of speech signals, ASR systems may achieve more satisfactory performance under certain constraints, or may be applied only to certain specific situations, such as when a user speaks "i want to shoe", based on the homonyms "shoe" and "write", the ASR system will typically give two results, "i want to shoe" and "i want to write". Thus, there may be one or more types of user corpus data, depending on the circumstances.

Step 102, the server acquires the user corpus data and the content data sent by the client.

Step 103, the server determines user intention information according to the content data and the user corpus data, wherein the user intention information comprises execution object information determined according to the content data.

It should be noted that, determining the user intention information corresponding to the user corpus data may specifically determine whether information that is regularly matched or irregularly matched with the user corpus data exists in the content data by determining whether the content data includes the user corpus data, where the information that is regularly matched or irregularly matched may be used as execution object information.

For example, under the rule of regular matching, assume that a user speaks "i want shoes" on an online shopping interface of a client, and assume that content data in the current online shopping interface includes text labels such as "shoes", "caps", and the like, two user corpus data such as "i want shoes" and "i want to write" are generated through processing of an ASR system at the client side, the two user corpus data and the content data are sent to a server, and the server finds that the content data includes text labels such as "shoes" which are regularly matched with "shoes" in the user corpus data. Under the regular matching rule, the accuracy of intention determination is higher, but the application scope is smaller, because the user must speak the content which is completely matched with the content data in the current online shopping interface.

Under the irregular matching rule, the user is assumed to say "I want leather shoes" on the online shopping interface of the client, content data in the current online shopping interface is assumed to comprise similar text labels such as "shoes", "caps", user corpus data of the "I want leather shoes" can be generated through processing of an ASR system at the client side, the user corpus data and the content data are sent to the server, and the server finds that the content data comprises the text labels of the "shoes" which are irregularly matched with the "leather shoes" in the user corpus data. Under the irregular matching rule, the application range of intention determination is larger, and the user can speak the content similar or similar to the content data in the current online shopping interface.

Step 104, the server sends the user intention information to the client.

Step 105, the client receives the user intention information.

And step 106, the client performs business operation aiming at the execution object information according to the user intention information.

In this step, the client receives the user intention information, where the user intention information may include two parameters, i.e., an executed action and an executed object, and with these two parameters, a control instruction may be formed, and the client may execute the control instruction to implement man-machine voice interaction.

In summary, the method for identifying user intention provided by the present application includes: the client sends the user corpus data and at least part of content data displayed in a client screen to the server; the server side determines user intention information according to the content data and the user corpus data, wherein the user intention information comprises execution object information determined according to the content data; the server side sends user intention information to the client side; and the client performs business operation aiming at the execution object information according to the user intention information. In the method, the server side can determine the execution object information corresponding to the user corpus data by utilizing the content data so as to correct deviation generated by understanding the user corpus data in the ASR process, so that finally determined user intention is associated with the content data currently displayed on the screen, and the accuracy of determining the user intention is improved.

Referring to fig. 4, a flowchart illustrating steps of an embodiment of a method for identifying user intention at a server side of the present application is shown.

Step 201, obtaining user corpus data and content data sent by the client, wherein the content data is at least part of data displayed in a screen of the client.

And step 202, determining user intention information according to the content data and the user corpus data, wherein the user intention information comprises execution object information determined according to the content data.

This step may refer to step 103, and will not be described herein.

And step 203, sending the user intention information to the client.

In summary, the method for identifying user intention provided by the present application includes: acquiring user corpus data and content data sent by the client; determining user intention information according to the content data and the user corpus data, wherein the user intention information comprises execution object information determined according to the content data; and sending the user intention information to the client. In the embodiment of the application, the server matches the user corpus data with the content data, so that the execution object information corresponding to the user corpus data is determined, and the execution object information is related to the content data, so that deviation generated by understanding of the user corpus data aiming at the user corpus text information in an ASR process is corrected, finally determined user intention is related to the content data currently displayed on a screen, and the accuracy of determining the user intention is greatly improved.

Referring to fig. 5, a flowchart of steps of an embodiment of a client-side user intention recognition method of the present application is shown.

And step 301, transmitting the corpus data of the user and at least part of the content data displayed in the screen to a server.

This step may refer to step 101, and will not be described herein.

Step 302, receiving user intention information sent by the server; the user intention information includes execution object information corresponding to the content data.

And step 303, executing business operation aiming at the execution object information according to the user intention information.

This step may refer to step 106, and will not be described herein.

In summary, the method for identifying user intention provided by the application comprises the following steps: transmitting the corpus data of the user and at least part of content data displayed in the screen to a server; receiving user intention information sent by the server; the user intention information includes execution object information corresponding to the content data; and executing business operation aiming at the execution object information according to the user intention information. In the embodiment of the application, the server matches the user corpus data with the content data, so that the execution object information corresponding to the user corpus data is determined, and the execution object information is related to the content data, so that deviation generated by understanding of the user corpus data aiming at the user corpus text information in an ASR process is corrected, finally determined user intention is related to the content data currently displayed on a screen, and the accuracy of determining the user intention is greatly improved.

Referring to fig. 6, a step flow diagram of an embodiment of a method for identifying user intention at a terminal device side of the present application is shown.

And step 401, obtaining user corpus data and at least part of content data displayed in the screen.

In the embodiment of the application, the user intention recognition method can also be completed by the terminal equipment alone, the terminal equipment can be a client of a voice interaction equipment type, and when voice interaction with a user is carried out, the terminal equipment can acquire user voice information sent by the user and further convert the user voice information into one or more pieces of user corpus data by utilizing an ASR module integrated by the terminal equipment.

And step 402, determining user intention information according to the content data and the user corpus data, wherein the user intention information comprises execution object information determined according to the content data.

In the embodiment of the present application, when understanding the semantic intent of the user corpus data, due to the limitation of the ASR technology, deviation will generally occur in understanding the user corpus data. The terminal device can correct the deviation by using the content data displayed on its own screen.

It should be noted that, determining the user intention information corresponding to the user corpus data, specifically, determining whether information regularly matched or irregularly matched with the user corpus data exists in the content data by judging whether the content data includes the user corpus data through the terminal device, where the information regularly matched or irregularly matched can be used as the execution object information.

And step 403, executing the business operation aiming at the execution object information according to the user intention information.

In the step, the terminal equipment analyzes two parameters, namely the execution action and the execution object information, included in the user intention information according to the determined user intention information, and a control instruction can be formed by the two parameters, so that the terminal equipment can execute the control instruction to realize man-machine voice interaction.

In summary, the method for identifying user intention provided by the application comprises the following steps: acquiring user corpus data and at least part of content data displayed in the screen; determining user intention information according to the content data and the user corpus data, wherein the user intention information comprises execution object information determined according to the content data; and executing business operation aiming at the execution object information according to the user intention information. In the embodiment of the application, the terminal equipment matches the user corpus data with the content data, so that the execution object information corresponding to the user corpus data is determined, and the execution object information is related to the content data, so that deviation generated by understanding of the user corpus data aiming at the user corpus text information in an ASR process is corrected, finally determined user intention is associated with the content data currently displayed on a screen of the terminal equipment, and the accuracy of determining the user intention is greatly improved.

Referring to fig. 7, a flowchart of the interactive steps of a user intent recognition method of the present application is shown.

In step 501, in the case of receiving user voice information, the client converts the user voice information into at least one user corpus data.

Alternatively, in another implementation manner, the client may also obtain the user voice information sent by the user 30 when performing voice interaction with the user, and send the user voice information to the server, where the server may further utilize the self-integrated ASR module to convert the user voice information into one or more pieces of user corpus data.

Step 502, the client obtains at least part of content data in the screen interface.

In this step, the obtaining manner of the content data displayed in the screen interface of the client may be multiple, and in one implementation manner, the content data in the screen interface may be obtained by calling an interface document of the screen interface and analyzing the interface document.

In step 503, the client sends the corpus data of the user and at least part of the content data displayed in the screen to the server.

Step 504, the server acquires the user corpus data and the content data sent by the client.

Step 505, the server invokes a preset intention template according to the content data and the user corpus data to determine the user intention information.

In the embodiment of the present application, a search engine may be set up in the server, and according to a correspondence between interface elements in a screen interface and an intent template, an index directory including the correspondence may be set up in the search engine, so as to match user corpus data in an intent recognition request with the intent template, thereby determining relevant content in user intent information according to a matching result.

Specifically, the user intention information may reflect the actual intention of the user, and may be further converted by the client to generate a corresponding control instruction, where the user intention information at least needs to have two parameters of an execution action and execution object information according to the most basic requirement on one control instruction, where in this embodiment of the present application, the user intention information may include three parameters of an interface element, action information and execution object information, and the execution action is related to the interface element and action information understood from user corpus data.

For example, when the voice spoken by the user includes "i buy" and the interface element includes a commodity purchase button, the action performed by the control instruction may be determined to be a purchase operation in the online shopping application based on the two pieces of information. In addition, the object to be performed is related to the user corpus data and content data, for example, suppose that the user speaks the voice "i want to listen to the third song" to the music interface. According to the determination of the executed actions, the interface element can be determined to be a music element, and the action information is a song listening, but at this time, the client does not know which song the user wants to listen to, so that error correction can be further performed on the user corpus data through the content data, and final execution object information, namely, the execution object is determined.

Therefore, the server can correct errors according to the matching between the corpus data of the user and the intent template and the content data, and can determine the corresponding interface element, action information and execution object information, namely, determine two important parameters of action and parameters required by a control instruction, and the client can execute corresponding business operation after obtaining the two parameters.

Optionally, in a specific implementation manner of the embodiment of the present invention, step 505 may specifically include:

in sub-step 5051, the server determines, according to the content data, a target interface element in the current interface of the client.

In this step, the server may determine the corresponding target interface element by identifying and analyzing the content data in the intent identification request, for example, it is assumed that the content data includes: "Song 1: XX "," song 2: XX "," play "," pause ", etc., the target interface element may be determined to be an audio element.

In sub-step 5052, the server matches the user corpus data with the intention template corresponding to the target interface element, and determines action information and execution object information corresponding to the user corpus data.

After the target interface element is determined, an intention template corresponding to the target interface element can be further determined, and the intention template comprises a plurality of template corpus data, so that the user corpus data is matched with the intention template, the template corpus data matched with the user corpus data can be determined to be a matching result, and action information and execution object information corresponding to the user corpus data can be determined according to analysis of the matching result.

It should be noted that, at this time, the obtained execution object information includes a fixed corpus reflecting the action information and a dynamic corpus reflecting the execution object information, which can only reflect the preliminary action information of the user, and if more accurate action information is further obtained, it is necessary to perform substantial content filling on the dynamic corpus in the preliminary action information by using content data.

For example, the user speaks "i want to listen to the third song", and through matching with the corpus "i want to listen to $song" in the audio element template, it can be initially determined that the user intends to listen to the song for the user, and further, if the user wants to know what the third song the user wants to listen to is, it can be determined by using the content data of the music interface of the client.

Optionally, in a specific implementation manner of the embodiment of the present invention, the intent template includes at least one template corpus data, where the template corpus data includes a correspondence between a fixed corpus and a dynamic corpus, the fixed corpus reflects motion information, the dynamic corpus reflects execution object information, and the substep 5052 specifically may include:

in sub-step 50521, the server matches the corresponding relationship between the user corpus data and the fixed corpus and the dynamic corpus in the at least one template corpus data, and determines the action information and the execution object information corresponding to the user corpus data.

In the embodiment of the application, the screen of the client usually comprises a plurality of interface elements, and the user generally inputs the user voice information to the interface elements in the interface when performing voice interaction.

For example, referring to FIG. 8, a block diagram of a user intent recognition system of the present application is shown. Wherein, it is assumed that the screen of the client includes: the client can realize four corresponding functions through the four elements, and the server can establish a corresponding intention template according to the four elements (only the intention template of the online shopping element is drawn in fig. 8). And adding corresponding template corpus data in the template according to business logic respectively corresponding to the four elements and action and object parameters required by the composition control instruction, wherein the template corpus data can comprise a corresponding relation between fixed corpus and dynamic corpus, the fixed corpus reflects action information, and the dynamic corpus reflects execution object information.

For example, in the online shopping element, template corpus data of 'I buy $brand' and 'I buy $product' facing a user can be established, wherein 'I buy' can be used as a fixed corpus, action parameters can be fixedly reflected, the $brand and $product can be used as dynamic corpus, and object parameters can be dynamically reflected. In addition, a merchant-oriented 'order member to $manufacturer' can be established, wherein 'order member to' can be used as a fixed corpus to fixedly reflect action parameters, and $manufacturer can be used as a dynamic corpus to dynamically reflect object parameters.

For another example, in the video element, corpora corresponding to the video service, such as "i want to watch $movie" and "i want to watch $television show" can be established.

Specifically, in the above example, the format of $XX may express a dynamic tag, i.e., any word belonging to the same category as XX may belong to $XX, e.g., "Adidas," "Naak," "Cola," may belong to the dynamic corpus of $brands. The sports shoes and umbrellas can be dynamic corpus of $products.

Optionally, in a specific implementation manner of the embodiment of the present invention, the substep 50521 may specifically include:

and a sub-step A1, wherein the server side calculates the similarity between the user corpus data and the template corpus data in the intention template, and determines at least one template corpus data with the similarity with the user corpus data being greater than or equal to a preset threshold value as target template corpus data.

Optionally, in a specific implementation manner of the embodiment of the present invention, the substep A1 may specifically include:

and a sub-step A11, wherein the server performs word segmentation on the user corpus data to obtain a word segmentation set.

And a sub-step A12, wherein the server determines a target interface element corresponding to the first segmentation according to the first segmentation corresponding to the word order of the fixed corpus in the segmentation set.

And a sub-step A13, wherein the server determines a target intention template corresponding to the target interface element.

And a sub-step A14, wherein the server calculates the similarity between the second word segment corresponding to the word sequence of the dynamic corpus in the word segment set and the template corpus data in the target intention template, and determines at least one template corpus data with the similarity between the second word segment and the template corpus data greater than or equal to the preset threshold value as target template corpus data.

In the embodiment of the present application, by way of an example, a specific description of the similarity calculation process between the user corpus data and the intent template is performed in combination with the contents of the above sub-steps a11 to a 14.

For example, there are "blue moon" brands of hand sanitizers and "blue moon" manufacturers, and assuming that a user speaks "i want to buy blue moon" of the user corpus, word segmentation is performed on the user corpus, and a word segmentation set including "i want to buy", "blue moon" can be obtained.

The interface element "online purchase" and the online purchase intention template corresponding to the "online purchase" interface element may be determined through the first word "i buy" (corresponding to the fixed corpus in the corpus template), and then similarity calculation is performed between the first word "blue moon" and the corpus in the online purchase intention template, and according to the similarity, a ranking list including the corpus may be generated, for example, the first position: "I buy $Brand" (similarity 90), second order: "I buy $product" (similarity 85), third bit: "order member to $manufacturer" (similarity 30), assuming that the preset threshold of similarity is set to 80, template corpus data with similarity to the second term greater than or equal to 80 can be determined as target template corpus data, i.e. the target template corpus data includes the first digit in the list: "I want to buy $Brand" (similarity 90) and second order: "I buy $product" (similarity 85), in the process, we screen out the corpus "order member to $vendor" that is far different from what the user says.

That is, through the above-described sub-steps a11 to a14, it is determined that the user says "i want to buy blue moon," which corresponds to two preliminary intentions of "user wants to buy product" and "user wants to buy brand," but it is not known what the specific product and the specific brand the user wants to buy at this time.

Optionally, in a specific implementation manner of the embodiment of the present invention, the dynamic corpus includes a dynamic label and a corpus category corresponding to the dynamic label; the substep a14 may specifically include:

and a sub-step A141, wherein the server determines the dynamic corpus in the template corpus data according to the dynamic label.

In this step, with reference to the examples provided in sub-step a11 to sub-step a14 above, for one template corpus data: "I buy $Brand", wherein "I buy" is a fixed corpus, "$Brand" is a dynamic corpus, in which "$" may be a dynamic tag, "brand" may be a corpus category, "$" and "brand" combination may be used to represent a brand classification category, in which there may be multiple commercial brands.

Therefore, in the process that the server side carries out similarity calculation on the second word segmentation corresponding to the word sequence of the dynamic corpus in the word segmentation set and the template corpus data in the target intention template, the dynamic label "$" of the dynamic corpus can be utilized first, and all the dynamic corpuses are searched in the target intention template.

And sub-step A142, the server calculates the similarity between the second word segment corresponding to the word sequence of the dynamic corpus in the word segment set and the corpus category corresponding to the dynamic tag, and determines template corpus data to which at least one corpus category with the similarity between the second word segment and the corpus category is greater than or equal to the preset threshold value belongs as target template corpus data.

In this step, referring to the examples provided in the above sub-steps a11 to a14, after determining all the dynamic corpora in the target intention template, a second word may be used to perform similarity calculation with the corpus class of each dynamic corpus and determine, as the target template corpus data, the template corpus data to which at least one corpus class having a similarity with the second word greater than or equal to the preset threshold belongs.

Specifically, the "brand", "product" and "manufacturer" may be corpus categories, and in the previous definition of the second word "blue moon" at the server side, firstly define it as a brand, secondly define it as a product, and finally define it as a manufacturer, then calculate the similarity between the second word and the corpus categories, specifically, may determine the similarity by determining the time sequence in which the second word is defined as the corpus category in advance, and determine the similarity according to the time sequence, where, according to the above, the similarity between the second word "blue moon" and the "brand" corpus category is the largest, the similarity between the second word "blue moon" and the "product" corpus category is the second, and the similarity between the second word "blue moon" and the "manufacturer" corpus category is the smallest.

Optionally, in another specific implementation manner of the embodiment of the present invention, the substep A1 may specifically include:

and a sub-step A15, wherein the server establishes an index catalog according to the corresponding relation between the interface element and the intention template.

In sub-step a16, the server stores the index directory in a memory of a search engine.

And a sub-step A17, wherein the server determines a target interface element currently displayed on a screen of the client through the content data.

And a sub-step A18, wherein the server side sends the target interface element to the search engine, and obtains a target intention template corresponding to the target interface element through the index catalog inquiry.

And a sub-step A19, wherein the server side calculates the similarity between the user corpus data and the template corpus data in the target intention template, and determines at least one template corpus data with the highest similarity with the user corpus data as target template corpus data.

Referring to fig. 8, the embodiment of the present application may further describe the contents in the above steps a15 to a19 by way of an example, specifically, assuming that the screen of the client includes: the client can realize four corresponding functions through the four elements, and the server can establish a corresponding intention template according to the four elements (only the intention template of the online shopping element is drawn in fig. 8). And an index catalog is established according to the corresponding relation between the intention templates and the interface elements, and a query word can be input into the index catalog to be matched with the interface elements in the index catalog, so that the corresponding intention templates are obtained through query.

For example, when the user says "me is about to buy the nike", the client collects the content data in the current online shopping interface and uploads the content data to the server, and the server can directly inquire in the index catalog of the search engine to obtain the intention template corresponding to the online shopping element according to the content data comprising the online shopping interface element, and the "me is about to buy the nike" is directly matched with the intention template corresponding to the online shopping element, so that the matching of the "me is about to buy the nike" with the intention template corresponding to the audio element, the video element and the chat element is not needed, the processing time is reduced, and the processing efficiency is improved.

And a sub-step A2, wherein the server matches the user corpus data with the target template corpus data according to a preset matching rule, determines fixed corpus in the matched target template corpus data as the action information, and adds dynamic corpus in the matched target template corpus data into the execution object information.

Optionally, in a specific implementation manner of the embodiment of the present invention, the substep A2 may be implemented specifically by a manner that the server matches the user corpus data with the execution object information according to a regular matching rule, determines a fixed corpus in the matched target template corpus data as the action information, and adds the dynamic corpus in the matched target template corpus data and the second word to the execution object information.

In the embodiment of the application, for the examples provided in the sub-steps a11 to a14, according to the obtained corpus 1 'i buy $ brand' and corpus 2 'i buy $ product', the whole sentence spoken by the user 'i buy $ product' is respectively and regularly matched with corpus 1 and corpus 2, the matching process is to regularly match the user 'i buy $ with the dynamic corpus $ brand and $ product in corpus 1 and corpus 2, and also regularly match the' blue moon $ product with the dynamic corpus $ brand in corpus 1 and corpus 2, at this time, if the business logic defines that only the brand is in the meaning of 'blue moon $', only the corpus 1 'i buy $ brand' is finally matched, at this time, the word of division of 'blue moon' is added to the corpus 1, the final user intention is reflected in the intention and interface element (action) of the online shopping, and the object information (object) of 'blue moon' is executed.

That is, by the above-described regular matching process, and the meaning of the brand in which "blue moon" is defined, it is determined that the user says "I want to buy blue moon", corresponding to the further intention that "user want to buy brand", rather than "user want to buy product".

Sub-step 5053, the server determines the user intention information according to the content data, the action information and the execution object information.

After determining the action information and the execution object information, the action information and the execution object information may be error-corrected by further using the content data, and the error-corrected result may be determined as user intention information.

Specifically, the action information and the execution object information are corrected by using the content data, and specifically, whether information which is regularly matched or irregularly matched with the execution object information exists in the content data or not can be determined by judging whether the content data comprises the execution object information, and the information which is regularly matched or irregularly matched can be added into the user intention information.

Optionally, in a specific implementation of an embodiment of the present invention, the substep 5053 may specifically include:

sub-step 50031, if the content data includes the execution object information, the server adds the action information, the target interface element and the execution object information into the user intention information;

in this step, when it is determined that the preliminary intention of the user is the action information and the execution object information corresponding to the user corpus data, and the dynamic corpus and the second word are added to the execution object information, in order to further obtain the more accurate user intention, it is necessary to determine whether the dynamic corpus with a larger meaning range can be replaced by the second word with a more accurate meaning.

Specifically, the above-mentioned judging process needs to judge whether the content data includes the dynamic corpus and the second word, if so, the dynamic corpus may be replaced by the second word.

Optionally, in a specific implementation manner of an embodiment of the present invention, the substep 50531 may be specifically implemented by adding, by the server, the action information, the interface element, and the execution object information to the user intention information if the content data includes the dynamic corpus and the second word segmentation.

For example, for the example provided in the above substep A2, after determining that the further intention corresponding to "i am to buy blue moon" that the user says is $ brand dynamic tag in "i am to buy brand", and the second term "blue moon", if it is assumed that the content data provided by the client exhibits an online shopping interface including blue moon brand products, obsolete brand products, vertical white brand products, and the like, then according to the $ brand dynamic tag, it may be determined that the content data includes "brand", further by the second term "blue moon", the regular matching to the blue moon brand in the content data, then it may be determined that the final user intention information includes the action parameters: "I want to buy" (interface element+action information), object parameters: brands "blue moon" (execution object information).

Sub-step 50532, if the content data does not include the execution object information, the server determines the target content data as target execution object information, and adds the action information, the target interface element and the target execution object information into the user intention information;

the target content data is data, of which the similarity with the execution object information accords with a preset condition, in the content data.

Specifically, in the above-mentioned judging process, whether the content data includes the dynamic corpus and the second word, if not, the content data with the highest similarity with the execution object information may be determined as the target execution object information, so as to replace the dynamic corpus with a larger meaning range.

Alternatively, in a specific implementation manner of the embodiment of the present invention, the substep 50532 may be specifically implemented by determining, by the server, the target content data as target execution object information if the dynamic corpus is not included in the content data or the second word is not included in the content data, and adding the action information, the interface element, and the target execution object information to the user intention information.

In this step, for example, for the example provided in the above substep A2, after determining that the further intention corresponding to the "i am to buy blue moon" spoken by the user is $ brand dynamic tag in the "user wants to buy blue moon" and the second word "blue moon" (daily chemical brand), if it is assumed that the content data provided by the client exhibits an online shopping interface, including a product of a resistance brand (sports equipment brand), a product of a vertical white brand (daily chemical brand), and the like, then according to the $ brand dynamic tag, it may be determined that the content data includes the "brand" and further through the second word "blue moon" and cannot be regularly matched to the corresponding content in the content data, then at this time, the $ brand dynamic tag, and the second word "blue moon" may be calculated to be similarity with the content data, and found that the vertical white brand in the content data is the $ brand dynamic tag, and the second word "blue moon" are classified as belonging to the daily chemical brand, and the similarity is the highest, so it may be determined that the final user intention information includes the parameters: "I want to buy" (interface element+action information), object parameters: brand "vertical white" (execution object information).

Optionally, in an implementation of an embodiment of the present application, the substep 50532 may specifically include:

and B1, word segmentation is carried out on the user corpus data to obtain a first array comprising word groups and word frequencies.

And B2, word segmentation is carried out on the content data to obtain a second array comprising word groups and word frequencies.

And a sub-step B3 of adding the content data corresponding to the second array with the highest cosine similarity of the first array into the user intention information.

In this embodiment of the present application, in combination with the above sub-steps B1 to B3, it is determined that content data having the highest similarity with the second word segmentation is determined as target execution object information, and specifically, word segmentation may be performed on user corpus data to obtain a first array including a phrase and a word frequency, and further word segmentation is performed on the content data to obtain a second array including a phrase and a word frequency, where the division of the first array and the second array is performed to perform dimension reduction processing on the user corpus data and the content data, so as to remove redundant data therein, so that a subsequent similarity calculation result is more accurate.

And finally, solving the cosine distance between the first array and the second array, wherein the cosine distance is larger to indicate that the cosine distance is more similar to the cosine distance, so that the content data corresponding to the second array with the highest cosine similarity with the first array can be added into user intention information, and the content data can be used for replacing dynamic corpus with a larger range in the execution object information.

Optionally, in an implementation manner of the embodiment of the present application, step B3 may specifically include:

and a sub-step B31 of counting the first word groups and the corresponding first word frequencies which are simultaneously present in the first array and the second array.

And a sub-step B32 of adding the corresponding first word frequency of each first phrase in the first array and the second array to obtain a first parameter.

And a sub-step B33 of counting the second word groups and the corresponding second word frequencies appearing in the first array.

And a substep B34, adding the square values and square roots of the second word frequency to obtain a second parameter.

And a sub-step B35 of counting third word groups and corresponding third word frequencies appearing in the second array.

Substep B36, adding the square values and square roots of the third word frequency to obtain a third parameter.

And a substep B37, dividing the first parameter by the product of the second parameter and the third parameter to obtain the cosine similarity between the first array and the second array.

And a sub-step B38 of adding the content data corresponding to the second array with the highest cosine similarity of the first array into the user intention information.

In this embodiment of the present application, in combination with the above sub-steps B31 to B38, the cosine similarity between the two arrays is determined according to the word frequencies and the word groups in the two arrays, specifically, the first word group and the corresponding first word frequency appearing in the first array and the second word group and the corresponding second word frequency appearing in the first array may be counted, and the third word group and the corresponding third word frequency appearing in the second array may be counted.

And adding the first word frequency corresponding to each first word group in the first array and the second array to obtain a first parameter. And adding the square values of the second word frequency and square root to obtain a second parameter. And adding the square values of the third word frequency and square root to obtain a third parameter.

And finally, dividing the first parameter by the product of the second parameter and the third parameter to obtain the cosine similarity between the first array and the second array.

Step 506, the server sends the user intention information to the client.

In step 507, the client receives the user intention information.

And step 508, the client performs business operation for the execution object information according to the user intention information.

This step may be specifically referred to the description of step 203 above, and will not be described herein.

In summary, the method for identifying user intention provided by the present application includes: the client sends the user corpus data and at least part of content data displayed in a client screen to the server; the server side determines user intention information according to the content data and the user corpus data, wherein the user intention information comprises execution object information determined according to the content data; the server side sends user intention information to the client side; and the client performs business operation aiming at the execution object information according to the user intention information. In the embodiment of the application, the server matches the user corpus data with the content data, so that the execution object information corresponding to the user corpus data is determined, and the execution object information is related to the content data, so that deviation generated by understanding of the user corpus data aiming at the user corpus text information in an ASR process is corrected, finally determined user intention is related to the content data currently displayed on a screen, and the accuracy of determining the user intention is greatly improved.

In addition, the intention templates corresponding to the interface elements can be established for the interface elements included in the screen of the client so as to perform preliminary error correction on the user corpus data through the intention templates, thereby obtaining preliminary user intention and further reducing semantic understanding deviation of the client in the ASR recognition process of the user voice information. In addition, in some schemes, all display content data of a screen needs to be stored in advance in order to determine the intention of a user, and such schemes are costly to maintain when the data amount is large. The method and the device can dynamically express all data belonging to the tag classification by utilizing the dynamic tag, so that the effect of reducing semantic understanding deviation of a client in the ASR recognition process of the user voice information can be realized by only storing the intention template comprising the template corpus of the dynamic tag, the data storage amount is reduced, and the maintenance cost is reduced.

Referring to fig. 9, a flowchart of the interactive steps of another user intent recognition method of the present application is shown. The user intention recognition method is applied to a terminal device including a screen.

And 601, acquiring user corpus data and at least part of content data displayed in the screen.

This step may refer to step 401, and will not be described herein.

Step 602, calling a preset intention template according to the content data and the user corpus data, and determining the user intention information.

This step may refer specifically to step 505 described above, and will not be described herein.

Optionally, step 602 may specifically include:

sub-step 6021, determining a target interface element in the current interface of the client according to the content data.

This step may be specifically referred to step 5051 described above, and will not be described herein.

Sub-step 6022, matching the user corpus data with the intention template corresponding to the target interface element, and determining action information and execution object information corresponding to the user corpus data.

This step may be specifically referred to step 5052, and will not be described herein.

Optionally, the intent template includes at least one template corpus data, where the template corpus data includes a correspondence between a fixed corpus and a dynamic corpus, the fixed corpus reflects motion information, and the dynamic corpus reflects execution object information, and the substep 6022 may specifically include:

And C1, matching the corresponding relation between the user corpus data and the fixed corpus and the dynamic corpus in the at least one template corpus data, and determining action information and execution object information corresponding to the user corpus data.

This step may be specifically referred to step 50321 described above, and will not be described herein.

Optionally, the substep C1 may specifically include:

and C11, performing similarity calculation on the user corpus data and template corpus data in the intention template, and determining at least one template corpus data with similarity greater than or equal to a preset threshold value with the user corpus data as target template corpus data.

This step may refer to the above step A1, and will not be described herein.

And C12, matching the user corpus data with the target template corpus data according to a preset matching rule, determining fixed corpus in the matched target template corpus data as the action information, and adding dynamic corpus in the matched target template corpus data into the execution object information.

This step may refer to the above step A2, and will not be described herein.

Sub-step 6023, determining the user intention information based on the content data, and the action information and the execution object information.

This step may be specifically referred to step 5053 described above, and will not be described herein.

Optionally, the substep 6023 may specifically include:

and D1, if the content data comprises the execution object information, adding the action information, the target interface element and the execution object information into the user intention information.

This step may be specifically referred to step 50531, which is not described herein.

And D2, if the content data does not include the execution object information, determining target content data as target execution object information, and adding the action information, the target interface element and the target execution object information into the user intention information.

This step may refer to step 50532, and will not be described herein.

And step 603, executing business operation aiming at the execution object information according to the user intention information.

This step may refer to step 403, and will not be described herein.

In summary, the method for identifying user intention provided by the present application includes: acquiring user corpus data and at least part of content data displayed in the screen; determining user intention information according to the content data and the user corpus data, wherein the user intention information comprises execution object information determined according to the content data; and executing business operation aiming at the execution object information according to the user intention information. In the embodiment of the application, the terminal equipment matches the user corpus data with the content data, so that the execution object information corresponding to the user corpus data is determined, and the execution object information is related to the content data, so that deviation generated by understanding of the user corpus data aiming at the user corpus text information in an ASR process is corrected, finally determined user intention is associated with the content data currently displayed on a screen of the terminal equipment, and the accuracy of determining the user intention is greatly improved.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments and that the acts referred to are not necessarily required by the embodiments of the present application.

On the basis of the above embodiment, the present embodiment further provides a server and a client, which are applied to electronic devices such as a server (cluster) and a terminal device.

Referring to fig. 10, a block diagram illustrating a user intention recognition apparatus according to an embodiment of the present application may specifically include the following modules:

a first sending module 701, configured to send, through a client, user corpus data and at least part of content data displayed in a screen to a server;

a first receiving module 702, configured to obtain, by using the server, the user corpus data and the content data sent by the client;

a first determining module 703, configured to determine, by using the server, user intention information according to the content data and the user corpus data, where the user intention information includes execution object information determined according to the content data;

optionally, the first determining module 703 includes:

the first calling module is used for calling a preset intention template according to the content data and the user corpus data by the server side to determine the user intention information.

Optionally, the first calling module includes:

the first determining submodule is used for determining a target interface element in the current interface of the client according to the content data by the server;

The second determining submodule is used for matching the user corpus data with the intention templates corresponding to the target interface elements by the server side and determining action information and execution object information corresponding to the user corpus data;

optionally, the intent template includes at least one template corpus data, the template corpus data including a correspondence between a fixed corpus reflecting motion information and a dynamic corpus reflecting execution object information, and the second determining submodule includes:

the first matching unit is used for matching the corresponding relation between the user corpus data and the fixed corpus and the dynamic corpus in the at least one template corpus data by the server side, and determining action information and execution object information corresponding to the user corpus data.

Optionally, the first matching unit includes:

the first matching subunit is used for performing similarity calculation on the user corpus data and template corpus data in the intention template by the server, and determining at least one template corpus data with similarity greater than or equal to a preset threshold value as target template corpus data;

Optionally, the first matching subunit is further configured to: the server performs word segmentation on the user corpus data to obtain a word segmentation set; the server determines a target interface element corresponding to a first word segment according to the first word segment corresponding to the word sequence of the fixed corpus in the word segment set; the server determines a target intention template corresponding to the target interface element; the server calculates the similarity between the second word segment corresponding to the word sequence of the dynamic corpus in the word segment set and the template corpus data in the target intention template, and determines at least one template corpus data with the similarity between the second word segment and the template corpus data greater than or equal to the preset threshold value as target template corpus data.

Optionally, the dynamic corpus includes a dynamic label and a corpus category corresponding to the dynamic label; the first matching subunit is further configured to: according to the dynamic labels, the server determines dynamic corpus in the template corpus data; the server calculates the similarity between the second word segment corresponding to the word sequence of the dynamic corpus in the word segment set and the corpus category corresponding to the dynamic tag, and determines template corpus data to which at least one corpus category with the similarity between the second word segment and the corpus category is greater than or equal to the preset threshold value belongs as target template corpus data.

The second matching subunit is configured to match the user corpus data with the target template corpus data according to a preset matching rule, determine a fixed corpus in the matched target template corpus data as the action information, and add a dynamic corpus in the matched target template corpus data into the execution object information.

Optionally, the second matching subunit is further configured to match the user corpus data with the execution object information according to a regular matching rule, determine a fixed corpus in the matched target template corpus data as the action information, add the dynamic corpus in the matched target template corpus data, and the second word into the execution object information.

Optionally, the first matching unit further includes:

the first establishing unit is used for establishing an index catalog by the server according to the corresponding relation between the interface element and the intention template;

and the first storage unit is used for storing the index catalog in a memory of the search engine by the server.

Optionally, the first matching subunit is further configured to determine, by using the content data, a target interface element currently displayed on a screen of the client; the server side sends the target interface element to the search engine, and obtains a target intention template corresponding to the target interface element through the index directory query; and the server calculates the similarity between the user corpus data and the template corpus data in the target intention template, and determines at least one template corpus data with the highest similarity with the user corpus data as target template corpus data.

And the third determining submodule is used for determining the user intention information according to the content data, the action information and the execution object information by the server.

Optionally, the third determining sub-module includes:

a first adding unit, configured to add, if the content data includes the execution object information, the action information, the target interface element, and the execution object information to the user intention information by the server;

optionally, the first adding unit includes:

a first adding subunit, configured to add, if the content data includes the dynamic corpus and the second word segment, the action information, the interface element, and the execution object information to the user intention information by the server side

A second adding unit, configured to determine, if the content data does not include the execution object information, the target content data as target execution object information, and add the action information, the target interface element, and the target execution object information to the user intention information;

Optionally, the second adding unit includes:

and the second adding subunit is configured to determine, by the server, the target content data as target execution object information if the content data does not include the dynamic corpus or the second segmentation word, and add the action information, the interface element, and the target execution object information to the user intention information.

A second sending module 704, configured to send, through the server, the user intention information to the client;

a second receiving module 705, configured to receive, by the client, the user intention information;

and the first execution module 706 is configured to execute, by the client, a business operation for the execution object information according to the user intention information.

In summary, the user intention recognition device provided in the present application includes: the client sends the user corpus data and at least part of content data displayed in a client screen to the server; the server side determines user intention information according to the content data and the user corpus data, wherein the user intention information comprises execution object information determined according to the content data; the server side sends user intention information to the client side; and the client performs business operation aiming at the execution object information according to the user intention information. In the method, the server side can determine the execution object information corresponding to the user corpus data by utilizing the content data so as to correct deviation generated by understanding the user corpus data in the ASR process, so that finally determined user intention is associated with the content data currently displayed on the screen, and the accuracy of determining the user intention is improved.

Referring to fig. 11, a block diagram of a server embodiment of the present application is shown, which may specifically include the following modules:

a third receiving module 801, configured to obtain user corpus data and content data sent by the client, where the content data is at least part of data displayed in a screen of the client;

a second determining module 802, configured to determine user intention information according to the content data and the user corpus data, where the user intention information includes execution object information determined according to the content data;

and a third sending module 803, configured to send the user intention information to the client.

In summary, the service end provided in the present application includes: acquiring user corpus data and content data sent by the client; determining user intention information according to the content data and the user corpus data, wherein the user intention information comprises execution object information determined according to the content data; and sending the user intention information to the client. In the embodiment of the application, the server matches the user corpus data with the content data, so that the execution object information corresponding to the user corpus data is determined, and the execution object information is related to the content data, so that deviation generated by understanding of the user corpus data aiming at the user corpus text information in an ASR process is corrected, finally determined user intention is related to the content data currently displayed on a screen, and the accuracy of determining the user intention is greatly improved.

Referring to fig. 12, a block diagram of a client embodiment of the present application is shown, which may specifically include the following modules:

a fourth sending module 901, configured to send the corpus data of the user and at least part of the content data displayed in the screen to a server;

a fourth receiving module 902, configured to receive user intention information sent by the server; the user intention information includes execution object information corresponding to the content data;

and a second execution module 903, configured to execute a business operation for the execution object information according to the user intention information.

In summary, the client provided in the present application includes: transmitting the corpus data of the user and at least part of content data displayed in the screen to a server; receiving user intention information sent by the server; the user intention information includes execution object information corresponding to the content data; and executing business operation aiming at the execution object information according to the user intention information. In the embodiment of the application, the server matches the user corpus data with the content data, so that the execution object information corresponding to the user corpus data is determined, and the execution object information is related to the content data, so that deviation generated by understanding of the user corpus data aiming at the user corpus text information in an ASR process is corrected, finally determined user intention is related to the content data currently displayed on a screen, and the accuracy of determining the user intention is greatly improved.

Referring to fig. 13, a block diagram of an embodiment of a terminal device of the present application is shown, which may specifically include the following modules:

a fifth receiving module 1001, configured to obtain user corpus data and at least part of content data displayed in the screen;

a third determining module 1002, configured to determine user intention information according to the content data and the user corpus data, where the user intention information includes execution object information determined according to the content data;

optionally, the third determining module 1002 includes:

and the second calling module is used for calling a preset intention template according to the content data and the user corpus data to determine the user intention information.

Optionally, the second calling module includes:

a fourth determining submodule, configured to determine a target interface element in a current interface of the client according to the content data;

a fifth determining sub-module, configured to match the user corpus data with an intent template corresponding to the target interface element, and determine action information and execution object information corresponding to the user corpus data;

optionally, the intent template includes at least one template corpus data, the template corpus data including a correspondence between a fixed corpus reflecting motion information and a dynamic corpus reflecting execution object information, and the fifth determining submodule includes:

And the second matching unit is used for matching the corresponding relation between the user corpus data and the fixed corpus and the dynamic corpus in the at least one template corpus data, and determining action information and execution object information corresponding to the user corpus data.

Optionally, the second matching unit includes:

the third matching subunit is used for carrying out similarity calculation on the user corpus data and template corpus data in the intention template, and determining at least one template corpus data with similarity to the user corpus data being greater than or equal to a preset threshold value as target template corpus data;

optionally, the third matching subunit is further configured to: word segmentation is carried out on the user corpus data to obtain word segmentation sets; determining a target interface element corresponding to a first word segmentation according to the first word segmentation corresponding to the word order of the fixed corpus in the word segmentation set; determining a target intention template corresponding to the target interface element; and performing similarity calculation on second word segmentation corresponding to the word sequence of the dynamic corpus in the word segmentation set and template corpus data in the target intention template, and determining at least one template corpus data with similarity larger than or equal to the preset threshold value as target template corpus data.

Optionally, the dynamic corpus includes a dynamic label and a corpus category corresponding to the dynamic label; the third matching subunit is further configured to: determining a dynamic corpus in the template corpus data according to the dynamic tag; and performing similarity calculation on second segmentation words corresponding to word sequences of the dynamic linguistic data in the segmentation word set and the linguistic data categories corresponding to the dynamic labels, and determining template linguistic data to which at least one linguistic data category with the similarity to the second segmentation words being greater than or equal to the preset threshold value belongs as target template linguistic data.

And the fourth matching subunit is used for matching the user corpus data with the target template corpus data according to a preset matching rule, determining fixed corpus in the matched target template corpus data as the action information, and adding dynamic corpus in the matched target template corpus data into the execution object information.

Optionally, the fourth matching subunit is further configured to match the user corpus data with the execution object information according to a regular matching rule, determine a fixed corpus in the matched target template corpus data as the action information, add the dynamic corpus in the matched target template corpus data, and add the second word into the execution object information.

Optionally, the third matching unit further includes:

the second establishing unit is used for establishing an index catalog according to the corresponding relation between the interface element and the intention template;

and the second storage unit is used for storing the index catalog in a memory of the search engine.

Optionally, the third matching subunit is further configured to determine, according to the content data, a target interface element currently displayed on the screen of the client; the target interface element is sent to the search engine, and a target intention template corresponding to the target interface element is obtained through the index catalog inquiry; and carrying out similarity calculation on the user corpus data and template corpus data in the target intention template, and determining at least one template corpus data with the highest similarity with the user corpus data as target template corpus data.

And a sixth determining sub-module for determining the user intention information according to the content data, the action information and the execution object information.

Optionally, the sixth determining submodule includes:

a third adding unit configured to add the action information, the target interface element, and the execution object information to the user intention information if the execution object information is included in the content data;

Optionally, the third adding unit includes:

a third adding subunit, configured to add the action information, the interface element, and the execution object information to the user intention information if the content data includes the dynamic corpus and the second word segment

A fourth adding unit configured to determine target content data as target execution object information if the execution object information is not included in the content data, and add the action information, the target interface element, and the target execution object information to the user intention information;

Optionally, the fourth adding unit includes:

and a fourth adding subunit, configured to determine the target content data as target execution object information if the content data does not include the dynamic corpus or does not include the second segmentation word, and add the action information, the interface element, and the target execution object information to the user intention information.

And a third execution module 1003, configured to execute a business operation for the execution object information according to the user intention information.

In summary, the terminal device provided in the present application includes: acquiring user corpus data and at least part of content data displayed in the screen; determining user intention information according to the content data and the user corpus data, wherein the user intention information comprises execution object information determined according to the content data; and executing business operation aiming at the execution object information according to the user intention information. In the embodiment of the application, the terminal equipment matches the user corpus data with the content data, so that the execution object information corresponding to the user corpus data is determined, and the execution object information is related to the content data, so that deviation generated by understanding of the user corpus data aiming at the user corpus text information in an ASR process is corrected, finally determined user intention is associated with the content data currently displayed on a screen of the terminal equipment, and the accuracy of determining the user intention is greatly improved.

The embodiment of the application also provides a non-volatile readable storage medium, where one or more modules (programs) are stored, where the one or more modules are applied to a device, and the device may be caused to execute instructions (instractions) of each method step in the embodiment of the application.

Embodiments of the present application provide one or more machine-readable media having instructions stored thereon that, when executed by one or more processors, cause an electronic device to perform a method as described in one or more of the above embodiments. In this embodiment of the present application, the electronic device includes a server (cluster), a mobile device, a terminal device, and so on.

Embodiments of the present disclosure may be implemented as an apparatus for performing a desired configuration using any suitable hardware, firmware, software, or any combination thereof, which may include a server (cluster), mobile device, terminal device, etc., electronic device. Fig. 14 schematically illustrates an example apparatus 1100 that may be used to implement various embodiments described herein.

For one embodiment, fig. 14 illustrates an example apparatus 1100 having one or more processors 1102, a control module (chipset) 1104 coupled to at least one of the processor(s) 1102, a memory 1106 coupled to the control module 1104, a non-volatile memory (NVM)/storage 1108 coupled to the control module 1104, one or more input/transmission devices 1111 coupled to the control module 1104, and a network interface 1112 coupled to the control module 1106.

The processor 1102 may include one or more single-core or multi-core processors, and the processor 1102 may include any combination of general-purpose or special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In some embodiments, the apparatus 1100 can be used as a server (cluster), a mobile device, a terminal device, or the like in the embodiments of the present application.

In some embodiments, apparatus 1100 can include one or more computer-readable media (e.g., memory 1106 or NVM/storage 1108) having instructions 1104 and one or more processors 1102, in combination with the one or more computer-readable media, configured to execute instructions 1104 to implement modules to perform actions described in this disclosure.

For one embodiment, the control module 1104 may include any suitable interface controller to provide any suitable interface to at least one of the processor(s) 1102 and/or any suitable device or component in communication with the control module 1104.

The control module 1104 may include a memory controller module to provide an interface to the memory 1106. The memory controller modules may be hardware modules, software modules, and/or firmware modules.

Memory 1106 may be used to load and store data and/or instructions 1104 for device 1100, for example. For one embodiment, memory 1106 may comprise any suitable volatile memory, such as, for example, a suitable DRAM. In some embodiments, memory 1106 may comprise double data rate type four synchronous dynamic random access memory (DDR 4 SDRAM).

For one embodiment, the control module 1104 can include one or more input/transmit controllers to provide an interface to the NVM/storage 1108 and the input/transmit device(s) 1100.

For example, NVM/storage 1108 may be used to store data and/or instructions 1104. NVM/storage 1108 may include any suitable nonvolatile memory (e.g., flash memory) and/or may include any suitable nonvolatile storage device(s) (e.g., one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives).

NVM/storage 1108 may include storage resources that are physically part of the device on which apparatus 1100 is installed or may be accessible by the device without necessarily being part of the device. For example, NVM/storage 1108 may be accessed over a network via input/transmission device(s) 1100.

Input/transmission device(s) 1100 may provide an interface for apparatus 1100 to communicate with any other suitable device, input/transmission device 1100 may include communication components, audio components, sensor components, and the like. The network interface 1102 may provide an interface for the device 1100 to communicate over one or more networks, and the device 1100 may communicate wirelessly with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols, such as accessing a wireless network based on a communication standard, such as WiFi, 2G, 3G, 4G, etc., or a combination thereof.

For one embodiment, at least one of the processor(s) 1102 may be packaged together with logic of one or more controllers (e.g., memory controller modules) of the control module 1104. For one embodiment, at least one of the processor(s) 1102 may be packaged together with logic of one or more controllers of the control module 1104 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 1102 may be integrated on the same mold as logic of one or more controllers of the control module 1104. For one embodiment, at least one of the processor(s) 1102 may be integrated on the same die as logic of one or more controllers of the control module 1104 to form a system on chip (SoC).

In various embodiments, apparatus 1100 may be, but is not limited to being: a server, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet, a netbook, etc.), among other terminal devices. In various embodiments, device 1100 may have more or fewer components and/or different architectures. For example, in some embodiments, the apparatus 1100 includes one or more cameras, keyboards, liquid Crystal Display (LCD) screens (including touch screen displays), non-volatile memory ports, multiple antennas, graphics chips, application Specific Integrated Circuits (ASICs), and speakers.

The embodiment of the application provides a server, which comprises: one or more processors; and one or more machine readable media having instructions stored thereon, which when executed by the one or more processors, cause the server to perform the data processing method as described in one or more of the embodiments of the present application.

The embodiment of the application provides electronic equipment, which comprises: one or more processors; and executable code stored thereon that, when executed, causes the processor to perform a user intent recognition method.

One or more machine-readable media are provided with executable code stored thereon that, when executed, causes a processor to perform a user intent recognition method.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present embodiments have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the present application.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The foregoing has described in detail a method and apparatus for identifying user intent, an electronic device and a storage medium provided by the present application, and specific examples have been applied herein to illustrate the principles and embodiments of the present application, the above examples being provided only to assist in understanding the method and core idea of the present application; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A method for identifying user intention, comprising:

the server determines a target interface element in the current interface of the client according to the content data; matching the user corpus data with an intention template corresponding to the target interface element, and determining action information and execution object information corresponding to the user corpus data; determining the user intention information according to the content data, the action information and the execution object information;

the server side sends the user intention information to the client side;

the client receives the user intention information;

2. The method according to claim 1, wherein the step of determining the user intention information by the server according to the content data, the action information and the execution object information, comprises:

If the content data comprises the execution object information, the server adds the action information, the target interface element and the execution object information into the user intention information;

if the content data does not include the execution object information, the server determines target content data as target execution object information, and adds the action information, the target interface element and the target execution object information into the user intention information;

3. The method of claim 2, wherein the intent template includes at least one template corpus data, the template corpus data including correspondence between a fixed corpus reflecting motion information and a dynamic corpus reflecting execution object information; the server matches the user corpus data with an intention template corresponding to the target interface element, and determines action information and execution object information corresponding to the user corpus data, wherein the steps comprise:

The server matches the corresponding relation between the user corpus data and the fixed corpus and the dynamic corpus in the at least one template corpus data, and determines action information and execution object information corresponding to the user corpus data.

4. The method of claim 3, wherein the step of the server matching the user corpus data with a correspondence between a fixed corpus and a dynamic corpus in the at least one template corpus data to determine action information and execution object information corresponding to the user corpus data includes:

the server side calculates the similarity between the user corpus data and the template corpus data in the intention template, and determines at least one template corpus data with the similarity between the user corpus data and the template corpus data being greater than or equal to a preset threshold value as target template corpus data;

according to a preset matching rule, the server matches the user corpus data with the target template corpus data, determines fixed corpus in the matched target template corpus data as the action information, and adds dynamic corpus in the matched target template corpus data into the execution object information.

5. The method according to claim 4, wherein the step of the server side performing similarity calculation on the user corpus data and the template corpus data in the intent template, and determining at least one template corpus data with similarity to the user corpus data being greater than or equal to a preset threshold as target template corpus data includes:

the server performs word segmentation on the user corpus data to obtain a word segmentation set;

the server determines a target interface element corresponding to a first word segment according to the first word segment corresponding to the word sequence of the fixed corpus in the word segment set;

the server determines a target intention template corresponding to the target interface element;

the server calculates the similarity between the second word segment corresponding to the word sequence of the dynamic corpus in the word segment set and the template corpus data in the target intention template, and determines at least one template corpus data with the similarity between the second word segment and the template corpus data greater than or equal to the preset threshold value as target template corpus data.

6. The method of claim 5, wherein the dynamic corpus comprises dynamic tags and corpus categories corresponding to the dynamic tags;

The step that the server calculates the similarity between the second word segment corresponding to the word sequence of the dynamic corpus in the word segment set and the template corpus data in the target intention template, and determines at least one template corpus data with the similarity between the second word segment and the template corpus data greater than or equal to the preset threshold value as target template corpus data comprises the following steps:

according to the dynamic labels, the server determines dynamic corpus in the template corpus data;

the server calculates the similarity between the second word segment corresponding to the word sequence of the dynamic corpus in the word segment set and the corpus category corresponding to the dynamic tag, and determines template corpus data to which at least one corpus category with the similarity between the second word segment and the corpus category is greater than or equal to the preset threshold value belongs as target template corpus data.

7. The method according to claim 5, wherein the step of the server matching the user corpus data with the target template corpus data according to a preset matching rule, determining a fixed corpus in the matched target template corpus data as the action information, and adding a dynamic corpus in the matched target template corpus data to the execution object information includes:

The server matches the user corpus data with the execution object information according to a regular matching rule, determines fixed corpus in the matched target template corpus data as the action information, and adds the dynamic corpus in the matched target template corpus data and the second word into the execution object information.

8. The method according to claim 7, wherein the step of the server adding the action information, the target interface element, and the execution object information to the user intention information if the execution object information is included in the content data, comprises:

if the content data comprises the dynamic corpus and the second word segmentation, the server adds the action information, the interface element and the execution object information into the user intention information;

and if the content data does not include the execution object information, the server determines the target content data as target execution object information, and adds the action information, the target interface element and the target execution object information into the user intention information, including:

And if the dynamic corpus is not included in the content data or the second segmentation word is not included in the content data, the server determines the target content data as target execution object information, and adds the action information, the interface element and the target execution object information into the user intention information.

9. The method as recited in claim 4, further comprising:

the server establishes an index catalog according to the corresponding relation between the interface elements and the intention templates;

the server stores the index directory in a memory of a search engine.

10. The method according to claim 9, wherein the step of the server side performing similarity calculation on the user corpus data and the template corpus data in the intent template, and determining at least one template corpus data with similarity to the user corpus data being greater than or equal to a preset threshold as target template corpus data includes:

the server determines a target interface element currently displayed on a screen of the client through the content data;

the server side sends the target interface element to the search engine, and obtains a target intention template corresponding to the target interface element through the index directory query;

And the server calculates the similarity between the user corpus data and the template corpus data in the target intention template, and determines at least one template corpus data with the highest similarity with the user corpus data as target template corpus data.

11. The user intention recognition method is applied to a server and is characterized by comprising the following steps:

acquiring user corpus data and content data sent by a client, wherein the content data is at least part of data displayed in a screen of the client;

determining a target interface element in the current interface of the client according to the content data; matching the user corpus data with an intention template corresponding to the target interface element, and determining action information and execution object information corresponding to the user corpus data; determining the user intention information according to the content data, the action information and the execution object information;

and sending the user intention information to the client.

12. A user intention recognition method applied to a client including a screen, comprising:

Receiving user intention information sent by the server; the user intention information is: after the server determines a target interface element in the current interface of the client according to the content data; matching the user corpus data with an intention template corresponding to the target interface element, and determining action information and execution object information corresponding to the user corpus data; finally, according to the content data, the action information and the information determined by the execution object information;

13. A user intention recognition method applied to a terminal device, the terminal device including a screen, comprising:

determining a target interface element in the current interface of the screen according to the content data;

matching the user corpus data with an intention template corresponding to the target interface element, and determining action information and execution object information corresponding to the user corpus data;

determining the user intention information according to the content data, the action information and the execution object information;

14. The method of claim 13, wherein the step of determining the user intention information based on the content data, and the action information and the execution object information, comprises:

if the content data comprises the execution object information, adding the action information, the target interface element and the execution object information into the user intention information;

if the content data does not include the execution object information, determining target content data as target execution object information, and adding the action information, the target interface element and the target execution object information into the user intention information;

15. The method of claim 14, wherein the intent template includes at least one template corpus data, the template corpus data including correspondence between a fixed corpus reflecting motion information and a dynamic corpus reflecting execution object information;

The step of matching the user corpus data with the intention template corresponding to the target interface element to determine the action information and the execution object information corresponding to the user corpus data comprises the following steps:

and matching the corresponding relation between the user corpus data and the fixed corpus and the dynamic corpus in the at least one template corpus data, and determining action information and execution object information corresponding to the user corpus data.

16. The method according to claim 15, wherein the step of matching the user corpus data with a correspondence between a fixed corpus and a dynamic corpus in the at least one template corpus data, and determining action information and execution object information corresponding to the user corpus data includes:

performing similarity calculation on the user corpus data and template corpus data in the intention template, and determining at least one template corpus data with similarity greater than or equal to a preset threshold value with the user corpus data as target template corpus data;

according to a preset matching rule, matching the user corpus data with the target template corpus data, determining fixed corpus in the matched target template corpus data as the action information, and adding dynamic corpus in the matched target template corpus data into the execution object information.

17. A user intention recognition device, said device comprising:

the first determining module is used for determining a target interface element in the current interface of the client according to the content data through the server; matching the user corpus data with an intention template corresponding to the target interface element, and determining action information and execution object information corresponding to the user corpus data; determining the user intention information according to the content data, the action information and the execution object information;

18. A server, the server comprising:

the third receiving module is used for acquiring user corpus data and content data sent by a client, wherein the content data is at least part of data displayed in a screen of the client;

the second determining module is used for determining a target interface element in the current interface of the client according to the content data; matching the user corpus data with an intention template corresponding to the target interface element, and determining action information and execution object information corresponding to the user corpus data; determining the user intention information according to the content data, the action information and the execution object information;

19. A client, said client comprising:

the fourth sending module is used for sending the corpus data of the user and at least part of content data displayed in the screen to the server;

the fourth receiving module is used for receiving the user intention information sent by the server; the user intention information is: after the server determines a target interface element in the current interface of the client according to the content data; matching the user corpus data with an intention template corresponding to the target interface element, and determining action information and execution object information corresponding to the user corpus data; finally, according to the content data, the action information and the information determined by the execution object information;

20. A terminal device, characterized in that said terminal device comprises:

a fifth receiving module, configured to obtain corpus data of the user and at least part of content data displayed in the screen;

a third determining module, configured to determine, according to the content data, a target interface element in the current interface of the screen; matching the user corpus data with an intention template corresponding to the target interface element, and determining action information and execution object information corresponding to the user corpus data; determining the user intention information according to the content data, the action information and the execution object information;

21. An electronic device, comprising:

a processor; and

memory having executable code stored thereon that, when executed, causes the processor to perform the user intent recognition method as recited in one or more of claims 1-10.

22. One or more machine readable media having executable code stored thereon that, when executed, causes a processor to perform the user intent recognition method as recited in one or more of claims 1-10.

23. An electronic device, comprising:

a processor; and

a memory having executable code stored thereon that, when executed, causes the processor to perform the user intent recognition method as recited in claim 11.

24. One or more machine readable media having executable code stored thereon that, when executed, causes a processor to perform the user intent recognition method of claim 11.

25. An electronic device, comprising:

a processor; and

a memory having executable code stored thereon that, when executed, causes the processor to perform the user intent recognition method as recited in claim 12.

26. One or more machine readable media having executable code stored thereon that, when executed, causes a processor to perform the user intent recognition method of claim 12.

27. An electronic device, comprising:

a processor; and

memory having executable code stored thereon that, when executed, causes the processor to perform the user intent recognition method as recited in one or more of claims 13-16.

28. One or more machine readable media having executable code stored thereon that, when executed, causes a processor to perform the user intent recognition method as recited in one or more of claims 13-16.