CN111611468A

CN111611468A - Page interaction method and device and electronic equipment

Info

Publication number: CN111611468A
Application number: CN202010356398.8A
Authority: CN
Inventors: 唐子杰; 张海杰; 麻雪云; 程磊生; 曹文强; 曹彬; 陈奇; 吴开放; 吕晟; 邵领; 张弛; 王珍
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-04-29
Filing date: 2020-04-29
Publication date: 2020-09-01
Anticipated expiration: 2040-04-29
Also published as: CN111611468B

Abstract

The application discloses a page interaction method, and relates to the technical field of voice in the technical field of computers. The specific implementation scheme is as follows: obtaining executable interactive content of a page; acquiring voice information; performing intention recognition on the voice information so as to determine target interaction matched with the intention of the voice information from the executable interaction content; and executing target interaction on the page. Therefore, the target interaction matched with the intention of the voice information input by the user is executed on the page, the user can interact with the webpage in a voice mode, and the interaction efficiency of the user and the page is improved.

Description

Page interaction method and device and electronic equipment

Technical Field

The present application relates to the field of speech technology in the field of computer technology, and in particular, to a page interaction method, apparatus and electronic device.

Background

The webpage is the largest information carrier of the internet, and in the internet era of Personal computers (PC for short), people use mouse clicking, scrolling and keyboard input modes to interact with the webpage; in the era of mobile internet, people interact with web pages in a manner of clicking and sliding by fingers.

However, no matter the method of mouse click, scroll and keyboard input or the method of finger touch is used for interacting with the webpage, the input efficiency is low, so that the technical problem of low efficiency exists when the user interacts with the webpage.

Disclosure of Invention

The application provides a page interaction method, a page interaction device, electronic equipment and a storage medium.

An embodiment of a first aspect of the present application provides a page interaction method, including:

acquiring executable interactive content of a page;

acquiring voice information;

performing intention recognition on the voice information so as to determine target interaction matched with the intention of the voice information from the executable interaction content;

and executing the target interaction on the page.

An embodiment of a second aspect of the present application provides another page interaction method, including:

acquiring executable interactive content obtained by a front-end identification page of the page;

acquiring voice information;

sending an interaction instruction of the target interaction to the front end of the page; and the interaction instruction is used for executing the target interaction on the page.

An embodiment of a third aspect of the present application provides a page interaction apparatus, including:

the first acquisition module is used for acquiring executable interactive content of the page;

the second acquisition module is used for acquiring voice information;

the intention recognition module is used for carrying out intention recognition on the voice information so as to determine target interaction matched with the intention of the voice information from the executable interactive content;

and the execution module is used for executing the target interaction on the page.

An embodiment of a fourth aspect of the present application provides another page interaction apparatus, including:

the interaction acquisition module is used for acquiring executable interaction content obtained by the front-end identification page of the page;

the voice acquisition module is used for acquiring voice information;

the recognition module is used for performing intention recognition on the voice information so as to determine target interaction matched with the intention of the voice information from the executable interactive content;

the sending module is used for sending the interactive instruction of the target interaction to the front end of the page; and the interaction instruction is used for executing the target interaction on the page.

An embodiment of a fifth aspect of the present application provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the page interaction method of the first aspect embodiment or the page interaction method of the second aspect embodiment.

A sixth aspect of the present application provides a non-transitory computer-readable storage medium storing computer instructions, where the computer instructions are configured to cause the computer to execute the page interaction method of the first aspect, or the page interaction method of the second aspect.

One embodiment in the above application has the following advantages or benefits: obtaining executable interactive content of a page; acquiring voice information; performing intention recognition on the voice information so as to determine target interaction matched with the intention of the voice information from the executable interaction content; and executing target interaction on the page. Therefore, the target interaction matched with the intention of the voice information input by the user is executed on the page, the user can interact with the webpage in a voice mode, and the efficiency of the interaction between the user and the page is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic flowchart of a page interaction method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a page interaction method provided in the second embodiment of the present application;

fig. 3 is a schematic flowchart of a page interaction method provided in the third embodiment of the present application;

fig. 4 is a schematic flowchart of a page interaction method provided in the fourth embodiment of the present application;

fig. 5 is a schematic flowchart of a page interaction method provided in the fifth embodiment of the present application;

FIG. 6 is an exemplary diagram of a page interaction process provided in a sixth embodiment of the present application;

FIG. 7 is a schematic diagram illustrating user interaction with a page provided by an embodiment of the present application;

fig. 8 is a schematic structural diagram of a page interaction apparatus according to a seventh embodiment of the present application;

fig. 9 is a schematic structural diagram of a page interaction apparatus according to an eighth embodiment of the present application;

FIG. 10 is a block diagram of an electronic device for implementing a method of page interaction of an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the related art, the method is used for interacting with the webpage only by using mouse clicking, scrolling, keyboard inputting and finger touching, and cannot interact with the webpage by using a voice control mode.

Compared with a voice interaction mode, the interaction with the webpage by using a mouse and finger touch mode has the following disadvantages: in terms of input efficiency, the efficiency of typing by a user through a keyboard is far lower than that of voice input; from the aspect of learning cost, a computer and a mobile phone need to be used as a basis for learning certain knowledge, and voice does not need to be used; from the operating distance, the user must be in close contact with the device, which is inconvenient in some scenarios.

Aiming at the technical problems existing when the existing user interacts with the webpage, the application provides a webpage interaction method, which is characterized in that executable interaction content of the webpage is obtained; acquiring voice information; performing intention recognition on the voice information so as to determine target interaction matched with the intention of the voice information from the executable interaction content; and executing target interaction on the page. Therefore, interaction with the webpage in a voice interaction mode is achieved, a user does not need to learn to use characters additionally or contact with equipment, and can interact with the webpage only by speaking, and interaction efficiency is improved.

The page interaction method, apparatus, electronic device, and storage medium according to embodiments of the present application are described below with reference to the accompanying drawings.

Fig. 1 is a schematic flowchart of a page interaction method according to an embodiment of the present application.

The page interaction method is exemplified by being configured in a page interaction device in the embodiments of the present application, and the page interaction device may be applied to any electronic device, so that the electronic device may execute a page interaction function.

The electronic device may be a PC, a cloud device, a mobile device, and the like, and the mobile device may be, for example, a mobile phone, a tablet computer, a personal digital assistant, a wearable device, a vehicle-mounted device, and the like, which have various operating systems.

As an example, the electronic device may include a voice module, a page front end, and a server end module. For example, the voice module acquires the voice of the user speaking; the front end of the page identifies the page to obtain executable interactive content of the page; the server module performs intention recognition on the voice acquired from the voice module so as to determine target interaction matched with the intention of the voice information from the executable interactive content; further, the page front end performs the target interaction on the page. Therefore, the function of voice interaction between the user and the webpage is realized.

The process of the electronic device executing the page interaction function is described in detail below with reference to fig. 1.

As shown in fig. 1, the page interaction method may include the following steps:

step 101, obtaining executable interactive content of a page.

The page may be a hypertext Markup Language (HTML) page. Such as pages presented by a hundred degree browser, pages presented by a google browser, and so on.

In the application, the HTML page can be analyzed through an HTML Parser module at the front end of the page, so that the executable interactive content of the page can be obtained. The HTMLParser is a self-contained module which is specially used for analyzing HTML in a computer programming language Python. The executable interactive content of the HTML page can be analyzed by using the HTMLParser module.

It should be explained that the executable interactive content of the page may be page operation-type content, such as sliding, refreshing, advancing, backing, and the like; the content of the page clicking operation type can also be displayed, such as immediate query, label switching and the like; the content of the slot filling operation class, such as departure place, arrival place, departure date and the like; other types of interactive content are also possible and will not be described in detail herein.

Step 102, acquiring voice information.

The voice information is text information input by a user in a voice mode.

In the embodiment of the application, when a user performs voice interaction with a page, the user speaks a section of voice, the voice module of the electronic device records the speaking content of the user, and sends text information obtained by identifying the recording content to the server module of the electronic device, so that the server module of the electronic device obtains the voice information.

As an example, when a user needs to purchase a train ticket, the user may speak a "train ticket from beijing to shanghai, friday, this week," the voice module of the electronic device records the spoken content of the user, and sends text information obtained by identifying the recorded content to the server module, so that the server module obtains the voice information.

It should be explained that, in this embodiment, the execution sequence of the above step 101 and step 102 is not limited, and the process of acquiring the voice information in step 102 may be executed first, and then the process of acquiring the executable interactive content of the page in step 101 may be executed.

For example, when a user performs voice interaction with a page, the voice module may acquire voice information input by the user after the page front end of the electronic device acquires executable interactive content of the page; or after the voice module of the electronic device acquires the voice information input by the user, the front end of the page acquires the executable interactive content of the page, which is not limited in this embodiment.

And 103, performing intention recognition on the voice information so as to determine target interaction matched with the intention of the voice information from the executable interactive content.

The target interaction refers to executable interaction content of a page matched with the intention of voice information input by a user.

In the application, after the server module of the electronic device acquires the voice information, intention recognition can be performed on the voice information, so that the intention of interaction between the user and the page can be determined from the voice information.

As a possible implementation manner, the text content corresponding to the voice information may be input into the trained intention recognition model, so as to obtain the intention information of the voice information according to the output of the model. Wherein, the intention recognition model has learned the mapping relation between the voice information and the corresponding intention, therefore, the intention of the voice information can be accurately recognized.

As another possible implementation manner, the voice information can be subjected to intention recognition by adopting a rule template classification method. For example, the text information corresponding to the voice information is matched with each template in the template library, so that the intention of the voice information is determined according to the template matched with the text information corresponding to the voice information in the template library.

It should be noted that the above-mentioned method of recognizing the intention of the voice information is merely an exemplary expression, and the remaining method of recognizing the intention of the voice information is also applicable to the present application.

In the embodiment of the application, after the intention of the user in the voice information interacting with the page is identified, the target interaction matched with the intention of the voice information can be determined from the executable interactive content.

As an example, assuming that the acquired voice information is "train ticket from beijing to shanghai in this friday", the intention of the voice information is determined to be inquiry train ticket by performing intention recognition on the voice information, and the interactive content matching the intention of the voice information can be determined to be the slot filling operation class content from the executable interactive contents, wherein the departure place is beijing, the arrival place is shanghai, and the departure date is this friday.

And 104, executing target interaction on the page.

In the application, after the target interaction matched with the intention of the voice information input by the user is determined, the front end of the page can be controlled to execute the target interaction on the page.

As an example, the user inputs the "diphtheria-pertussis-tetanus vaccine 201607050-2 wuhan biology" by voice, and the voice module of the electronic device records the voice input by the user and transmits the voice information obtained by recognition to the server module. The server module performs intention identification on the received voice information, determines the intention of the user through details of the page query vaccine, determines target interaction matched with the intention of the voice information from executable interactive content of the page as 'immediate query', and controls the front end of the page to execute the operation of the immediate query on the page, so that a query result can be obtained.

According to the page interaction method, the executable interaction content of the page is obtained; acquiring voice information; performing intention recognition on the voice information so as to determine target interaction matched with the intention of the voice information from the executable interaction content; and executing target interaction on the page. Therefore, the target interaction matched with the intention of the voice information input by the user is executed on the page, the user can interact with the webpage in a voice mode, and the interaction efficiency of the user and the page is improved.

On the basis of the above embodiments, a second embodiment of the present application provides another page interaction method, and fig. 2 is a schematic flow chart of the page interaction method provided in the second embodiment of the present application.

As shown in fig. 2, the page interaction method may include the following steps:

step 201, analyzing the page to obtain the response operation of each interactive element.

For example, the operation responded by the interactive element may be a click operation, a text input operation, or the like.

In this application, each interactive element in the page may include: content interaction elements, menu interaction elements, and state interaction elements.

The < details > and < summary > elements in the content interactive elements belong to newly added content interactive elements and are mainly used for interactive display of titles, details and contents of documents. The < details > element is used to specify the role of the document or some detailed information, often in conjunction with the < summary > element.

Since the style and function of each interactive element in the page are defined in the attribute information of the interactive element, for example, the attribute information of the interactive element includes the response operation of the interactive element and the text description for describing the interactive purpose. Therefore, the page can be analyzed to obtain the response operation of each interactive element.

Step 202, the operation responded by each interactive element is taken as the interactive operation executable by each interactive element.

In the application, after the operation responded by each interactive element in the page is obtained, the operation responded by each interactive element can be used as the interactive operation executable by each interactive element.

As an example, an interactive element may be determined to perform a click operation in response to an operation "click". That is, the "click" in the attribute information of the interactive element is for stating that the interactive operation that the interactive element can perform is a click.

Step 203, generating a list of executable interactive contents according to the executable interactive operations of the interactive elements.

In the application, after determining the interactive operations executable by the interactive elements, the interactive operations executable by the interactive elements can be classified to obtain a list of executable interactive contents.

That is, the list of executable interactive contents of the page includes each interactive element and the executable interactive operation corresponding to each interactive element. Furthermore, by inquiring the list of the executable interactive content, the functions corresponding to the interactive elements can be determined, and the efficiency of interaction between the user and the page can be improved.

Specifically, after the page is analyzed and a plurality of interactive elements in the page are determined, the text description of each interactive element obtained by analyzing the page can be obtained. For example, the text description may be confirm, refresh, forward, backward, etc. Furthermore, the text description of each interactive element is used as the interactive purpose of each interactive element.

As an example, where the text of an interactive element describes "forward," it may be determined that the interactive purpose of the interactive element is forward.

In the embodiment of the application, after the interactive operation of each interactive element in the page and the interactive purpose of each interactive element are determined, each interactive element can be classified to generate a list of executable interactive contents of the page.

Step 204, acquiring voice information.

In the embodiment of the present application, the implementation process of step 204 may refer to the implementation process of step 102 in the foregoing embodiment, and is not described herein again.

It should be noted that the execution process of step 204 may also precede step 201, and this application is not limited herein.

And step 205, performing intention recognition on the voice information so as to determine executable interactive operation of the target element in the page according to the executable interactive content.

In the embodiment of the application, after the intention of the voice information is identified and the intention of the voice information is determined, the interactive purpose of a plurality of interactive elements in the page can be determined from the executable interactive content of the page. Further, from among the plurality of interactive elements, an interactive element whose interaction purpose matches the intention is determined as a target element.

Furthermore, the executable interactive operation of the target element in the page can be determined according to the executable interactive content of the page.

As an example, the service-side module performs intention recognition on the voice information, determines the intention of the voice information to be 'purchase train tickets', and can determine the interactive operation which can be executed by the target element in the page from the executable interactive content to be the query operation according to the intention of the voice information.

In step 206, a target operation matching the intention is determined from the interactive operations that can be performed on the target element.

In the embodiment of the application, after the intention of the voice information is identified and determined, the interactive operation executable by the target element in the page is determined according to the executable interactive content of the page, and then the target operation matched with the intention can be determined from the interactive operation executable by the target element.

For example, recognizing that the intention of the voice information is "purchase of train tickets from beijing to shanghai on this friday", it is possible to determine, as a query operation, a target operation that matches the intention among the interactive operations that can be performed by the target element.

And step 207, generating an interactive instruction of target interaction according to the target element and the target operation.

According to the method and the device, after the intention of the voice information is identified, the target element is determined from a plurality of interactive elements in the page according to the intention of the voice information, and after the target operation matched with the intention is determined from interactive operations executable by the target element, the interactive instruction of the target interaction can be generated according to the target element and the target operation.

Continuing with the example in step 206 as an example, after determining that the user interacts with the page to purchase train tickets from beijing to shanghai on this friday, according to the elements corresponding to the search box and the query operation that needs to be executed, an interactive instruction of the target interaction may be generated as the query instruction. Therefore, after the interactive instruction of the target interaction is generated through the intention of the voice information, the target interaction can be executed on the page, and the voice interaction between the user and the page is realized.

And step 208, executing the interactive instruction of the target interaction on the page.

In the embodiment of the application, after the server module generates the interactive instruction of the target interaction according to the target element and the target operation executed on the target element, the interactive instruction is sent to the page front end, so that the page front end executes the target interaction on the page according to the interactive instruction, and the purpose that the user interacts with the page in a voice mode is achieved.

For example, the interactive instruction for generating the target interaction is a query instruction, and the front end of the page may execute the target interaction for querying the train ticket on the page according to the interactive instruction.

According to the page interaction method, the page is analyzed to obtain the response operation of each interactive element, the response operation of each interactive element is used as the executable interactive operation of each interactive element, and a list of executable interactive contents is generated according to the executable interactive operation of each interactive element to obtain voice information; and performing intention recognition on the voice information to determine interactive operations executable by target elements in the page according to the executable interactive contents, determining target operations matched with the intention from the interactive operations executable by the target elements, and generating interactive instructions of the target interaction according to the target elements and the target operations. Therefore, the target element which accords with the intention of the voice information is determined from all interactive elements in the page, and the interactive instruction of the target interaction is generated according to the target element and the target operation executed by the target element, so that the interaction between the user and the page in a voice mode is realized, and the interaction efficiency between the user and the page is improved.

On the basis of the above embodiment, when performing intent recognition on voice information in

steps

103 and 205, it is also possible to determine the manner of performing intent recognition on voice information by determining whether a target template matching the voice information exists in the template library. The above process is described in detail below with reference to fig. 3.

Fig. 3 is a schematic flowchart of a page interaction method provided in the third embodiment of the present application.

As shown in fig. 3, the page interaction method may further include the following steps:

step 301, a template library is obtained.

Wherein, a large number of trained templates are stored in the template library.

In the embodiment of the application, when the voice information is acquired and recognized, a preset template library can be acquired so as to match the voice information with each template in the template library.

Step 302, matching each template in the template library with the voice information respectively.

Step 303, determine whether there is a target template with sentence pattern matching with the voice message in the template library.

The target module is a template matched with the voice information in the module library.

In the embodiment of the application, the voice information is matched with each template in the template library to judge whether a sentence pattern and a target template matched with the voice information exist in the template library, and the voice information is subjected to intention recognition by adopting a corresponding voice information recognition mode.

And step 304, if a target template with the sentence pattern matched with the voice information exists in the template library, performing intention identification on the voice information according to the target template.

In the embodiment of the application, after the voice information is matched with each template in the template library, a target template with a sentence pattern matched with the voice information in the module library is determined, and in this case, the intention recognition of the voice information can be performed according to the target template.

As a possible implementation manner, when performing intent recognition on voice information according to a target template matched with the voice information, the slot filling content of the slot may be extracted and obtained according to the slot position set in the target template and a text position corresponding to the slot position in the voice information. Further, the content of the slot filling of the slot position can be used as the intention of the voice information. Therefore, the intention of the voice information can be accurately identified through the slot filling content of the slot position, and the interaction accuracy between the user and the page is improved.

As an example, suppose the acquired voice information is "train ticket from Beijing to Shanghai on this Friday", and there is a train ticket from the template library with the template "{ time } { city name } to { city name }, which exactly matches the voice information" train ticket from Beijing to Shanghai on this Friday ". Furthermore, according to the slot position set in the template, extracting the slot filling content of the slot position from the text position corresponding to the slot position in the voice information, so as to identify the intention of the voice information according to the slot filling content of the slot position. For example, the intent recognition of the speech information according to the template may result in:

intention 1: inputting a departure date; groove position 1: fifthly, the week;

intention 2: inputting a starting city; and (4) groove position 2: beijing;

intention 3: inputting arrival city; slot position 3: shanghai.

As another possible implementation manner, when performing intent recognition on voice information according to a target template matched with the voice information, an intent corresponding to the target template may be used as an intent of the voice information.

As an example, if the voice information is "click", the voice information is matched with each template in the template library, and it is determined that there is a target template matching the voice information as "X click", the intention of the target template may be regarded as the intention of the voice information.

And 305, if no target template with the sentence pattern matched with the voice information exists in the template library, performing intention recognition on the voice information by adopting a semantic model.

In the embodiment of the application, after the voice information is matched with each template in the template library, it is determined that no target template with a sentence pattern matched with the voice information exists in the module library, and in this case, the semantic model is adopted to identify the intention of the voice information.

As an example, assuming that the voice information is "go from shanghai", after matching the voice information with each template in the template library, it is determined that there is no target model having a sentence pattern matching the voice information in the template library, and in this case, the voice information cannot be intention-recognized using the target template, the voice information is intention-recognized using the semantic model. Therefore, intention recognition of different voice information is achieved, and accuracy of the intention recognition of the voice information is improved.

As a possible implementation manner, firstly, word segmentation processing is performed on the voice information to obtain semantic features and part-of-speech features of each word segmentation, and then the voice features and the part-of-speech features of each word segmentation are input into a semantic model to determine the intention of the voice information according to the output of the semantic model.

It is understood that natural language processing is the understanding of human language by computers, i.e., the understanding of the meaning behind words by computers that can read words as people would. In the process of reading, only the meaning of the word is understood, and the meaning of the whole sentence can be grasped. Therefore, to make a computer understand human text, it is necessary for the computer to accurately grasp the meaning of each word. Therefore, in the natural language processing, when speech information is processed, it is necessary to perform word segmentation on the speech information.

Alternatively, speech information may be segmented in a statistical-based manner, with the statistical sample content from some standard corpus. The method can also be used for segmenting the voice information in a dictionary-based segmentation mode to obtain semantic features and part-of-speech features of each segmented word.

The semantic model in the application is a model obtained by adopting a large number of training samples to train in advance, and the trained semantic model can accurately identify the intention of each participle feature.

According to the webpage interaction method, the template base is obtained, each template in the template base is matched with the voice information, if the target template with the sentence pattern matched with the voice information exists in the template base, intention recognition is conducted on the voice information according to the target template, and if the target template with the sentence pattern matched with the voice information does not exist in the template base, intention recognition is conducted on the voice information through the semantic model. Therefore, different types of voice information are subjected to intention recognition by adopting different intention recognition modes, and the accuracy of the voice information intention recognition is improved.

On the basis of the above embodiments, the fourth embodiment of the present application provides another web page interaction method.

Fig. 4 is a schematic flowchart of a web page interaction method provided in the fourth embodiment of the present application.

As shown in fig. 4, the web page interaction method, executed by a server module or a client, may include the following steps:

step 401, obtaining executable interactive content obtained by the page front end identification page.

According to the method and the device, the page front end identifies the page, executable interactive content of the page can be obtained, and then the executable interactive content of the page is sent to the server module, so that the server module can obtain the executable interactive content obtained by identifying the page by the page front end.

As an example, an HTML page may be parsed by an HTML Parser module at the front of the page to obtain executable interactive content for the page.

Step 402, acquiring voice information.

It should be explained that, in this embodiment, the execution sequence of the step 401 and the step 402 is not limited, and the process of acquiring the voice information in the step 402 may be executed first, and then the process of acquiring the executable interactive content obtained by the page front-end identification page in the step 401 is executed.

In step 403, intention recognition is performed on the voice information to determine a target interaction matching the intention of the voice information from the executable interaction content.

In the embodiment of the present application, the implementation processes of step 402 and step 403 may refer to the implementation processes of step 102 and step 103 in the first embodiment, and are not described herein again.

Step 404, sending an interaction instruction of target interaction to the front end of the page; and the interaction instruction is used for executing target interaction on the page.

In the method, the server module performs intention recognition on the voice information, determines target interaction matched with the intention of the voice information from executable interactive contents, and then sends an interactive instruction corresponding to the target interaction to the page front end, so that the page front end executes the target interaction on the page according to the interactive instruction.

For example, the server module performs intention recognition on voice information, determines an interactive instruction corresponding to a target interaction matched with the intention of the voice information from the executable interactive content as a click instruction, and sends the instruction to the page front end, so that the page front end executes the click instruction on the page.

According to the page interaction method, the voice information is obtained by obtaining the executable interaction content obtained by identifying the page by the page front end, intention identification is carried out on the voice information, so that the target interaction matched with the intention of the voice information is determined from the executable interaction content, and the interaction instruction of the target interaction is sent to the page front end; and the interaction instruction is used for executing target interaction on the page. Therefore, the interaction instruction interacting with the page is determined by the server module and sent to the front end of the page, so that target interaction is executed on the page, interaction between the user and the page in a voice interaction mode is realized, and the efficiency of interaction between the user and the page is improved.

On the basis of the above embodiments, the present application provides a web page interaction method.

Fig. 5 is a schematic flowchart of a web page interaction method provided in the fifth embodiment of the present application.

As shown in fig. 5, the web page interaction method may include the following steps:

step 501, executable interactive content obtained by a page front-end identification page is obtained.

In the embodiment of the application, the front end of the page analyzes the page to obtain the operation responded by each interactive element, and then the operation responded by each interactive element is used as the interactive operation executable by each interactive element.

After the page is analyzed and a plurality of interactive elements in the page are determined, the text description of each interactive element obtained by analyzing the page can be obtained. Furthermore, the text description of each interactive element is used as the interactive purpose of each interactive element. After the interactive operation of each interactive element and the interactive purpose of each interactive element in the page are determined, each interactive element can be classified to generate a list of executable interactive contents of the page.

After the page is analyzed by the page front end to obtain the executable interactive content of the page, the obtained executable interactive content is sent to the server module, so that the server module obtains the executable interactive content obtained by the page front end to identify the page.

Step 502, acquiring voice information.

It should be explained that, in this embodiment, the execution sequence of the step 501 and the step 502 is not limited, and the process of acquiring the voice information in the step 502 may be executed first, and then the process of acquiring the executable interactive content obtained by the page front-end identification page in the step 501 may be executed.

And 503, performing intention recognition on the voice information to determine executable interactive operation of the target element in the page according to the executable interactive content.

Step 504, determining the target operation matched with the intention from the interactive operations performed by the target element.

And 505, generating an interactive instruction of target interaction according to the target element and the target operation.

In the embodiment of the present application, the implementation process of step 502 to step 505 may refer to the implementation process of step 204 to step 207, which is not described herein again.

Step 506, sending an interactive instruction of target interaction to the front end of the page; and the interaction instruction is used for executing target interaction on the page.

In the embodiment of the present application, the implementation process of step 506 may refer to the implementation process of step 404 in the foregoing embodiment, and details are not described here.

Therefore, the interaction instruction interacting with the page is determined by the server module and sent to the front end of the page, so that target interaction is executed on the page, interaction between the user and the page in a voice interaction mode is realized, and the efficiency of interaction between the user and the page is improved.

As an example, referring to fig. 6, fig. 6 is an exemplary diagram of a page interaction process provided in a sixth embodiment of the present application.

As shown in fig. 6, the page interaction method includes the following steps:

step 601, the front end of the page analyzes the page to obtain the executable interactive content of the page.

Step 602, the voice module records and identifies the voice input by the user, and sends the voice information to the server module after obtaining the voice information.

Step 603, the server module obtains each module in the module library to match with the voice information respectively.

And step 604, if a target template with the sentence pattern matched with the voice information exists in the template library, performing intention identification on the voice information according to the target template.

Step 605, if there is no target template with sentence pattern matching with the voice information in the template library, performing feature extraction on the voice information.

Step 606, the extracted features are input into a semantic model to obtain the intent of the voice information.

In the application, after the server module determines the intention of the voice information, the target interaction matched with the intention of the voice information is determined from the executable interactive content.

In step 607, the front end of the page performs the target interaction on the page.

Therefore, voice information input by a user is acquired through the voice module of the electronic equipment, the intention recognition is carried out on the voice information through the server module, the target interaction matched with the intention of the voice information is determined from the executable interaction content of the page, the target interaction is carried out on the page at the front end of the page, the interaction between the user and the page in a voice mode is realized, and the interaction efficiency between the user and the page is improved.

As an example, fig. 7 is a schematic diagram of interaction between a user and a page provided in the embodiment of the present application.

As can be seen from fig. 7, after the user inputs voice to the page of the electronic device, and the page acquires the voice information input by the user, the target interaction matching the intention of the voice information is determined from the executable interaction content by performing intention recognition on the voice information, that is, the target interaction can be executed on the page. Therefore, the purpose that the user interacts with the page in a voice mode is achieved.

In order to implement the above embodiments, the present application provides a page interaction apparatus.

Fig. 8 is a schematic structural diagram of a page interaction apparatus according to a sixth embodiment of the present application.

As shown in fig. 8, the page interaction apparatus 600 may include: a first acquisition module 610, a second acquisition module 620, an intent recognition module 630, and an execution module 640.

The first obtaining module 610 is configured to obtain executable interactive content of a page.

And a second obtaining module 620, configured to obtain the voice information.

And an intention recognition module 630, configured to perform intention recognition on the voice information to determine, from the executable interaction content, a target interaction matching an intention of the voice information.

And the execution module 640 is used for executing target interaction on the page.

As a possible scenario, the intention identifying module 630 further includes:

the first determining unit is used for determining executable interactive operation of the target element in the page according to the executable interactive content;

a second determination unit, configured to determine, from the interactive operations that can be performed on the target element, a target operation that matches the intent;

and the first generating unit is used for generating an interactive instruction of target interaction according to the target element and the target operation.

As another possible case, the intention identifying module 630 further includes:

and the third determining unit is used for determining the interaction purposes of the interactive elements in the page according to the executable interaction content.

And a fourth determining unit, configured to determine a target element from the multiple interactive elements, wherein the interaction purpose of the target element matches the intention.

As another possible scenario, the first obtaining module 610 includes:

and the analysis unit is used for analyzing the page to obtain the response operation of each interactive element.

And the fifth determining unit is used for taking the operation responded by each interactive element as the interactive operation executable by each interactive element.

And the second generating unit is used for generating a list of executable interactive contents according to the interactive operation which can be executed by each interactive element.

As another possible case, the second generating unit is further configured to:

acquiring text description of each interactive element obtained by analyzing the page;

using the text description of each interactive element as the interactive purpose of each interactive element;

and generating a list of executable interactive contents according to the interactive operation of each interactive element and the interactive purpose of each interactive element.

As another possible case, the intention identifying module 630 may further include:

and the second acquisition unit is used for acquiring the template library.

And the matching unit is used for matching each template in the template library with the voice information respectively.

The intention identification unit is used for identifying the intention of the voice information according to the target template if the sentence pattern and the target template matched with the voice information exist in the template library; and if the target template with the sentence pattern matched with the voice information does not exist in the template library, performing intention recognition on the voice information by adopting a semantic model.

As another possible case, the intention identifying unit is further configured to:

extracting the slot filling content of the slot position according to the slot position set in the target template and the text position corresponding to the slot position in the voice information; the content of the slot filling of the slot position is used as the intention of voice information;

or, the intention corresponding to the target template is used as the intention of the voice information.

segmenting the voice information to obtain the characteristics of each segmented word; the features include semantic features and part-of-speech features;

and inputting the characteristics of each participle into a semantic model to obtain the intention of the voice information.

It should be noted that the explanation of the page interaction method in the first to third embodiments is also applicable to the page interaction apparatus of the first to third embodiments, and is not repeated herein.

The webpage interaction device of the embodiment of the application acquires the executable interaction content of the webpage; acquiring voice information; performing intention recognition on the voice information so as to determine target interaction matched with the intention of the voice information from the executable interaction content; and executing target interaction on the page. Therefore, the target interaction matched with the intention of the voice information input by the user is executed on the page, the interaction between the user and the webpage in a voice mode is realized, and the interaction efficiency between the user and the page is improved.

In order to implement the above embodiments, the present application provides another page interaction apparatus.

Fig. 9 is a schematic structural diagram of a page interaction apparatus according to a seventh embodiment of the present application.

As shown in fig. 9, the page interaction apparatus 700 may include: an interaction acquisition module 710, a voice acquisition module 720, a recognition module 730, and a sending module 740.

The interaction obtaining module 710 is configured to obtain executable interaction content obtained by identifying a page at a front end of the page;

a voice obtaining module 720, configured to obtain voice information;

a recognition module 730, configured to perform intent recognition on the voice information to determine, from the executable interactive content, a target interaction matching an intent of the voice information;

a sending module 740, configured to send an interaction instruction of the target interaction to the page front end; and the interaction instruction is used for executing the target interaction on the page.

As a possible scenario, the identifying module 730 may further be configured to:

determining executable interactive operation of target elements in the page according to the executable interactive content;

determining a target operation matched with the intention from the interactive operations executable by the target element;

and generating an interactive instruction of target interaction according to the target element and the target operation.

As another possible scenario, the identifying module 730 may further be configured to:

determining interaction purposes of a plurality of interaction elements in a page according to the executable interaction content;

from the plurality of interactive elements, a target element is determined, wherein an interaction purpose of the target element matches the intent.

As another possible scenario, the interaction obtaining module 710 may further be configured to:

analyzing the page to obtain the response operation of each interactive element;

taking the operation responded by each interactive element as the interactive operation executable by each interactive element;

and generating a list of executable interactive contents according to the interactive operation which can be executed by each interactive element.

acquiring a template library;

matching each template in the template library with the voice information respectively;

if a target template with a sentence pattern matched with the voice information exists in the template library, performing intention identification on the voice information according to the target template;

and if the target template with the sentence pattern matched with the voice information does not exist in the template library, performing intention recognition on the voice information by adopting a semantic model.

The page interaction device of the embodiment of the application acquires voice information by acquiring executable interaction content acquired by a front end of a page to identify the page, performs intention identification on the voice information, determines target interaction matched with the intention of the voice information from the executable interaction content, and sends an interaction instruction of the target interaction to the front end of the page; and the interaction instruction is used for executing target interaction on the page. Therefore, the interaction instruction interacting with the page is determined by the server module and sent to the front end of the page, so that target interaction is executed on the page, interaction between the user and the page in a voice interaction mode is realized, and the efficiency of interaction between the user and the page is improved.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 10 is a block diagram of an electronic device according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 10, the electronic apparatus includes: one or more processors 801, memory 802, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 10 illustrates an example of a processor 801.

The memory 802 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method of page interaction provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of page interaction provided herein.

The memory 802, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method of page interaction in the embodiments of the present application (e.g., the first obtaining module 610, the second obtaining module 620, the intention identifying module 630, and the executing module 640 shown in fig. 8, and the interaction obtaining module 710, the voice obtaining module 720, the identifying module 730, and the sending module 740 shown in fig. 9). The processor 801 executes various functional applications of the server and data processing, i.e., implements the method of page interaction in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory Y02.

The memory 802 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the page-interacting electronic device, and the like. Further, the memory 802 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 802 may optionally include memory located remotely from the processor 801, which may be connected to a page-interacting electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the page interaction method may further include: an input device 803 and an output device 804. The processor 801, the memory 802, the input device 803, and the output device 804 may be connected by a bus or other means, and are exemplified by a bus in fig. 10.

The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device interacting with the page, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer, one or more mouse buttons, a track ball, a joystick, or other input device. The output devices 804 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the executable interactive content of the page is obtained; acquiring voice information; performing intention recognition on the voice information so as to determine target interaction matched with the intention of the voice information from the executable interaction content; and executing target interaction on the page. Therefore, the target interaction matched with the intention of the voice information input by the user is executed on the page, the interaction between the user and the webpage in a voice mode is realized, and the interaction efficiency between the user and the page is improved.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A page interaction method is characterized by comprising the following steps:

acquiring executable interactive content of a page;

acquiring voice information;

and executing the target interaction on the page.

2. The page interaction method of claim 1, wherein the determining, from the executable interaction content, a target interaction that matches the intent of the voice message comprises:

and generating an interactive instruction of the target interaction according to the target element and the target operation.

3. The page interaction method according to claim 2, wherein before determining the interactive operation that can be performed by the target element in the page, the method further comprises:

determining the interaction purpose of a plurality of interaction elements in the page according to the executable interaction content;

determining the target element from a plurality of the interactive elements, wherein the interaction purpose of the target element matches the intention.

4. The page interaction method according to claim 1, wherein the obtaining of the executable interaction content of the page comprises:

and generating the list of the executable interactive contents according to the executable interactive operation of each interactive element.

5. The page interaction method according to claim 4, wherein the generating the list of executable interactive contents according to the interactive operation executable by each interactive element comprises:

and generating the list of the executable interactive content according to the interactive operation of each interactive element and the interactive purpose of each interactive element.

6. The page interaction method according to any one of claims 1 to 5, wherein the performing intent recognition on the voice information comprises:

acquiring a template library;

if a target template with a sentence pattern matched with the voice information exists in the template library, performing intention recognition on the voice information according to the target template;

7. The page interaction method according to claim 6, wherein said performing intent recognition on said voice message according to said target template comprises:

extracting the slot filling content of the slot position according to the slot position set in the target template and the text position corresponding to the slot position in the voice information; taking the filling content of the slot position as the intention of the voice information;

8. The page interaction method according to claim 6, wherein said performing intent recognition on said voice information by using semantic model comprises:

segmenting the voice information to obtain the characteristics of each segmented word; the features comprise semantic features and part-of-speech features;

and inputting the characteristics of each participle into the semantic model to obtain the intention of the voice information.

9. A page interaction method is characterized by comprising the following steps:

acquiring voice information;

10. The page interaction method of claim 9, wherein the determining, from the executable interaction content, a target interaction that matches the intent of the voice message comprises:

11. The page interaction method according to claim 10, wherein before determining the interactive operation that can be performed by the target element in the page, the method further comprises:

12. The page interaction method according to claim 9, wherein the obtaining of the executable interactive content obtained by the page front-end recognition page comprises:

13. The page interaction method according to claim 12, wherein the generating the list of executable interactive contents according to the interactive operation executable by each interactive element comprises:

14. The page interaction method according to any one of claims 9 to 13, wherein the performing intent recognition on the voice information comprises:

acquiring a template library;

15. A page interaction apparatus, comprising:

the second acquisition module is used for acquiring voice information;

16. The page interaction apparatus of claim 15, wherein the intention recognition module further comprises:

and the first generating unit is used for generating the interactive instruction of the target interaction according to the target element and the target operation.

17. The page interaction apparatus of claim 16, wherein the intention recognition module further comprises:

a third determining unit, configured to determine, according to the executable interactive content, interactive destinations of multiple interactive elements in the page;

a fourth determining unit, configured to determine the target element from the plurality of interactive elements, wherein an interaction purpose of the target element matches the intention.

18. The page interaction device according to claim 15, wherein the first obtaining module comprises:

the analysis unit is used for analyzing the page to obtain the response operation of each interactive element;

a fifth determining unit, configured to use the operation responded by each interactive element as an interactive operation executable by each interactive element;

and the second generating unit is used for generating the list of the executable interactive contents according to the interactive operation which can be executed by each interactive element.

19. The page interaction device according to claim 18, wherein the second generating unit is further configured to:

20. The page interaction device of any one of claims 15 to 19, wherein the intention recognition module further comprises:

a first obtaining unit, configured to obtain a template library;

the matching unit is used for matching each template in the template library with the voice information respectively;

the intention recognition unit is used for performing intention recognition on the voice information according to a target template if the sentence pattern exists in the template library and the target template is matched with the voice information; and if the target template with the sentence pattern matched with the voice information does not exist in the template library, performing intention recognition on the voice information by adopting a semantic model.

21. The page interaction device of claim 20, wherein the intention identifying unit is further configured to:

22. The page interaction device of claim 20, wherein the intention identifying unit is further configured to:

23. A page interaction apparatus, comprising:

the voice acquisition module is used for acquiring voice information;

24. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the page interaction method of any one of claims 1-8 or to implement the page interaction method of any one of claims 9-15.

25. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the page interaction method of any one of claims 1 to 8 or to implement the page interaction method of any one of claims 9 to 15.