CN114996622B

CN114996622B - Information acquisition method, value network model training method and electronic equipment

Info

Publication number: CN114996622B
Application number: CN202210920138.8A
Authority: CN
Inventors: 余浩; 王健
Original assignee: Beijing Hongji Information Technology Co ltd
Current assignee: Beijing Hongji Information Technology Co ltd
Priority date: 2022-08-02
Filing date: 2022-08-02
Publication date: 2022-11-11
Anticipated expiration: 2042-08-02
Also published as: CN114996622A

Abstract

The application provides an information acquisition method, a value network model training method and electronic equipment, wherein the method acquires global semantic information of a first webpage and control semantic information of each control object in the first webpage through a semantic extraction model; judging whether the first webpage contains target content to be acquired or not according to the global semantic information of the first webpage; if the first webpage does not contain the target content to be acquired, predicting and selecting the score of each control object through a value network model according to the global semantic information and the control semantic information of each control object; and executing operation on the control object with the highest value in the first webpage, jumping to a second webpage, and continuously judging whether the second webpage contains target content to be acquired until the target content is acquired. According to the scheme, human beings can be simulated to understand the webpage, and human click operation is simulated to jump the webpage, so that the webpage where the target content is located is jumped as fast as possible, and the convenience of information acquisition is improved.

Description

Information acquisition method, value network model training method and electronic equipment

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to an information acquisition method, a value network model training method, an electronic device, and a computer-readable storage medium.

Background

Human beings browse information and find desired content on the internet by understanding the content of web pages (images, text, GUI controls, etc.) and jumping directly between web pages and web pages.

Search engines can be used to quickly find web page content, but some content has rights control or dynamic content and cannot be obtained by search engines.

We can develop an intelligent device that can understand the content and layout of a web page like a human being, know how to go to the most likely next page, and further determine whether there is information needed by the user, and so on until the target information is found.

Human-like intelligent web browsing and information acquisition are a reliable artificial intelligence system. It not only needs to do natural language understanding, but also needs to do image recognition to understand the webpage elements. Is a technology similar to general intelligence. Has wide application prospect. The following is an example of an application: and a system integration manufacturer provides IT system operation and maintenance work for other enterprises. It may be necessary to access different operating systems and various software and hardware (e.g., network devices). There are various versions and models between different software and hardware, and there are a lot of compatibility problems. These compatibility issues are typically published on their websites by software and hardware vendors in their respective forms and updated with up-to-date compatibility information. System integration vendors desire to manage device vendor compatibility information in a uniform manner for use by technicians. Usually, the work is done by people, but the equipment is various, the updating is frequent, and the time and the labor are very consumed. An automatic compatibility information capturing robot is very useful, official network addresses of all manufacturers are provided for the robot, the robot automatically browses web pages until a target web page is found, and the robot captures and stores the compatibility information.

Generally, a crawler technology and a search engine technology are used for completing the above operations, but for the crawler technology, a person needs to enter a relevant webpage in advance to set contents crawled by the crawler. Once the web page content is updated, the crawler will fail. For the search engine technology, dynamic web page content cannot be processed, and a page with authority control cannot be accessed. Therefore, in the case of web pages with control authority or dynamic change, the crawler technology and the search engine technology will not be used.

Disclosure of Invention

The embodiment of the application provides an information acquisition method, which is used for quickly understanding webpage content to perform webpage skipping and finding target information to be inquired.

The embodiment of the application provides an information acquisition method, which comprises the following steps:

loading a first webpage, and obtaining global semantic information of the first webpage and control semantic information of each control object in the first webpage through a semantic extraction model;

judging whether the first webpage contains target content to be acquired or not according to the global semantic information of the first webpage;

if the first webpage does not contain the target content to be acquired, predicting and selecting the score of each control object through a value network model according to the global semantic information and the control semantic information of each control object;

and executing operation on the control object with the highest score in the first webpage, jumping to a second webpage, and continuously judging whether the second webpage contains target content to be acquired until the target content is acquired.

In one embodiment, the loading the first webpage includes:

loading a website home page corresponding to the website entry address according to the input website entry address;

obtaining global semantic information of the website home page and control semantic information of each control object in the website home page through a semantic extraction model;

predicting and selecting the score of each control object through a value network model according to the global semantic information of the website home page and the control semantic information of each control object in the website home page;

and executing operation on the control object with the highest score in the website home page, and jumping to a first webpage from the website home page.

In an embodiment, the determining, according to the global semantic information of the first web page, whether the first web page includes target content to be acquired includes:

calculating a first similarity between the global semantic information of the first webpage and the target content;

and determining whether the first webpage contains the target content according to the size of the first similarity.

In an embodiment, before the obtaining, by the semantic extraction model, the global semantic information of the first web page and the control semantic information of each control object in the first web page, the method further includes:

acquiring sample content and a sample website;

and training the initial value network according to the sample content and the sample website to obtain a value network model obtained by the initial value network training.

In an embodiment, the training an initial value network according to the sample content and the sample website to obtain a value network model obtained by the initial value network training includes:

and alternately training a pre-training model and an initial value network according to the sample content and the sample website to obtain a semantic extraction model obtained by training the pre-training model and a value network model obtained by training the initial value network.

loading a sample website home page corresponding to the sample website, and obtaining the global semantic information of the sample website home page and the control semantic information of each control object in the sample website home page through a pre-training model;

according to the global semantic information of the sample website home page and the control semantic information of each control object in the sample website home page, the score of each control object is selected through initial value network prediction;

selecting a control object with the highest score from the sample website home page according to a first preset probability, executing operation on the selected control object, and jumping to a new web page from the sample website home page;

and updating the initial value network according to the semantic similarity between the new web page and the sample content, and continuing to select a control object and skip the web page from the new web page until a training target is reached to obtain a value network model obtained by the initial value network training.

In an embodiment, the updating the initial value network according to the semantic similarity between the new web page and the sample content includes:

extracting global semantic information of the new web page through the pre-training module, and calculating semantic similarity between the web page semantic information of the new web page and the sample content;

determining an incentive value for executing operation on the selected control object according to the semantic similarity between the webpage semantic information of the new webpage and the sample content;

and updating a loss function and adjusting the parameters of the initial value network according to the reward value.

In an embodiment, the alternately training the pre-training model and the initial value network according to the sample content and the sample website includes:

loading a sample website home page corresponding to the sample website, and obtaining global semantic information of the sample website home page and control semantic information of each control object in the sample website home page through a pre-training model;

predicting the webpage content of the next state according to the sample website home page and the control object selected from the sample website home page;

updating the pre-training model according to the semantic similarity between the new webpage and the webpage content of the next state; and extracting semantic information from the sample website home page again based on the updated pre-training model, and training the initial value network.

In an embodiment, the extracting semantic information from the sample website home page again based on the updated pre-training model, and training the initial value network includes:

obtaining global semantic information of the sample website home page and control semantic information of each control object in the sample website home page through the updated pre-training model;

and updating the initial value network according to the semantic similarity between the new web page and the sample content, continuing to select a control object and skip web pages from the new web page, and updating the updated pre-training model again until a training target is reached.

The embodiment of the present application further provides a training method for a value network model, where the value network model is used to obtain target content from a website, and the method includes:

loading a sample website home page corresponding to a sample website, and obtaining global semantic information of the sample website home page and control semantic information of each control object in the sample website home page through a pre-training model;

selecting a control object with the highest score from the sample website home page according to a first preset probability, executing operation on the selected control object, and jumping to a new webpage from the sample website home page;

and updating the initial value network according to the semantic similarity between the new webpage and the sample content, and continuing to select a control object and skip the webpage from the new webpage until a training target is reached to obtain a value network model obtained by the initial value network training.

An embodiment of the present application further provides an information acquiring apparatus, including:

the semantic extraction module is used for loading a first webpage and obtaining the global semantic information of the first webpage and the control semantic information of each control object in the first webpage through a semantic extraction model;

the content evaluation module is used for judging whether the first webpage contains target content to be acquired or not according to the global semantic information of the first webpage;

the score prediction module is used for predicting and selecting the score of each control object through a value network model according to the global semantic information and the control semantic information of each control object if the first webpage does not contain the target content to be acquired;

and the webpage skipping module is used for executing operation on the control object with the highest score in the first webpage, skipping to a second webpage, and continuously judging whether the second webpage contains target content to be acquired until the target content is acquired.

The embodiment of the present application further provides a training apparatus for a value network model, where the value network model is used to obtain target content from a website, and the apparatus includes:

the webpage loading module is used for loading a sample website home page corresponding to a sample website and obtaining the global semantic information of the sample website home page and the control semantic information of each control object in the sample website home page through a pre-training model;

the score prediction module is used for selecting the score of each control object through initial value network prediction according to the global semantic information of the sample website home page and the control semantic information of each control object in the sample website home page;

the webpage skipping module is used for selecting a control object with the highest score from the sample website home page according to a first preset probability, executing operation on the selected control object, and skipping to a new webpage from the sample website home page;

and the network updating module is used for updating the initial value network according to the semantic similarity between the new web page and the sample content, continuing to select a control object and skip the web page from the new web page until a training target is reached, and acquiring a value network model obtained by the initial value network training.

An embodiment of the present application further provides an electronic device, where the electronic device includes:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the above-mentioned information acquisition method or training method of the value network model.

Embodiments of the present application further provide a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is executable by a processor to perform the above-mentioned information obtaining method or training method of a value network model.

According to the technical scheme provided by the embodiment of the application, the global semantic information of the first webpage and the control semantic information of each control object in the first webpage are obtained through the semantic extraction model; judging whether the first webpage contains target content to be acquired or not according to the global semantic information of the first webpage; if the first webpage does not contain the target content to be acquired, predicting and selecting the score of each control object through a value network model according to the global semantic information and the control semantic information of each control object; and executing operation on the control object with the highest value in the first webpage, jumping to a second webpage, and continuously judging whether the second webpage contains target content to be acquired until the target content is acquired. According to the scheme, human beings can be simulated to understand the webpage, human click operation is simulated to jump the webpage, the webpage where the target content is located is jumped as fast as possible, and the convenience of information acquisition is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;

FIG. 2 is a web page topology diagram of a website provided by an embodiment of the present application;

FIG. 3 is a schematic flow chart of a training process provided by an embodiment of the present application;

FIG. 4 is a schematic illustration of a website home page;

FIG. 5 is a schematic diagram of the network module outputting scores corresponding to actions in a certain state;

FIG. 6 is a schematic diagram of the value of a network module output state and action combination;

FIG. 7 is a schematic diagram of a training process for a value network model;

FIG. 8 is a schematic flow chart of the alternative training of the semantic extraction model and the value network model;

FIG. 9 is a schematic diagram of a training principle of a semantic extraction model based on state transition;

fig. 10 is a schematic flowchart of an information acquisition method according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

Like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not construed as indicating or implying relative importance.

Fig. 1 is a schematic structural diagram of an electronic device provided in an embodiment of the present application. The electronic device 100 may be configured to execute the information obtaining method provided in the embodiment of the present application. As shown in fig. 1, the electronic device 100 includes: one or more processors 102, and one or more memories 104 storing processor-executable instructions. Wherein the processor 102 is configured to execute the information obtaining method provided by the following embodiments of the present application.

The processor 102 may be a gateway, or may be an intelligent terminal, or may be a device including a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other form of processing unit having data processing capability and/or instruction execution capability, and may process data of other components in the electronic device 100, and may control other components in the electronic device 100 to perform desired functions.

The memory 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 102 to implement the information acquisition methods described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

In one embodiment, the electronic device 100 shown in FIG. 1 may further include an input device 106, an output device 108, and a data acquisition device 110, which are interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are exemplary only, and not limiting, and the electronic device 100 may have other components and structures as desired.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like. The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like. The data acquisition device 110 may acquire an image of an object and store the acquired image in the memory 104 for use by other components. Illustratively, the data acquisition device 110 may be a camera.

In an embodiment, the components in the example electronic device 100 for implementing the information obtaining method of the embodiment of the present application may be integrally disposed, or may be disposed separately, such as the processor 102, the memory 104, the input device 106, and the output device 108 are integrally disposed in one body, and the data collecting device 110 is disposed separately.

In an embodiment, the example electronic device 100 for implementing the information acquisition method of the embodiment of the present application may be implemented as an intelligent terminal, such as a smart phone, a tablet computer, a desktop computer, a server, a vehicle-mounted device, and the like.

Fig. 2 is a web page topology diagram of a website according to an embodiment of the present application. A website is a webpage topological graph (a directed cyclic graph) composed of a plurality of webpages (Homepage and Web pages), each website can have one to a plurality of webpage states, and the content of each webpage state is different. Furthermore, a website can be regarded as a web page state topology diagram composed of several web page states, as shown in fig. 2. A user performing an action a on a web page may change one web page state to the next, where any one web page state may be rolled back to the previous web page state by a "back" action on the browser.

The following embodiments of the present application provide a solution to train an agent (also called a robot) to find a target content by browsing web pages and understanding web pages from the entrance web site of a target web site like a human being, and specifically, train it using reinforcement learning technology so that it can understand the content on different web pages and gradually learn which link or button to click to find the web page with the required content most quickly. A certain webpage of the target website contains target content, the target content is content (a whole webpage or some content in the webpage) which needs to be found finally, and the target website home page can be obtained by loading the portal website. In one embodiment, the target web site may be "hundred degrees," the portal address of the target web site may be www.baidu.com, and the target content may be "web pages of hundred degrees news.

Reinforcement learning is a process of repeated iteration, and according to reinforcement learning techniques, each iteration solves two problems: given a policy, an evaluation function, and updating the policy based on the value function. Policy is a behavior function of the agent, a state to action mapping that tells the agent how to pick the next action. The value function is used to calculate the reward generated by the action.

The training sample may include the portal sites and target content of a plurality of target web sites.

The training process and the application process are described in detail below.

Fig. 3 is a flowchart illustrating a training process according to an embodiment of the present application. As shown in fig. 3, the training process includes:

step S310: and loading a sample website home page corresponding to the sample website, and obtaining the global semantic information of the sample website home page and the control semantic information of each control object in the sample website home page through a pre-training model.

For differentiation, the entry website of the training phase may be referred to as a sample website, and the target content may be referred to as sample content. The sample website home page refers to the website home page of the sample website address.

The global semantic information refers to the meaning mainly expressed by the whole webpage of the sample website home page, and can be represented by a vector. The control object is a webpage object for controlling completion of webpage jumping, and can be in-page jumping or webpage change, or inter-page jumping or rollback. FIG. 4 is a schematic illustration of a website home page. As shown in fig. 4, each of the portions shown by the black boxes is a control object, and the functional attributes of the control object are divided into information presentation, information input, and control. Taking the search box shown in fig. 4 as an example, the search box object includes a text box object, an icon object, and a button object, where the text box object is an information input function object, the icon object is an intra-page jump control function object, and the button object is an inter-page jump control function object.

The semantic information of the control refers to semantic information of a control object, for example, "hundred degrees" of the control object, and the semantic representation is a word vector representation of the four words. If a control object has a large amount of text, such as "A is behind 5G: b "was rejected as object content and the semantic is its sentence vector. If a control object is a picture, such as a camera icon in a search box, the picture content needs to be recognized, and then the words are expressed in word vectors to obtain semantic representation of the words. Therefore, each control object can obtain control semantic information, so that the control object can be conveniently selected according to semantics in the follow-up process, and the jumping direction is determined.

The global semantic information and the control semantic information can be obtained by extracting through a pre-training model. The pre-trained model may be considered as a model trained prior to step S310.

Step S320: and predicting and selecting the score of each control object through an initial value network according to the global semantic information of the sample website home page and the control semantic information of each control object in the sample website home page.

Specifically, the initial value network includes two parts, namely an actor (policy function) and a critic (value function), as shown in fig. 5, global semantic information (status) and control semantic information (action) of a certain control object can be used as input, and the score (i.e. score) of the control object is selected through actor evaluation. The high and low values of the score are used for representing the possibility that the control object can jump to a target webpage containing target content when being selected to jump. The scores of all the control objects can be obtained by this step S320. Where the penalty function is used to measure the difference between the reward value (reward value) and the true value, the gradient is the direction that narrows the difference between the predicted value and the true value. And performing back propagation according to the loss function, transferring the gradient, and adjusting the weight parameter of the actor. The calculation of the reward value is described in correspondence with fig. 6 below.

Step S330: and selecting the control object with the highest score from the sample website home page according to a first preset probability, executing operation on the selected control object, and jumping to a new webpage from the sample website home page.

For example, the first preset probability may be a rough probability, such as 80%, 90%, or 95%, and the rough probability selects the control object with the highest score to perform the simulated click operation, so as to realize the jump from the first page of the sample website to the new page. In other words, a small probability can randomly select a control object to jump to a new web page. The new webpage is a webpage obtained after the first page of the sample website is jumped.

Step S340: and updating the initial value network according to the semantic similarity between the new web page and the sample content, and continuing to select a control object and skip web pages from the new web page until a training target is reached to obtain a value network model obtained by the training of the initial value network.

In an embodiment, the web page understanding of the new web page can be performed through the pre-training model, so that the global semantic information of the new web page is obtained. And calculating the semantic similarity between the global semantic information of the new web page and the sample content, and determining a reward value of the designated control object in the first page of the selected sample website.

As shown in FIG. 6, according to the state of the web page before the jump and the selected control object (i.e. action), the reward value of the selected control object can be output through critic. And then calculating gradient to perform back propagation according to the difference between the reward value and the true value, and adjusting the weight parameter of the operator. In one embodiment, the detailed form of the Critic function may calculate the semantic similarity between the new web page and the sample content.

It should be noted that the global semantic information and the sample content may be expressed by semantic vectors, so the semantic similarity may be expressed by a distance between vectors, and the smaller the distance is, the greater the semantic similarity is. For example, we can express by Cos distance: d (X, Y) = X Y/| X | | Y |, X and Y are semantic representations of the current web page and sample content, | X | is the square norm of X, | Y | is the square norm of Y, X | _ Y represents the inner product of the vector. When the new web page contains sample content, the distance is 0.

In one embodiment, the reward value may be measured by semantic similarity between the global semantic information and the sample content, with the greater the semantic similarity, the greater the reward value. The goal of reinforcement learning is to maximize the reward of the final state, which has the disadvantage that the reward signal is sparse, since only the signal of the final web page is executed, and the signals of other intermediate web pages are all null. Thus, in another embodiment, the concrete form of the Critic function, i.e. the calculation formula for the reward value reward, may be R (Si, ai) = D (Si, ai, S (i + 1)) -D (S (i-1), a (i-1), si)). Si is the web page state of step i, S (i + 1) is the web page state of step i +1, ai is the action of step i, and action Ai is executed in Si state, and the state jumps to S (i + 1). D (Si, ai, S (i + 1)) may represent semantic similarity of the S (i + 1) web page state to the sample content. D (S (i-1), A (i-1), si) may represent semantic similarity of Si web page state and sample content. (S (i-1), A (i-1), si, ai, S (i + 1)) are two consecutive state transitions. Thus, the reward value for performing action Ai in the Si state may be represented by the semantic similarity of the S (i + 1) web page state to the sample content, minus the difference between the semantic similarity of the Si web page state and the sample content.

This process is a reward or penalty for selecting a control object. If the selection is good, the rewarded control object has more chances to be selected again when the similar webpage state is encountered again according to the updated value network, and if the rewarded control object is punished, the chances to select the control object again are reduced.

The browsing process from a website home page to sample content (i.e. target content) is a reinforcement learning process, and the training data may be a collection of browsing data of a plurality of websites. The website browsing data may include a set of web page topologies and port pairs (website home page and target content) for a website.

As shown in fig. 7, in a first step, a value network is initialized; secondly, loading a home page of a website according to an access and exit pair (a home page and target content of the website) on the website; and thirdly, performing webpage understanding through a pre-training model to obtain global semantic information and control semantic information. And fourthly, under the current global semantic information, the value network is used for evaluating, the value of each control object is selected, the control object with the highest value is selected with a certain probability, and page skipping is achieved. And fifthly, performing webpage understanding on the new webpage through the pre-training model to obtain the global semantic information and the control semantic information of the new webpage. And sixthly, generating a corresponding reward value according to the global semantic information of the new web page and the semantic similarity of the target content. Seventhly, updating the value network according to the reward value. After the initial value network is updated once, the control object selection and the webpage skipping from the new webpage can be continued based on the updated value network, whether the skipped webpage contains sample content or not is evaluated, the reward value is generated, the value network is updated again, and the process is circulated until the new webpage containing the target content is skipped or the exit is failed. Then, a second round, i.e., a second entrance-exit pair, of training is performed on the same website until the end condition of the website is reached. The termination condition may include two cases: 1) Failure: the number of operation steps is greater than a certain threshold; 2) The success is as follows: and in m rounds of path finding after the minimum path finding step number x, successful path finding less than x steps can not be performed all the time, and the method is finished. And eighthly, selecting the next website and the entrance and exit pair set, and repeating the second step to the seventh step. In one embodiment, the training goal may be to reach the end condition of each web site, resulting in a value network model resulting from initial value network training.

In the embodiment, the pre-training model is known, the initial value network is trained directly according to the sample content and the sample website by obtaining the sample content and the sample website, and the value network model obtained by the initial value network training is obtained.

In the following embodiment, the pre-training model and the initial value network are alternately trained according to the sample content and the sample website, and a semantic extraction model obtained by retraining the pre-training model and a value network model obtained by training the initial value network are obtained. Thus, the accuracy of semantic representation of the web page and the control object can be improved. As shown in fig. 8, the alternating training process may include the following steps: step S610-step S650.

Step S610: and loading a sample website home page corresponding to the sample website, and obtaining the global semantic information of the sample website home page and the control semantic information of each control object in the sample website home page through a pre-training model.

Step S620: and predicting and selecting the score of each control object through an initial value network according to the global semantic information of the sample website home page and the control semantic information of each control object in the sample website home page.

Step S630: and selecting the control object with the highest score from the sample website home page according to a first preset probability, executing operation on the selected control object, and jumping to a new webpage from the sample website home page.

Step S640: predicting the webpage content of the next state according to the sample website home page and the control object with the highest score in the sample website home page;

the steps S610 to S630 are the same as the steps S310 to S330 in the above embodiment. Unlike the embodiment of fig. 3, the current embodiment further more accurately represents the semantics of the web page state and the semantics of the control object (i.e., the action) by adopting a mode of self-supervision learning and reinforcement learning and alternate learning. The self-supervised learning can predict the representation of the state and the action of the self-supervised learning by sampling state transition as a supervision signal, namely predicting the webpage content of the next state by a pre-trained model through the sample website top page (state) at the current moment and the control object (action) with the highest score, as shown in fig. 9, representing the collection of the webpage states in multiple categories, and outputting the probability value of the next state being each webpage state in the webpage topological graph through the pre-trained model.

Step S650: updating the pre-training model according to the semantic similarity between the new webpage and the webpage content of the next state; and extracting semantic information from the sample website home page again based on the updated pre-training model, and training the initial value network.

As shown in fig. 9, f represents a module for extracting global semantic information of a web page from the pre-training model, and g represents a module for extracting control semantic information of a control object from the pre-training model. The pre-training model can predict the webpage content of the next state based on the global semantic information of the current webpage and the control semantic information of the selected control object, recalculate the loss function and the gradient based on the actual jump to the new webpage and the predicted webpage content of the next state, return to update the pre-training model, and enable the probability value of the webpage content of the next state to be the new webpage to be as large as possible. And then, the global semantic information of the sample website home page and the control semantic information of each control object in the sample website home page can be obtained again through the updated pre-training model. Repeating the step S620, and selecting the score of each control object through initial value network prediction according to the global semantic information of the sample website home page and the control semantic information of each control object in the sample website home page; step S630 performs an operation on the control object with the highest score in the sample website home page, and jumps to a new web page from the sample website home page.

And then, updating the initial value network according to the semantic similarity between a new web page and the sample content, continuously selecting a control object and skipping web pages from the new web page, and updating the updated pre-training model again until a training target is reached. The initial value network may be updated according to the embodiment shown in fig. 3, thereby implementing the alternate training of the pre-training model and the initial value network.

Fig. 10 is a schematic flowchart of an information acquisition method according to an embodiment of the present application. As shown in fig. 10, the method comprises the steps of:

step S1010: loading a first webpage, and obtaining the global semantic information of the first webpage and the control semantic information of each control object in the first webpage through a semantic extraction model.

The first web page may be the home page or any intermediate state web page. In an embodiment, the first webpage can be directly loaded according to the webpage address input by the user. In other embodiments, the first web page may be skipped from the first page of a website.

The semantic extraction model can be obtained by the training method provided above, retraining by a pre-training model, or directly adopting the pre-training model. The global semantic information of the first web page and the extraction manner of the control semantic information of the control object in the first web page may be as described in the embodiment shown in fig. 3.

Step S1020: and judging whether the first webpage contains target content to be acquired or not according to the global semantic information of the first webpage.

In an embodiment, a first similarity between global semantic information of the first webpage and the target content may be calculated; and determining whether the first webpage contains the target content according to the size of the first similarity.

Specifically, the first similarity refers to a semantic similarity between the global semantic information of the first web page and the target content. For example, if the first similarity is greater than the predetermined value, it may indicate that the first webpage includes the target content.

Step S1030: and if the first webpage does not contain the target content to be acquired, predicting and selecting the score of each control object through a value network model according to the global semantic information and the control semantic information of each control object.

The value network model may be obtained from initial value network training using the method described in the above embodiments. And taking the global semantic information and the control semantic information of each control object as the input of the value network model, and obtaining the score of each control object output by the value network model. The score may be used to characterize the effectiveness of selecting the control object to jump to a web page containing the target content.

Step S1040: and executing operation on the control object with the highest score in the first webpage, jumping to a second webpage, and continuously judging whether the second webpage contains target content to be acquired until the target content is acquired.

And in the application stage, the control object with the highest score can be selected to execute the operation, and the operation jumps from the first webpage to the second webpage. And then, repeating the steps, continuously judging whether the second webpage contains target content to be acquired according to the sum of the global semantic information of the second webpage, if not, continuously selecting the control object with the highest score from the second webpage to execute operation, jumping to a third webpage, and so on until jumping to the target webpage containing the target content, thus obtaining the inquired target content.

In an embodiment, the loading the first webpage in step S1010 may include the following steps: loading a website home page corresponding to the website entry address according to the input website entry address; obtaining global semantic information of the website home page and control semantic information of each control object in the website home page through a semantic extraction model; predicting and selecting the score of each control object through a value network model according to the global semantic information of the website home page and the control semantic information of each control object in the website home page; and executing operation on the control object with the highest score in the website home page, and jumping to a first webpage from the website home page.

That is, the first webpage can be obtained by jumping from the website top page of the website portal address, which can be input by the user. Similarly, according to the global semantic information of the website home page and the control semantic information of each control object in the website home page, the score of each control object is selected through value network model prediction, and then the control object with the highest score in the website home page is selected to execute operation, and the operation is carried out from the website home page to the first webpage.

The following is an embodiment of the apparatus of the present application, which may be used to execute the information acquisition method and the training method of the value network model described in the foregoing embodiment, and reference may be made to the foregoing method embodiment for a specific implementation process of the apparatus embodiment described below.

An embodiment of the present application further provides an information obtaining apparatus, including:

The embodiment of the present application further provides a training apparatus for a value network model, where the value network model is used to obtain target content from a website, the apparatus includes:

In the embodiments provided in the present application, the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. An information acquisition method, comprising:

2. The method of claim 1, wherein loading the first web page comprises:

3. The method according to claim 1, wherein the determining whether the first web page contains target content to be acquired according to the global semantic information of the first web page comprises:

4. The method of claim 1, wherein before the obtaining global semantic information of the first web page and control semantic information of each control object in the first web page through a semantic extraction model, the method further comprises:

acquiring sample content and a sample website;

5. The method of claim 4, wherein the training of the initial value network according to the sample content and the sample website to obtain the value network model obtained by the initial value network training comprises:

6. The method of claim 4, wherein the training of the initial value network according to the sample content and the sample website to obtain the value network model obtained by the initial value network training comprises:

according to the global semantic information of the sample website home page and the control semantic information of each control object in the sample website home page, predicting and selecting the score of each control object through an initial value network;

7. The method of claim 6, wherein said updating the initial value network based on semantic similarity between the new web page and the sample content comprises:

and updating a loss function and adjusting parameters of the initial value network according to the reward value.

8. The method of claim 5, wherein the alternately training the pre-trained model and the initial value network according to the sample content and the sample website comprises:

updating the pre-training model according to the semantic similarity between the new web page and the web page content of the next state; and extracting semantic information from the sample website home page again based on the updated pre-training model, and training the initial value network.

9. The method of claim 8, wherein the extracting semantic information from the sample website home page based on the updated pre-training model and training the initial value network comprises:

10. A method for training a value network model, wherein the value network model is used for acquiring target content from a website, the method comprises the following steps:

11. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the information acquisition method of any one of claims 1 to 9 or the training method of the value network model of claim 10.

12. A computer-readable storage medium, characterized in that the storage medium stores a computer program executable by a processor to perform the information acquisition method of any one of claims 1 to 9 or the training method of the value network model of claim 10.