CN113515589A

CN113515589A - Data recommendation method, device, equipment and medium

Info

Publication number: CN113515589A
Application number: CN202110038819.7A
Authority: CN
Inventors: 欧子菁; 赵瑞辉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-01-12
Filing date: 2021-01-12
Publication date: 2021-10-19

Abstract

The embodiment of the application provides a data recommendation method, a device, equipment and a medium, wherein the method comprises the following steps: acquiring a first initial text associated with the first query information, acquiring an associated text corresponding to the first initial text according to a text reference relation corresponding to the first initial text, and constructing a reference relation graph containing the first initial text and the associated text; screening a target text corresponding to the first query information in the associated text and the first initial text according to the reference relation graph; and determining a shortest text reading path containing the target text in the reference relation graph, and generating first recommended content for responding to the first query information according to the shortest text reading path. By adopting the embodiment of the application, the recommendation accuracy of the text data can be improved.

Description

Data recommendation method, device, equipment and medium

Technical Field

The present application relates to the field of internet technologies, and in particular, to a data recommendation method, apparatus, device, and medium.

Background

With the development of data informatization, the data volume increases rapidly, and big data shows the trend of diversification and decentralization. In the context of large-scale data, most of the data is redundant to the user, who may only be interested in certain information. For example, when a user is doing research, the user needs to search a large amount of documents for reading the documents needed by the user.

In the prior art, a user can input a keyword in a search engine, the similarity between the keyword and a document topic can be calculated in the search engine, documents are sorted according to the similarity, and documents with document titles containing the keyword can be returned to the user according to the sorting. However, documents recommended based on keyword similarity often only include keywords input by a user in document titles, document contents are not the contents desired by the user, and documents really most closely matching with the user's intention may not include keywords input by the user in titles, which may result in a low accuracy rate of documents recommended to the user.

Disclosure of Invention

The embodiment of the application provides a data recommendation method, device, equipment and medium, which can improve the recommendation accuracy of text data.

An embodiment of the present application provides a data recommendation method, including:

acquiring a first initial text associated with the first query information, acquiring an associated text corresponding to the first initial text according to a text reference relation corresponding to the first initial text, and constructing a reference relation graph containing the first initial text and the associated text;

screening a target text corresponding to the first query information in the associated text and the first initial text according to the reference relation graph;

and determining a shortest text reading path containing the target text in the reference relation graph, and generating first recommended content for responding to the first query information according to the shortest text reading path.

An embodiment of the present application provides a data recommendation device in one aspect, including:

the relation graph building module is used for obtaining a first initial text associated with the first query information, obtaining an associated text corresponding to the first initial text according to a text reference relation corresponding to the first initial text, and building a reference relation graph containing the first initial text and the associated text;

the screening module is used for screening a target text corresponding to the first query information in the associated text and the first initial text according to the reference relation graph;

and the reading path determining module is used for determining the shortest text reading path containing the target text in the reference relation graph and generating first recommended content for responding to the first query information according to the shortest text reading path.

Wherein, the relational graph building module comprises:

the associated text determining unit is used for acquiring a reference text and a referred text corresponding to the first initial text according to the text reference relation corresponding to the first initial text, and determining the reference text and the referred text as associated texts;

and the construction unit is used for determining the first initial text and the associated text as text nodes and constructing a reference relation graph containing the text nodes according to the text reference relation between the text nodes.

Wherein, the screening module includes:

the text quantity obtaining unit is used for determining the first initial text and the associated text as text nodes and obtaining the quantity of the referred texts corresponding to the text nodes in the reference relation graph;

and the target text determining unit is used for determining the text node with the number of the cited texts larger than the number threshold as the target text corresponding to the first query information if the text node with the number of the cited texts larger than the number threshold exists.

Wherein, reading path confirms the module includes:

the weight obtaining unit is used for obtaining text recommendation weights and edge weights corresponding to the reference relation graph and constructing M initial text reading paths containing the target text in the reference relation graph; the edge weight is used for representing the relevance between two texts with text reference relations in the reference relation graph, and M is a positive integer;

the weight accumulated value determining unit is used for determining weight accumulated values corresponding to the M initial text reading paths according to the text recommendation weight and the edge weight respectively contained in the M initial text reading paths;

and the first shortest path determining unit is used for determining the initial text reading path corresponding to the minimum weight accumulated value as the shortest text reading path in the M initial text reading paths.

Wherein, reading path confirms the module includes:

the weight obtaining unit is further used for obtaining a text recommendation weight and an edge weight corresponding to the reference relation graph, and constructing a first text subgraph containing a target text in the reference relation graph; the edge weight is used for representing the relevance between two texts with text reference relations in the reference relation graph;

the first spanning tree acquisition unit is used for acquiring a first minimum spanning tree in the first text sub-image according to the edge weight and the text recommendation weight contained in the first text sub-image; the first minimum spanning tree is a spanning tree with a minimum weight accumulated value in the first text subgraph, the first minimum spanning tree comprises target texts, and the minimum weight accumulated value in the first text subgraph is an accumulated value of text recommendation weights and edge weights in the first minimum spanning tree;

the second spanning tree construction unit is used for constructing a second text subgraph containing the target text in the reference relation graph, updating the first minimum spanning tree according to the text recommendation weight and the edge weight contained in the second text subgraph, and generating a second minimum spanning tree in the second text subgraph; the second minimum spanning tree is a spanning tree with a minimum weight accumulated value in the second text subgraph, the weight accumulated value corresponding to the second minimum spanning tree is smaller than the weight accumulated value corresponding to the first minimum spanning tree, the second minimum spanning tree comprises target texts, and the minimum weight accumulated value in the second text subgraph is an accumulated value of text recommendation weight and edge weight in the second minimum spanning tree;

and the second shortest path determining unit is used for determining the second minimum spanning tree as the shortest text reading path if the second minimum spanning tree is the spanning tree with the minimum weight accumulated value in the reference relationship graph.

Wherein, the weight acquisition unit includes:

the first text weight determining subunit is used for determining the first initial text and the associated text as text nodes in the reference relationship graph, acquiring text ranking values and rating features corresponding to the text nodes, and determining text recommendation weights corresponding to the text nodes according to the text ranking values and the rating features;

a text node acquiring subunit, configured to acquire a text node v with a text reference relationship in the reference relationship graph_iAnd a text node v_j(ii) a i and j are positive integers less than or equal to the number of text nodes;

an edge weight determining subunit for acquiring the text node v_iAnd a text node v_jThe reference frequency between the text nodes v is determined according to the reference frequency_jAnd a text node v_iEdge weights in between; the reference frequency refers to the text node v_jAt text node v_iThe number of times of being referenced in the text content of (a).

Wherein, the weight acquisition unit includes:

the second text weight determining subunit is used for determining the first initial text and the associated text as text nodes in the reference relationship graph, converting the text nodes into text characterization vectors, and determining text recommendation weights corresponding to the text nodes according to the text characterization vectors;

the initial node vector determining subunit is used for generating an initial node vector corresponding to the text node according to the text reference relationship of the text node in the reference relationship graph;

the encoding subunit is used for inputting the initial node vector into the graph convolution network, performing information encoding on the initial node vector according to the graph convolution network, and generating a node encoding vector corresponding to the initial node vector;

and the second edge weight determining subunit is used for determining the edge weight between any two text nodes with text reference relations in the reference relation graph according to the node coding vector.

Wherein, the relational graph building module comprises:

a query information determination unit for acquiring one or more keywords input in a search engine and determining the one or more keywords as first query information;

the interface calling unit is used for calling an application program interface in a search engine and acquiring at least two texts to be recommended contained in the search engine;

the text similarity obtaining unit is used for obtaining text similarities between the first query information and the at least two texts to be recommended respectively, and determining a first initial text associated with the first query information in the at least two texts to be recommended according to the text similarities.

Wherein, reading path confirms the module includes:

the text reference relation acquisition unit is used for acquiring path text nodes and path text reference relations contained in the shortest text reading path; the path text reference relation is used for indicating the reading sequence of the path text nodes;

and the recommended content determining unit is used for determining the first recommended content corresponding to the first query information according to the path text node and the path text reference relation.

The path text node comprises at least two documents;

the recommended content determining unit is specifically configured to:

acquiring document abstract information corresponding to at least two documents respectively, determining the reference relation between the at least two document abstract information and the path text as first recommended content corresponding to the first query information, and displaying the first recommended content in a query page.

Wherein, the device still includes:

the initial text search module is used for acquiring second query information input in a search engine and acquiring a second initial text associated with the second query information in the search engine;

and the recommending module is used for determining the first recommended content as second recommended content corresponding to the second query information and displaying the second recommended content in the query page if the second initial text is the same as the first initial text.

In one aspect, an embodiment of the present application provides a computer device, including a memory and a processor, where the memory is connected to the processor, the memory is used for storing a computer program, and the processor is used for calling the computer program, so that the computer device executes the method provided in the above aspect in the embodiment of the present application.

An aspect of the embodiments of the present application provides a computer-readable storage medium, in which a computer program is stored, where the computer program is adapted to be loaded and executed by a processor, so as to enable a computer device with the processor to execute the method provided by the above aspect of the embodiments of the present application.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided by the above-mentioned aspect.

According to the embodiment of the application, a first initial text associated with first query information can be obtained, an associated text corresponding to the first initial text is obtained according to a text reference relation corresponding to the first initial text, a reference relation graph containing the first initial text and the associated text is constructed, and then a target text corresponding to the first query information can be screened from the associated text and the first initial text according to the reference relation graph; the shortest text reading path containing the target text can be determined in the reference relation graph, and first recommended content for responding to the first query information can be generated according to the shortest text reading path. Therefore, after the first initial text associated with the first query information is acquired, a text reference relationship can be introduced to expand the first initial text to obtain a reference relationship diagram, and then a shortest text reading path including a target text (a result obtained by screening from the first initial text and the associated text) can be determined in the reference relationship diagram, and the text and text reference relationship included in the shortest text reading path can be used as first recommended content corresponding to the first query information, that is, the importance degree between different texts can be mined by using the reference relationship between the texts, and then the recommended content associated with the query information is determined, so that the recommendation accuracy of text data can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a text data recommendation scenario provided by an embodiment of the present application;

fig. 3 is a schematic flowchart of a data recommendation method provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of screening target texts from a reference relationship diagram according to an embodiment of the present application;

FIG. 5 is a diagram illustrating a shortest text reading path according to an embodiment of the present application;

fig. 6 is a schematic flowchart of a data recommendation method provided in an embodiment of the present application;

fig. 7 is a schematic view of a recommendation scenario of medical text data according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a data recommendation device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Embodiments of the present application relate to Artificial Intelligence (AI) technology. Artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The data processing scheme provided by the application belongs to a Natural Language Processing (NLP) technology belonging to the field of artificial intelligence.

Natural language processing is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like. According to the text similarity and the text similarity, the text content associated with the query information can be recommended to the user.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present disclosure. As shown in fig. 1, the network architecture may include a server 10d and a user terminal cluster, which may include one or more user terminals, where the number of user terminals is not limited. As shown in fig. 1, the user terminal cluster may specifically include a user terminal 10a, a user terminal 10b, a user terminal 10c, and the like. The server 10d may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. The user terminal 10a, the user terminal 10b, the user terminal 10c, and the like may each include: the smart terminal has a video/image playing function, such as a smart phone, a tablet computer, a notebook computer, a palm computer, a Mobile Internet Device (MID), a wearable device (e.g., a smart watch, a smart bracelet, etc.), and a smart television. As shown in fig. 1, the user terminal 10a, the user terminal 10b, the user terminal 10c, etc. may be respectively connected to the server 10d via a network, so that each user terminal may interact data with the server 10d via the network.

Taking the user terminal 10a as an example, when a user needs to query text data, the user may input query information in a search input box of the user terminal 10a, and the user terminal 10a may obtain the query information input by the user and send the query information to the server 10 d; after receiving the query information sent by the user terminal 10a, the server 10d may calculate similarity between the query information and each text topic to be recommended, may obtain a plurality of initial texts associated with the query information according to the order of the similarity, and may further expand the plurality of initial texts through a text reference relationship corresponding to the initial texts, and generate a reference relationship diagram; and then, the target text can be screened from the reference relation graph, and the shortest text reading path containing the target text is determined. The server 10d may return the text covered by the shortest text reading path and the text reference relationship to the user terminal 10a as recommended content in response to the query information, and present the text covered by the shortest text reading path and the text reference relationship for the user in the user terminal 10 a. In the embodiment of the application, the reference relation among different texts can be utilized to determine the recommended content associated with the query information for the user, the recommendation accuracy of the text data can be improved, and the reading text path reserves the reference relation among the texts, so that the user can be helped to determine the text reading sequence.

Referring to fig. 2, fig. 2 is a schematic view of a text data recommendation scenario provided in an embodiment of the present application. The following describes a recommendation process of text data by taking a document (which may include papers, patents, writings, etc.) retrieval scenario as an example. As shown in fig. 2, a user may need to query some documents to assist the research process, and the user may start a browser in the user terminal (e.g., the user terminal 10a in the embodiment corresponding to fig. 1) and start an XX academic query page 20a in the browser, where the query page 20a may include a search input box 20b and a search function control 20 f. The user can input query information image detection in the search input box 20b, and can also perform triggering operation on the search function control 20f in the query page 20a after the input is completed, and the user terminal at this time can respond to the triggering function aiming at the search function control 20f, acquire the query information image detection input in the search input box 20b, call an application program interface of the XX academia, and calculate the text similarity between each document in the XX academia and the query information image detection; furthermore, documents collected in the XX academia can be ranked according to the text similarity, and the top k ranked documents are used as initial search results of query information image detection, wherein k is a positive integer. In the embodiment of the present application, a value of the number k of initial search results may be determined according to actual requirements, as shown in fig. 2, the number k of initial search results may be set to 5, and when the first 5 documents sorted according to text similarity are

documents

1, 2, 3, 4, and 5, the

documents

1, 2, 3, 4, and 5 may all be regarded as initial document nodes, that is, documents included in the document set 20 c. It is understood that the documents included in the document set 20c are also only initial search results corresponding to the query information "image detection" and are not finally determined recommended content, and therefore the documents included in the document set 20c may not be displayed in the query page 20a, that is, the

documents

1, 2, 3, 4, and 5 are not temporarily displayed in the query page 20 a.

After the user terminal acquires the 5 initial document nodes, the cited documents and the cited documents corresponding to the 5 initial document nodes can be acquired through the document citation relations corresponding to the 5 initial document nodes, and further the cited documents and the cited documents can be determined as the associated documents. Of course, the user terminal may acquire the cited documents and the cited documents corresponding to the 5 initial document nodes, respectively, and may also acquire the cited documents corresponding to the cited documents again, and the cited documents corresponding to the cited documents may also be referred to as associated texts corresponding to the initial document nodes. The user terminal may construct a citation relationship graph according to the initial document nodes and the associated documents, the application relationship graph is shown as an area 20d, and the citation relationship graph in the area 20d may further include document citation relationships between 5 initial document nodes and associated documents corresponding to the 5 initial document nodes.

As shown in fig. 2, the document reference relationships between 5 initial document nodes are respectively expressed as:

documents

1 and 3 refer to document 2, document 2 refers to document 4, and document 4 refers to document 5; the cited relationship of the user terminal acquiring the document 1 is as follows: cited documents of

documents

6 and 3 are: the cited documents of

documents

7 and 2 are: the cited documents of

documents

8 and 2 are:

references

9 and 10, reference 4 are: document 11; further, the user terminal may also obtain the cited document of document 9 as: the cited documents of

documents

13 and 10 are: the cited documents of

documents

13 and 11 are:

documents

15 and 12, and document 12 cites document 15; in this case, documents 7 to 15 may be referred to as associated documents corresponding to the original document node, the original document node and the associated documents may be referred to as document nodes in a reference relationship graph, and an "arrow" in the reference relationship graph may be represented as a document reference relationship or may be referred to as an edge between two document nodes. The user terminal may obtain a node weight corresponding to each document node in the reference relationship graph and an edge weight between two document nodes having the document reference relationship, where the node weight corresponding to document 2 is: the node weight 2, the edge weight between document 2 and document 10 is: the edge weights are 2-10. Wherein, the node weight can be determined according to the ranking value and the rating of the document, the edge weight can be determined according to the reference frequency between two documents with document reference relationship, for example, the edge weights 2-10 can be expressed as the times that the document 10 is referred in the document content of the document 10.

Further, the user terminal may obtain the number of cited documents in the reference relationship graph of each document node, and may screen out the target document from the reference relationship graph according to the number of cited documents, or may also understand that the initial document node is screened to obtain the re-screened initial document node (i.e., the target document), and the larger the number of cited documents is, the more likely the document is to be a document commonly cited by the initial document node, the more important the document is. As shown in fig. 2, the document 13 in the citation relationship diagram is cited by the

documents

9 and 10 at the same time, which means that the document 13 is an important document in the citation relationship diagram, and therefore the document 13 can be determined as a target document; similarly, the user terminal may screen out the following target documents from the reference relationship diagram: document 1, document 2, document 4, document 13, and document 15.

After the user terminal obtains the reference relationship graph carrying the node weight and the edge weight and the target document, the shortest document reading path 20e may be generated from the reference relationship graph by using a shortest path algorithm, where the shortest path algorithm may be a nexst algorithm, and the nexst algorithm may be used to find an optimal tree from the reference relationship graph to cover the target document, and minimize the node weight and the edge weight on the tree. In other words, the shortest document reading path 20e can be understood as a minimum spanning tree generated by applying the NEWST algorithm, the minimum spanning tree at least includes the aforementioned screened target documents (including

documents

1, 2, 4, 13 and 15), and as shown in fig. 2, the shortest text reading path 20e can include: the literature citation relationship among the

documents

1, 2, 4, 10, 11, 13, and 15, and the above documents themselves. The user terminal may take the documents included in the shortest document reading path and the document citation relationship as recommended content in response to the query information "image detection", and present the shortest document reading path 20e in the query page 20 a.

According to the shortest document reading path 20e shown in the query page 20a, the user can quickly determine the reading order of the documents, for example, the

documents

13 and 15 may be basic theoretical documents related to "image detection", and other articles are extensions of the

documents

13 and 15, so that the user can preferentially read the

documents

13 and 15 to help the user better understand the image detection field. In the embodiment of the application, the shortest document reading path determined by the document reference relationship, the node weight and the edge weight can be used for improving the recommendation accuracy of the document; the citation relation among the documents is reserved in the shortest document reading path, so that the method is beneficial to helping a user to quickly determine the document reading sequence.

Referring to fig. 3, fig. 3 is a schematic flowchart of a data recommendation method according to an embodiment of the present application. It is understood that the data recommendation method may be executed by a computer device, and the computer device may be a stand-alone server, or a user terminal, or a system of a server and a user terminal, or a server cluster composed of a plurality of servers, or a computer program application (including program code), and is not limited in this respect. As shown in fig. 3, the data recommendation method may include the steps of:

step S101, obtaining a first initial text associated with the first query information, obtaining an associated text corresponding to the first initial text according to a text reference relation corresponding to the first initial text, and constructing a reference relation graph containing the first initial text and the associated text.

Specifically, in a text retrieval scenario, a user may input first query information (such as the query information "image detection" in the embodiment corresponding to fig. 2) in a query page (such as the query page 20a in the embodiment corresponding to fig. 2), and trigger a search function (such as the search function control 20f in the embodiment corresponding to fig. 2) in the query page; the computer device may respond to the triggering operation for the search function, acquire first query information input in a query page, and call an application program interface corresponding to a search engine, acquire a search result corresponding to the first query information in the search engine, and use the search result as a first initial text corresponding to the first query information (such as

documents

1 and 2 in the embodiment corresponding to fig. 2). Among them, the search engine may include but is not limited to: baidu search engine, Microsoft search engine, Google search engine, AMIner search engine (scientific intelligence big data mining and service system platform), the first initial text may refer to the search results obtained by the existing search engine. Alternatively, the first initial text may refer to the top k search results selected from the search results of the search engine. For example, the text list may include N texts, the computer device may select the first k texts from the N texts included in the text list as the first initial text associated with the first query information, k is a positive integer less than or equal to N, and a specific value of k may be manually set according to actual requirements, for example, k may be set to 30.

Further, the computer device may obtain a reference text and a referenced text corresponding to the first initial text according to the text reference relationship corresponding to the first initial text, and determine both the reference text and the referenced text as associated texts corresponding to the first initial text; the associated text and the first initial text may both be determined as text nodes, and a reference relationship graph (such as the reference relationship graph in the area 20d in the embodiment corresponding to fig. 2) containing all the text nodes is constructed according to the text reference relationship between the text nodes. The associated text may refer to a reference text of the first initial text, or a referenced text, or a text referenced by the reference text, or a referenced text corresponding to the referenced text, and the like. For example, the first initial text may include text 1 and text 2, where the reference text corresponding to text 1 is text 3, the referenced text corresponding to text 1 is text 4, the reference text corresponding to text 2 is text 5, the referenced text corresponding to text 2 is text 6, the reference text corresponding to text 3 is text 7, and the referenced text corresponding to text 4 is text 8, and then the computer device may determine that text 3, text 4, text 5, text 6, text 7, and text 8 are all associated texts corresponding to the first initial text. The nodes in the reference relationship graph can be represented as text nodes, and the edges in the reference relationship graph can be represented as two adjacent text nodes having a text reference relationship. In other words, the computer device may perform second-order expansion on the first initial text according to the text reference relationship corresponding to the first initial text, and construct a reference relationship diagram according to the first initial text and the expanded text. It should be noted that, in the embodiment of the present application, the text node in the reference relationship graph may include text data in text retrieval scenes such as articles, patents, writings, news, medical health articles, blogs, and pub posts.

And S102, screening a target text corresponding to the first query information in the associated text and the first initial text according to the reference relation graph.

Specifically, the topic of the first initial text has strong correlation with the first query information, but the first initial text cannot contain preliminary knowledge associated with the first query information, and in order to capture the preliminary knowledge, the target text may be re-screened from the reference relationship graph, that is, the computer device may obtain the importance degree of each text node contained in the reference relationship graph according to the text reference relationship contained in the reference relationship graph, and may screen the target text corresponding to the first query information according to the importance degree. The computer device obtains the number of the referred texts corresponding to each text node in the reference relationship graph, and if the number of the referred texts in the reference relationship graph is greater than a number threshold (which may be set manually according to actual requirements, such as the number threshold may be set to 5), the text node whose number of the referred texts is greater than the number threshold may be determined as the target text corresponding to the first query information. When the number of the referred texts is greater than the number threshold, the text node is represented as an important text in the reference relationship graph, and at this time, the text node can be determined as a target text. The target texts screened from the reference relation graph can be understood as texts commonly referenced by more texts, so that the user can be helped to understand the field to which the first query information belongs, and the target texts can be used as the preliminary knowledge familiar with the field to which the first query information belongs.

Referring to fig. 4 together, fig. 4 is a schematic diagram illustrating a target text screened from a reference relationship diagram according to an embodiment of the present application. As shown in fig. 4, the first initial texts acquired by the computer device are respectively: text 1, text 2, text 3, text 4, and text 5, and a reference relationship diagram as shown in the area 30a may be constructed according to the first initial text and the associated text corresponding to the initial text. The computer device may obtain, in the reference relationship diagram of the area 30a, the number of referred texts respectively corresponding to each text (which may also be referred to as a text node), where, for example, the number of referred texts corresponding to each of the text 1, the text 4, the text 14, and the text 17 is 2, the number of referred texts corresponding to the text 2 is 3, the number of referred texts corresponding to the text 3, the text 5, the text 11, the text 12, the text 13, the text 15, and the text 16 is 1, and the number of referred texts corresponding to the text 6, the text 7, the text 8, the text 9, and the text 10 is 0.

Assuming that the threshold number is 1, the computer device may determine all texts with the number of referred texts being greater than 1 as target texts, that is, text 1, text 2, text 4, text 14, and text 17 may all be determined as target texts, and mark the screened target texts in the reference relationship diagram of the area 30 a. It is understood that the number of the screened target texts may be the same as or different from the number of the first initial texts, and the application is not particularly limited.

Step S103, determining the shortest text reading path containing the target text in the reference relation graph, and generating first recommended content for responding to the first query information according to the shortest text reading path.

Specifically, after the target text is screened out, the computer device may find a shortest text reading path including the target text from the reference relationship graph, and use a text covered by the shortest text reading path and the text reference relationship as the first recommended content in response to the first query information.

It can be understood that before the computer device finds the shortest text reading path from the reference relationship graph, it needs to obtain a text recommendation weight corresponding to each text node in the reference relationship graph and an edge weight between two adjacent text nodes, where the edge weight may be used to represent an association between two texts having a text reference relationship in the reference relationship graph. In other words, the computer device can acquire the text ranking values and the rating features corresponding to the text nodes contained in the reference relationship graph, and further can determine the text recommendation weights corresponding to the text nodes according to the text ranking values and the rating features; the computer equipment can acquire any two text nodes v with text reference relations in the reference relation graph_iAnd a text node v_jWherein i and j are positive integers less than or equal to the number of text nodes, and then the text node v can be obtained_iAnd a text node v_jThe reference frequency between the text nodes v can be determined according to the reference frequency_jAnd a text node v_iThe reference frequency refers to the text node v_jAt text node v_iThe number of times of being referenced in the text content of (a).

The text ranking value corresponding to the text node may refer to a page rank score, which is a part of a google ranking algorithm (ranking formula) and may be used to identify the rank or importance of the web page; in this embodiment, the page rank may refer to using an importance score to indicate the importance of a text node, so that the page rank score should be a non-negative number, and the larger the number of remaining texts referring to the text node is, the more important the text node is, the higher the page rank score is. The rating features corresponding to the text nodes can be obtained by calculating the ranking corresponding to each text node through the rating features according to the grade information corresponding to the publishing mechanism to which the text belongs. For example, when the text node is a paper, the rating feature may obtain the rating of the text node through the China Computer society (CCF) and Aminer, and calculate the ranking of the text node according to the rating, wherein the CCF may refer to a review organization performing academic evaluation. And the computer equipment can calculate the text recommendation weight corresponding to each text node according to the page rank score and the rating characteristic.

Optionally, the computer device may further convert each text node in the reference relationship graph into a corresponding text representation vector by using methods such as a Word frequency feature (TF-IDF) of the text, a GLOVE (a Word vector model), a Word2vec (a Word vector model), a Bert (pre-training model), an ELMO (pre-training model), a long-time and short-time memory network (LSTM), a Recurrent Neural Network (RNN), a Gated Recurrent Unit (GRU), and a transform module (a conversion model), so as to determine a text recommendation weight corresponding to each text node by using the text representation vector. Optionally, when determining the edge weight in the reference relationship Graph, the computer device may determine the edge weight by using a Graph volume Network (GCN) method, a Graph Attention Network (GAN) method, and the like.

Further, after the computer device obtains the text recommendation weight and the edge weight in the reference relationship graph, a shortest text reading path containing the target text may be found in the reference relationship graph, that is, the computer device may construct a first text sub-graph containing the target text in the reference relationship graph, and further, according to the edge weight and the text recommendation weight contained in the first text sub-graph, a first minimum spanning tree in the first text sub-graph may be obtained, where the first minimum spanning tree may be a spanning tree having a minimum weight accumulated value in the first text sub-graph, the minimum weight accumulated value in the first text sub-graph may be an accumulated value of the text recommendation weight and the edge weight in the first minimum spanning tree, and the first minimum spanning tree may include the target text. Then, the computer device may construct a second text sub-graph containing the target text in the reference relationship graph, and further may update the first minimum spanning tree according to the text recommendation weight and the edge weight contained in the second text sub-graph to generate a second minimum spanning tree in the second text sub-graph, where the second minimum spanning tree may be a spanning tree with a minimum weight accumulated value in the second text sub-graph, the weight accumulated value corresponding to the second minimum spanning tree is smaller than the weight accumulated value corresponding to the first minimum spanning tree, the minimum weight accumulated value in the second text sub-graph is an accumulated value of the text recommendation weight and the edge weight in the second minimum spanning tree, and the second minimum spanning tree includes the target text; if the second minimum spanning tree is the spanning tree with the minimum weight accumulation value in the reference relationship graph, the second minimum spanning tree can be determined as the shortest text reading path. The method for finding the shortest text reading path may include, but is not limited to: dijkstra (Dijkstra) algorithm, Bellman-Ford algorithm, Floyd algorithm, SPFA algorithm, nexst algorithm.

Taking the NEWST algorithm as an example, the NEWST algorithm is adopted to generate the shortest text reading path containing the target text from the reference relationship graph. After giving the target text, the NEWST algorithm may find the best tree in the reference relationship graph to cover the target text and minimize the sum of the weights of the text nodes and edges on the tree. The nexst algorithm can be defined as: g ═ V, E, S, w, c is used to represent a connected undirected graph, where V may be represented as a set of all text nodes in the reference relationship graph, E may be represented as a set of all edges in the reference relationship graph, w may refer to a text recommendation weight corresponding to a text node in the reference relationship graph (or may refer to a function mapping text nodes in the reference relationship graph to a positive weight), c may refer to an edge weight edge corresponding to an edge in the reference relationship graph (or may refer to a function mapping edges in the reference relationship graph to a positive weight), S may refer to a set of target texts, and S may refer to a subset of V, i.e., a function mapping edges in the reference relationship graph to a positive weight)

The target text in the embodiment of the present application may also be referred to as fixed nodes (templates) in the nexst algorithm.

The objective of the nexst algorithm is to find an optimal spanning tree T, so that the spanning tree T covers all target texts, and the weight accumulation value of the text recommendation weight and the edge weight in the spanning tree T is minimized, and the spanning tree T can be represented by the following formula (1):

wherein, V_TCan be represented as a collection of text nodes contained in a spanning tree T, E_TCan be represented as a collection of edges contained in the spanning tree T.

The process of finding the optimal spanning tree in the referential graph can be converted into a process of solving a minimum loss function, which can be shown in the following equation (2):

wherein, the loss function of the edge can be defined as:

wherein, i and j may be represented as labels corresponding to two text nodes having a text reference relationship, α and β may be represented as normal numbers, for example, α may be set to 3, and β may be represented as 2. con (i, j) may be used to evaluate the correlation between text node i and text node j, and con (i, j) may be expressed as the number of times text node j is referenced in the text content of text node i. Similarly, the penalty function w for a text node may be defined as:

wherein γ, a, and b are all hyper-parameters, γ may be set to 5, a may be set to 0.7, and b may be set to 0.3 in this application embodiment, and the values of γ, a, and b are not specifically limited in this application embodiment. pgsocre (i) may represent the page rank score for text node i. venue (i) may be represented as a ranking score for text node i.

Since the loss function in the formula (2) has a plurality of unknowns, the computer device may convert the problem solving of the loss function (2) into a process of solving an approximation. The computer device may set the reference relationship graph G ═ V, E, S, w, c, a fixed set of target nodes

The loss function w of the text node (i.e. the above formula (4)) and the loss function c of the edge (i.e. the above formula (3)) are used as the input of the NEWST algorithm, and the output result of the NEWST algorithm can be: an optimal spanning tree T ═ (V)_T，E_T) The spanning tree T ═ V_T，E_T) All of the text nodes in S may be included, where,

the step of executing the nexst algorithm may include: constructing a subgraph G containing S from the reference relation graph G₁＝(V₁,E₁S, w, c), the subfigure G₁Also referred to as a first textual subgraph; and can be taken from sub-graph G₁Find a minimum spanning tree T₁(which may be referred to as the first minimum spanning tree), if sub-graph G₁If there are multiple minimum spanning trees, then one minimum spanning tree can be arbitrarily selected from the multiple minimum spanning trees as the minimum spanning tree T₁(ii) a The computer device can construct a new subgraph G₂(which may be referred to as a second textual subgraph) by replacing the minimum spanning tree T₁Edge in G becomes a shorter path in G if sub-graph G₂In which a plurality of shorter paths exist, the user can arbitrarily select from the plurality of shorter pathsA shorter path; subsequently, the computer device may proceed from sub-graph G₂Finding the minimum spanning tree T in₂Repeating the above steps continuously to construct subgraph G_SFinding the minimum spanning tree T in the graph, if G is a subgraph_SIf there are multiple minimum spanning trees, one minimum spanning tree may be arbitrarily selected from the multiple minimum spanning trees as the minimum spanning tree T, and the minimum spanning tree T at this time may be represented as the minimum spanning tree including the target text S in the reference relationship diagram, and may be determined as the minimum text reading path.

Further, the computer device may obtain a path text node and a path text reference relation included in the shortest text reading path, where the path text reference relation may be used to indicate a reading order of the path text node, and further may determine the first recommended content corresponding to the first query information according to the path text node and the path text reference relation. The computer device may display the first recommended content in the query page, the user may determine a reading sequence of the first recommended content according to a path text reference relationship included in the first recommended content, the first recommended content displayed in the query page at this time may include information such as a text theme, a text creator, and text creation time of a text node, and if the user wants to read the text content corresponding to the text node, the user needs to trigger an expansion operation on the text node.

Optionally, the path text node included in the shortest text reading path may include at least two documents, the computer device at this time may obtain document summary information corresponding to the at least two documents, determine, according to the reference relationship between the at least two document summary information and the path text, the first recommended content corresponding to the first query information, display the first recommended content in the query page, the user may determine, according to the path text reference relationship included in the first recommended content, a reading order of the first recommended content, the first recommended content displayed in the query page at this time may include summary information corresponding to the text node, and the user may directly refer to an idea of the text node in the query page to quickly understand knowledge of a field to which the first query information belongs.

Optionally, in order to verify the validity of the data recommendation method provided by the present application, the data recommendation method (e.g., nexst algorithm) provided by the embodiments of the present application may be verified by using a data set. In this embodiment of the application, a surfeybank data set (which may include a plurality of review articles) may be used, and a user may extract a phrase from a topic of a review article included in the surfeybank data set and use the extracted phrase as query information, and the article cited in the review article may be used as a desired recommended article corresponding to the query information.

In the embodiment of the present application, the computer device may compare the data recommendation method proposed in the embodiment of the present application with an existing search engine, which may include but is not limited to: google scholars, microsoft scholars, Aminer, PageRank, and the performance of each method was judged using two indices of F1 Score (F1 Score) and accuracy (precision). The F1 score can be an index used for measuring the accuracy of the two classification models in statistics, and the accuracy and the recall rate of the classification models are considered at the same time; the F1 score can be regarded as a weighted average of the model accuracy and the recall ratio, the maximum value can be 1, and the minimum value can be 0. The accuracy rate can refer to the precision between the paper output by adopting each method and the expected recommended paper; for both the F1 score and the accuracy, the higher the value, the better the performance of the representation method. The experimental results of the above methods in the surveyybank data set can be shown in table 1 below:

TABLE 1

	Google academia	Microsoft's academia	Aminer	PageRank	NEWST
						F1 score	0.2143	0.1156	0.1211	0.0242	0.2345
Rate of accuracy	0.3630	0.2117	0.2340	0.0358	0.4740

As shown in table 1, the NEWST algorithm proposed in the embodiment of the present application has a higher F1 score and accuracy than the existing search engine because the NEWST algorithm proposed in the embodiment of the present application utilizes the reference relationships between different papers, and the existing search engine is based on keyword matching only.

The shortest text reading path generated in the above experiment process may be as shown in fig. 5, where fig. 5 is a schematic diagram of a shortest text reading path provided in an embodiment of the present application. The query information during the experiment was: the query information corresponds to an overview thesis "Automatic key phrase Extraction: A Survey of the State of the Art (Automatic key phrase Extraction: latest technical Survey)", the shortest text reading path generated by the NEWST algorithm is shown in FIG. 5, the shortest text reading path may include 17 theses, and although there are a few theses different from the expected recommended theses, all the theses included in the shortest text reading path are "Automatic key phrase Extraction". The subject of the paper 8, as shown in fig. 5, is: the relevance between this paper 8 and "automatic key phrase extraction" is relatively small in the subject, but the mathematical principle of paper 8 may help the user to understand more about the domain to which "automatic key phrase extraction" belongs.

In the embodiment of the application, after a first initial text associated with first query information is acquired, a text reference relationship can be introduced to expand the first initial text to obtain a reference relationship diagram, and then a shortest text reading path containing a target text (a result obtained after screening from the first initial text and the associated text) can be determined in the reference relationship diagram, and both the text and the text reference relationship contained in the shortest text reading path can be used as first recommended content corresponding to the first query information, that is, importance degrees among different texts can be mined by using the reference relationship among the texts, so that the recommended content associated with the query information is determined, and the recommendation accuracy of text data can be improved; the reading text path reserves the reference relation between the texts, so that the reading text path is beneficial to helping a user to determine the text reading sequence.

Referring to fig. 6, fig. 6 is a schematic flowchart of a data recommendation method according to an embodiment of the present application. It is understood that the data recommendation method may be executed by a computer device, and the computer device may be a stand-alone server, or a user terminal, or a system of a server and a user terminal, or a server cluster composed of a plurality of servers, or a computer program application (including program code), and is not limited in this respect. As shown in fig. 6, the data recommendation method may include the steps of:

step S201, acquiring one or more keywords input in a search engine, and determining the one or more keywords as first query information; and calling an application program interface in the search engine to acquire at least two texts to be recommended contained in the search engine.

Specifically, in a text retrieval scenario, a user may input one or more keywords in a query page and trigger a search function in the query page, and the computer device may respond to a trigger operation for the search function to obtain the one or more keywords input in the query page. When the number of the keywords input by the user in the query page is one, the computer device can take the keywords as first query information; when the number of the keywords input by the user in the query page is multiple, the computer device can splice the multiple keywords, and the spliced keyword sequence is used as the first query information. Further, the computer device may invoke an application program interface corresponding to the search engine, and obtain at least two texts to be recommended collected in the search engine, that is, may obtain all texts to be recommended in the search engine.

Step S202, obtaining text similarity between the first query information and at least two texts to be recommended respectively, and determining a first initial text associated with the first query information in the at least two texts to be recommended according to the text similarity.

Specifically, the computer device may determine text similarity between the first query information and at least two texts to be recommended respectively through a keyword matching method, may further sort the at least two texts to be recommended according to the text similarity, and determine a first initial text associated with the first query information from the sorted at least two texts to be recommended. For example, the at least two texts to be recommended collected in the search engine may be: the method comprises the steps of determining text similarity between first query information and N texts to be recommended respectively through a keyword matching method for a text 1 to be recommended, a text 2 to be recommended, a text 3 to be recommended, … … to be recommended and a text N to be recommended, wherein if the text similarity between the first query information and the text 1 to be recommended is as follows: the similarity 1, and the text similarity between the first query information and the text to be recommended 2 is as follows: and similarity 2, … …, wherein the text similarity between the first query information and the text N to be recommended is as follows: similarity N, if all text similarities are sorted in descending order, top k (where k may be assumed to be 5) similarities are: similarity 2, similarity 10, similarity 15, similarity 20 and similarity 7, the computer device may use the text to be recommended 2, the text to be recommended 10, the text to be recommended 15, the text to be recommended 20 and the text to be recommended 7 as the first initial text associated with the first query information.

Step S203, acquiring the associated text corresponding to the first initial text according to the text reference relation corresponding to the first initial text, and constructing a reference relation graph containing the first initial text and the associated text.

And step S204, screening a target text corresponding to the first query information in the associated text and the first initial text according to the reference relation graph.

The specific implementation manner of steps S203 to S204 may refer to steps S101 to S102 in the embodiment corresponding to fig. 3, which is not described herein again.

Step S205, determining the first initial text and the associated text as text nodes in the reference relationship graph, converting the text nodes into text characterization vectors, and determining text recommendation weights corresponding to the text nodes according to the text characterization vectors.

Specifically, the computer device may determine both a first initial text and an associated text included in the reference relationship graph as text nodes, and may further convert each text node in the reference relationship graph into a corresponding text representation vector by using methods such as a Word frequency feature (TF-IDF) of the text, a GLOVE (a Word vector model), a Word2vec (a Word vector model), a Bert (pre-training model), an ELMO (pre-training model), a long-short memory network (LSTM), a Recurrent Neural Network (RNN), a gated cyclic unit (GRU), and a Transformer module (a conversion model), so that each text representation vector may be converted into a numerical value, which may be referred to as a text recommendation weight corresponding to the text node.

Step S206, generating an initial node vector corresponding to the text node according to the text reference relation of the text node in the reference relation graph; and inputting the initial node vector into a graph convolution network, and performing information coding on the initial node vector according to the graph convolution network to generate a node coding vector corresponding to the initial node vector.

Specifically, the computer device may generate an initial node vector corresponding to the text node according to the text reference relationship of the text node in the reference relationship graph, for example, the computer device may obtain a vector representation corresponding to each text node in the reference relationship graph by using a graph representation learning method, and the vector representation at this time may be referred to as the initial node vector corresponding to the text node. The graph representation learning method can be a one-hot code mode, a TransE algorithm and the like, and aims to convert natural language into text vectors. When determining the edge weight in the reference relationship Graph, the computer device may use a Graph Convolutional Network (GCN) method, a Graph Attention Network (GAN) method, and the like to determine the edge weight.

Optionally, when determining the edge weight in the reference relationship graph by using the graph convolution network, the computer device may input the initial node vector to the graph convolution network, and perform information encoding on the initial node vector according to a plurality of network layers in the graph convolution network to obtain an encoded vector corresponding to the initial node vector; in order to alleviate the problems of transition smoothness and error propagation in the graph convolution network, a gating function can be introduced after each network layer of the graph convolution network, the coding vector output by each network layer is processed according to the gating function, and the output result of the last network layer passing through the gating function in the graph convolution network is determined to be the node coding vector corresponding to the initial node vector. The gating function is introduced into the graph convolution network, so that the gating function can be used for eliminating abnormal values in the same network layer, and can also be used for eliminating results output by network layers with abnormal values in adjacent network layers.

And step S207, determining edge weight between any two text nodes with text reference relations in the reference relation graph according to the node coding vectors.

In particular, calculatingAfter the node coding vectors corresponding to each text node are obtained, the edge weight between the two text nodes can be determined by calculating the similarity between the node coding vectors corresponding to the two text nodes with the text reference relationship. For example for a text node v with a text reference relationship_iAnd a text node v_jThe computer equipment can adopt a cosine similarity and other similarity calculation method to calculate the text node v_iCorresponding node code vector and text node v_jSimilarity between corresponding node coding vectors is determined as a text node v_iAnd a text node v_jThe larger the edge weight is, the text node v is represented_iAnd a text node v_jThe stronger the correlation between them. Therefore, the computer device can obtain the edge weight corresponding to each edge in the reference relationship graph.

Step S208, determining the shortest text reading path containing the target text in the reference relation graph, and generating first recommended content for responding to the first query information according to the shortest text reading path.

Specifically, after obtaining the text recommendation weight and the edge weight in the reference relationship graph, the computer device may find the shortest text reading path including the target text in the reference relationship graph. The computer equipment can traverse all text nodes contained in the reference relationship graph, and M initial text reading paths containing target texts are constructed in the reference relationship graph, wherein M is a positive integer; the computer device can sequentially calculate weight accumulated values corresponding to the M initial text reading paths according to text recommendation weights and edge weights respectively contained in the M initial text reading paths, and determine the initial text reading path with the minimum weight accumulated value as the shortest text reading path in the M initial text reading paths.

Further, the computer device may determine the text node and the text reference relation included in the shortest text reading path as the first recommended content corresponding to the first query information, and display the first recommended content in the query page, so that the user may quickly determine the text reading order in the first drawing content.

Optionally, the computer device may store the first query information, the first initial text, the shortest text reading path, and other information, so as to facilitate subsequent data inspection. When the computer device obtains the second query information input by the user in the search engine, the computer device may also obtain a second initial text associated with the second query information from the search engine by calling an application program interface corresponding to the search engine, and if the second initial text is the same as the first initial text corresponding to the first query information, the computer device may determine the first recommended content as a second recommended content corresponding to the second query information, and display the second recommended content in the query page, that is, the computer device may directly determine the second recommended content corresponding to the second query information without performing operations such as constructing a reference relation graph, finding a shortest text reading path, and the like, and may improve recommendation efficiency of text data. When the second initial text is different from the first initial text, the second recommended content corresponding to the second query information still needs to be determined according to the above procedure. It should be noted that, the texts collected in the search engine are continuously updated, so that, when it is determined whether the first initial text and the second initial text are the same, the obtaining times corresponding to the first initial text and the second initial text, respectively, may also be obtained, and if the time difference between the two is too large (for example, more than 1 month), the subsequent operation still needs to be performed on the second initial text to determine the second recommended content corresponding to the second query information.

Optionally, the data recommendation method provided by the embodiment of the application may be applied to any scene related to text retrieval, for example, a medical query scene, an article retrieval scene, a news distribution scene, a blog search scene, a log search scene, and the like. Taking a medical query scenario as an example, after a user inputs medical query information in a query page, a computer device may obtain the medical query information input in the query page, obtain an initial medical article associated with the medical query information in a search engine, construct a reference relationship graph through a text reference relationship between the initial medical articles (the text reference relationship at this time may include a forwarding relationship, a content copying relationship, and the like), and obtain a text recommendation weight and an edge weight in the reference relationship graph; further, a target medical article can be screened from the citation relationship diagram (compared with the initial medical article, the target medical article can better help a user to be familiar with preliminary knowledge in the field to which medical query information belongs); and searching the shortest text reading path containing the target medical article in the citation relation graph, and displaying the medical article and the citation relation contained in the shortest text reading path in the query page.

Referring to fig. 7 together, fig. 7 is a schematic view of a recommendation scene of medical text data according to an embodiment of the present application. As shown in fig. 7, a user may start an instant messaging application in a used user terminal 40a, open a payment page in the instant messaging application, and display services provided by the instant messaging application in the payment page, such as a living payment, a city service, a medical health service, an insurance service, and the like, when the user performs a trigger operation on the medical health service in the payment page, a computer device may respond to the trigger operation on the medical health service, and display a medical health query page in the user terminal, where an input box 40c and a query function control 40d may be displayed, the user may input medical query information "what skin allergy should be noticed" in the input box 40c, and after the user terminal 40a obtains the query information "what skin allergy should be noticed", may obtain an initial medical article corresponding to the query information "what skin allergy should be noticed" in a search engine, and then, by using the text reference relation corresponding to the initial medical article, a shortest text reading path 40e corresponding to "what should be noticed by skin allergy" is generated, and the shortest text reading path 40e is displayed in the medical query page, so that the user can determine the reading sequence of the medical article according to the shortest text reading path 40e and quickly get familiar with the medical knowledge corresponding to "what should be noticed by skin allergy". The generation process of the shortest text reading path 40e may refer to the above step S201 to step S208, or refer to the embodiment corresponding to fig. 3, which is not described herein again.

Please refer to fig. 8, and fig. 8 is a schematic structural diagram of a data recommendation device according to an embodiment of the present application. As shown in fig. 8, the data recommendation apparatus 1 may include: the relationship graph constructing module 11, the screening module 12 and the reading path determining module 13;

the relation graph building module 11 is configured to obtain a first initial text associated with the first query information, obtain an associated text corresponding to the first initial text according to a text reference relation corresponding to the first initial text, and build a reference relation graph including the first initial text and the associated text;

the screening module 12 is configured to screen a target text corresponding to the first query information from the associated text and the first initial text according to the reference relationship graph;

and the reading path determining module 13 is configured to determine a shortest text reading path including the target text in the reference relationship graph, and generate first recommended content for responding to the first query information according to the shortest text reading path.

The specific functional implementation manners of the relationship diagram building module 11, the screening module 12, and the reading path determining module 13 may refer to steps S101 to S103 in the embodiment corresponding to fig. 3, which are not described herein again.

In some possible embodiments, the relationship graph building module 11 may include: the system comprises a query information determining unit 111, an interface calling unit 112, a text similarity obtaining unit 113, an associated text determining unit 114 and a constructing unit 115;

a query information determining unit 111 configured to acquire one or more keywords input in a search engine, and determine the one or more keywords as first query information;

the interface calling unit 112 is used for calling an application program interface in a search engine and acquiring at least two texts to be recommended contained in the search engine;

the text similarity obtaining unit 113 is configured to obtain text similarities between the first query information and the at least two texts to be recommended, and determine, according to the text similarities, a first initial text associated with the first query information in the at least two texts to be recommended.

An associated text determining unit 114, configured to obtain a reference text and a referred text corresponding to the first initial text according to a text reference relationship corresponding to the first initial text, and determine the reference text and the referred text as associated texts;

the constructing unit 115 is configured to determine both the first initial text and the associated text as text nodes, and construct a reference relationship graph including the text nodes according to the text reference relationship between the text nodes.

For specific functional implementation manners of the query information determining unit 111, the interface calling unit 112, the text similarity obtaining unit 113, the associated text determining unit 114, and the constructing unit 115, reference may be made to step S201 to step S203 in the embodiment corresponding to fig. 6, which is not described herein again.

In some possible embodiments, the screening module 12 may include: a text quantity acquisition unit 121, a target text determination unit 122;

a text quantity obtaining unit 121, configured to determine both the first initial text and the associated text as text nodes, and obtain a quantity of referred texts corresponding to the text nodes in the reference relationship graph;

the target text determining unit 122 is configured to determine, if there are text nodes whose number of referred texts is greater than the number threshold, the text nodes whose number of referred texts is greater than the number threshold as the target text corresponding to the first query information.

The specific functional implementation manners of the text number obtaining unit 121 and the target text determining unit 122 may refer to step S102 in the embodiment corresponding to fig. 3, which is not described herein again.

In some possible embodiments, the reading path determining module 13 may include: a weight acquisition unit 131, a weight accumulated value determination unit 132, a first shortest path determination unit 133;

the weight obtaining unit 131 is configured to obtain text recommendation weights and edge weights corresponding to the reference relationship graph, and construct M initial text reading paths including the target text in the reference relationship graph; the edge weight is used for representing the relevance between two texts with text reference relations in the reference relation graph, and M is a positive integer;

the weight accumulated value determining unit 132 is configured to determine, according to the text recommendation weight and the edge weight included in each of the M initial text reading paths, a weight accumulated value corresponding to each of the M initial text reading paths;

the first shortest path determining unit 133 is configured to determine, as the shortest text reading path, the initial text reading path corresponding to the smallest weight accumulation value among the M initial text reading paths.

The specific functional implementation manners of the weight obtaining unit 131, the weight accumulated value determining unit 132, and the first shortest path determining unit 133 may refer to step S208 in the embodiment corresponding to fig. 6, which is not described herein again.

In some possible embodiments, the reading path determining module 13 may include: a weight obtaining unit 131, a first spanning tree obtaining unit 134, a second spanning tree constructing unit 135, a second shortest path determining unit 136;

the weight obtaining unit 131 is further configured to obtain a text recommendation weight and an edge weight corresponding to the reference relationship graph, and construct a first text sub-graph containing the target text in the reference relationship graph; the edge weight is used for representing the relevance between two texts with text reference relations in the reference relation graph;

a first spanning tree obtaining unit 134, configured to obtain a first minimum spanning tree in the first text sub-image according to the edge weight and the text recommendation weight included in the first text sub-image; the first minimum spanning tree is a spanning tree with a minimum weight accumulated value in the first text subgraph, the first minimum spanning tree comprises target texts, and the minimum weight accumulated value in the first text subgraph is an accumulated value of text recommendation weights and edge weights in the first minimum spanning tree;

the second spanning tree construction unit 135 is configured to construct a second text sub-graph containing the target text in the reference relationship graph, update the first minimum spanning tree according to the text recommendation weight and the edge weight contained in the second text sub-graph, and generate a second minimum spanning tree in the second text sub-graph; the second minimum spanning tree is a spanning tree with a minimum weight accumulated value in the second text subgraph, the weight accumulated value corresponding to the second minimum spanning tree is smaller than the weight accumulated value corresponding to the first minimum spanning tree, the second minimum spanning tree comprises target texts, and the minimum weight accumulated value in the second text subgraph is an accumulated value of text recommendation weight and edge weight in the second minimum spanning tree;

the second shortest path determining unit 136 is configured to determine the second minimum spanning tree as the shortest text reading path if the second minimum spanning tree is the spanning tree with the smallest accumulated weight value in the reference relationship graph.

For specific functional implementation manners of the weight obtaining unit 131, the first spanning tree obtaining unit 134, the second spanning tree constructing unit 135, and the second shortest path determining unit 136, reference may be made to step S103 in the embodiment corresponding to fig. 3, which is not described herein again. When the weight accumulated value determining unit 132, the first shortest path determining unit 133, and the first spanning tree obtaining unit 134, the second spanning tree constructing unit 135, and the second shortest path determining unit 136 are performing the corresponding operations, the operations are suspended; when the first spanning tree obtaining unit 134, the second spanning tree construction unit 135, and the second shortest path determination unit 136 are performing the corresponding operations, the weight accumulated value determination unit 132, and the first shortest path determination unit 133 each suspend performing the operations.

In some possible embodiments, the weight obtaining unit 131 may include: a first text weight determining subunit 1311, a text node acquiring subunit 1312, an edge weight determining subunit 1313;

a first text weight determining subunit 1311, configured to determine both the first initial text and the associated text as text nodes in the reference relationship graph, obtain a text ranking value and a rating feature corresponding to the text node, and determine a text recommendation weight corresponding to the text node according to the text ranking value and the rating feature;

a text node obtaining subunit 1312, configured to obtain a text node v with a text reference relationship in the reference relationship graph_iAnd a text node v_j(ii) a i and j are positive integers less than or equal to the number of text nodes;

an edge weight determination subunit 1313 for obtaining the text node v_iAnd a text node v_jThe reference frequency between the text nodes v is determined according to the reference frequency_jAnd a text node v_iEdge weights in between; the reference frequency refers to the text node v_jAt text node v_iThe number of times of being referenced in the text content of (a).

The specific functional implementation manners of the first text weight determining subunit 1311, the text node obtaining subunit 1312, and the edge weight determining subunit 1313 may refer to step S103 in the embodiment corresponding to fig. 3, which is not described herein again.

In some possible embodiments, the weight obtaining unit 131 may include: a second text weight determining subunit 1314, an initial node vector determining subunit 1315, an encoding subunit 1316, a second edge weight determining subunit 1317;

the second text weight determining subunit 1314 is configured to determine both the first initial text and the associated text as text nodes in the reference relationship graph, convert the text nodes into text characterization vectors, and determine text recommendation weights corresponding to the text nodes according to the text characterization vectors;

an initial node vector determining subunit 1315, configured to generate an initial node vector corresponding to the text node according to the text reference relationship of the text node in the reference relationship graph;

the encoding subunit 1316, configured to input the initial node vector to the graph convolution network, perform information encoding on the initial node vector according to the graph convolution network, and generate a node encoding vector corresponding to the initial node vector;

a second edge weight determining subunit 1317, configured to determine, according to the node encoding vector, an edge weight between any two text nodes having a text reference relationship in the reference relationship graph.

For specific functional implementation manners of the second text weight determining subunit 1314, the initial node vector determining subunit 1315, the encoding subunit 1316, and the second edge weight determining subunit 1317, reference may be made to steps S205 to S207 in the embodiment corresponding to fig. 6, which is not described herein again. When the first text weight determining subunit 1311, the text node obtaining subunit 1312 and the edge weight determining subunit 1313 perform corresponding operations, the second text weight determining subunit 1314, the initial node vector determining subunit 1315, the encoding subunit 1316 and the second edge weight determining subunit 1317 pause to perform operations; when the second text weight determining subunit 1314, the initial node vector determining subunit 1315, the encoding subunit 1316, and the second edge weight determining subunit 1317 are performing the corresponding operations, the first text weight determining subunit 1311, the text node obtaining subunit 1312, and the edge weight determining subunit 1313 each suspend performing the operations.

In some possible embodiments, the reading path determining module 13 may include: a text reference relation acquisition unit 137, a recommended content determination unit 138;

a text reference relation obtaining unit 137, configured to obtain a path text node and a path text reference relation included in the shortest text reading path; the path text reference relation is used for indicating the reading sequence of the path text nodes;

and a recommended content determining unit 138, configured to determine, according to the path text node and the path text reference relationship, a first recommended content corresponding to the first query information.

The path text node comprises at least two documents;

the recommended content determining unit 138 is specifically configured to:

The specific functional implementation manners of the text reference relationship obtaining unit 137 and the recommended content determining unit 138 may refer to step S103 in the embodiment corresponding to fig. 3, which is not described herein again.

In some possible embodiments, the data processing recommendation device 1 may further include: an initial text search module 14, a recommendation module 15;

an initial text search module 14, configured to obtain second query information input in a search engine, and obtain, in the search engine, a second initial text associated with the second query information;

and the recommending module 15 is configured to determine the first recommended content as a second recommended content corresponding to the second query information if the second initial text is the same as the first initial text, and display the second recommended content in the query page.

The specific functional implementation manners of the initial text search module 14 and the recommendation module 15 may refer to step S208 in the embodiment corresponding to fig. 6, which is not described herein again.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 9, the computer apparatus 1000 may include: the processor 1001, the network interface 1004, and the memory 1005, and the computer apparatus 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. Optionally, the network interface 1004 may include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory 1005 may also be at least one memory device located remotely from the processor 1001. As shown in fig. 9, a memory 1005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the computer device 1000 shown in fig. 9, the network interface 1004 may provide a network communication function; the user interface 1003 is an interface for providing a user with input; and the processor 1001 may be used to invoke a device control application stored in the memory 1005 to implement:

It should be understood that the computer device 1000 described in this embodiment of the application may perform the description of the data recommendation method in the embodiment corresponding to any one of fig. 3 and fig. 6, and may also perform the description of the data recommendation device 1 in the embodiment corresponding to fig. 8, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Further, here, it is to be noted that: an embodiment of the present application further provides a computer-readable storage medium, where the computer program executed by the aforementioned data recommendation apparatus 1 is stored in the computer-readable storage medium, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the data recommendation method in the embodiment corresponding to any one of fig. 3 and fig. 6 can be performed, so that details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium referred to in the present application, reference is made to the description of embodiments of the method of the present application. As an example, the program instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network, which may constitute a block chain system.

Further, it should be noted that: embodiments of the present application also provide a computer program product or computer program, which may include computer instructions, which may be stored in a computer-readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor can execute the computer instruction, so that the computer device executes the description of the data recommendation method in the embodiment corresponding to any one of fig. 3 and fig. 6, which will not be described herein again. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer program product or the computer program referred to in the present application, reference is made to the description of the embodiments of the method of the present application.

It should be noted that, for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the order of acts described, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The modules in the device can be merged, divided and deleted according to actual needs.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. A method for recommending data, comprising:

acquiring a first initial text associated with first query information, acquiring an associated text corresponding to the first initial text according to a text reference relation corresponding to the first initial text, and constructing a reference relation graph containing the first initial text and the associated text;

2. The method according to claim 1, wherein the obtaining of the associated text corresponding to the first initial text according to the text reference relationship corresponding to the first initial text, and the building of the reference relationship graph including the first initial text and the associated text comprises:

acquiring a reference text and a referenced text corresponding to the first initial text according to a text reference relation corresponding to the first initial text, and determining the reference text and the referenced text as the associated text;

and determining the first initial text and the associated text as text nodes, and constructing a reference relation graph containing the text nodes according to the text reference relation between the text nodes.

3. The method according to claim 1, wherein the filtering, according to the reference relationship diagram, a target text corresponding to the first query information from the associated text and the first initial text includes:

determining the first initial text and the associated text as text nodes, and acquiring the number of referred texts corresponding to the text nodes in the reference relation graph;

and if the text nodes with the number of the referred texts larger than the number threshold exist, determining the text nodes with the number of the referred texts larger than the number threshold as the target texts corresponding to the first query information.

4. The method of claim 1, wherein determining the shortest text reading path containing the target text in the reference relationship graph comprises:

acquiring text recommendation weights and edge weights corresponding to the reference relationship graph, and constructing M initial text reading paths containing the target text in the reference relationship graph; the edge weight is used for representing the relevance between two texts with text reference relations in the reference relation graph, and M is a positive integer;

determining weight accumulated values corresponding to the M initial text reading paths according to text recommendation weights and edge weights contained in the M initial text reading paths respectively;

and determining the initial text reading path corresponding to the minimum weight accumulated value as the shortest text reading path in the M initial text reading paths.

5. The method of claim 1, wherein determining the shortest text reading path containing the target text in the reference relationship graph comprises:

acquiring a text recommendation weight and an edge weight corresponding to the reference relation graph, and constructing a first text subgraph containing the target text in the reference relation graph; the edge weight is used for representing the relevance between two texts with text reference relations in the reference relation graph;

acquiring a first minimum spanning tree in the first text sub-image according to the edge weight and the text recommendation weight contained in the first text sub-image; the first minimum spanning tree is a spanning tree with a minimum weight accumulated value in the first text subgraph, the first minimum spanning tree comprises the target text, and the minimum weight accumulated value in the first text subgraph is an accumulated value of text recommendation weights and edge weights in the first minimum spanning tree;

constructing a second text sub-graph containing the target text in the reference relation graph, and updating the first minimum spanning tree according to the text recommendation weight and the edge weight contained in the second text sub-graph to generate a second minimum spanning tree in the second text sub-graph; the second minimum spanning tree is a spanning tree with a minimum weight accumulated value in the second text subgraph, the weight accumulated value corresponding to the second minimum spanning tree is smaller than the weight accumulated value corresponding to the first minimum spanning tree, the second minimum spanning tree comprises the target text, and the minimum weight accumulated value in the second text subgraph is an accumulated value of text recommendation weights and edge weights in the second minimum spanning tree;

and if the second minimum spanning tree is the spanning tree with the minimum weight accumulated value in the reference relationship graph, determining the second minimum spanning tree as the shortest text reading path.

6. The method according to claim 4 or 5, wherein the obtaining of the text recommendation weight and the edge weight corresponding to the reference relationship graph comprises:

determining the first initial text and the associated text as text nodes in the reference relationship graph, acquiring text ranking values and rating features corresponding to the text nodes, and determining text recommendation weights corresponding to the text nodes according to the text ranking values and the rating features;

acquiring text nodes v with text reference relations in the reference relation graph_iAnd a text node v_j(ii) a i and j are positive integers less than or equal to the number of the text nodes;

acquiring the text node v_iAnd said text node v_jAccording to the reference frequency, determining the text node v_jAnd said text node v_iEdge weights in between; the reference frequency refers to the text node v_jAt the text node v_iThe number of times of being referenced in the text content of (a).

7. The method according to claim 4 or 5, wherein the obtaining of the text recommendation weight and the edge weight corresponding to the reference relationship graph comprises:

determining the first initial text and the associated text as text nodes in the reference relationship graph, converting the text nodes into text characterization vectors, and determining text recommendation weights corresponding to the text nodes according to the text characterization vectors;

generating an initial node vector corresponding to the text node according to the text reference relation of the text node in the reference relation graph;

inputting the initial node vector into a graph convolution network, and carrying out information coding on the initial node vector according to the graph convolution network to generate a node coding vector corresponding to the initial node vector;

and determining the edge weight between any two text nodes with text reference relations in the reference relation graph according to the node coding vector.

8. The method of claim 1, wherein obtaining the first initial text associated with the first query information comprises:

acquiring one or more keywords input in a search engine, and determining the one or more keywords as the first query information;

calling an application program interface in the search engine to acquire at least two texts to be recommended contained in the search engine;

acquiring text similarity between the first query information and the at least two texts to be recommended respectively, and determining a first initial text associated with the first query information in the at least two texts to be recommended according to the text similarity.

9. The method of claim 1, wherein the generating of the first recommended content for responding to the first query information according to the shortest text reading path comprises:

acquiring a path text node and a path text reference relation contained in the shortest text reading path; the path text reference relation is used for indicating the reading sequence of the path text nodes;

and determining first recommended content corresponding to the first query information according to the path text node and the path text reference relation.

10. The method of claim 9, wherein the path text node comprises at least two documents;

the determining the first recommended content corresponding to the first query information according to the path text node and the path text reference relationship includes:

acquiring document abstract information corresponding to the at least two documents respectively, determining the document abstract information and the path text reference relationship as first recommended content corresponding to the first query information, and displaying the first recommended content in a query page.

11. The method of claim 1, further comprising:

acquiring second query information input in a search engine, and acquiring a second initial text associated with the second query information in the search engine;

and if the second initial text is the same as the first initial text, determining the first recommended content as second recommended content corresponding to the second query information, and displaying the second recommended content in a query page.

12. A data recommendation device, comprising:

the relation graph building module is used for obtaining a first initial text associated with first query information, obtaining an associated text corresponding to the first initial text according to a text reference relation corresponding to the first initial text, and building a reference relation graph containing the first initial text and the associated text;

13. A computer device comprising a memory and a processor;

the memory is coupled to the processor, the memory for storing a computer program, the processor for invoking the computer program to cause the computer device to perform the method of any of claims 1 to 11.

14. A computer-readable storage medium, in which a computer program is stored which is adapted to be loaded and executed by a processor to cause a computer device having said processor to carry out the method of any one of claims 1 to 11.