CN111984794A

CN111984794A - Entity description extraction method and device based on knowledge graph and computing equipment

Info

Publication number: CN111984794A
Application number: CN201910435222.9A
Authority: CN
Inventors: 朱坤鸿; 张晨; 周梁
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2019-05-23
Filing date: 2019-05-23
Publication date: 2020-11-24

Abstract

The invention discloses a knowledge graph-based entity description extraction method, a knowledge graph-based entity description extraction device and a knowledge graph-based entity description extraction computing device, wherein the method comprises the following steps: step S1, extracting an entity description set of a given entity from a knowledge graph database; step S2, aiming at each non-repeated entity description in the entity description set, calculating the confidence degree of the entity description according to the similarity between each non-repeated entity description and the frequency of each non-repeated entity description appearing in the entity description set; and step S3, according to the confidence of each entity description, screening out at least one entity description from the entity description set as a standby entity description of the entity. In the scheme of the invention, when the quality of an entity description is quantized, the quantization result is obtained by depending on the frequency of occurrence of each entity description and the similarity between each entity description and the entity description, so that the calculation result is more reliable.

Description

Entity description extraction method and device based on knowledge graph and computing equipment

Technical Field

The invention relates to the technical field of computers, in particular to an entity description extraction method, an entity description extraction device and computing equipment based on a knowledge graph.

Background

Modern search engines constantly perfect the product functions in the development process, and provide more comprehensive and convenient information services for users. The right-side recommendation system displays entities related to user query to the user in the form of pictures, corresponding names and category labels, so that the user can more conveniently know other knowledge information related to query, and good user experience is obtained. Fig. 1 shows a schematic diagram of the right-hand recommendations when there is no reason to search for a pommel dog. As shown in fig. 1, the user can obtain other knowledge information related to the bomei dog based on guessing your likes, related creatures, guessing your attention, and other people searching for recommended contents in the four blocks. However, most users do not know the working principle behind the recommendation system, nor are they particularly aware of the content recommended by the recommendation system. In this case, the recommendation reason can be displayed to the user in an intuitive form, so that the search experience of the user and the trust of the search system can be greatly improved. One more typical way is to present the user with an introduction to the recommending entity itself.

In the prior art, mining of recommended reasons is usually implemented in a template-based manner, wherein the source of the template is mainly two types as follows: the first method is a Bootstrap type semi-automatic acquisition method based on high-quality knowledge triple seeds. The method assumes that sentences in which two entity words appear simultaneously describe the relationship between the entity pairs, is suitable for large-scale knowledge extraction scenes, but is easy to introduce noise to cause rapid degradation of system performance along with the increase of iteration times. Second, a human expert-defined method. The template extraction result of the method is generally high in quality, but due to the flexibility and diversity of natural language expression, the situation that the extracted content is wrong or low in quality still inevitably occurs. Therefore, how to let the machine automatically compare and filter the extracted contents is a challenge in the current knowledge extraction field to reduce the labor cost and improve the knowledge acquisition efficiency.

However, in the aspect of knowledge quality calculation, a main existing method is to calculate the support degree of the template and the confidence degree of the knowledge through frequency statistics, and further quantify the quality of the extraction result. However, the frequency is only an external factor for measuring the knowledge quality, and the knowledge quality cannot be measured comprehensively, for example, the quality may be very high although some knowledge appears less frequently; moreover, the quantization method cannot further distinguish knowledge with the same frequency statistics. And another main mode is to manually label a batch of high-quality data and then sort the sentences through sorting learning, and the method is time-consuming, labor-consuming and high in cost and cannot be well expanded in the open field.

Disclosure of Invention

In view of the above, the present invention has been made to provide a method, apparatus and computing device for extracting an entity description based on a knowledge-graph that overcomes or at least partially solves the above problems.

According to one aspect of the invention, the invention provides a knowledge-graph-based entity description extraction method, which comprises the following steps:

step S1, extracting an entity description set of a given entity from a knowledge graph database;

Step S2, calculating the confidence of each non-repeated entity description in the entity description set according to the similarity between each non-repeated entity description and the frequency of each non-repeated entity description appearing in the entity description set;

step S3, according to the confidence of each entity description, screening out at least one entity description from the entity description set as a standby entity description of the entity.

According to another aspect of the present invention, there is provided a method for pushing a search engine recommendation reason, including:

obtaining a recommendation result of a search engine;

taking the recommendation result as a given entity, and obtaining a standby entity description corresponding to the recommendation result by using the entity description extraction method based on the knowledge graph;

and selecting the entity description from the standby entity descriptions as a recommendation reason to be presented on a search result presentation page.

According to still another aspect of the present invention, there is provided a knowledge-graph-based entity description extracting apparatus, including:

the extraction module is suitable for extracting an entity description set of a given entity from a knowledge graph database;

A confidence calculation module adapted to calculate, for each non-repeated entity description in the set of entity descriptions, a confidence of the entity description according to a similarity between the respective non-repeated entity description and the entity description, and a frequency of occurrence of the respective non-repeated entity description in the set of entity descriptions;

and the screening module is suitable for screening at least one entity description from the entity description set according to the confidence degree of each entity description to serve as a standby entity description of the entity.

According to another aspect of the present invention, there is provided a method for pushing a reason for a search engine recommendation, including:

the acquisition module is suitable for acquiring a recommendation result of the search engine;

an entity description extraction module, adapted to use the recommendation result as a given entity, and obtain a standby entity description corresponding to the recommendation result by using the entity description extraction device based on the knowledge graph;

and the selection module is suitable for selecting the entity description from the standby entity descriptions as a recommendation reason to be presented on a search result display page.

According to an aspect of the present invention, there is provided a computing device comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

The memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the entity description extracting method based on the knowledge graph.

According to another aspect of the present invention, there is provided a computing device comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the pushing method of the search engine recommendation reason.

According to still another aspect of the present invention, a computer storage medium is provided, in which at least one executable instruction is stored, and the executable instruction causes a processor to perform operations corresponding to the above-mentioned method for extracting an entity description based on a knowledge graph.

According to still another aspect of the present invention, there is provided a computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform an operation corresponding to the push method for the reason recommended by the search engine.

According to the entity description extraction method, device and computing equipment based on the knowledge graph, an entity description set is extracted and obtained for a given entity; for each non-repeated entity description in the entity description set, calculating the confidence of the entity description according to the similarity between each non-repeated entity description and the frequency of occurrence of each non-repeated entity description in the entity description set; and selecting a backup entity description for the given entity based on the confidence level calculation. Therefore, the scheme of the embodiment does not need to depend on large-scale labeled data, and can be widely applied to extraction of large-scale open domain unsupervised entity description; in addition, compared with a mode of measuring the quality of the entity description only according to the frequency, the method further considers the content information of the entity description, so that the calculation result of the confidence degree is more reliable, and correspondingly, a spare entity description with higher quality can be screened out to be displayed to a user.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 shows a schematic diagram of the right side recommendations when searching for a Bomei dog for no reason for recommendation;

FIG. 2 illustrates a flow diagram of a knowledge-graph based entity description extraction method according to one embodiment of the invention;

FIG. 3 illustrates a flow diagram of a knowledge-graph based entity description extraction method according to another embodiment of the invention;

FIG. 4 shows a flow diagram of a push method for search engine recommendation grounds according to one embodiment of the invention;

FIG. 5 shows a schematic diagram of the right side recommendations when searching for a Bomei dog for reasons of recommendation;

FIG. 6 shows a functional block diagram of an apparatus for knowledge-graph based entity description extraction according to one embodiment of the present invention;

FIG. 7 shows a functional block diagram of a push device for search engine reason recommendation according to another embodiment of the present invention;

FIG. 8 illustrates a schematic structural diagram of a computing device in accordance with an embodiment of the present invention;

FIG. 9 shows a schematic structural diagram of a computing device according to an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

FIG. 2 shows a flowchart of a knowledge-graph based entity description extraction method according to one embodiment of the invention. As shown in fig. 2, the method includes:

in step S201, for a given entity, an entity description set of the entity is extracted from a knowledge graph database.

The scheme of the invention is used for extracting a high-quality entity description introducing the entity for the given entity, namely the standby entity description.

Where a given entity refers to any entity that may present a recommendation in a recommendation page. Taking the search for beauties in a search engine (see fig. 1), it is shown that the recommended entities include related organisms such as pandas, teacup dogs, and silver fox dogs, which can be respectively used as the given entities of the present invention.

Specifically, an entity description set is extracted from a knowledge graph database, and the entity description set comprises a plurality of entity descriptions. In the invention, the source of the knowledge graph data is not limited, and optionally, knowledge graph underlying data of a specific search engine can be directly used as the knowledge graph data in the invention, or a plurality of knowledge graph data can be integrated to obtain the knowledge graph data in the invention. In the present invention, the specific manner of extracting the entity description set is not limited, and any manner of extracting knowledge is included in the scope of the present invention.

Step S202, for each non-repeated entity description in the entity description set, calculating a confidence of the entity description according to a similarity between each non-repeated entity description and the entity description and a frequency of occurrence of each non-repeated entity description in the entity description set.

In the present invention, each constituent element in the entity description set is denoted as an entity description, and accordingly, the entity description set is composed of a plurality of elements, and the entity description set includes a plurality of entity descriptions. The repeated entity description exists in the plurality of entity descriptions, and the non-repeated entity description refers to the entity description which is remained after the repeated entity description is removed. For example, the entity descriptions remaining after the 2 nd and later repeated occurrences of the same entity description in the entity description set are removed.

Wherein the frequency of occurrence of the entity description in the entity description set refers to the number of occurrences in the entity description set,

in this step, a confidence is calculated for each non-duplicate entity description to quantify the quality of that entity description. In particular, through big data analysis, the quality of the entity description is not only related to the frequency of occurrence of the entity description in the entity description set, but also related to the content information of the entity description itself. In the invention, the similarity between each non-repeated entity description and the entity description is used for representing the content information of the entity description, and the confidence of the entity description is comprehensively calculated according to the similarity data of the content information of the representation entity description and the frequency data of the use frequency information outside the representation entity description. According to the result of the big data analysis, it can be determined that the higher the overall similarity between the current entity description and the rest of the entity descriptions is, the higher the quality of the current entity description is, and the higher the frequency of occurrence of the current entity description is, the higher the quality of the current entity description is, and the confidence coefficient is calculated based on the rule, but a specific calculation mode is not limited.

Step S203, according to the confidence of each entity description, at least one entity description is screened out from the entity description set to be used as a standby entity description of the entity.

In particular, the confidence level of the entity description is a quantification of the quality of the entity description, and the backup entity description of the entity is screened from the entity description set according to the confidence level so as to ensure the quality of the recommendation reason selected and presented in the recommendation page. In the present invention, a specific screening manner is not limited, and optionally, a preset number of entity descriptions may be screened out according to a sequence from high confidence to low confidence, or an entity description with a confidence higher than a preset confidence threshold may be screened out.

According to the entity description extraction method based on the knowledge graph provided by the embodiment, an entity description set is extracted and obtained for a given entity; for each non-repeated entity description in the entity description set, calculating the confidence of the entity description according to the similarity between each non-repeated entity description and the frequency of occurrence of each non-repeated entity description in the entity description set; and selecting a backup entity description for the given entity based on the confidence level calculation. Therefore, the scheme of the embodiment does not need to depend on large-scale labeled data, and can be widely applied to extraction of large-scale open domain unsupervised entity description; in addition, compared with a mode of measuring the quality of the entity description only according to the frequency, the method further considers the content information of the entity description, so that the calculation result of the confidence degree is more reliable, and correspondingly, a spare entity description with higher quality can be screened out to be displayed to a user.

FIG. 3 shows a flow diagram of a knowledge-graph based entity description extraction method according to another embodiment of the invention. As shown in fig. 3, the method includes:

step S301, for a given entity, an entity description set of the entity is extracted from a knowledge graph database.

In particular, the knowledge-graph data is used as raw data for extracting entity descriptions, wherein the knowledge-graph data includes, but is not limited to, entity profiles, textual descriptions, semantic tags, and/or existing descriptions, which are generally derived from lexical items of various encyclopedias, entity descriptions of a given entity as recognized by the industry. And constructing one or more extraction models according to the general linguistic features of the entity description, and extracting an entity description set of the entity from the knowledge map database by using the one or more extraction models. And extracting entity descriptions meeting the model characteristics of the extraction model from the entity brief description, the text description and/or the semantic labels by using each extraction model respectively.

For example, words such as "is a kind of" is honored as "and" is regarded as "are general linguistic features of the entity description, and a matching regular expression is written as an extraction template for the general linguistic features, so that the entity description set can be extracted.

Further, in the process of extracting the entity description set by using the extraction model, the same or different entity descriptions can be extracted from the knowledge graph data of the entity by using different templates, wherein for the case of extracting the same entity description, for example, the same entity description can be extracted from the entity brief description and the body description by using different extraction templates, so that the entity descriptions contained in the entity description set are repeated.

Step S302, filtering out the entity description which is extracted with errors from the entity description set.

After the entity description set is obtained through extraction, entity descriptions which obviously do not accord with the characteristics of the entity descriptions and are contained in the entity description set are filtered out, so that too much noise is prevented from being introduced in the subsequent confidence degree calculation process. In practice, due to the diversity of natural languages, extraction results in a large-scale scene are prone to extraction errors, and preliminary filtering needs to be performed on the extraction results.

Specifically, the erroneous entity description can be extracted from the entity description set according to the composition features of the entity description and/or the linguistic features of the non-entity description, wherein the composition features comprise length features and/or punctuation features.

In some optional embodiments, for each non-repeated entity description of the entity description set, it is determined whether the length of the entity description is within a preset length interval, and if not, the entity description is filtered out from the entity description set. Wherein, according to the regular length of the sentence, the specific range between the regular length intervals of the sentence is set as the preset length interval, and the preset length interval is set as 1-19 characters (in this case, the length of the entity is at least 1 character by default) on the assumption that the regular length of the sentence is between 2 characters and 20 characters. In this way, entity descriptions that do not conform to the length characteristics may be filtered out.

And/or, in other alternative embodiments, for each non-repeated entity description of the entity description set, determining whether the entity description includes a preset symbol, and if so, filtering the entity description from the entity description set. The predetermined symbols may refer to any punctuation marks, including semicolon, comma, period, question mark, etc., and in practice, the entity is described as a phrase or short sentence, which usually does not contain punctuation marks. In this way, entity descriptions that do not match punctuation characteristics can be filtered out.

And/or, in some alternative embodiments, a conflict model that conflicts with the extraction model may be constructed based on linguistic characteristics of non-entity descriptions, and a conflict entity description that satisfies model characteristics of the conflict model may be extracted from the knowledge graph database using the conflict model corresponding to one or more extraction models, e.g., "…", "because statements in the sentence structure of …" do not fit as entity descriptions, and statements that do not fit as entity descriptions, i.e., conflict entity descriptions, may be extracted using the conflict model. And aiming at each non-repeated entity description in the entity description set, judging whether the entity description is the same as the conflict entity description, and if so, filtering the entity description from the entity description set. In this way, statements that are not suitable as descriptions of entities may be filtered out.

It should be noted that the present invention is not limited to the three filtering manners listed above, and in the specific implementation, a person skilled in the art may also use other manners to perform filtering, optionally, filtering is performed according to the character content of the entity description, and if the entity description further includes a given entity, filtering is performed.

Further, for each non-repetitive entity description in the entity description set, filtering out the entity description from the entity description set means to filter out all entity descriptions in the entity description set that are the same as the text of the entity description, for example, if an entity description a appears 3 times in the entity description set, the entity description a corresponding to 3 element positions is filtered out.

Step S303, for each non-repeated entity description in the entity description set, calculating a confidence of the entity description according to the similarity between each non-repeated entity description and the frequency of occurrence of each non-repeated entity description in the entity description set.

In this embodiment, the process of calculating the confidence level of the entity description includes the following steps:

step one, a probability transfer matrix is constructed.

Constructing a N-by-N probability transition matrix according to the frequency of the non-repetitive entity description in the entity description set and the similarity between the non-repetitive entity descriptions; and N is the number of non-repeated entity descriptions in the entity description set.

Specifically, for each non-repeated entity description, counting the frequency of the entity description appearing in the entity description set; and the similarity between the non-repeated entity descriptions can be obtained by adopting an unsupervised sentence similarity calculation method, such as the similarity of a jaccard, the edit distance, a vector space model, the average or weighted summation of word vectors/word vectors, the similarity calculation combining with an external language knowledge base (such as a knowledgenet), and the like.

For example, the entity description set is { a, b, c, b, c, c, d, a, b, c }, where the non-repetitive entity descriptions include a, b, c, d, the 4 entity descriptions are statistically obtained to have frequencies of 2, 3, 4, 1 respectively in the entity description set, and the following 10 similarity values S are calculated_aa，S_ab，S_ac，S_ad，S_bb，S_bc，S_bd，S_cc，S_cd，S_dd. Accordingly, a 4 x 4 probability transition matrix may be constructed.

Further, in the process of constructing the N × N probability transition matrix, the number of rows corresponds to non-repetitive entity descriptions, for example, row 1 corresponds to non-repetitive entity description a, and row 2 corresponds to non-repetitive entity description b … …, wherein the probability values of N elements in each row are related to the following three values: the similarity of each non-repeated entity description with the non-repeated entity description corresponding to the row, the frequency of the entity description corresponding to the row, and the appearance of any entity description in the entity description setThe maximum frequency. For example, the probability value of the row 1, column 2 element, and S_abThe frequency 2 of occurrence of the entity description a and the maximum frequency 4 of occurrence of the entity description c are related. The probability transition matrix constructed in this way enables the N probability values corresponding to each non-repetitive entity description to be related to not only the frequency, but also the similarity between each non-repetitive entity description and the entity description, and further can comprehensively refer to two information of the frequency and the content for each non-repetitive entity description in subsequent calculation.

In one embodiment of the present invention, the probability transition matrix is set to M, where M is_[i][j]The preparation method comprises the following steps: calculating the sum of the similarity of the ith non-repeated entity description and the jth non-repeated entity description and the frequency of the ith non-repeated entity description appearing in the entity description set to obtain a summation result; calculating the ratio of the summation result to the highest frequency of any non-repeated entity description in the entity description set to obtain M_[i][j]The value of (c). Still taking the example of the entity description set as { a, b, c, b, c, c, d, a, b, c }, the order of the non-repeated entity descriptions is a, b, c, d in turn, a 4 × 4 probability transition matrix can be constructed, and then M is_[2][4]Comprises the following steps: similarity S between entity description b and entity description d_bdAnd the sum of the frequencies 3 of the entity descriptions b occurring in the entity description set (S)_bd+3) to the highest frequency 4 corresponding to the entity description c, i.e. (S)_bd+3)/4。

In addition, in some alternative embodiments, if the knowledge-graph database includes an existing description of a given entity, the existing description may be added to the set of entity descriptions, such that the set of entity descriptions includes an existing description of the entity, which is generally a commonly accepted or recognized description, and the existing description is added to the combination of the entity descriptions, such that the reliability of the extracted result has a relatively adaptive comparison criterion, e.g., a higher confidence level than the existing description, and is determined to be an alternative entity description. Specifically, whether a specific field exists is searched in the data corresponding to the given entity, for example, the specific field is a lexical item field, and if the specific field is found, the field content of the specific field is extracted as the existing description of the given entity. And aiming at the existing description, when counting the frequency, calculating the average frequency value of the frequency of the plurality of non-repeated entity descriptions appearing in the entity description set, and determining the frequency of the existing description according to the average frequency value. If the frequency of the existing description appearing in the entity description set is low, and it is obviously not reasonable to use the low frequency as the frequency of the existing description, in these alternative embodiments, the frequency of the existing description is determined by calculating the average frequency value of the frequencies of the plurality of non-repetitive entity descriptions in the entity description set except the existing description, for example, the average frequency value is used as the frequency of the existing description, and in this way, the determined frequency of the existing description can be more reasonable.

Step two, determining an initial state vector S₀。

In this embodiment, the confidence of each non-repetitive entity description is calculated by a recursive method, and then the recursive initial state vector S needs to be determined₀。

In particular, the initial parameter is in particular an initial state vector S₀Setting a state vector S₀Is a column vector [1/N, 1/N, … …, 1/N]And the number of the rows is N, namely, initially, each non-repeated entity description is endowed with the same initial confidence value of 1/N.

After determining the initial state vector S₀Thereafter, before recurrence in the following step three, the probability transition matrix M is normalized by row.

Step three, according to the transposed matrix and the state vector S of the probability transfer matrix_t-1Recursion to obtain a state vector S_t(ii) a Calculating a state vector S_tEach element in and state vector S_t-1And obtaining a plurality of difference results according to the difference of the corresponding elements.

Wherein, on first recursion, the initial state vector S is₀As a state vector S_t-1In the subsequent recursion, the recurred state vector S is obtained_tAs a state vector S_t-1。

In particular, the state vector S_tAccording to the state vector S_t-1And a transposed matrix M of the probability transfer matrix M^TObtaining the state vector S by setting a recursion expression _t-1And a transposed matrix M^TSubstituting into the expression, recursion to obtain a state vector S_t. Wherein, the recurrence expression can be set as follows:

S_t＝β·M^TS_t-1+(1-β)·S_t-1(ii) a In the formula, β is a constant which is fixedly set.

Further, the state vector S is obtained by recursion of the expression_tIs a column vector of N rows, where each element corresponds to the confidence of an entity description. Obtaining a state vector S_tThen, the state vector S_tAnd the state vector S_t-1And subtracting the elements in the same row to obtain the difference value of the corresponding element so as to judge whether the recurrence ending condition is met.

Step four, judging whether the sum of the absolute values of the difference results is smaller than a preset target value, if so, carrying out state vector S_tDetermining a confidence for each non-duplicate entity description for each element in the set of elements; if not, the state vector S_tAssigned to the state vector S_t-1And repeatedly executing the third step and the fourth step.

Specifically, the absolute values of the plurality of calculated difference results are summed to obtain a sum of absolute values, whether the sum of absolute values is smaller than a preset target value or not is judged, if the sum of absolute values is smaller than the preset target value, a recurrence ending condition is met, and the confidence of each non-repetitive entity description can be determined, wherein each element in the state vector St is determined as the confidence of each non-repetitive entity description according to the sequence of the non-repetitive entity description corresponding to each row when the probability transition matrix is constructed.

Still by way of example, if the order of the non-repeated entity descriptions above is a, b, c, d, and correspondingly, the 1 st row, the 2 nd row, the 3 rd row, and the 4 th row in the probability transition matrix represent the probabilities corresponding to a, b, c, d, respectively, then the state vector S_tWherein 4 elements are a, b, c,d confidence level.

In addition, if the sum of the absolute values is greater than or equal to a preset target value, the state vector S is added_tAssigned to the state vector S_t-1And repeatedly executing the third step and the fourth step until the sum of the absolute values is less than a preset target value, and obtaining the confidence of each non-repeated entity description.

It should be noted that, in the above steps one to four, the way of calculating the confidence of the entity descriptions according to the frequency and the similarity is only a feasible implementation manner, but the present invention is not limited thereto, and in other alternative embodiments of the present invention, the process of the above steps one to four may be implemented directly by using a probabilistic biased random walk algorithm that fuses the external features, that is, on the basis of the existing random walk algorithm, the probability transition matrix is constructed by combining the occurrence frequency of each non-repetitive entity description, wherein the algorithm input is N non-repetitive entity descriptions and their corresponding frequencies, and the algorithm output is N entity descriptions and their corresponding confidences. The convergence of the algorithm modeled as a markov process is also theoretically guaranteed because the probability transition matrix M satisfies the following three conditions of the markov process, M being a random matrix (all elements of M are equal to or greater than 0 and the sum of the elements of each column is 1), M being irreducible, and M being aperiodic. By calculating the confidence degree in the method, reliability modeling can be performed on the entity description extraction, and the reliability modeling comprises the steps of combining the similarity, the statistical frequency and the semantic item description among the entity descriptions as a reference and combining the external information of the extraction result to perform the reliability measurement on the extraction result in a self-adaptive manner. Alternatively, in some other embodiments of the present invention, the confidence value may be obtained by setting weights for the frequency and the similarity and calculating by weighted summation.

Step S304, according to the confidence of each entity description, at least one entity description is screened out from the entity description set to be used as a standby entity description of the entity.

Wherein the confidence of the entity description is a quantification of the quality of the entity description.

Specifically, the backup entity description is screened from the entity description combination according to the confidence value, so as to obtain a backup recommendation reason of the given entity, and the user is guided to know the given entity by giving a high-quality recommendation reason in the recommendation page.

In some alternative embodiments of the invention, the backup entity descriptions are filtered out according to the average confidence value. Calculating an average confidence value of the confidence degrees of a plurality of non-repeated entity descriptions in the entity description set, and screening at least one entity description with the confidence degree higher than the average confidence value from the entity description set to serve as a standby entity description of the entity. For example, the confidence of the entity descriptions a, b, c and d is 0.2,0.1,0.25 and 0.12 respectively, and the calculated average confidence value is (0.2+0.1+0.25+0.12)/4 ═ 0.1675, then a and c are selected as the backup entity descriptions.

In other alternative embodiments of the present invention, if the entity description set includes an existing description of an entity, at least one entity description with a confidence higher than that of the existing description is selected from the entity description set as a standby entity description of the entity, or on the basis of this, the existing description may be added as a standby entity description. In the above example, if a is the existing description, c is screened out as the standby entity description, or a and c may be used as the standby entity description.

According to the method for extracting the entity description based on the knowledge graph, the entity description which is extracted to be wrong in the extracted entity description set is filtered, and the phenomenon that the confidence coefficient calculation process is excessively noisy due to the extraction of the mistake is avoided; according to the frequency of the non-repeated entity descriptions in the entity description set and the similarity between the non-repeated entity descriptions, a probability transition matrix is constructed, and the confidence of the non-repeated entity descriptions is calculated according to the probability transition matrix, so that the calculation of the confidence is not only dependent on the frequency parameters, but also dependent on the similarity between the entity descriptions, and the calculation result of the confidence is more reliable; moreover, the existing description of the entity is added into the entity description set for calculating the confidence, so that the reliability of the extraction result has a relatively self-adaptive reference. Therefore, the scheme of the embodiment does not depend on large-scale labeling data, and is suitable for large-scale open domain unsupervised entity description extraction; in addition, compared with a mode of measuring the quality of the entity description only according to the frequency, the method further considers the content information of the entity description, so that the calculation result of the confidence degree is more reliable, and correspondingly, a spare entity description with higher quality can be screened out to be displayed to a user.

Fig. 4 shows a flowchart of a push method for search engine recommendation reasons according to an embodiment of the present invention. As shown in fig. 4, the method includes:

step S401, obtaining the recommendation result of the search engine.

The recommendation result refers to the recommendation content matched by the search engine according to the search condition, and referring to fig. 1 in the background art, the user searches for the beauties dog, and the bear dog, the teacup dog and the like recommended in the right recommendation column are recommendation results.

And step S402, taking the recommendation result as a given entity, and obtaining a standby entity description corresponding to the recommendation result by using the entity description extraction method based on the knowledge graph.

And taking the recommendation result as a given entity, namely as an object needing a recommendation reason, and extracting a standby entity description corresponding to the given entity by using the entity description extraction method based on the knowledge graph in the above method embodiment.

And step S403, selecting the entity description from the standby entity descriptions as a recommendation reason to be presented on the search result presentation page.

Specifically, if there are multiple standby entity descriptions, a preset number of entity descriptions are selected from the standby entity descriptions to be presented as a recommendation reason. Optionally, one entity description is selected as the recommendation reason, for example, the entity description with the highest confidence coefficient is selected as the recommendation reason, or when the same entity is recommended for the same user multiple times, different entity descriptions are sequentially selected from the standby entity descriptions as the recommendation reasons, so that repetition of continuous multiple recommendation reasons is avoided.

Fig. 5 shows a schematic diagram of the right-hand recommendation when searching for a bonmei dog for a reason for recommendation. As shown in fig. 5, in the right-side recommendation, a recommendation reason is displayed for each recommendation result, for example, the recommendation reason for a panda is "funny and lovely naughty".

According to the pushing method of the search engine recommendation reasons provided by the embodiment, the search result of the search engine is used as a given entity, the standby entity description is extracted according to the entity description extraction method based on the knowledge graph in the embodiment, and the recommendation reasons are selected from the standby entity description; and showing the recommendation reason and the recommendation result to the user together so as to guide the user to know the recommendation result. Therefore, the proposal of the invention can show high-quality recommendation reasons in the search result display page, thereby improving the user experience.

FIG. 6 shows a functional block diagram of an apparatus for knowledge-graph based entity description extraction, according to one embodiment of the present invention. As shown in fig. 6, the apparatus includes:

an extraction module 601, adapted to extract, for a given entity, an entity description set of the entity from a knowledge graph database;

a confidence calculation module 602, adapted to calculate, for each non-repeated entity description in the set of entity descriptions, a confidence of the entity description according to a similarity between the respective non-repeated entity description and the entity description, and a frequency of occurrence of the respective non-repeated entity description in the set of entity descriptions;

The screening module 603 is adapted to screen at least one entity description from the entity description set as a standby entity description of the entity according to the confidence of each entity description.

In an alternative embodiment, the extraction module is further adapted to: an entity description set of the entity is extracted from a knowledge graph database using one or more extraction models.

In an alternative embodiment, the apparatus further comprises: the filtering module is suitable for judging whether the length of the entity description is within a preset length interval or not aiming at each non-repeated entity description of the entity description set, and if not, filtering the entity description from the entity description set; and/or the presence of a gas in the gas,

and judging whether the entity description contains a preset symbol or not, if so, filtering the entity description from the entity description set.

In an alternative embodiment, the apparatus further comprises: a filtering module adapted to extract a conflict entity description satisfying model characteristics of the conflict model from a knowledge graph database using the conflict model corresponding to the one or more extraction models;

and aiming at each non-repeated entity description in the entity description set, judging whether the entity description is the same as the conflict entity description, and if so, filtering the entity description from the entity description set.

In an alternative embodiment, the confidence calculation module is further adapted to:

constructing a N-N probability transition matrix according to the frequency of the non-repetitive entity description in the entity description set and the similarity between the non-repetitive entity descriptions; wherein, N is the number of non-repeated entity descriptions in the entity description set;

the probability transition matrix is set to M, where M_[i][j]The preparation method comprises the following steps: calculating the sum of the similarity of the ith non-repeated entity description and the jth non-repeated entity description and the frequency of the ith non-repeated entity description appearing in the entity description set to obtain a summation result; calculating the ratio of the summation result to the highest frequency of any non-repeated entity description in the entity description set to obtain M_[i][j]A value of (d);

determining an initial state vector S₀；

Transposed matrix from probability transition matrix and state vector S_t-1Recursion to obtain a state vector S_t(ii) a Calculating a state vector S_tEach element in and state vector S_t-1Obtaining a plurality of difference results according to the difference of the corresponding elements;

judging whether the sum of the absolute values of the difference results is less than a preset target value, if so, determining the shape of the difference result State vector S_tDetermining a confidence for each non-duplicate entity description for each element in the set of elements; if not, the state vector S_tAssigned to the state vector S_t-1Step S33 and step S34 are repeatedly performed.

In an alternative embodiment, the screening module is further adapted to:

calculating an average confidence value of confidence degrees of a plurality of non-repeated entity descriptions in an entity description set, and screening at least one entity description with a confidence degree higher than the average confidence value from the entity description set as a standby entity description of the entity.

In an alternative embodiment, the set of entity descriptions includes existing descriptions of the entity;

the screening module is further adapted to: and screening out at least one entity description with the confidence coefficient higher than that of the existing description from the entity description set as a standby entity description of the entity.

the device further comprises: and the frequency calculation module is suitable for calculating the average frequency value of the frequency of the plurality of non-repeated entity descriptions appearing in the entity description set, and determining the frequency of the existing description according to the average frequency value.

Fig. 7 shows a functional block diagram of a push apparatus for search engine reason recommendation according to another embodiment of the present invention. As shown in fig. 7, the apparatus includes:

an obtaining module 701, adapted to obtain a recommendation result of a search engine;

an entity description extracting module 702, adapted to use the recommendation result as a given entity, and obtain a standby entity description corresponding to the recommendation result by using the entity description extracting apparatus based on a knowledge graph in the above apparatus embodiment;

a selecting module 703 adapted to select an entity description from the standby entity descriptions as a reason for recommendation to be presented on a search result presentation page.

The embodiment of the application provides a non-volatile computer storage medium, wherein the computer storage medium stores at least one executable instruction, and the computer executable instruction can execute the entity description extraction method based on the knowledge graph in any method embodiment.

The embodiment of the application provides a non-volatile computer storage medium, wherein at least one executable instruction is stored in the computer storage medium, and the computer executable instruction can execute a pushing method of the search engine recommendation reason in any method embodiment.

Fig. 8 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device.

As shown in fig. 8, the computing device may include: a processor (processor)802, a Communications Interface 804, a memory 806, and a communication bus 808.

Wherein:

the processor 802, communication interface 804, and memory 806 communicate with one another via a communication bus 808.

A communication interface 804 for communicating with network elements of other devices, such as clients or other servers.

The processor 802, configured to execute the program 810, may specifically perform relevant steps in the aforementioned method for extracting an entity description based on a knowledge-graph.

In particular, the program 810 may include program code comprising computer operating instructions.

The processor 802 may be a central processing unit CPU, or an Application Specific Integrated Circuit ASIC (Application Specific Integrated Circuit), or one or more Integrated circuits configured to implement embodiments of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

The memory 806 stores a program 810. The memory 806 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 810 may be specifically configured to cause the processor 802 to perform the following operations:

In an alternative embodiment, the program 810 may be further specifically configured to cause the processor 802 to:

an entity description set of the entity is extracted from a knowledge graph database using one or more extraction models.

Aiming at each non-repeated entity description in the entity description set, judging whether the length of the entity description is within a preset length interval, if not, filtering the entity description from the entity description set; and/or the presence of a gas in the gas,

extracting a conflict entity description satisfying the model features of the conflict model from a knowledge graph database by using the conflict model corresponding to the one or more extraction models;

step S31, constructing a N-by-N probability transition matrix according to the frequency of each non-repetitive entity description appearing in the entity description set and the similarity between the non-repetitive entity descriptions; wherein, N is the number of non-repeated entity descriptions in the entity description set;

step S32, determining an initial state vector S₀；

Step S33, transpose matrix according to probability transition matrix and state vector S_t-1Recursion to obtain a state vector S_t(ii) a Calculating a state vector S_tEach element in and state vector S_t-1Obtaining a plurality of difference results according to the difference of the corresponding elements;

step S34, determining whether the sum of the absolute values of the difference results is less than a predetermined target value, if yes, determining the state vector S_tDetermining a confidence for each non-duplicate entity description for each element in the set of elements; if not, the state vector S_tAssigned to the state vector S_t-1Step S33 and step S34 are repeatedly performed.

the program 810 may be further specifically configured to cause the processor 802 to perform the following operations:

and screening out at least one entity description with the confidence coefficient higher than that of the existing description from the entity description set as a standby entity description of the entity.

and calculating the average frequency value of the frequencies of the plurality of non-repeated entity descriptions appearing in the entity description set, and determining the frequency of the existing description according to the average frequency value.

Fig. 9 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device.

As shown in fig. 9, the computing device may include: a processor (processor)902, a communication Interface 904, a memory 906, and a communication bus 908.

Wherein:

the processor 902, communication interface 904, and memory 906 communicate with one another via a communication bus 908.

A communication interface 904 for communicating with network elements of other devices, such as clients or other servers.

The processor 902 is configured to execute the program 910, and may specifically execute the relevant steps in the pushing method embodiment of the search engine recommendation reason.

In particular, the program 910 may include program code that includes computer operating instructions.

The processor 902 may be a central processing unit CPU, or an Application Specific Integrated Circuit ASIC (Application Specific Integrated Circuit), or one or more Integrated circuits configured to implement embodiments of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

A memory 906 for storing a program 910. The memory 906 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 910 may specifically be configured to cause the processor 902 to perform the following operations:

obtaining a recommendation result of a search engine;

taking the recommendation result as a given entity, and obtaining a standby entity description corresponding to the recommendation result by using the entity description extraction method based on the knowledge graph in the method embodiment;

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of the knowledge-graph based entity description extraction means and the push means of the search engine recommendation rationale in accordance with embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The invention discloses: A1. a knowledge graph-based entity description extraction method comprises the following steps:

A2. The method according to a1, wherein the extracting the entity description set of the entity from the knowledge graph database further comprises:

A3. The method of a1 or a2, wherein, after said extracting the set of entity descriptions of the entity from the knowledge-graph database, the method further comprises:

A4. The method according to a2, wherein after said extracting the entity description set of the entity from the knowledge-graph database, the method further comprises:

A5. The method of any one of a1-a4, wherein the step S2 further includes:

the probability transition matrix is set to M, where M_[i][j]The preparation method comprises the following steps: calculating the sum of the similarity of the ith non-repeated entity description and the jth non-repeated entity description and the frequency of the ith non-repeated entity description appearing in the entity description set to obtain a summation result; calculating the ratio of the summation result to the highest frequency of any non-repeated entity description in the entity description set to obtain M _[i][j]A value of (d);

step S32, determining an initial state vector S0;

step S34, determining whether the sum of the absolute values of the difference results is less than a predetermined target value, if yes, determining the state vector S_tDetermining a confidence for each non-duplicate entity description for each element in the set of elements; if not, the state vector S is processed_tAssigned to the state vector S_t-1Step S33 and step S34 are repeatedly performed.

A6. The method according to a5, wherein the screening out at least one entity description from the entity description set as a backup entity description for the entity according to the confidence of each entity description further comprises:

A7. The method according to A5, wherein the entity description set contains existing descriptions of the entity;

The screening out at least one entity description from the entity description set as a standby entity description of the entity according to the confidence of each entity description further comprises:

A8. The method of any one of a1-a7, wherein the set of entity descriptions contains existing descriptions of the entity;

the method further comprises the following steps: and calculating the average frequency value of the frequencies of the plurality of non-repeated entity descriptions appearing in the entity description set, and determining the frequency of the existing description according to the average frequency value.

The invention also discloses: B9. a push method for search engine recommendation reasons comprises the following steps:

obtaining a recommendation result of a search engine;

taking the recommendation result as a given entity, and obtaining a spare entity description corresponding to the recommendation result by using the method of any one of A1-A8;

The invention also discloses: C10. a knowledge-graph-based entity description extraction apparatus, comprising:

C11. The apparatus of C10, wherein the extraction module is further adapted to: an entity description set of the entity is extracted from a knowledge graph database using one or more extraction models.

C12. The apparatus of C10 or C11, wherein the apparatus further comprises: the filtering module is suitable for judging whether the length of the entity description is within a preset length interval or not aiming at each non-repeated entity description of the entity description set, and if not, filtering the entity description from the entity description set; and/or the presence of a gas in the gas,

C13. The apparatus of C11, wherein the apparatus further comprises: a filtering module adapted to extract a conflict entity description satisfying model characteristics of the conflict model from a knowledge graph database using the conflict model corresponding to the one or more extraction models;

C14. The apparatus of any one of C10-C13, wherein the confidence computation module is further adapted to:

the probability transition matrix is set to M, where M_[i][j]The preparation method comprises the following steps: calculating the sum of the similarity of the ith non-repeated entity description and the jth non-repeated entity description and the occurrence frequency of the ith non-repeated entity description in the entity description set to obtain a summation result Fruit; calculating the ratio of the summation result to the highest frequency of any non-repeated entity description in the entity description set to obtain M_[i][j]A value of (d);

determining an initial state vector S₀；

judging whether the sum of the absolute values of the difference results is less than a preset target value, if so, determining the state vector S_tDetermining a confidence for each non-duplicate entity description for each element in the set of elements; if not, the state vector S_tAssigned to the state vector S_t-1Step S33 and step S34 are repeatedly performed.

C15. The apparatus of C14, wherein the screening module is further adapted to:

C16. The apparatus according to C14, wherein the entity description set includes existing descriptions of the entity;

C17. The apparatus according to any one of C10-C16, wherein the set of entity descriptions contains existing descriptions of the entity;

The invention also discloses: D18. a push device for recommending reasons by a search engine comprises the following components:

an entity description extraction module, adapted to use the recommendation result as a given entity, and obtain a standby entity description corresponding to the recommendation result by using the apparatus of any one of B10-B17;

The invention also discloses: E19. a computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

The memory is configured to store at least one executable instruction that causes the processor to perform operations corresponding to the knowledgegraph-based entity description extraction method of any of A1-A8.

The invention also discloses: F20. a computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the push method for the reason recommended by the search engine as described in B9.

The invention also discloses: G21. a computer storage medium having stored therein at least one executable instruction that causes a processor to perform operations corresponding to the method for knowledge-graph based entity description extraction as described in any one of a 1-A8.

The invention also discloses: H22. a computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the push method for search engine recommendation reasons as set forth in B9.

Claims

1. A knowledge graph-based entity description extraction method comprises the following steps:

2. The method of claim 1, wherein said extracting the entity description set of the entity from the knowledge-graph database further comprises:

3. The method of claim 1 or 2, wherein after said extracting the set of entity descriptions of the entity from the knowledge-graph database, the method further comprises:

4. A push method for search engine recommendation reasons comprises the following steps:

obtaining a recommendation result of a search engine;

taking the recommendation result as a given entity, and obtaining a spare entity description corresponding to the recommendation result by using the method of any one of claims 1 to 3;

5. A knowledge-graph-based entity description extraction apparatus, comprising:

6. A push device for recommending reasons by a search engine comprises the following components:

an entity description extraction module, adapted to use the recommendation result as a given entity, and obtain a standby entity description corresponding to the recommendation result by using the apparatus of claim 5;

7. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the knowledge-graph-based entity description extraction method according to any one of claims 1-3.

8. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the pushing method of the search engine recommendation reason according to claim 4.

9. A computer storage medium having stored therein at least one executable instruction that causes a processor to perform operations corresponding to the method of knowledge-graph based entity description extraction of any one of claims 1-3.

10. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the push method for search engine recommendation reasons according to claim 4.