CN111125379B

CN111125379B - Knowledge base expansion method and device, electronic equipment and storage medium

Info

Publication number: CN111125379B
Application number: CN201911368840.2A
Authority: CN
Inventors: 夏有君; 李莉; 戴瑾
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2019-12-26
Filing date: 2019-12-26
Publication date: 2022-12-06
Anticipated expiration: 2039-12-26
Also published as: CN111125379A

Abstract

The embodiment of the invention provides a knowledge base expansion method, a knowledge base expansion device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a seed sentence pattern corresponding to any knowledge point in a knowledge base and a plurality of accumulated sentence patterns in the corresponding field of the knowledge base; and expanding any knowledge point based on the semantic information of the seed sentence pattern and each accumulated sentence pattern and/or the path of the seed sentence pattern and each accumulated sentence pattern in the service knowledge graph of the corresponding field. The method, the device, the electronic equipment and the storage medium provided by the embodiment of the invention automatically expand the knowledge base based on the semantic information of the seed sentence pattern and each accumulated sentence pattern and/or the path in the service knowledge map of the corresponding field, effectively save the labor cost and the time cost, can unbind the sentence patterns with different intentions, avoid the ambiguity between standard questions, and improve the expansion quality and the expansion effect.

Description

Knowledge base expansion method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of human-computer interaction, in particular to a knowledge base expansion method, a knowledge base expansion device, electronic equipment and a storage medium.

Background

With the rapid development of artificial intelligence technology and the wide application of man-machine interaction systems, intelligent customer service is in force. The intelligent customer service can answer the user questions based on the knowledge base and provide all-weather service for the user.

Currently, the construction of the knowledge base is usually completed manually, and workers are required to sort and summarize knowledge points and corresponding standard questions in related fields, and expand each standard question on the basis of the sorting and summarization. The method consumes a great deal of manpower and time and is excessively dependent on the professional ability of workers, so that the quality of the obtained knowledge base is uneven.

Disclosure of Invention

The embodiment of the invention provides a knowledge base expansion method, a knowledge base expansion device, electronic equipment and a storage medium, which are used for solving the problem that a large amount of labor and time are consumed in the conventional knowledge base expansion method.

In a first aspect, an embodiment of the present invention provides a method for expanding a knowledge base, including:

determining a seed sentence pattern corresponding to any knowledge point in a knowledge base and a plurality of accumulated sentence patterns in the corresponding field of the knowledge base;

and expanding any knowledge point based on the semantic information of the seed sentence pattern and each accumulated sentence pattern and/or the path of the seed sentence pattern and each accumulated sentence pattern in the service knowledge graph of the corresponding field.

Preferably, the expanding any knowledge point based on the semantic information of the seed sentence pattern and each accumulated sentence pattern and the path of the seed sentence pattern and each accumulated sentence pattern in the service knowledge graph of the corresponding field specifically includes:

selecting a plurality of candidate sentence patterns from the plurality of accumulated sentence patterns based on the seed sentence patterns and the semantic information of each accumulated sentence pattern;

and expanding any knowledge point based on the paths of the seed sentence patterns and each candidate sentence pattern in the service knowledge graph of the corresponding field and the service class information of the seed sentence patterns and each candidate sentence pattern.

Preferably, the selecting a plurality of candidate sentence patterns from the plurality of accumulated sentence patterns based on the semantic information of the seed sentence pattern and each accumulated sentence pattern specifically includes:

determining the semantic feature vector similarity of the seed sentence pattern and any accumulated sentence pattern based on the semantic feature vectors in the semantic information of the seed sentence pattern and any accumulated sentence pattern;

and/or determining the semantic key information similarity of the seed sentence pattern and any accumulated sentence pattern based on the semantic key information in the semantic information of the seed sentence pattern and the any accumulated sentence pattern;

and selecting a plurality of candidate sentence patterns from the plurality of accumulated sentence patterns based on the semantic feature vector similarity and/or the semantic key information similarity of the seed sentence pattern and each accumulated sentence pattern.

Preferably, the determining semantic key information similarity between the seed sentence pattern and any one of the accumulated sentence patterns based on semantic key information in the semantic information of the seed sentence pattern and the any one of the accumulated sentence patterns specifically includes:

determining the similarity of the operation type information of the seed sentence pattern and any accumulated sentence pattern based on the operation type information in the semantic key information of the seed sentence pattern and the any accumulated sentence pattern;

determining the similarity of the business class information of the seed sentence pattern and any accumulated sentence pattern based on the business class information in the semantic key information of the seed sentence pattern and any accumulated sentence pattern;

and determining semantic key information similarity of the seed sentence pattern and any accumulated sentence pattern based on operation type information similarity and service type information similarity of the seed sentence pattern and any accumulated sentence pattern.

Preferably, the selecting a plurality of candidate sentence patterns from the plurality of accumulated sentence patterns based on the semantic feature vector similarity and the semantic key information similarity of the seed sentence pattern and each accumulated sentence pattern specifically includes:

and if the similarity of the semantic feature vector of the seed sentence pattern and any accumulated sentence pattern is within a preset vector similarity interval and the similarity of the semantic key information of the seed sentence pattern and any accumulated sentence pattern is greater than or equal to a preset information similarity threshold, taking any accumulated sentence pattern as the candidate sentence pattern.

Preferably, the expanding any knowledge point based on the path of the seed sentence pattern and each candidate sentence pattern in the service knowledge graph of the corresponding field and the service class information of the seed sentence pattern and each candidate sentence pattern specifically includes:

determining the path similarity between the seed sentence pattern and the path of any candidate sentence pattern in the service knowledge graph of the corresponding field;

if the path similarity is greater than a preset path similarity threshold value and the operation type information in the seed sentence pattern is the same as that in any candidate sentence pattern, replacing the service type information in any candidate sentence pattern with the service type information in the seed sentence pattern;

and adding the replaced any candidate sentence pattern to the any knowledge point.

Preferably, the determining the path similarity between the seed sentence pattern and the path of any candidate sentence pattern in the service knowledge graph of the corresponding domain specifically includes:

determining a path of the seed sentence pattern in the business knowledge graph of the corresponding field based on the business knowledge graph of the corresponding field and the semantic key information of the seed sentence pattern;

determining a path of any candidate sentence pattern in the business knowledge graph of the corresponding field based on the business knowledge graph of the corresponding field and the semantic key information of any candidate sentence pattern;

and determining the path similarity based on the paths of the seed sentence patterns and any candidate sentence pattern in the service knowledge graph of the corresponding field.

In a second aspect, an embodiment of the present invention provides a knowledge base expansion apparatus, including:

a sentence pattern determining unit for determining seed sentence patterns corresponding to any knowledge point in the knowledge base and a plurality of accumulated sentence patterns in the corresponding field of the knowledge base;

and the sentence pattern expansion unit is used for expanding any knowledge point based on the semantic information of the seed sentence pattern and each accumulated sentence pattern and/or the path of the seed sentence pattern and each accumulated sentence pattern in the service knowledge map of the corresponding field.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a processor, a communication interface, a memory, and a bus, where the processor and the communication interface complete mutual communication through the bus, and the processor may invoke a logic command in the memory to execute the steps of the method as provided in the first aspect.

In a fourth aspect, embodiments of the present invention provide a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method as provided in the first aspect.

The knowledge base expansion method, the knowledge base expansion device, the electronic equipment and the storage medium provided by the embodiment of the invention automatically expand the knowledge base based on the semantic information of the seed sentence patterns and each accumulated sentence pattern and/or the path in the business knowledge graph of the corresponding field, thereby effectively saving the labor cost and the time cost, being capable of unbinding the sentence patterns with different intentions, avoiding the ambiguity between standard questions and improving the expansion quality and the expansion effect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flow chart illustrating a method for expanding a knowledge base according to an embodiment of the present invention;

FIG. 2 is a flow chart of a knowledge point expansion method according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for determining candidate sentence patterns based on semantic information according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a method for expanding knowledge points based on a service knowledge graph path according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a business knowledge graph path provided by an embodiment of the invention;

FIG. 6 is a flow chart illustrating a method for expanding a knowledge base according to another embodiment of the present invention;

FIG. 7 is a schematic structural diagram of a knowledge base expansion apparatus according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

With the development of natural language understanding and large-scale knowledge processing technology, intelligent customer service technology oriented to various industries is developed. The services corresponding to different manufacturers in different industries are different, so knowledge bases required by intelligent customer service applied to different manufacturers in different industries also need to be constructed respectively. And the construction and expansion of the knowledge base are very time-consuming and labor-consuming.

At present, in each field of intelligent customer service, a large amount of corpus resources are accumulated, and the expansion of a knowledge base can be realized in a resource reuse mode. The conventional resource multiplexing is usually realized by a multiplexing method based on a knowledge base template or a mapping multiplexing method based on knowledge points. The multiplexing method based on the knowledge base template has the advantages that the template maintenance cost is high, data with different intentions have high winding types, for example, templates such as 'introduction XXX' and 'how to handle XXX' are extremely large in sentence pattern corresponding to the templates, but ambiguity between standard questions is easy to occur in the actual use process due to different keywords XXX. The mapping multiplexing method based on the knowledge points has the advantages that most knowledge points are different due to the difference of service ranges of different manufacturers, the expansion is carried out only by the mapping relation of the knowledge points, the reusable knowledge points are few, and the expansion effect is not ideal.

To this end, an embodiment of the present invention provides a method for expanding a knowledge base, and fig. 1 is a schematic flow chart of the method for expanding a knowledge base, as shown in fig. 1, the method includes:

step 110, determining the corresponding seed sentence pattern in any knowledge point in the knowledge base and a plurality of accumulated sentence patterns in the corresponding field of the knowledge base.

Here, the knowledge base is a knowledge base to be expanded, the knowledge base includes a plurality of knowledge points, and each knowledge point corresponds to one standard question and a plurality of expansion questions. When any knowledge point in the knowledge base is expanded, any sentence pattern in the standard question and the expanded question corresponding to the knowledge point can be used as a seed sentence pattern.

In the domain corresponding to the knowledge base to be expanded, a large amount of linguistic data resources are accumulated, wherein the accumulated sentence pattern is a question sentence pattern accumulated in advance in the corresponding domain. The accumulated sentence may be obtained in the process of human-to-human conversation or human-to-machine conversation in the corresponding field, the accumulated sentence may be extracted from the conversation text, or may be extracted after transcription of the voice conversation is performed through voice recognition, which is not specifically limited in the embodiment of the present invention.

And step 120, expanding the knowledge points based on the semantic information of the seed sentence pattern and each accumulated sentence pattern and/or the path of the seed sentence pattern and each accumulated sentence pattern in the service knowledge graph of the corresponding field.

Specifically, the semantic information of the sub-sentence pattern or any accumulated sentence pattern is used to reflect the semantics of the sentence pattern, and may be represented as a word vector of each participle in the text of the sub-sentence pattern or any accumulated sentence pattern, or a context vector of each participle in the text, or a key information entity in the text, or a word vector of a key information entity in the text, and the like.

Based on the semantic information of the seed sentence pattern and any accumulated sentence pattern, the semantic similarity between the seed sentence pattern and the accumulated sentence pattern can be measured, and the higher the semantic similarity is, the higher the probability that the accumulated sentence pattern and the seed sentence pattern correspond to the same knowledge point is, and the better the effect of using the accumulated sentence pattern for expanding the knowledge points corresponding to the seed sentence pattern is.

The business knowledge graph of the corresponding field is established in advance based on a large amount of corpus resources accumulated in the field corresponding to the knowledge base, and the business knowledge graph comprises key information entities related to the corresponding field and relations among the key information entities, and is embodied as nodes in the business knowledge graph and edges among the nodes. The path of the seed sentence pattern or any accumulated sentence pattern in the service knowledge graph of the corresponding field is used for reflecting the key information entity and the relation of the sentence pattern in the service knowledge graph.

Based on the path of the seed sentence pattern and any accumulated sentence pattern in the service knowledge graph of the corresponding field, the similarity of the seed sentence pattern and the path of the accumulated sentence pattern can be measured, the higher the similarity of the path is, the higher the similarity between the key information entities and the relationship of the seed sentence pattern and the accumulated sentence pattern is, the higher the probability that the accumulated sentence pattern and the seed sentence pattern correspond to the same knowledge point is, and the better the effect of the accumulated sentence pattern for expanding the knowledge points corresponding to the seed sentence pattern is.

By applying the semantic information of the seed sentence pattern and each accumulated sentence pattern and/or the path of the seed sentence pattern and each accumulated sentence pattern in the business knowledge map of the corresponding field, the accumulated sentence pattern which can be used for realizing the expansion of the corresponding knowledge point can be screened out from all the accumulated sentence patterns, and then the expansion of the knowledge point is realized. The above operation is executed for each knowledge point needing to be expanded in the knowledge base, so that the accumulated sentence patterns in the corresponding field of the knowledge base can be reused, and the automatic expansion of the knowledge base is realized.

The method provided by the embodiment of the invention automatically expands the knowledge base based on the seed sentence pattern, the semantic information of each accumulated sentence pattern and/or the path in the service knowledge map of the corresponding field, effectively saves the labor cost and the time cost, can unbind the sentence patterns with different intentions, avoids the ambiguity between standard questions, and improves the expansion quality and the expansion effect.

Based on any of the above embodiments, fig. 2 is a schematic flow chart of the knowledge point expansion method provided by the embodiment of the present invention, and as shown in fig. 2, step 120 specifically includes:

in step 121, a plurality of candidate sentence patterns are selected from the plurality of accumulated sentence patterns based on the seed sentence pattern and the semantic information of each accumulated sentence pattern.

Specifically, for any accumulated sentence pattern, whether the accumulated sentence pattern is applied to the expansion of the subsequent knowledge points as a candidate sentence pattern can be judged by comparing the semantic information of the seed sentence pattern and the semantic information of the accumulated sentence pattern. Here, the condition for determining whether to use the accumulated sentence pattern as the candidate sentence pattern may be whether the semantic similarity between the accumulated sentence pattern and the seed sentence pattern is greater than a preset minimum semantic similarity, or whether the semantic similarity between the accumulated sentence pattern and the seed sentence pattern is a preset number of maximum semantic similarities between all the accumulated sentence patterns and the seed sentence patterns, and the like, which is not specifically limited in the embodiment of the present invention.

And step 122, expanding the knowledge points based on the paths of the seed sentence patterns and each candidate sentence pattern in the service knowledge graph of the corresponding field, and the service class information of the seed sentence patterns and each candidate sentence pattern.

Specifically, after the accumulated sentence patterns are screened through the semantic information to obtain a plurality of candidate sentence patterns, the knowledge point expansion can be performed based on the paths of the seed sentence patterns and each candidate sentence pattern in the service knowledge graph of the corresponding field, and the service class information of the seed sentence patterns and each candidate sentence pattern. For example, if the similarity of the paths of the sub-sentence pattern and any candidate sentence pattern in the service knowledge graph of the corresponding field is greater than the preset minimum path similarity, and the service class information of the sub-sentence pattern and the candidate sentence pattern is the same, the candidate sentence pattern may be directly used as the expansion question of the knowledge point, if the similarity of the paths of the sub-sentence pattern and any candidate sentence pattern in the service knowledge graph of the corresponding field is greater than the preset minimum path similarity, and the service class information of the sub-sentence pattern and the candidate sentence pattern is different, the service class information of the candidate sentence pattern may be replaced by the service class information of the sub-sentence pattern, and the replaced candidate sentence pattern may be used as the expansion question of the knowledge point.

The method provided by the embodiment of the invention can further expand the knowledge base through the path in the service knowledge map of the corresponding field and the service class information on the basis of screening the accumulated sentence pattern based on the semantic information, thereby effectively improving the expansion quality of the knowledge base.

Based on any of the foregoing embodiments, fig. 3 is a schematic flowchart of a method for determining candidate sentence patterns based on semantic information according to an embodiment of the present invention, as shown in fig. 3, step 121 specifically includes:

step 1211, determining a semantic feature vector similarity between the seed sentence pattern and any accumulated sentence pattern based on the semantic feature vectors in the semantic information of the seed sentence pattern and the accumulated sentence pattern.

And/or, in step 1212, determining semantic key information similarity between the seed sentence pattern and the accumulated sentence pattern based on the semantic key information in the semantic information of the seed sentence pattern and the accumulated sentence pattern.

In particular, the semantic information comprises semantic feature vectors and/or semantic key information. The semantic feature vector may be a word vector of each participle in the sentence pattern text, or a context vector of each participle in the text, and the semantic feature vector may be obtained based on a model with a semantic metric function, such as LSTM and Bert.

The semantic feature vector similarity of the seed sentence pattern and any accumulated sentence pattern is the similarity of the semantic feature vectors of the seed sentence pattern and any accumulated sentence pattern, and the semantic feature vector similarity can be calculated by cosine similarity, euclidean distance and other methods.

Furthermore, the semantic key information is used to represent key information entities contained in the sentence pattern text, where the semantic key information may be operation type information or service type information, and the semantic key information may include entities extracted from the sentence pattern text and information categories described by the entities. The semantic key information can be obtained based on a model with an entity extraction function, such as BERT-ATT, BERT-CRF and the like.

The semantic key information similarity of the seed sentence pattern and any accumulated sentence pattern is the similarity of the semantic key information of the seed sentence pattern and any accumulated sentence pattern, and the semantic key information similarity can be expressed as the proportion of the same key information entities in the semantic key information of the seed sentence pattern and the accumulated sentence pattern.

Step 1213, selecting several candidate sentence patterns from the several accumulated sentence patterns based on the similarity of semantic feature vectors and/or the similarity of semantic key information between the seed sentence pattern and each accumulated sentence pattern.

Specifically, when only step 1211 is executed and step 1212 is not executed, the candidate sentence pattern may be selected based on the similarity between the seed sentence pattern and the semantic feature vector of each accumulated sentence pattern; when only step 1212 is executed without executing step 1211, the candidate sentence pattern may be selected based on the similarity between the seed sentence pattern and the semantic key information of each accumulated sentence pattern; in both

steps

1211 and 1212, candidate sentence patterns may be selected based on the similarity of the semantic feature vectors and the similarity of the semantic key information of the seed sentence pattern and each of the accumulated sentence patterns.

The method provided by the embodiment of the invention selects the candidate sentence patterns based on the similarity of the semantic feature vectors and/or the similarity of the semantic key information, and improves the multiplexing quality of the sentence patterns by semantic screening at different angles, thereby ensuring the expansion quality of a subsequent knowledge base.

Based on any of the above embodiments, in step 1211, the calculation of the semantic feature vector similarity may be implemented by:

suppose that the seed sentence pattern is q ₁ Any cumulative sentence pattern is q ₂ Seed sentence pattern q ₁ And cumulative sentence pattern q ₂ Respectively is a semantic feature vector of

Wherein

Representing a vector

Is a first dimension of (a), n is a vector

Of (c) is calculated.

Setting semantic feature vector similarity as a vector

Dot product and vector of

The quotient of the two modulo results from the following equation:

in the formula (Sq) ₁ ，q ₂ ) For the similarity of the semantic feature vectors,

is composed of

The dot product of (a) is,

is composed of

The die of (a) is used,

is composed of

Die (2).

Based on any of the above embodiments, step 1212 specifically includes: determining the similarity of the operation type information of the seed sentence pattern and the accumulated sentence pattern based on the operation type information in the semantic key information of the seed sentence pattern and any accumulated sentence pattern; determining the similarity of the business class information of the seed sentence pattern and the accumulated sentence pattern based on the business class information in the semantic key information of the seed sentence pattern and the accumulated sentence pattern; and determining semantic key information similarity of the seed sentence pattern and the accumulated sentence pattern based on the operation class information similarity and the service class information similarity of the seed sentence pattern and the accumulated sentence pattern.

Specifically, the semantic key information includes operation class information and service class information, where the operation class information is an entity of an operation type included in the sentence text, the service class information is an entity of a service type included in the sentence text, for example, in the semantic key information corresponding to "help me see my credit card bill", the operation class information includes "query", and the service class information includes "credit card" and "bill".

For different types of entities in two sentence patterns, respective similarity needs to be calculated: calculating the similarity of the operation type information of the seed sentence pattern and the operation type information of the accumulated sentence pattern to obtain the similarity of the operation type information; and calculating the similarity of the business class information of the seed sentence pattern and the business class information of the accumulated sentence pattern to obtain the similarity of the business class information. And then, integrating the operation class information similarity and the service class information similarity to determine the semantic key information similarity. For example, the lower similarity between the operation information similarity and the service information similarity may be used as the semantic key information similarity, or the semantic key information similarity may be obtained by performing weighted summation on the operation information similarity and the service information similarity according to a preset weight.

The method provided by the embodiment of the invention obtains more accurate semantic key information similarity by respectively calculating the operation type information similarity and the service type information similarity so as to improve the accuracy of accumulated sentence pattern screening.

Based on any of the above embodiments, in step 1212, the calculation of the semantic key information similarity may be implemented by the following method:

suppose that the seed sentence pattern is q ₁ Any cumulative sentence pattern is q ₂ Seed sentence pattern q ₁ And cumulative sentence pattern q ₂ Respectively K of semantic key information ₁ ＝{K ₁₁ ,K ₁₂ And K ₂ ＝{K ₂₁ ,K ₂₂ In which K is ₁₁ And K ₁₂ Are respectively seed sentence pattern q ₁ Operation class information and service class information of, K ₂₁ And K ₂₂ Respectively, is an accumulated sentence pattern q ₂ Operation class information and service class information.

Based on seed sentence pattern q ₁ And accumulated sentence pattern q ₂ Operation class information K of ₁₁ And K ₂₁ And obtaining the similarity M of the operation information ₁ As shown in the following formula:

wherein, cover (K) ₁₁ ,K ₂₁ ) Representing a seed sentence pattern q ₁ And accumulated sentence pattern q ₂ The number of entities, count (K), overlapped in the operation type information of (2) ₁₁ ，K ₂₁ ) Representing a seed sentence pattern q ₁ And accumulated sentence pattern q ₂ The total number of entities of the operation class information.

Based on seed sentence pattern q ₁ And accumulated sentence pattern q ₂ Service class information K of ₁₂ And K ₂₂ Obtaining the similarity M of the service class information ₂ As shown in the following formula:

in the formula, over (K) ₁₂ ,K ₂₂ ) Representing a seed sentence pattern q ₁ And accumulated sentence pattern q ₂ The number of entities, count (K), coinciding with each other in the service class information of (1) ₁₂ ，K ₂₂ ) Representing a seed sentence pattern q ₁ And cumulative sentence pattern q ₂ The total number of entities of the medium service class information.

Obtaining operation class information similarity M ₁ Similarity M with service class information ₂ Then, similarity M can be carried out on operation class information ₁ Similarity M with service class information ₂ Carrying out weighted summation to obtain semantic key information similarity M (q) ₁ ,q ₂ ) As shown in the following formula:

M(q ₁ ,q ₂ )＝w*M ₁ +(1-w)*M ₂

wherein w is M ₁ Corresponding weight, (1-w) is M ₂ The corresponding weight.

Based on any of the above embodiments, step 1213 specifically includes: if the semantic feature vector similarity of the seed sentence pattern and any accumulated sentence pattern is within the preset vector similarity interval and the semantic key information similarity of the seed sentence pattern and the accumulated sentence pattern is greater than or equal to the preset information similarity threshold, the accumulated sentence pattern is used as a candidate sentence pattern.

Specifically, the preset vector similarity interval is a threshold interval set for the semantic feature vector similarity, if the semantic feature vector similarity is greater than the upper limit of the preset vector similarity interval, the seed sentence pattern is too similar to the accumulated sentence pattern, the accumulated sentence pattern is applied to the expansion of the knowledge point, the improvement effect on the quality of the knowledge base is not good, if the semantic feature vector similarity is less than the lower limit of the preset vector similarity interval, the semantics of the seed sentence pattern and the accumulated sentence pattern are basically irrelevant, the accumulated sentence pattern obviously does not correspond to the service range of the knowledge point, and only when the semantic feature vector similarity is within the preset vector similarity interval, that is, the semantic feature vector similarity is less than or equal to the upper limit of the preset vector similarity interval and is greater than or equal to the lower limit of the preset vector similarity interval, the accumulated sentence pattern can not only correspond to the service range of the knowledge point, but also can bring a better improvement effect on the quality of the knowledge base.

The preset information similarity threshold is a threshold set aiming at the semantic key information similarity, if the semantic key information similarity is larger than or equal to the preset information similarity threshold, the accumulated sentence pattern is similar to the semantics of the seed sentence pattern, otherwise, the accumulated sentence pattern is determined to be basically irrelevant to the semantics of the seed sentence pattern.

And combining the semantic feature vector similarity with the semantic key information similarity, and determining the accumulated sentence pattern as a candidate sentence pattern when the semantic feature vector similarity is within a preset vector similarity interval and the semantic key information similarity is greater than or equal to a preset information similarity threshold.

It should be noted that the preset vector similarity interval and the preset information similarity threshold in the embodiment of the present invention may be adjusted according to the number of candidate sentences that need to be acquired, for example, after completing one expansion of the knowledge base, there still exists a part of knowledge points that are not correspondingly expanded, the preset vector similarity interval corresponding to the part of knowledge points may be expanded, and the preset information similarity threshold corresponding to the part of knowledge points may be reduced, so as to increase the number of candidate sentences corresponding to the part of knowledge points.

The method provided by the embodiment of the invention can effectively ensure the correlation between the candidate sentence patterns and the seed sentence patterns which are subsequently used for expanding the knowledge base and the diversity of the expression modes of the candidate sentence patterns by screening the accumulative sentence patterns through the similarity of the semantic feature vectors and the similarity of the semantic key information, thereby improving the expansion quality of the knowledge base.

Generally, the sentence accumulation is performed in units of fields. Multiple vendors are generally included in a single domain, and there are differences between specific services of different vendors, which results in some sentences having the same intent in nature, but there are differences in terms of specific service nouns. If such patterns are applied directly to the expansion of the knowledge base, they may result in invalid patterns in the knowledge base. Based on any of the above embodiments, fig. 4 is a schematic flow chart of a service knowledge graph path-based knowledge point expansion method provided by the embodiment of the present invention, as shown in fig. 4, step 122 specifically includes:

step 1221, determining path similarity between the seed sentence pattern and the path of any candidate sentence pattern in the business knowledge graph of the corresponding domain.

Specifically, the path similarity is used to measure the similarity between the path of the sentence in the service knowledge graph in the corresponding domain and the path of any candidate sentence in the service knowledge graph in the same domain, and the higher the path similarity is, the higher the similarity between the key information entities and the relationship between the sentence in the seed and the accumulated sentence is, and the higher the probability that the accumulated sentence and the sentence in the seed correspond to the same knowledge point is.

Step 1222, if the path similarity is greater than the preset path similarity threshold and the seed sentence pattern is the same as the operation type information in the candidate sentence pattern, replacing the service type information in the candidate sentence pattern with the service type information in the seed sentence pattern.

Here, the preset path similarity threshold is a threshold preset for the path similarity, and if the path similarity is greater than the preset path similarity threshold, it is determined that the similarity between the two paths is high. Under the condition, if the operation type information in the seed sentence pattern is the same as the operation type information in the candidate sentence pattern and the seed sentence pattern is different from the service type information in the candidate sentence pattern, the service type information in the candidate sentence pattern can be directly replaced by the service type information in the seed sentence pattern, so that the problem of invalid multiplexing caused by directly adding the candidate sentence pattern of which the service type information does not belong to the service range corresponding to the current knowledge base to the knowledge base is avoided.

In step 1223, the candidate sentence pattern after replacement is added to the knowledge point.

Specifically, after the service information is replaced, the candidate sentence pattern is added to the knowledge point corresponding to the seed sentence pattern, and the multiplexing of the candidate sentence pattern can be completed, so as to expand the knowledge base.

The method provided by the embodiment of the invention ensures that the service class information of the candidate sentence patterns added to the knowledge base belongs to the service range corresponding to the current knowledge base through the service class information replacement, thereby avoiding the problem of invalid multiplexing and increasing the richness of the knowledge base.

Based on any of the above embodiments, step 1221 specifically includes: determining the path of the seed sentence pattern in the business knowledge graph of the corresponding field based on the business knowledge graph of the corresponding field and the semantic key information of the seed sentence pattern; determining the path of the candidate sentence pattern in the business knowledge graph of the corresponding field based on the business knowledge graph of the corresponding field and the semantic key information of any candidate sentence pattern; and determining the path similarity based on the seed sentence pattern and the path of the candidate sentence pattern in the service knowledge graph of the corresponding field.

Specifically, the semantic key information of the seed sentence pattern and the semantic key information of the candidate sentence pattern can be obtained by performing entity recognition on the sentence pattern text. On the basis of the known service knowledge graph of the corresponding field, each key information entity in the semantic key information can be correspondingly matched with each node in the service knowledge graph, and then the path of the sentence pattern in the service knowledge graph is obtained.

For example, a seed sentence pattern q ₁ To "I want to look up my credit card's consumption record", candidate sentence pattern q ₂ To help me see the bill of me's bank card, q ₁ Semantic key information K ₁ = query, credit card, bill, q ₂ Semantic key information K ₂ The statement can be obtained by combining the service knowledge map of the corresponding field with the condition that { inquiry, bank card and bill }, and the statement can be obtainedFormula q ₁ And candidate sentence pattern q ₂ Respectively, paths in the service knowledge graph. FIG. 5 is a schematic diagram of a service knowledge graph path according to an embodiment of the present invention, and in FIG. 5, a dotted line corresponds to q ₁ The dotted line corresponds to q ₂ The path of (2). Referring to FIG. 5, q is ₁ And q is ₂ Only the specific traffic (q) is distinguished by the path of (c) ₁ Corresponding credit card, q ₂ Corresponding to a bank card), q ₁ And q is ₂ The true intent of (a) is in principle to query the bill.

At q is ₁ And q is ₂ Is greater than a preset path similarity threshold, and q ₁ And q is ₂ Q may be equal to each other ₂ In the sentence pattern, the 'bank card' is replaced by 'credit card', and the replaced candidate sentence pattern q ₂ I.e. to help me see the bill of my credit card. Therefore, sentence output after corresponding words are replaced in the selected sentences is achieved, and the phenomenon that the judgment is inaccurate due to manual later confirmation is greatly reduced. The finally selected sentences replace the keywords A with the keywords B determined by input; for example, replace "help me see the bill of me bank card" with "help me see the bill of me credit card". Therefore, the problem of invalid multiplexing caused by directly adding candidate sentences of which the service type information does not belong to the service range corresponding to the current knowledge base to the knowledge base is avoided.

Based on any of the above embodiments, fig. 6 is a schematic flowchart of a knowledge base expansion method according to another embodiment of the present invention, as shown in fig. 6, the method includes:

step 610, determining a seed sentence pattern and an accumulated sentence pattern:

aiming at any knowledge base, determining knowledge points needing to be expanded in the knowledge base, taking any sentence pattern in a standard question and an extension question of the knowledge points as a seed sentence pattern, and taking a question sentence pattern accumulated in advance in the field corresponding to the knowledge base as an accumulated sentence pattern.

Step 620, calculation and measurement of semantic feature vectors:

based on semantic metric models such as LSTM and Bert, semantic feature vectors of the seed sentence pattern and any accumulated sentence pattern are determined, and the similarity between the seed sentence pattern and any accumulated sentence pattern is calculated to serve as the similarity of the semantic feature vectors.

If the similarity of the semantic feature vector of the sub sentence pattern and any accumulated sentence pattern is within the preset vector similarity interval, the accumulated sentence pattern is reserved, otherwise, the accumulated sentence pattern is deleted.

Step 630, calculation and measurement of semantic key information:

and determining semantic key information of the seed sentence pattern and any accumulated sentence pattern based on entity extraction models such as BERT-ATT, BERT-CRF and the like, and calculating the similarity between the seed sentence pattern and any accumulated sentence pattern to be used as the semantic key information similarity.

If the semantic key information similarity between the sub sentence pattern and any accumulated sentence pattern is greater than or equal to the preset information similarity threshold, the accumulated sentence pattern is used as a candidate sentence pattern, otherwise, the accumulated sentence pattern is deleted.

Through two rounds of screening in the

steps

620 and 630, semantic relevance and expression mode diversity of the candidate sentence patterns are effectively guaranteed.

Step 640, replacing the service class information based on the path:

and respectively establishing paths of the seed sentence pattern and the candidate sentence pattern in the business knowledge graph of the corresponding field based on the business knowledge graph of the corresponding field and the semantic key information of the seed sentence pattern and any candidate sentence pattern.

And then, calculating the path similarity of the two paths, and if the path similarity is greater than a preset path similarity threshold value and the operation type information in the seed sentence pattern is the same as that in the candidate sentence pattern, replacing the service type information in the candidate sentence pattern with the service type information in the seed sentence pattern, thereby avoiding the problem of invalid multiplexing caused by directly adding the candidate sentence pattern of which the service type information does not belong to the service range corresponding to the current knowledge base to the knowledge base.

Step 650, knowledge point expansion:

and adding the candidate sentence pattern which completes the service class information replacement to the knowledge point, thereby completing the multiplexing of the candidate sentence pattern and realizing the expansion of the knowledge base.

Based on any of the above embodiments, fig. 7 is a schematic structural diagram of a knowledge base expansion apparatus according to an embodiment of the present invention, as shown in fig. 7, the apparatus includes a sentence pattern determination unit 710 and a sentence pattern expansion unit 720;

the sentence pattern determining unit 710 is configured to determine a seed sentence pattern corresponding to any knowledge point in a knowledge base, and a plurality of accumulated sentence patterns in a field corresponding to the knowledge base;

the sentence pattern expansion unit 720 is configured to expand the any knowledge point based on the semantic information of the seed sentence pattern and each accumulated sentence pattern, and/or the path of the seed sentence pattern and each accumulated sentence pattern in the service knowledge graph of the corresponding field.

The device provided by the embodiment of the invention automatically expands the knowledge base based on the semantic information of the seed sentence pattern and each accumulated sentence pattern and/or the path in the business knowledge map of the corresponding field, thereby effectively saving the labor cost and the time cost, being capable of unbinding the sentence patterns with different intentions, avoiding the ambiguity between standard questions and improving the expansion quality and the expansion effect.

Based on any of the above embodiments, the sentence expansion unit 720 includes:

the semantic screening unit is used for selecting a plurality of candidate sentence patterns from the plurality of accumulated sentence patterns based on the seed sentence patterns and the semantic information of each accumulated sentence pattern;

and the path expansion unit is used for expanding any knowledge point based on the paths of the seed sentence patterns and each candidate sentence pattern in the service knowledge graph of the corresponding field and the service class information of the seed sentence patterns and each candidate sentence pattern.

Based on any embodiment above, the semantic filtering unit includes:

the vector similarity determining subunit is used for determining the similarity of the semantic feature vectors of the seed sentence pattern and any accumulated sentence pattern based on the semantic feature vectors in the semantic information of the seed sentence pattern and the any accumulated sentence pattern;

and/or, an information similarity determining subunit, configured to determine, based on semantic key information in the semantic information of the seed sentence pattern and the any accumulated sentence pattern, a semantic key information similarity between the seed sentence pattern and the any accumulated sentence pattern;

and the screening subunit is used for selecting a plurality of candidate sentence patterns from the plurality of accumulated sentence patterns based on the similarity of the semantic feature vectors and/or the similarity of the semantic key information of the seed sentence patterns and each accumulated sentence pattern.

Based on any of the above embodiments, the information similarity determining subunit includes:

the operation class similarity module is used for determining the operation class information similarity of the seed sentence pattern and any accumulated sentence pattern based on the operation class information in the semantic key information of the seed sentence pattern and the any accumulated sentence pattern;

the business class similarity module is used for determining the business class information similarity of the seed sentence pattern and any accumulated sentence pattern based on the business class information in the semantic key information of the seed sentence pattern and any accumulated sentence pattern;

and the semantic similarity module is used for determining the semantic key information similarity of the seed sentence pattern and any accumulated sentence pattern based on the operation class information similarity and the service class information similarity of the seed sentence pattern and any accumulated sentence pattern.

Based on any of the above embodiments, the screening subunit is specifically configured to:

Based on any one of the above embodiments, the path expansion unit includes:

a path similarity determining subunit, configured to determine a path similarity between the seed sentence pattern and a path of any candidate sentence pattern in the service knowledge graph in the corresponding field;

a replacing subunit, configured to replace, if the path similarity is greater than a preset path similarity threshold and the seed sentence pattern is the same as the operation type information in any candidate sentence pattern, the service type information in any candidate sentence pattern with the service type information in the seed sentence pattern;

and the expansion subunit is used for adding the any candidate sentence pattern after replacement to the any knowledge point.

Based on any of the embodiments described above, the path similarity determining subunit is specifically configured to:

determining a path of any candidate sentence pattern in the service knowledge graph of the corresponding field based on the service knowledge graph of the corresponding field and the semantic key information of any candidate sentence pattern;

Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 8, the electronic device may include: a processor (processor) 810, a communication Interface 820, a memory 830 and a communication bus 840, wherein the processor 810, the communication Interface 820 and the memory 830 communicate with each other via the communication bus 840. The processor 810 may call logical commands in the memory 830 to perform the following method: determining a seed sentence pattern corresponding to any knowledge point in a knowledge base and a plurality of accumulated sentence patterns in the corresponding field of the knowledge base; and expanding any knowledge point based on the semantic information of the seed sentence pattern and each accumulated sentence pattern and/or the path of the seed sentence pattern and each accumulated sentence pattern in the service knowledge graph of the corresponding field.

In addition, the logic commands in the memory 830 can be implemented in the form of software functional units and stored in a computer readable storage medium when the logic commands are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes a plurality of commands for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

Embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the method provided in the foregoing embodiments when executed by a processor, and the method includes: determining a seed sentence pattern corresponding to any knowledge point in a knowledge base and a plurality of accumulated sentence patterns in the corresponding field of the knowledge base; and expanding any knowledge point based on the semantic information of the seed sentence pattern and each accumulated sentence pattern and/or the path of the seed sentence pattern and each accumulated sentence pattern in the service knowledge graph of the corresponding field.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on the understanding, the above technical solutions substantially or otherwise contributing to the prior art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several commands for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of expanding a knowledge base, comprising:

and screening the accumulated sentence patterns for realizing the expansion of the corresponding knowledge points from all the accumulated sentence patterns based on the semantic information of the seed sentence pattern and each accumulated sentence pattern and the similarity between the seed sentence pattern and the path of each accumulated sentence pattern in the service knowledge graph of the corresponding field, and expanding any knowledge point, wherein the path reflects the key information entity and the relation of the sentence pattern in the service knowledge graph.

2. The method for expanding a knowledge base according to claim 1, wherein the expanding any knowledge point based on the semantic information of the sentence pattern and each of the accumulated sentence patterns and the path of the sentence pattern and each of the accumulated sentence patterns in the business knowledge graph of the corresponding domain specifically comprises:

3. The method for expanding a knowledge base according to claim 2, wherein the selecting a plurality of candidate sentence patterns from the plurality of accumulated sentence patterns based on the seed sentence pattern and the semantic information of each accumulated sentence pattern comprises:

determining the similarity of the semantic feature vectors of the seed sentence pattern and any accumulated sentence pattern based on the semantic feature vectors in the semantic information of the seed sentence pattern and the any accumulated sentence pattern;

and/or determining the similarity of the semantic key information of the seed sentence pattern and any accumulated sentence pattern based on the semantic key information in the semantic information of the seed sentence pattern and any accumulated sentence pattern, wherein the semantic key information is used for representing key information entities contained in the sentence pattern text;

4. The method for expanding a knowledge base according to claim 3, wherein the determining the semantic key information similarity between the seed sentence pattern and the any accumulated sentence pattern based on the semantic key information in the semantic information between the seed sentence pattern and the any accumulated sentence pattern specifically includes:

determining the similarity of the operation type information of the seed sentence pattern and any accumulated sentence pattern based on the operation type information in the semantic key information of the seed sentence pattern and the any accumulated sentence pattern, wherein the operation type information is an entity of an operation type contained in a sentence pattern text;

determining the similarity of the business class information of the seed sentence pattern and any accumulated sentence pattern based on the business class information in the semantic key information of the seed sentence pattern and the any accumulated sentence pattern;

and determining the semantic key information similarity of the seed sentence pattern and any accumulated sentence pattern based on the operation type information similarity and the service type information similarity of the seed sentence pattern and any accumulated sentence pattern.

5. The method for expanding a knowledge base according to claim 3, wherein the selecting a plurality of candidate sentence patterns from the plurality of accumulated sentence patterns based on the similarity of the semantic feature vector and the similarity of the semantic key information of the seed sentence pattern and each accumulated sentence pattern comprises:

6. The method of claim 2, wherein the expanding any knowledge point based on the path of the seed sentence pattern and each candidate sentence pattern in the service knowledge graph of the corresponding domain and the service class information of the seed sentence pattern and each candidate sentence pattern comprises:

determining the path similarity between the paths of the seed sentence pattern and any candidate sentence pattern in the service knowledge graph of the corresponding field;

if the path similarity is greater than a preset path similarity threshold value and the operation type information in the seed sentence pattern is the same as the operation type information in any candidate sentence pattern, replacing the service type information in any candidate sentence pattern with the service type information in the seed sentence pattern, wherein the operation type information is an entity of an operation type contained in a sentence pattern text;

and adding the any candidate sentence pattern after replacement to the any knowledge point.

7. The method for expanding a knowledge base according to claim 6, wherein the determining the path similarity between the seed sentence pattern and the path of any candidate sentence pattern in the business knowledge graph of the corresponding domain specifically comprises:

8. A knowledge base expansion apparatus, comprising:

and the sentence pattern expansion unit is used for screening the accumulated sentence patterns for realizing the expansion of the corresponding knowledge points from all the accumulated sentence patterns based on the semantic information of the seed sentence patterns and each accumulated sentence pattern and the similarity between the paths of the seed sentence patterns and each accumulated sentence pattern in the service knowledge graph of the corresponding field, and expanding any knowledge point, wherein the paths reflect the key information entities and the relations of the sentence patterns in the service knowledge graph.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the knowledge base expansion method according to any one of claims 1 to 7 are implemented by the processor when executing the program.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, performs the steps of the knowledge base expansion method according to any one of claims 1 to 7.