Disclosure of Invention
The embodiment of the invention provides a semantic identification method and device combined with knowledge graph entity information and related equipment, aiming at solving the problem of low semantic identification accuracy in the prior art.
In a first aspect, an embodiment of the present invention provides a semantic identification method in combination with knowledge-graph entity information, which includes:
acquiring a training corpus, constructing a first document matrix according to a preset matrix construction rule, identifying entities in the first document matrix by using an entity naming technology, and screening out training entities from the entities according to an entity screening rule;
calculating the weight between each training entity in the first document matrix, and constructing a graph neural network according to the weight;
calculating a graph hidden state vector of each training entity based on the graph neural network;
calculating an adjacent semantic vector of each word in the training corpus based on the first document matrix;
acquiring a word embedding vector and a position vector of each word in the training corpus, and combining the graph hidden state vector, the adjacent semantic vector, the word embedding vector and the position vector to obtain an input vector of the training corpus;
predicting the input vector through an encoder of a Transformer model based on the input vector to obtain the prediction probability of the semanteme of the input vector;
calculating the probability loss between the prediction probability and the true semantics of the input vector according to a preset loss function, and optimizing the model parameters of the encoder according to the probability loss to obtain a semantic recognition model;
acquiring a target input vector corresponding to a target identification corpus, and performing semantic identification through the semantic identification model to obtain a target probability of the target identification corpus;
and determining a semantic recognition result of the target recognition corpus according to the target probability.
In a second aspect, an embodiment of the present invention provides a semantic recognition apparatus combining knowledge-graph entity information, which includes:
the screening module is used for acquiring the training corpus, constructing a first document matrix according to a preset matrix construction rule, identifying entities in the first document matrix by using an entity naming technology, and screening out training entities from the entities according to an entity screening rule;
the building module is used for calculating the weight between each training entity in the first document matrix and building a graph neural network according to the weight;
the graph hidden state vector calculation module is used for calculating a graph hidden state vector of each training entity based on the graph neural network;
an adjacent semantic vector calculation module, configured to calculate an adjacent semantic vector for each word in the training corpus based on the first document matrix;
an input vector construction module, configured to obtain a word embedding vector and a position vector of each word in the corpus, and combine the hidden state vector, the adjacent semantic vector, the word embedding vector, and the position vector to obtain an input vector of the corpus;
the prediction module is used for predicting the input vector through an encoder of a Transformer model based on the input vector to obtain the prediction probability of the semanteme of the input vector;
the parameter optimization module is used for calculating the probability loss between the prediction probability and the true semantics of the input vector according to a preset loss function, and optimizing the model parameters of the encoder according to the probability loss to obtain a semantic identification model;
and the semantic recognition module is used for acquiring a target input vector corresponding to the target recognition corpus, and performing semantic recognition through the semantic recognition model to obtain a semantic recognition result.
In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the semantic recognition method according to the first aspect is implemented.
In a fourth aspect, the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the semantic recognition method according to the first aspect.
The embodiment of the invention provides a semantic identification method and device combined with knowledge graph entity information and related equipment. The method comprises the steps of obtaining a training corpus, constructing a first document matrix according to a preset matrix construction rule, and identifying an entity in the first document matrix by using an entity naming technology; screening out training entities from the entities according to an entity screening rule, calculating the weight between each training entity in the first document matrix, and constructing a graph neural network according to the weights; calculating a graph hidden state vector of each entity based on a graph neural network; calculating an adjacent semantic vector of each word in the training corpus based on the first document matrix; acquiring a word embedding vector and a position vector of each word in the training corpus, and combining a graph hidden state vector, an adjacent semantic vector, a word embedding vector and a position vector to obtain an input vector of the training corpus; predicting the input vector through an encoder of a Transformer model to obtain the prediction probability of the input vector belonging to the semantics; calculating probability loss between the prediction probability and the true semantics of the input vector according to a preset loss function, and optimizing model parameters of the encoder according to the probability loss to obtain a semantic recognition model; acquiring a target input vector corresponding to the target identification corpus, and performing semantic identification through a semantic identification model to obtain the target probability of the target identification corpus; and determining the semantic recognition result of the target recognition entity according to the target probability. The method considers the semantic relation of each entity in the document, and adds the adjacent semantic vector and the hidden state vector of the entity when the entity is used for training the semantic recognition model, so that the entity characteristics obtained by the semantic recognition model are more comprehensive, and the accuracy of the semantic recognition model for semantic recognition is improved.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Please refer to fig. 1, which is a flowchart illustrating a semantic recognition method combining knowledge-graph entity information according to an embodiment of the present invention, the method includes steps S110 to S180.
Step S110, acquiring a training corpus, constructing a first document matrix according to a preset matrix construction rule, identifying entities in the first document matrix by using an entity naming technology, and screening out training entities from the entities according to an entity screening rule;
in this embodiment, a training corpus is obtained, a first document matrix is constructed according to a preset matrix construction rule, an entity in the first document matrix is identified by using an entity naming technique, and a training entity is screened from entities according to an entity screening rule. The corpus may be dialogue corpora of various customer services and clients. The entities are nouns in the corpus. The entity may be a word or a plurality of words, and the corpus is screened from the entity according to the entity screening rule. For example, when an entity is composed of a plurality of words, there is Shanghai Pudong airport, which is composed of three words of Shanghai, Pudong, and airport.
For example, as shown in FIG. 6, a word segmentation tool is used to segment the ith sentence Xi in the corpus document, and the words in the sentences are sequentially identified as Xi in order1、Xi2、Xi3.. Entity identification is performed on the first document matrix using entity naming techniques, and each entity is given a number, e.g., E1, E2, E3. Where there may be at least 2 words of one entity, then the words are identified as 1 entity. Such as X14、X15Is an entity, marked as an entity.
In one embodiment, as shown in fig. 2, step S110 includes:
s111, clustering all entities of the training corpus to obtain entity clusters of different categories;
step S112, constructing entity adjacency matrixes by all entities in each entity cluster, and determining the contribution value of each entity relative to each entity adjacency matrix according to the entity adjacency matrixes;
s113, performing descending order arrangement on all entities according to the contribution values, and dividing according to the number of preset entities to obtain a training entity queue;
and S114, randomly masking the entities in the training entity queue according to a preset masking rule, and taking the unmasked entities in the training entity queue as the training entities.
In this embodiment, in order to make the model learn the information that is included in the less common entity, the entity screening rule is specifically set as follows: clustering all entities of the training corpus to obtain entity clusters of different categories, constructing entity adjacent matrixes by all the entities in each entity cluster, determining the contribution value of each entity relative to each entity adjacent matrix according to the entity adjacent matrixes, arranging all the entities in a descending order according to the contribution values, and dividing according to the number of preset entities to obtain training entity queues; and finally, randomly masking the entities in the training entity queue according to a preset masking rule, and taking the unmasked entities in the training entity queue as training entities.
For example, let n entities in a certain entity cluster, define the entity adjacency matrix composed of all entities in the entity cluster as A
n×nIf entity E
jBelonging to entity E
iThe entity semantic neighbor of (A)
ij1, otherwise 0, then:
defining the contribution of entity Ei to the entities in entity adjacency matrix A as the sum of the elements in each row of the matrix, i.e.
Dividing a plurality of (self-appointed number) equidistant intervals according to the contribution values of all the entities to obtain a training entity queue, then randomly masking the entities of the training entity queue according to a masking rule, and training the left and right entities of the unmasked entities in the training entity queue. Wherein if there is a physical adjacency matrix A
100×100The entity contribution value queue is [ c ]
1,c
2,...,c
100]Can be divided into 4 intervals[c
1,...,c
25]、[c
26,...,c
50]、[c
51,...,c
75]、[c
76,...,c
100]The elements in each interval are arranged according to the descending order of the element values to form a queue L
i(1≤i≤4). Each time a portion of the entities from the queues Li is randomly selected for masking by selecting the smaller contributing value entities in each queue, the higher the proportion of the selected masks, in order to force the modeling to the less common information that the entities contain.
Step S120, calculating the weight between each training entity in the first document matrix, and constructing a graph neural network according to the weight;
in this embodiment, to construct the graph neural network, weights between each training entity in the first document matrix are calculated, and finally, weights between each training entity are calculated according to the weights between each training entityijAnd constructing a graph neural network. Wherein the weights between each training entity are calculated as follows: calculating the space position distance between each entity in the first document matrix according to a preset space position rule; calculating semantic relative position distance between each entity in the first document matrix according to a preset semantic position rule; and calculating the product of the spatial position distance and the semantic relative position distance to obtain the weight between the two entities.
In one embodiment, as shown in fig. 3, step S120 includes:
step S121, calculating a spatial position distance between each entity in the first document matrix according to a preset spatial position rule;
step S122, calculating semantic relative position distance between each entity in the first document matrix according to a preset semantic position rule;
and S123, calculating the product of the spatial position distance and the semantic relative position distance to obtain the weight between the two entities.
In this embodiment, the weight calculation between each training entity includes two parts, a spatial position distance and a semantic relative position distance.
Wherein, the meterComputing spatial location distance geo _ disijTwo types are distinguished: (1) if two entities are in the same section ParkThen the spatial position between the two entities is the number of words WordSpace between themij(ii) a (2) If two entities are not in the same paragraph, the word number WordSpace separating the two entitiesijBy a coefficient ymEvery m is a distant paragraph spacing, such as one entity in paragraph 1 and another in paragraph 2, the distant paragraph spacing m is 1, where the coefficient γ ∈ (0,1), i.e., the relationship is as follows:
next, a reference knowledge-graph, such as a knowledgenet, is selected to obtain the categories and hierarchies to which the entities identified from the corpus belong. For semantic relative position sem disijThe calculation of (c) includes three types: (1) the semantic relative position distance of entities belonging to the same category of the knowledge graph and at the same level is 1, such as apples and pears, Beijing and Shanghai; (2) for entities belonging to the same category of the knowledge graph and having top-down relationship or bottom-up relationship, such as China, Guangdong province, Shenzhen City, and Futian district, the semantic relative position distance of each entity to the entity at the upper/lower layer is 1, the upper/lower layer is lambda, and the analogy is lambdan(ii) a For example, let λ be 0.8, the futian region has a relative position distance of 0.8 for the Shenzhen city semantic and 0.64 for the Guangdong province semantic; (3) for entities not belonging to the same category of the knowledge graph, such as apple and Beijing, let the farthest semantic relative position distance in all the same categories in the knowledge graph be lambdamaxThen the semantic relative position distance of entities in different classes can be defined as the ratio λmaxThe smaller one.
Finally, calculating the product of the space position distance and the semantic relative position distance to obtain the weight between the two entities, namely the weightij=geo_disij×sem_disij。
Step S130, calculating a graph hidden state vector of each entity based on the graph neural network;
in this embodiment, a graph hidden state vector of each entity is calculated by a graph neural network. The types of the graph neural networks are many, and not limited herein, and it is preferable to construct a graph neural network based on a Graph Convolution Network (GCN) in this embodiment.
Step S140, calculating an adjacent semantic vector of each word in the training corpus based on the first document matrix;
in this embodiment, in order to obtain the features of each word in the corpus to the maximum extent, based on the first document matrix, an adjacent semantic vector between each word and an adjacent word in the corpus is calculated.
In one embodiment, as shown in fig. 4, step S140 includes:
step S141, expanding the first document matrix to obtain a second document matrix;
s142, determining an adjacent word sequence of each word in the second document matrix according to a preset adjacent semantic rule; the adjacent semantic rule is an adjacent word rule;
and S143, based on the adjacent word sequence, calculating a forward adjacent semantic vector and a backward adjacent semantic vector of each word by adopting a preset LSTM model, and splicing the forward adjacent semantic vector and the backward adjacent semantic vector to obtain the adjacent semantic vectors, wherein each word has at least one adjacent semantic vector.
In this embodiment, the number of words in the longest sentence in the document is set to L based on the first document matrixmaxFilling up the rest sentences, i.e. the number of words is less than LmaxThe length portion is replaced with a special symbol (e.g., unk) to form a second document matrix. In consideration of the possibility of meaningless short sentences in the document, sentences whose number of words is less than a certain threshold Lt (e.g., Lt ═ 4) may be culled. Taking a word as a step length, selecting the word in the adjacent direction as the 'k-th adjacent word sequence' of the word. As shown in FIG. 7, for example, the word x44 with the center position is selected in the second document matrix, and the left, upper left, lower left, upper right, and lower left of the word are selectedAnd the words in eight directions, namely the right and the lower right, are used as the 1 st adjacent word sequence; and taking 16 words in a step size of extending one word outwards from the 1 st adjacent word sequence as the 2 nd adjacent word sequence. By analogy, a word exists in one or more sequences of contiguous words. It should be noted that if the word in a certain direction has exceeded the edge of the second document matrix, the word completion is done first, and the special symbol (unk) is filled in to construct the last word sequence adjacent to the word.
Further, as shown in fig. 8, after determining the adjacent word sequence of each word according to the second adjacent matrix, encoding the adjacent semantic sequence by using an LSTM model, which uses a long-and-short-term memory network, to obtain a forward adjacent semantic vector and a backward adjacent semantic vector, and splicing the two vectors to form a kth "adjacent semantic vector" of the word. Each neighboring semantic vector is weighted, with the principle that the larger k, the smaller its weight. The weights may be calculated using supervised machine learning methods or other methods.
Step S150, obtaining a word embedding vector and a position vector of each word in the training corpus, and combining the graph hidden state vector, the adjacent semantic vector, the word embedding vector and the position vector to obtain an input vector of the training corpus;
in this embodiment, word embedding adjacency and position vectors for each word in the corpus are obtained. The word embedding vector can be obtained through word2vec, glove and other models; and the position vector is calculated and obtained by adopting a sine and cosine mode in a preset Transformer model. And then combining the graph hidden state vector, the adjacent semantic vector, the word embedding vector and the position vector of each word in the training corpus to obtain the input vector of the training corpus. It should be noted that the hidden state vector, the adjacent semantic vector, the word embedding vector, and the position vector exist only for the input vector of the word belonging to the entity; rather than the input vector of the word of the entity, only the adjacency semantic vector, the word embedding vector, and the position vector.
Step S160, based on the input vector, predicting the input vector through an encoder of a Transformer model to obtain the prediction probability of the semantic meaning of the input vector;
step S170, calculating the probability loss between the prediction probability and the true semantic meaning of the input vector according to a preset loss function, and optimizing the model parameters of the encoder according to the probability loss to obtain a semantic recognition model;
in this embodiment, the input vector of each word of the training corpus is input into the encoder of the transform model, and the prediction probability of the semantic meaning to which each word belongs is output. And calculating probability loss between the prediction probability and the true semantics of the input vector according to a preset loss function, and optimizing model parameters of the encoder according to the probability loss to obtain a semantic recognition model.
And S180, acquiring a target input vector corresponding to the target recognition corpus, and performing semantic recognition through the semantic recognition model to obtain a semantic recognition result.
In this embodiment, after determining the semantic recognition model based on the above steps, the target input vector of the target recognition corpus is input into the semantic recognition model for semantic recognition to obtain a corresponding prediction probability, and a final semantic recognition result is determined according to the prediction probability.
The method considers the semantic relation of each entity in the document, and adds the adjacent semantic vector and the hidden state vector of the entity when the entity is used for training the semantic recognition model, so that the entity characteristics obtained by the semantic recognition model are more comprehensive, and the accuracy of the semantic recognition model for semantic recognition is improved.
The embodiment of the invention also provides a semantic recognition device combined with knowledge graph entity information, and the semantic recognition device is used for executing any embodiment of the semantic recognition method. Specifically, referring to fig. 5, fig. 5 is a schematic block diagram of a semantic recognition apparatus incorporating knowledge-graph entity information according to an embodiment of the present invention. The semantic recognition apparatus 100 may be configured in a server.
As shown in fig. 5, the semantic recognition apparatus 100 combining knowledge-graph entity information includes a filtering module 110, a constructing module 120, a hidden state vector calculating module 130, an adjacent semantic vector calculating module 140, an input vector constructing module 150, a predicting module 160, a parameter optimizing module 170, and a semantic recognition module 180.
The screening module 110 is configured to obtain a training corpus, construct a first document matrix according to a preset matrix construction rule, identify entities in the first document matrix by using an entity naming technology, and screen out training entities from the entities according to an entity screening rule;
a building module 120, configured to calculate weights between each training entity in the first document matrix, and build a graph neural network according to the weights;
a graph hidden state vector calculation module 130, configured to calculate a graph hidden state vector for each entity based on the graph neural network;
an adjacent semantic vector calculation module 140, configured to calculate an adjacent semantic vector for each word in the corpus based on the first document matrix;
an input vector construction module 150, configured to obtain a word embedding vector and a position vector of each word in the corpus, and combine the hidden state vector, the adjacent semantic vector, the word embedding vector, and the position vector to obtain an input vector of the corpus;
the prediction module 160 is configured to predict the input vector through an encoder of a transform model based on the input vector, so as to obtain a prediction probability of a semantic meaning to which the input vector belongs;
the parameter optimization module 170 is configured to calculate a probability loss between the prediction probability and a true semantic meaning to which the input vector belongs according to a preset loss function, and optimize a model parameter of the encoder according to the probability loss to obtain a semantic recognition model;
and the semantic recognition module 180 is used for acquiring a target input vector corresponding to the target recognition corpus, and performing semantic recognition through the semantic recognition model to obtain a semantic recognition result.
In one embodiment, the screening module 110 includes:
the clustering unit is used for clustering all entities of the training corpus to obtain entity clusters of different categories;
the contribution value calculation unit is used for forming entity adjacent matrixes by all the entities in each entity cluster and determining the contribution value of each entity relative to each entity adjacent matrix according to the entity adjacent matrixes;
the dividing unit is used for performing descending arrangement on all the entities according to the contribution values and dividing the entities according to the number of preset entities to obtain a training entity queue;
and the masking unit is used for randomly masking the entities in the training entity queue according to a preset masking rule and taking the unmasked entities in the training entity queue as the training entities.
In one embodiment, the building module 120 includes:
the spatial position distance calculating unit is used for calculating the spatial position distance between two entities in the first document matrix according to a preset spatial position rule;
the semantic relative position distance calculating unit is used for calculating the semantic relative position distance between two entities in the first document matrix according to a preset semantic position rule;
and the weight calculation unit is used for calculating the product of the spatial position distance and the semantic relative position distance to obtain the weight between the two entities.
The embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the semantic identification method in combination with the knowledge-graph entity information as described above when executing the computer program.
In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the semantic identification method in conjunction with knowledge-graph entity information as described above.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only a logical division, and there may be other divisions when the actual implementation is performed, or units having the same function may be grouped into one unit, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.