CN111611796A - Hypernym determination method and device for hyponym, electronic device and storage medium - Google Patents

Hypernym determination method and device for hyponym, electronic device and storage medium Download PDF

Info

Publication number
CN111611796A
CN111611796A CN202010430946.7A CN202010430946A CN111611796A CN 111611796 A CN111611796 A CN 111611796A CN 202010430946 A CN202010430946 A CN 202010430946A CN 111611796 A CN111611796 A CN 111611796A
Authority
CN
China
Prior art keywords
hypernym
sample
hyponym
training
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010430946.7A
Other languages
Chinese (zh)
Inventor
李晨曦
荆宁
张红林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Wuhan Co Ltd
Original Assignee
Tencent Technology Wuhan Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Wuhan Co Ltd filed Critical Tencent Technology Wuhan Co Ltd
Priority to CN202010430946.7A priority Critical patent/CN111611796A/en
Publication of CN111611796A publication Critical patent/CN111611796A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a method and a device for determining hypernyms of hyponyms, electronic equipment and a storage medium, wherein the method comprises the following steps: obtaining a description text corresponding to a hyponym to be processed; determining the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set based on the description text corresponding to the hyponym to be processed; and determining a target hypernym corresponding to the hyponym to be processed from the candidate hypernyms based on the corresponding probability of each candidate hypernym. In the scheme of the application, no requirement is made on the description text, and compared with a rule-based method, the method is higher in universality, can be simply and effectively applied to scenes of various data, and is higher in recall rate of the hypernym determined by the method.

Description

Hypernym determination method and device for hyponym, electronic device and storage medium
Technical Field
The present application relates to the field of machine learning technologies, and in particular, to a method and an apparatus for determining an hypernym of a hyponym, an electronic device, and a storage medium.
Background
In the prior art, determining the hypernym of the hyponym generally needs to determine the hypernym of the hyponym from the related text of the hyponym based on a preset rule, that is, the hypernym of the hyponym can be determined only if the related text contains information meeting the preset rule, and then if the related text of the hyponym does not contain the related text meeting the preset rule, the hypernym of the hyponym may not be determined accurately. Therefore, in the prior art, the scheme for determining the hypernym of the hyponym has high requirements on data, and rules need to be set, so that the scheme has no universality.
Disclosure of Invention
The embodiment of the present application mainly aims to provide a method and an apparatus for determining an hypernym of a hyponym, an electronic device, and a storage medium.
In a first aspect, an embodiment of the present application provides a method for determining an hypernym of a hyponym, where the method includes:
obtaining a description text corresponding to a hyponym to be processed;
determining the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set based on the description text corresponding to the hyponym to be processed;
and determining a target hypernym corresponding to the hyponym to be processed from the candidate hypernyms based on the corresponding probability of each candidate hypernym.
In a second aspect, the present application provides an hypernym determination apparatus for a hyponym, the apparatus comprising:
the information processing device comprises a to-be-processed information acquisition module, a processing module and a processing module, wherein the to-be-processed information acquisition module is used for acquiring a description text corresponding to a to-be-processed hyponym;
the information processing module is used for determining the probability that the to-be-processed hyponym corresponds to each candidate hypernym in the hypernym set based on the description text corresponding to the to-be-processed hyponym;
and the target hypernym determining module is used for determining the target hypernym corresponding to the hyponym to be processed from the candidate hypernyms based on the corresponding probability of each candidate hypernym.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor and a memory; the memory has stored therein readable instructions which, when loaded and executed by the processor, implement the method as shown in the first aspect or any one of the alternative embodiments of the first aspect described above.
In a fourth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is loaded by a processor and executed, the method as shown in the first aspect or any optional embodiment of the first aspect is implemented.
The beneficial effect that technical scheme that this application provided brought is:
when the hypernym of the hyponym to be processed needs to be determined, the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set can be determined based on the description text corresponding to the hyponym to be processed, wherein the description text can determine the hypernym of the hyponym to be processed in each candidate hypernym based on the probability corresponding to each candidate hypernym without meeting any set rule.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of a hypernym determination method for a hyponym according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a training process of a hypernym determination model according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an hypernym determination apparatus for a hyponym according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to yet another embodiment of the present application.
Detailed Description
Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an", "the" and "the" include plural referents unless the context clearly dictates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Machine Learning (ML) is a multi-domain cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and counterlearning.
With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.
For better understanding and description of the embodiments of the present application, some technical terms used in the embodiments of the present application will be briefly described below.
Neural Networks (NN): the method is an arithmetic mathematical model simulating animal neural network behavior characteristics and performing distributed parallel information processing. The network achieves the aim of processing information by adjusting the mutual connection relationship among a large number of nodes in the network depending on the complexity of the system.
In the prior art, for the construction of a knowledge graph, it is usually necessary to mine hypernyms of hyponyms, and the current method for mining hypernyms of hyponyms is usually a rule-based method, that is, based on a preset rule, the hypernyms meeting the rule are extracted from the description text of the hyponyms, for example: the hyponym "apple" is described as originally: the "apple is a fruit", the hypernym of "fruit" is included in the description text, and the hypernym of the apple can be obtained based on the description text. Therefore, in the prior art, the description text contains information conforming to the set rule, and the superior word of the inferior word can be determined.
Determining the hypernym of the hyponym by the scheme in the prior art has the following problems:
1. if the description text of the hyponym does not contain information meeting the set rule, the hypernym of the hyponym may not be determined accurately.
2. The scheme needs to preset rules and has high requirements on data, so that the scheme has no universality and the recall rate is low.
In view of the above technical problems, the present application provides a method and an apparatus for determining hypernyms of hyponyms, an electronic device and a storage medium, which, based on the solution of the present application, when needing to determine the superior word of the inferior word to be processed, based on the inferior word to be processed and the description text corresponding to the inferior word to be processed, the probability that the hyponym to be processed corresponds to each candidate hypernym in the set of hypernyms can be determined, wherein, the description text can determine the superior word of the inferior word to be processed in each candidate superior word based on the corresponding probability of each candidate superior word without conforming to any set rule, the method has no requirement on description texts, has stronger universality compared with a rule-based method, can be simply and effectively applied to scenes of various data, and has higher recall rate of hypernyms determined by the method.
In addition, based on the to-be-processed hyponym and the description text corresponding to the to-be-processed hyponym, the probability that the to-be-processed hyponym corresponds to each candidate hypernym in the hypernym set can be determined through a hypernym determination model, wherein the hypernym determination model is obtained through multi-task learning training, namely in the process of training the hypernym determination model, two models can be simultaneously trained based on training samples (the sample hyponym and the description text corresponding to the sample hyponym), namely a hypernym prediction model and a semantic matching result prediction model, wherein the hypernym prediction model is used for predicting the hypernym prediction result of the sample hyponym based on the training samples, the semantic matching result prediction model is used for predicting the semantic similarity between the sample hyponym and the sample hypernym based on the training samples, and the obtained hypernym determination model is trained based on the multi-task learning, compared with a scheme based on rules, the method has the advantages that the data requirement is low, and therefore the generalization capability of the model and the mining effect of the hypernyms are improved.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 1 shows a schematic flow diagram of a method for determining an hypernym of a hyponym provided by the present application, an execution subject of the present application may be a server, and the present solution is described with the server as the execution subject, as shown in fig. 1, the method may include steps S110 to S130, where:
step S110, obtaining a description text corresponding to the hyponym to be processed.
The to-be-processed hyponym refers to a hyponym whose hypernym needs to be determined, and the description text may be a description text related to the to-be-processed hyponym, such as a word. In an alternative of the present application, if the obtained non-text description information corresponding to the hyponym to be processed, for example, the speech information, is obtained, the speech information may be converted into text information, and the text information is used as the description text corresponding to the hyponym to be processed.
Step S120, based on the description text corresponding to the hyponym to be processed, determining the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set.
Step S130, based on the corresponding probability of each candidate hypernym, determining a target hypernym corresponding to the hyponym to be processed from each candidate hypernym.
The hypernym set is a pre-created set, and the hypernym set comprises hypernyms corresponding to different hyponyms, one hyponym can correspond to at least one hypernym, for example, an apple (hyponym), and the corresponding hypernym can be a fruit, an electronic brand, a movie name, and the like. The hypernyms in the hypernym set can be used as candidate hypernyms of the hyponym to be processed, when the target hypernym corresponding to the hyponym to be processed is determined, the probability can be based on the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set, the probability can reflect the possibility that each candidate hypernym in the hypernym set is the target hypernym of the hyponym to be processed, and the probability value is larger and the possibility is larger. Then a target hypernym, which may be at least one, may be determined based on the probability that each candidate hypernym corresponds.
In an alternative, the corresponding hypernym when the probability is greater than the set value may be used as the target hypernym. Wherein the set point may be configured based on actual demand.
It is understood that the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set may be a positive number not greater than 1, or may be 0 or 1.
According to the scheme of the embodiment of the application, when the hypernym of the hyponym to be processed needs to be determined, the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set can be determined based on the to-be-processed hyponym and the description text corresponding to the hyponym to be processed, wherein the description text can determine the hypernym of the hyponym to be processed in each candidate hypernym based on the probability corresponding to each candidate hypernym without meeting any set rule.
In an alternative aspect of the present application, the method further comprises:
acquiring a hyponym to be processed;
determining the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set based on the description text corresponding to the hyponym to be processed may include:
and determining the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set based on the hyponym to be processed and the description text corresponding to the hyponym to be processed.
When the probability that the to-be-processed hyponym corresponds to each candidate hypernym in the hypernym set is determined, the to-be-processed hyponym and the description text corresponding to the to-be-processed hyponym can be defined based on the to-be-processed hyponym and the description text corresponding to the to-be-processed hyponym, and on the basis, the to-be-processed hyponym can be emphasized when the probability is determined by combining the to-be-processed hyponym, so that the determined probability is more accurate.
In the description text, the to-be-processed hyponym may be included, and the to-be-processed hyponym may be obtained from the corresponding description text.
In an alternative of the present application, determining, based on a description text corresponding to a hyponym to be processed, a probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set includes:
performing word segmentation processing on the description text to obtain each word segmentation corresponding to the description text;
extracting features of at least two scales based on the word vector of each participle;
and splicing the features of all scales to obtain the text features corresponding to the description text.
When the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set is determined, the features of at least two scales can be extracted based on the description text, and the features of different scales comprise different information, so that the text features determined based on the features of different scales can reflect the characteristics of the text more accurately.
In an alternative scheme of the application, word segmentation processing can be performed on the description text through a word segmentation tool, so that each word segmentation corresponding to the description text is obtained.
In an alternative of the present application, extracting features of at least two scales based on the word vector of each segmented word may include:
and extracting the characteristics of at least two scales through convolution processing modules of at least two scales based on the word vector of each participle.
The convolution processing module comprises convolution layers, so that the features of at least two scales can be extracted through the convolution layers.
The convolution processing module can also comprise a pooling layer, and the features extracted by the convolution layer can be converted into the features with the same dimensionality through the pooling layer, so that the subsequent processing is facilitated.
As an example, the description text includes a to-be-processed hyponym, and the extracting text features based on the to-be-processed hyponym and the description text may specifically include: performing word segmentation processing on the description text to obtain each word segmentation corresponding to the description text, and determining a word sequence matrix corresponding to each word segmentation based on each word segmentation, wherein the word sequence matrix can be expressed as: wi={w1,w2,…,wnWhere n is the number of participles, wiRepresenting a word sequence corresponding to the ith word-segmentation, WiAnd representing a matrix corresponding to each word sequence, wherein n is more than or equal to i and more than or equal to 1.
Based on the word sequence corresponding to each participle, a word vector matrix corresponding to each participle is obtained, which can be expressed as:
Figure BDA0002500496120000071
wherein x isiAnd representing the word vector corresponding to the ith word segmentation, wherein n is more than or equal to i and more than or equal to 1, c represents the dimension of each word vector as c, X is a vector matrix formed by the word vectors, each row represents the word vector of one word segmentation, and the column number of the vector matrix represents c.
By convolving the word vector sum of each participle by the convolutional layer, the convolution result can be expressed as:
Figure BDA0002500496120000072
Figure BDA0002500496120000081
wherein, cijRepresents the ith convolutionThe convolution result at the j-th position is checked.
In this example, in the convolution processing procedure, three convolution kernels of different scales are selected to perform convolution on the word vectors of the participles respectively, the scales of the convolution kernels include the width and the length of the convolution kernels, in this example, the widths of the three convolution kernels are the same, and then the scales of the three convolution kernels are different, which means that the lengths of the convolution kernels are different, and in this example, the lengths of the convolution kernels are 2,3 and 4 respectively.
Performing maximum pooling on the convolution result corresponding to each scale, wherein the pooling result is represented as: y isiHere, i is 2,3,4, i denotes convolution kernels of different lengths. Wherein, y2Denotes the pooling result, y, of the convolution kernel for a length of 23Denotes the pooling result, y, of the convolution kernel for a length of 34Indicating the pooling result for a convolution kernel of length 4.
And then splicing a plurality of pooling results as text features, and recording as follows:
Figure BDA0002500496120000082
f denotes a number y2,y3And y4And carrying out splicing to obtain text characteristics.
If the probability that the to-be-processed hyponym corresponds to each candidate hypernym in the hypernym set is determined based on the to-be-processed hyponym and the description text corresponding to the to-be-processed hyponym, when the text features are determined, word vectors corresponding to the to-be-processed hyponym can be determined first, and then the text features corresponding to the to-be-processed hyponym and the description text are obtained based on the word vectors corresponding to the to-be-processed hyponym and the word vectors of the word segments corresponding to the description text. For example, based on the above example, a row of word vectors may be added to the word vector matrix, where the word vectors are word vectors corresponding to the hyponyms to be processed.
In an alternative of the present application, determining, based on the description text corresponding to the hyponym to be processed, a probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set may include:
and determining the probability of the to-be-processed hyponym corresponding to each candidate hypernym in the hypernym set through the hypernym determination model based on the description text corresponding to the to-be-processed hyponym.
The hypernym determination model is trained in advance, the hypernym determination model can be used for describing texts or text characteristics corresponding to the texts, and the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set is output. If the input of the hypernym determination model is the text feature, the text feature needs to be extracted based on the description text corresponding to the hyponym to be processed, and the text feature is input to the hypernym determination model.
In the scheme of the application, the hypernym determination model is obtained by training in the following way:
acquiring training samples, wherein the training samples comprise sample description texts corresponding to sample hyponyms and sample hypernyms corresponding to the sample hyponyms, each sample hypernym comprises at least one correct hypernym and at least one error hyponym, each sample hypernym is marked with a first label, the first label represents that the sample hypernym is a hypernym marking result of the hypernym corresponding to the sample hyponym, and the hypernym set comprises each sample hypernym;
training an hypernym prediction model of the initial neural network model based on a sample description text corresponding to a sample hyponym of each training sample until a total loss function of the initial neural network model converges, wherein the total loss function comprises a first loss function, and the hypernym prediction model at the end of training is used as a hypernym determination model;
the output of the hypernym prediction model is that the hyponym of the training sample corresponds to the hypernym prediction result of each candidate hypernym, and the value of the first loss function represents the difference between the hypernym labeling result corresponding to the hyponym of the sample and the hypernym prediction result.
The training sample comprises a description text corresponding to a sample hyponym and a sample hypernym corresponding to the sample hyponym, one sample hyponym can correspond to at least one sample hypernym, and if at least two sample hypernyms correspond to the sample hyponym, correct hypernyms and error hypernyms can be enveloped in the at least two sample hypernyms. The correct hypernym means that the hypernym is the correct hypernym corresponding to the hyponym, for example, the hyponym is "apple", the hypernym is "fruit", and the fruit is the correct hypernym of "apple". The error hypernym refers to the hypernym corresponding to the hyponym, for example, the hyponym is "apple", the hypernym is "vegetable", and the "vegetable" is the error hyponym of "apple".
In the scheme of the application, the hypernym prediction model inputs the sample description text corresponding to the sample hyponym of each training sample, and outputs the hypernym prediction result of the hyponym of the training sample corresponding to each candidate hypernym. Each sample hypernym in the training samples comprises a correct hypernym and an error hypernym, and the accuracy of model training can be improved based on the correct hypernym and the error hypernym.
The hypernyms of the samples in the training samples can be used as a hypernym set, the hypernym set comprises correct hypernyms and wrong hypernyms, and target hypernyms of hyponyms are determined from the candidate hypernyms in the hypernym set.
In the scheme of the application, each sample hypernym corresponding to the sample hyponym can be constructed in a manual construction mode. The same hypernym may exist in the sample hypernyms corresponding to different sample hyponyms.
In the scheme of the application, the number of constructed sample hypernyms corresponding to each sample hypernym may be the same or different.
Each sample hypernym is marked with a first label, the first label represents that the sample hypernym is a hypernym marking result of a hypernym corresponding to the sample hyponym, namely the first label represents that the marked sample hypernym is the hypernym of which hyponym. The first label can be labeled manually, or mapped to map the hypernyms of the samples into a plurality of first labels. The first label may be a character string, characters, numbers, etc., and the specific representation form of the first label is not limited in this application.
In the scheme of the application, the input of the hypernym prediction model may be a sample description text corresponding to a sample hyponym of each training sample, or may be based on text features extracted from a sample description text corresponding to a sample hyponym of a training sample. It is understood that the extraction of the text features may be implemented outside the hypernym prediction model or may be implemented inside the hypernym prediction model.
In the solution of the present application, the hypernym prediction result may be a probability that a hyponym corresponds to each candidate hypernym in the hypernym set, and the output of the hypernym prediction model may be represented as: [ p ]1,p2,…,pm]Wherein m is the number of candidate hypernyms (sample hypernyms) in the hypernym set, m is more than or equal to i and more than or equal to 1, wherein i represents the ith hypernym in the hypernym set, and piIn practical application, the set value is α, and the hypernym with the probability greater than α is used as the target hypernym.
In the scheme of the application, the hypernym prediction model comprises a full link layer and a normalization layer, wherein the full link layer is used for mapping text features into m-dimensional vectors, the normalization layer is used for carrying out normalization processing on the m-dimensional vectors, and the hypernym prediction result is represented through a normalization result. As an alternative, softmax may be used as the normalization layer, and the normalization result may be expressed as: [ p ]1,p2,…,pm]。
In the scheme of the application, the value of the first loss function represents the difference between the hypernym labeling result and the hypernym prediction result corresponding to the sample hyponym, and for a positive training sample, the smaller the difference is, the closer the hypernym prediction result is to the corresponding hypernym labeling result is, that is, the higher the probability that the hypernym corresponding to the hypernym prediction result is the correct hypernym corresponding to the sample hyponym (the hypernym corresponding to the hypernym labeling result) is. On the contrary, the larger the difference is between the hypernym prediction result and the corresponding hypernym labeling result is, that is, the smaller the probability that the hypernym corresponding to the hypernym prediction result is the correct hypernym corresponding to the sample hyponym is. For the negative training sample, the smaller the difference is, the closer the hypernym prediction result is to the corresponding hypernym labeling result is, that is, the higher the probability that the hypernym corresponding to the hypernym prediction result is the wrong hypernym of the hyponym of the sample (the hypernym corresponding to the hypernym labeling result) is. On the contrary, the larger the difference is between the hypernym prediction result and the corresponding hypernym labeling result is, that is, the smaller the probability that the hypernym corresponding to the hypernym prediction result is the wrong hypernym corresponding to the sample hyponym is.
In the scheme of the present application, for a training sample, the first loss function may be a cross-entropy loss function (cross-entropy), which is specifically expressed as:
Figure BDA0002500496120000111
where m denotes the number of sample hypernyms in the hypernym set, piIndicates the probability (hypernym prediction result) corresponding to the ith hypernym, piI in (a) represents the ith hypernym in the hypernym set, yiIndicating whether the upper word set contains the ith upper word or not, yiThe i in (a) indicates that the ith hypernym in the hypernym set is also represented. For example, if the hypernym set contains the ith hypernym, yi1, if the i-th hypernym is not contained in the hypernym set, yiWhen equal to 0.
It is to be understood that the first loss function is a loss function for one training sample, and for the total loss function, the first loss function needs to be summed based on the number of training samples.
In an alternative of the present application, the training samples further include sample hyponyms, and the training of the hypernym prediction model of the initial neural network model is performed based on the sample description texts corresponding to the sample hyponyms of the training samples, including:
and training the hypernym prediction model of the initial neural network model based on the sample hyponyms of the training samples and the sample description texts corresponding to the sample hyponyms.
And if the training samples can also comprise sample hyponyms, inputting the hyponyms of the training samples and sample description texts corresponding to the sample hyponyms by the hypernym prediction model, and outputting hypernym prediction results of the hyponyms of the training samples corresponding to the candidate hypernyms.
In an alternative scheme of the application, for a training sample, a second label is further labeled on a sample hypernym, the second label represents a semantic matching labeling result of the sample hypernym and a sample description text, and the initial neural network model further comprises a semantic matching result prediction model;
training an hypernym prediction model of the initial neural network model based on a sample description text corresponding to a sample hyponym of each training sample, including:
training an hypernym prediction model of the initial neural network model based on a sample description text corresponding to a sample hyponym of each training sample;
training a semantic matching result prediction model based on each training sample;
the output of the semantic matching result prediction model is a semantic matching prediction result of a sample description text of a training sample and each candidate hypernym, the total loss function further comprises a second loss function, and the value of the second loss function represents the difference between a semantic matching labeling result and the semantic matching prediction result corresponding to the sample description text.
In the training process of the hypernym determination model, the semantic matching result prediction model can be trained based on semantic similarity between sample hypernyms and sample description texts corresponding to the sample hyponyms, namely, the hypernym prediction model of the initial neural network model is trained based on the sample description texts corresponding to the sample hyponyms of the training samples, and meanwhile, the semantic matching result prediction model can be trained based on the training samples, so that the precision of the hypernym determination model obtained by training is higher.
Each sample hypernym is also labeled with a second label, and the second label represents a semantic matching labeling result of the sample hypernym and the sample description text, namely the semantic similarity between the sample description text corresponding to one sample hyponym and the sample hypernym labeled by the second label. The second label can also be labeled manually, or mapped to map the hypernyms of the samples into a plurality of second labels. The second label may be a character string, characters, numbers, etc., and the specific representation form of the second label is not limited in this application.
In the scheme of the present application, the output of the semantic matching result prediction model is a semantic matching prediction result, i.e. semantic similarity, between the sample description text of the training sample and each candidate hypernym, and in the scheme of the present application, the degree of similarity between two words (one hypernym and one sample description text) can be represented by a probability, for example, a probability greater than a set threshold value represents that the two word senses are similar, and a probability not greater than the set threshold value represents that the two word senses are not similar. Whether the two words are similar or not can be represented by the result of the two-classification, for example, when the result of the two-classification is 0, the semantic similarity between the two words is represented, and when the result of the two-classification is 1, the semantic similarity between the two words is represented. It should be noted that the specific representation form of the semantic matching prediction result is not limited in the present application, and all of them are within the scope of the present application.
As an example, if the semantic matching predictor is represented by a probability, the semantic matching predictor can be represented as: [ p ]1,p2],p1,p2Respectively representing the probability of similarity or dissimilarity between two words, p1+p2Assume that the threshold value is set to β if p is 11Greater than β, p2Not greater than β, the words are similar, otherwise, if p is1Not greater than β, p2Greater than β, the two words are not similar.
In addition, two classifications can be performed based on the probability to obtain a two classification result, that is, when the probability is greater than β, the two classification result is 0 to indicate that the two words are similar, and when the probability is not greater than β, the two classification result is 1 to indicate that the two words are not similar. The two classification results can be recorded as: y ', y ' 0 indicates similarity, and y ' 1 indicates dissimilarity.
It is understood that [ p ] above1,p2]For semantic similarity between a sample description text and a sample hypernym, if a training sample has a sample hyponym and k sample hypernyms, k [ p ] can be obtained correspondingly1,p2]。
In the scheme of the application, the semantic matching result prediction model comprises a full link layer and a normalization layer, wherein the full link layer is used for mapping the associated features into two-dimensional vectors, the normalization layer is used for performing normalization processing on the two-dimensional vectors, and the semantic matching prediction result is represented through the normalization result. As an alternative, softmax may be used as the normalization layer, and the normalization result may be expressed as: [ p ]1,p2]。
In an alternative of the present application, the value of the second loss function represents the difference between the semantic matching labeling result and the semantic matching prediction result corresponding to the sample description text, and the smaller the difference is, the closer the semantic matching labeling result and the semantic matching prediction result corresponding to the sample description text is to the correct hypernym in the training sample, i.e. the closer the semantic similarity between the hypernym corresponding to the semantic matching prediction result and the sample description text is to the semantic matching labeling result (similarity). On the contrary, the larger the difference is between the semantic matching labeling result (similar) corresponding to the sample description text and the semantic matching prediction result is, that is, the less the semantic similarity between the hypernym corresponding to the semantic matching prediction result and the sample description text is similar to the semantic matching labeling result (similar). For the error hypernym in the training sample, the smaller the difference is, the closer the semantic matching labeling result (dissimilarity) corresponding to the sample description text is to the semantic matching prediction result, i.e. the closer the semantic similarity between the hypernym corresponding to the semantic matching prediction result and the sample description text is to the semantic matching labeling result (dissimilarity). On the contrary, the larger the difference is between the semantic matching labeling result (dissimilarity) corresponding to the sample description text and the semantic matching prediction result is, that is, the less the semantic similarity between the hypernym corresponding to the semantic matching prediction result and the sample description text is similar to the semantic matching labeling result (dissimilarity).
In an alternative aspect of the present application, the second loss function may be a maximum interval loss function (max-margin), specifically expressed as: lpMax (0, λ -y '· y), where λ is a hyper-parameter, p denotes the p-th sample description text, and y, y' denote the semantic matching labeling result and the semantic matching prediction result, respectively. Loss function for a training sample
It will be appreciated that the second loss function is a loss function for one training sample, and that for the total loss function, the second loss function needs to be summed based on the number of training samples.
The input of the semantic matching result prediction model may be each training sample, if the training sample includes a sample description text corresponding to a sample hyponym and each sample hypernym corresponding to the sample hyponym, the input of the semantic matching result prediction model is a sample description text corresponding to the sample hyponym and each sample hypernym corresponding to the sample hyponym, and if the training sample includes a sample hyponym, a sample description text corresponding to the sample hyponym and each sample hypernym corresponding to the sample hyponym, the input sample hyponym of the semantic matching result prediction model, a sample description text corresponding to the sample hyponym, and each sample hypernym corresponding to the sample hyponym.
In an alternative aspect of the present application, training the semantic matching result prediction model based on training samples may include:
for any training sample, respectively extracting a first feature of a sample description text in the training sample and a second feature corresponding to a hypernym of each sample in the training sample;
and determining the associated features between the hyponyms of the samples and the hypernyms of the samples through a semantic matching result prediction model based on the first features and the second features, and obtaining the semantic matching prediction result corresponding to the training sample based on the associated features.
The input of the semantic matching result prediction model may be training samples, each training sample includes a sample description text corresponding to a sample hyponym and each sample hypernym corresponding to the sample hyponym, and the semantic matching result prediction model may be trained based on the associated features extracted from the training samples in the training of the semantic matching result prediction model. The correlation characteristics can embody semantic correlation between the sample description text and the sample hypernyms, and the model is trained based on the correlation characteristics, so that the model training precision can be further improved.
It is understood that the extraction of the associated features may be implemented outside the semantic matching result prediction model or may be implemented inside the semantic matching result prediction model.
The first feature is the same as the text feature extraction principle described above, and is not described herein again, and the text feature extracted during training of the hypernym prediction model can be directly used as the first feature. Namely, the hypernym prediction model and the semantic matching result prediction model can share the text characteristics during training. Therefore, the first feature can be extracted only once from the hypernym prediction model, and the text feature can be directly used in the semantic matching result prediction model without extracting once again, so that the data calculation amount is saved, and the model training speed is improved.
It is understood that the above is a description of extracting the associated features from one training sample, and the associated features may be extracted in the same manner for other training samples, and then the semantic matching result prediction model is trained based on the associated features corresponding to each training sample.
The second feature refers to a feature corresponding to each sample hypernym in one training sample, and as an example, if there are 5 sample hypernyms in one training sample, the second feature refers to a feature corresponding to the 5 sample hypernyms. For a training sample, the semantic matching prediction result refers to semantic similarity between a sample description text in the training sample and all sample hypernyms in the training sample, that is, all sample hypernyms in the training sample are regarded as a whole. If the semantic matching prediction results are similar, the semantic matching prediction results indicate that the semantics between the sample description text and the hypernyms of the samples are similar in the training sample, otherwise, if the semantic matching prediction results are not similar, the semantic matching prediction results indicate that the semantics between the sample description text and the hypernyms of the samples are not similar in the training sample.
In the scheme of the present application, for any training sample, extracting the second feature corresponding to the hypernym in each sample may include:
splicing the hypernyms of all samples in the training samples to obtain hypernym texts;
performing word segmentation processing on the hypernym text to obtain each word segmentation corresponding to the hypernym text;
and obtaining a second characteristic based on each word segmentation corresponding to the hypernym text.
The second feature may also be extracted based on the convolution processing module described above, and the principle of the second feature is consistent with the principle of extracting the text feature, which is not described herein again.
As an example, for each sample hypernym corresponding to one sample hyponym, for example, one sample hyponym corresponds to 5 sample hypernyms, the 5 sample hypernyms are firstly spliced to obtain a hypernym text, that is, a section of text, and the hypernym text is subjected to word segmentation, where a word sequence matrix of each word segmentation may be represented as: wi’={w′1,w′2…, w 'k, where k is the number of corresponding participles in a hypernym text, in this example, k is 5, w'iRepresenting the word sequence corresponding to the ith participle, k is more than or equal to i and more than or equal to 1, Wi' indicates a matrix corresponding to each word sequence. And determining a text word vector matrix corresponding to each participle based on the word sequence of each participle, wherein the text word vector matrix can be expressed as:
Figure BDA0002500496120000151
wherein x isi'represents a word vector corresponding to the ith participle, and L represents that the dimension of each word vector is L, X'1:LAnd a matrix corresponding to the text word vector representing each participle, wherein each row represents a word vector of one participle, and the column number of the vector matrix represents L. Then based on the word vector of each participle, the word vector is obtained by the extraction method of the text characteristics in the foregoingTo the second feature, note:
Figure BDA0002500496120000152
wherein, y2' represents a feature corresponding to a convolution kernel of length 2, y3' represents a feature corresponding to a convolution kernel of length 3, y4'represents the feature corresponding to a convolution kernel of length 4, and f' represents y2′,y3' and y4' second feature obtained by performing splicing.
Further, the semantic matching result prediction model further includes a feature interaction layer, and determines, based on the first feature and the second feature, an association feature between the sample hypernym and the sample hyponym, which may specifically be:
and determining the association characteristics between the sample hyponyms and the sample hypernyms through the characteristic interaction layer based on the first characteristics and the second characteristics.
As an example, the first feature is denoted as f, the second feature is denoted as f', and the associated feature may be denoted as:
Figure BDA0002500496120000153
Figure BDA0002500496120000154
in an alternative of the present application, an alternative training mode may be adopted, where the two models (the hypernym prediction model and the semantic matching result prediction model) are trained alternately, specifically, each task training may be trained iteratively n times, where n is 50, and then the alternative training mode is adopted, and after the hypernym prediction model is trained x times, the semantic matching result prediction model is trained y times, where x is less than n, and y is less than n, and the training is stopped until each task training reaches n times. For the case of multi-task training, an alternating training mode is adopted, and the model parameters can be correspondingly adjusted based on each training result, so that the accuracy of the trained model is higher.
It should be noted that, in the solution of the present application, the two models may be trained in an alternating training manner, or may also be trained in a non-alternating training manner, for example, after training the hypernym prediction model n times, the semantic matching result prediction model is trained n times.
In an alternative aspect of the present application, each sample hypernym of a training sample is determined by:
acquiring an initial sample hypernym set corresponding to each training sample, wherein the initial sample hypernym set comprises at least two initial correct hypernyms;
for each initial correct hypernym, determining the negative sampling probability of the initial correct hypernym based on the number of times that the initial correct hypernym appears in all initial sample hypernym sets;
for any training sample, determining at least one error hypernym corresponding to the training sample based on the negative sampling probability of each initial correct hypernym corresponding to the training sample, and taking other initial correct hypernyms except the initial correct hypernym corresponding to the at least one error hypernym in the initial sample hypernym set corresponding to the training sample as the correct hypernyms corresponding to the training sample.
And determining the hypernym of each sample of the training sample is to determine the correct hypernym and the error hypernym corresponding to the training sample. Each training sample corresponds to an initial sample hypernym set, the initial hypernym set can comprise initial correct hypernyms and initial wrong hypernyms, and in order to enrich data of the initial sample hypernyms in the initial sample hypernym set, the initial sample hypernym set comprises at least two initial correct hypernyms.
When determining the hypernyms of the samples of the training samples, the hypernyms may be determined manually or based on all the initial correct hypernyms corresponding to the hyponyms of the samples. Because of all the initial correct hypernyms corresponding to the hyponyms of different samples, there are repeated initial correct hypernyms, that is, the initial correct hypernym set includes repeated initial correct hypernyms. Thus, the number of occurrences of each initial correct hypernym in all initial sample hypernym sets, denoted as Ni, can be determined, representing the number of occurrences of the ith initial correct hypernym in all initial sample hypernym sets.
Then, based on the number of times that each initial correct hypernym appears in all initial sample hypernym sets, determining the negative sampling probability of each initial correct hypernym through a sampling probability calculation formula, wherein the sampling probability calculation formula is expressed as: p is a radical ofi=Niand/N, wherein N is the number of initial correct hypernyms in all initial sample hypernym sets.
Then, for any training sample, based on the negative sampling probability of each initial correct hypernym corresponding to the training sample, at least one error hypernym corresponding to the training sample is determined, and the initial correct hypernyms except the initial correct hypernym corresponding to the at least one error hypernym in the initial sample hypernym set corresponding to the training sample are used as the correct hypernym corresponding to the training sample.
In an alternative of the present application, at least one erroneous hypernym may be determined based on the initial correct hypernym with a higher negative sampling probability (e.g., greater than a set threshold value) according to the magnitude of the negative sampling probability of each initial correct hypernym corresponding to the training sample.
It is understood that the error hypernym corresponding to the training sample can be the correct hypernym corresponding to other training samples.
In the scheme of the application, for a training sample, the number of the error hypernyms and the number of the correct hypernyms may be the same, that is, if the number of the correct hypernyms corresponding to a sample hyponym in the training sample is k, the number of the error hypernyms corresponding to the sample hyponym is also k.
In an alternative aspect of the present application, determining at least one error hypernym corresponding to the training sample based on the negative sampling probability of each initial correct hypernym corresponding to the training sample includes:
determining at least one of the initial correct hypernyms corresponding to the training sample as at least one error hypernym corresponding to the training sample based on the negative sampling probability of each initial correct hypernym corresponding to the training sample, and/or replacing at least one initial correct hypernym in the initial sample hypernym set corresponding to the training sample with at least one error hypernym based on the negative sampling probability of each initial correct hypernym corresponding to the training sample.
When determining at least one error hypernym corresponding to the training sample based on the negative sampling probability of each initial correct hypernym corresponding to the training sample, at least one of the following methods may be included:
the first method comprises the following steps: and determining at least one of the initial correct hypernyms corresponding to the training sample as at least one error hypernym corresponding to the training sample based on the negative sampling probability of each initial correct hypernym corresponding to the training sample.
As an example, the training sample corresponds to a number of initial correct hypernyms, where a is a positive integer not less than 2, and the initial correct hypernyms of the training sample refer to the initial correct hypernyms in all the initial sample hypernym sets. Among the a initial correct hypernyms, at least one of the a initial correct hypernyms can be determined to be a wrong hypernym based on the negative sampling probabilities of the a initial correct hypernyms. For example, according to the magnitude of the negative sampling probability, the initial correct hypernym with the negative sampling probability greater than the set threshold may be selected as the error hypernym corresponding to the training sample.
In this scheme, the number of correct hypernyms and the number of incorrect hypernyms corresponding to the training sample may be the same or different.
And the second method comprises the following steps: and replacing at least one initial correct hypernym in the initial sample hypernym set corresponding to the training sample with at least one error hypernym based on the negative sampling probability of each initial correct hypernym corresponding to the training sample.
As an example, the training sample corresponds to b initial correct hypernyms, where b is a positive integer not less than 2, and the initial correct hypernyms of the training sample refer to the initial correct hypernyms in all the initial sample hypernym sets. In the b initial correct hypernyms, at least one initial correct hypernym in the initial sample hypernym set corresponding to the training sample may be replaced with at least one wrong hypernym based on the negative sampling probability of the b initial correct hypernyms.
In an alternative aspect of the present application, replacing at least one initial correct hypernym in the initial sample hypernym set corresponding to the training sample with at least one wrong hypernym based on the negative sampling probability of each initial correct hypernym corresponding to the training sample, includes:
based on the negative sampling probability of each initial correct hypernym corresponding to the training sample, the set number of initial correct hypernyms are replaced by the wrong hypernyms, and the probability of the initial sample hypernym set corresponding to the training sample is higher.
And setting the number to be not more than the number of the hypernyms in the sample in the initial sample hypernym set. The set number may be determined based on a set ratio, which may be configured based on actual demand. As an example, for example, if the ratio is set to 80%, the number of the sample hypernyms in the initial sample hypernym set is 10, and all of the 10 samples are initial correct hypernyms, then based on the negative sampling probability of each initial correct hypernym corresponding to the training sample, 8 initial correct hypernyms with higher probabilities may be replaced with the wrong hypernym corresponding to the training sample, and the remaining 2 initial correct hypernyms may be used as the correct hypernyms corresponding to the training sample.
The initial correct hypernyms in the initial sample hypernym set are replaced in a set number mode, so that the difference between the error hypernyms corresponding to the training samples can be ensured, namely the error hypernyms corresponding to the training samples are different, and the generalization capability of the model is improved.
In this scheme, the number of correct hypernyms and the number of incorrect hypernyms corresponding to the training sample may be the same or different.
And thirdly, determining at least one of the initial correct hypernyms corresponding to the training sample as at least one error hypernym corresponding to the training sample based on the negative sampling probability of each initial correct hypernym corresponding to the training sample, and replacing at least one initial correct hypernym in the initial sample hypernym set corresponding to the training sample with at least one error hypernym.
As an example, the training sample corresponds to c initial correct hypernyms, where c is a positive integer not less than 2, and the initial correct hypernyms of the training sample refer to the initial correct hypernyms in all the initial sample hypernym sets. In the c initial correct hypernyms, at least one initial correct hypernym in the initial sample hypernym set corresponding to the training sample may be replaced with at least one error hypernym based on the negative sampling probability of the c initial correct hypernyms, and at the same time, at least one initial correct hypernym is determined from the c initial correct hypernyms as an error hypernym.
At this time, if the number of correct hypernyms and the number of wrong hypernyms corresponding to the training sample are the same, the sum of the number of wrong hypernyms determined by the replacement method and the number of wrong hypernyms determined by the non-replacement method is equal to the number of correct hypernyms.
Also in this example, the initial correct hypernym having a negative sampling probability greater than the set threshold may be selected as the wrong hypernym by the magnitude of the negative sampling probability, and the initial correct hypernym having a negative sampling probability greater than the set threshold may be replaced with the wrong hypernym.
In an alternative of the present application, for any training sample, if there are at least two same probabilities in the negative sampling probabilities of each initial correct hypernym, and the negative sampling probability is greater than a set threshold;
determining at least one error hypernym corresponding to the training sample based on the negative sampling probability of each initial correct hypernym corresponding to the training sample, including:
and determining at least one of the initial correct hypernyms corresponding to the at least two same probabilities as at least one error hypernym.
Wherein, in the negative sampling probability of each initial correct hypernym, if the same negative sampling probability exists, it indicates that the times of occurrence of at least two initial correct hypernyms are the same in all the initial sample hypernym sets. When the same probability is greater than the set threshold, it indicates that at least one wrong hypernym needs to be determined from the initial correct hypernyms corresponding to the same probability, and at this time, based on the negative sampling probability of each initial correct hypernym corresponding to the training sample, the determined at least one wrong hypernym corresponding to the training sample includes the initial correct hypernyms corresponding to the at least two same probabilities.
It can be understood that, if there are at least two same probabilities in the negative sampling probabilities of the initial correct hypernyms for any training sample, and the negative sampling probability is not greater than the set threshold; the erroneous hypernym is not determined from the at least two initial correct hypernyms corresponding to the same probability.
At least two same probabilities exist in the negative sampling probabilities of the initial correct hypernyms, and when the negative sampling probability is greater than a set threshold, the wrong hypernyms can be determined from the initial correct hypernyms corresponding to the at least two same probabilities with a certain probability, so that the difference between the wrong hypernyms corresponding to the at least two training samples can be ensured under the condition that the wrong hypernyms corresponding to the at least two training samples are determined based on the initial correct hypernyms corresponding to the at least two same probabilities.
It can be understood that, for any training sample, if there are at least two same probabilities in the negative sampling probabilities of the initial correct hypernyms, and the corresponding scheme is applicable to any scheme for determining the hypernyms of the samples of the training sample when the negative sampling probability is greater than the set threshold.
With reference to fig. 2 and the following specific examples, the training process of the hypernym determination model of the present application is described in detail, and the specific scheme is as follows:
step 1, constructing training samples, wherein one training sample comprises a sample hyponym, a sample description text corresponding to the sample hyponym, and each sample hypernym corresponding to the sample hyponym.
In constructing the training samples, one sample hyponym may correspond to a plurality of sample hypernyms, in this example, one sample hyponym corresponds to k sample hypernyms, and each sample hypernym includes at least one correct hypernym and at least one incorrect hypernym.
Step 2, after the training samples are constructed, the Text features of the sample description texts corresponding to the hyponyms of the samples can be extracted through the Text feature extraction method described above, in this example, as shown in fig. 2, the sample description texts can be subjected to word segmentation and word embedding processing, the processed word segmentation Text features are extracted through a Text classification model TextCNN (Convolutional Neural Networks), and the extracted Text features are used as the bottom layer features of a subsequent training hypernym prediction model and a semantic matching result prediction model.
And 3, for the sample hypernyms (the hypernyms shown in fig. 2) in the training samples, marking a first label (the hypernym shown in fig. 2) on each sample hypernym, wherein the first label represents that the sample hypernym is the hypernym marking result of the hypernym corresponding to the sample hyponym.
Step 4, training a hypernym prediction model (the multi-label classification task shown in fig. 2) of the initial neural network model based on the text features in the step 2 until a total loss function of the initial neural network model converges, wherein the total loss function comprises a first loss function (cross entropy loss shown in fig. 2), and the hypernym prediction model at the end of training is used as a hypernym determination model; in this example, the output of the hypernym prediction model is that the hyponym of the training sample corresponds to the hypernym prediction result of each candidate hypernym (all sample hypernyms), and the value of the first loss function represents the difference between the hypernym labeling result and the hypernym prediction result corresponding to the sample hyponym.
And 5, labeling the sample hypernym with a second label, wherein the second label represents a semantic matching labeling result of the sample hypernym and the sample description text, and the initial neural network model further comprises a semantic matching result prediction model.
For each sample hypernym in a training sample, performing word segmentation and word embedding processing on a hypernym text (a section of text obtained by splicing the hypernyms of the samples) corresponding to the hypernym of each sample, and extracting the feature (second feature) of the processed word segmentation text through a text classification model TextCNN; the text features extracted in the step 2 are used as first features, based on the first features and the second features, the association features between the sample description text and the hypernyms of the samples are determined through the feature interaction layer, a semantic matching result prediction model (a semantic matching task shown in fig. 2) is trained based on the association features, the output of the semantic matching result prediction model is a semantic matching prediction result (a semantic matching result between the sample description text in one training sample and the hypernyms of the samples in the training sample) of the training sample description text and each candidate hypernym, the total loss function further includes a second loss function (semantic similarity loss), and the value of the second loss function characterizes the difference between the semantic matching labeling result and the semantic matching prediction result corresponding to the sample description text.
It should be noted that the execution sequence of the above steps 4 and 5 is not limited, and the configuration may be performed based on actual requirements, for example, the steps 4 and 5 may be executed simultaneously to accelerate the model training speed.
In this example, the two models (the hypernym prediction model and the semantic matching result prediction model) may be alternately trained in an alternating training manner, and specifically, each task training may be iteratively trained 50 times to obtain a trained hypernym determination model.
Based on the same principle as the method shown in fig. 1, the embodiment of the present application further provides an hypernym determining apparatus 20 for a hyponym, as shown in fig. 3, the hypernym determining apparatus 20 for a hyponym may include an information to be processed obtaining module 210, an information processing module 220, and a target hypernym determining module 230, where:
a to-be-processed information obtaining module 210, configured to obtain a description text corresponding to a to-be-processed hyponym;
the information processing module 220 is configured to determine, based on the description text corresponding to the to-be-processed hyponym, a probability that the to-be-processed hyponym corresponds to each candidate hypernym in the hypernym set;
and a target hypernym determining module 230, configured to determine, based on the probability corresponding to each candidate hypernym, a target hypernym corresponding to the to-be-processed hyponym from each candidate hypernym.
According to the scheme of the embodiment of the application, when the hypernym of the hyponym to be processed needs to be determined, the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set can be determined based on the description text corresponding to the hyponym to be processed, wherein the description text can determine the hypernym of the hyponym to be processed in each candidate hypernym based on the probability corresponding to each candidate hypernym without meeting any set rule.
Optionally, the apparatus further comprises:
the to-be-processed hyponym acquisition module is used for acquiring to-be-processed hyponyms;
when determining, based on the description text corresponding to the hyponym to be processed, the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set, the information processing module 220 is specifically configured to:
and determining the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set based on the hyponym to be processed and the description text corresponding to the hyponym to be processed.
Optionally, when determining, based on the description text corresponding to the hyponym to be processed, the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set, the information processing module 220 is specifically configured to:
performing word segmentation processing on the description text to obtain each word segmentation corresponding to the description text;
extracting features of at least two scales based on the word vector of each participle;
and splicing the features of all scales to obtain the text features corresponding to the description text.
Optionally, when determining, based on the description text corresponding to the hyponym to be processed, the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set, the information processing module 220 is specifically configured to:
determining the probability of the to-be-processed hyponym corresponding to each candidate hypernym in the hypernym set through a hypernym determination model based on the description text corresponding to the to-be-processed hyponym;
the device also comprises a model training module, wherein the model training module is used for training the hypernym determination model, and the hypernym determination model is obtained by training in the following way:
acquiring training samples, wherein the training samples comprise sample description texts corresponding to sample hyponyms and sample hypernyms corresponding to the sample hyponyms, each sample hypernym comprises at least one correct hypernym and at least one error hyponym, each sample hypernym is marked with a first label, the first label represents that the sample hypernym is a hypernym marking result of the hypernym corresponding to the sample hyponym, and the hypernym set comprises each sample hypernym;
training an hypernym prediction model of the initial neural network model based on a sample description text corresponding to a sample hyponym of each training sample until a total loss function of the initial neural network model converges, wherein the total loss function comprises a first loss function, and the hypernym prediction model at the end of training is used as a hypernym determination model;
the output of the hypernym prediction model is that the sample description text of the training sample corresponds to the hypernym prediction result of each candidate hypernym, and the value of the first loss function represents the difference between the hypernym labeling result and the hypernym prediction result corresponding to the sample description text.
Optionally, the training samples further include sample hyponyms, and the model training module is specifically configured to, when training the hypernym prediction model of the initial neural network model based on the sample description text corresponding to the sample hyponyms of the training samples:
and training the hypernym prediction model of the initial neural network model based on the sample hyponyms of the training samples and the sample description texts corresponding to the sample hyponyms.
Optionally, for a training sample, the sample hypernym is further labeled with a second label, the second label represents a semantic matching labeling result of the sample hypernym and the sample description text, and the initial neural network model further includes a semantic matching result prediction model;
when the model training module trains the hypernym prediction model of the initial neural network model based on the sample description text corresponding to the hyponym of each training sample, the model training module is specifically configured to:
training an hypernym prediction model of the initial neural network model based on a sample description text corresponding to a sample hyponym of each training sample;
training a semantic matching result prediction model based on each training sample;
the output of the semantic matching result prediction model is a semantic matching prediction result of a sample description text of a training sample and each candidate hypernym, the total loss function further comprises a second loss function, and the value of the second loss function represents the difference between a semantic matching labeling result and the semantic matching prediction result corresponding to the sample description text.
Optionally, the hypernym prediction model and the semantic matching result prediction model are trained in an alternative training mode.
Optionally, when the model training module trains the semantic matching result prediction model based on each training sample, the model training module is specifically configured to:
for any training sample, respectively extracting a first feature of a sample description text in the training sample and a second feature corresponding to a hypernym of each sample in the training sample;
and determining the associated features between the hyponyms of the samples and the hypernyms of the samples through a semantic matching result prediction model based on the first features and the second features, and obtaining the semantic matching prediction result corresponding to the training sample based on the associated features.
Optionally, the apparatus further includes a sample hypernym determining module, configured to determine each sample hypernym of the training sample, where each sample hypernym of the training sample is determined in the following manner:
acquiring an initial sample hypernym set corresponding to each training sample, wherein the initial sample hypernym set comprises at least two initial correct hypernyms;
for each initial correct hypernym, determining the negative sampling probability of the initial correct hypernym based on the number of times that the initial correct hypernym appears in all initial sample hypernym sets;
for any training sample, determining at least one error hypernym corresponding to the training sample based on the negative sampling probability of each initial correct hypernym corresponding to the training sample;
and taking other initial correct hypernyms except the initial correct hypernym corresponding to at least one error hypernym in the initial sample hypernym set corresponding to the training sample as the correct hypernym corresponding to the training sample.
Optionally, the sample hypernym determining module is specifically configured to, when determining at least one wrong hypernym corresponding to the training sample based on the negative sampling probability of each initial correct hypernym corresponding to the training sample:
determining at least one of the initial correct hypernyms corresponding to the training sample as at least one error hypernym corresponding to the training sample based on the negative sampling probability of each initial correct hypernym corresponding to the training sample, and/or replacing at least one initial correct hypernym in the initial sample hypernym set corresponding to the training sample with at least one error hypernym based on the negative sampling probability of each initial correct hypernym corresponding to the training sample.
Optionally, the sample hypernym determining module is specifically configured to, when replacing at least one initial correct hypernym in the initial sample hypernym set corresponding to the training sample with at least one wrong hypernym based on the negative sampling probability of each initial correct hypernym corresponding to the training sample:
based on the negative sampling probability of each initial correct hypernym corresponding to the training sample, the set number of initial correct hypernyms are replaced by the wrong hypernyms, and the probability of the initial sample hypernym set corresponding to the training sample is higher.
Optionally, for any training sample, if at least two identical probabilities exist in the negative sampling probability of each initial correct hypernym, and the negative sampling probability is greater than a set threshold;
the sample hypernym determination module is specifically configured to, when determining at least one erroneous hypernym corresponding to the training sample based on the negative sampling probability of each initial correct hypernym corresponding to the training sample:
and determining at least one of the initial correct hypernyms corresponding to the at least two same probabilities as at least one error hypernym.
Optionally, for a training sample, the number of erroneous hypernyms is the same as the number of correct hypernyms.
Since the hypernym determining apparatus for a hyponym provided in the embodiment of the present application is an apparatus capable of executing the hypernym determining method for a hyponym in the embodiment of the present application, based on the hypernym determining method for a hyponym provided in the embodiment of the present application, a person skilled in the art can understand a specific implementation manner and various variations of the hypernym determining apparatus for a hyponym in the embodiment of the present application, and therefore, how to implement the hypernym determining method for a hyponym in the embodiment of the present application by the hypernym determining apparatus for a hyponym is not described in detail here. The hypernym determination device for the hyponym, which is adopted by the person skilled in the art to implement the hypernym determination method in the embodiment of the present application, is within the scope of the present application.
Based on the same principle as the hypernym determination method and the hypernym determination device for the hyponym provided in the embodiments of the present application, an embodiment of the present application also provides an electronic device, which may include a processor and a memory. The memory stores therein readable instructions, which when loaded and executed by the processor, may implement the method shown in any of the embodiments of the present application.
As an example, fig. 4 shows a schematic structural diagram of an electronic device 4000 to which the solution of the embodiment of the present application is applied, and as shown in fig. 4, the electronic device 4000 includes: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application specific integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (extended industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.
The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically erasable programmable Read Only Memory), a CD-ROM (Compact Read Only Memory) or other optical disk storage, optical disk storage (including Compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to.
The memory 4003 is used for storing application codes for executing the scheme of the present application, and the execution is controlled by the processor 4001. Processor 4001 is configured to execute application code stored in memory 4003 to implement what is shown in any of the foregoing method embodiments.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (15)

1. A hypernym determination method for a hyponym, comprising:
obtaining a description text corresponding to a hyponym to be processed;
determining the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set based on the description text corresponding to the hyponym to be processed;
and determining a target hypernym corresponding to the hyponym to be processed from each candidate hypernym based on the corresponding probability of each candidate hypernym.
2. The method of claim 1, further comprising:
acquiring the hyponym to be processed;
the determining the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set based on the description text corresponding to the hyponym to be processed includes:
and determining the probability that the to-be-processed hyponym corresponds to each candidate hypernym in the hypernym set based on the to-be-processed hyponym and the description text corresponding to the to-be-processed hyponym.
3. The method according to claim 1, wherein the determining the probability that the to-be-processed hyponym corresponds to each candidate hypernym in the hypernym set based on the description text corresponding to the to-be-processed hyponym comprises:
performing word segmentation processing on the description text to obtain each word segmentation corresponding to the description text;
extracting features of at least two scales based on the word vector of each word segmentation;
and splicing the features of all the scales to obtain the text features corresponding to the description text.
4. The method according to any one of claims 1 to 3, wherein the determining the probability that the to-be-processed hyponym corresponds to each candidate hypernym in the hypernym set based on the description text corresponding to the to-be-processed hyponym comprises:
determining the probability that the hyponym to be processed corresponds to each candidate hypernym in the hypernym set through a hypernym determination model based on the description text corresponding to the hyponym to be processed;
wherein the hypernym determination model is trained on the basis of the following modes:
obtaining training samples, wherein the training samples comprise sample description texts corresponding to sample hyponyms and sample hypernyms corresponding to the sample hyponyms, each sample hypernym comprises at least one correct hypernym and at least one error hyponym, each sample hypernym is marked with a first label, the first label represents that the sample hypernym is a hypernym marking result of the hypernym corresponding to the sample hyponym, and the hypernym set comprises the sample hypernyms;
training an hypernym prediction model of an initial neural network model based on a sample description text corresponding to a sample hyponym of each training sample until a total loss function of the initial neural network model converges, wherein the total loss function comprises a first loss function, and the hypernym prediction model at the end of training is used as the hypernym determination model;
the output of the hypernym prediction model is that the sample description text of the training sample corresponds to the hypernym prediction result of each candidate hypernym, and the value of the first loss function represents the difference between the hypernym labeling result and the hypernym prediction result corresponding to the sample description text.
5. The method of claim 4, wherein the training samples further include sample hyponyms, and the training of the hypernym prediction model of the initial neural network model based on the sample description texts corresponding to the sample hyponyms of the training samples comprises:
and training an hypernym prediction model of the initial neural network model based on the sample hyponyms of the training samples and the sample description texts corresponding to the sample hyponyms.
6. The method of claim 4, wherein for one of the training samples, the sample hypernym is further labeled with a second label, the second label characterizes a semantic matching labeling result of the sample hypernym and the sample description text, and the initial neural network model further comprises a semantic matching result prediction model;
training an hypernym prediction model of an initial neural network model based on a sample description text corresponding to a sample hyponym of each training sample, including:
training an hypernym prediction model of the initial neural network model based on a sample description text corresponding to a sample hyponym of each training sample;
training the semantic matching result prediction model based on each training sample;
the output of the semantic matching result prediction model is the semantic matching prediction result of the sample description text of the training sample and each candidate hypernym, the total loss function further comprises a second loss function, and the value of the second loss function represents the difference between the semantic matching labeling result and the semantic matching prediction result corresponding to the sample description text.
7. The method of claim 6, wherein training the semantic matching result prediction model based on each of the training samples comprises:
for any training sample, respectively extracting a first feature of a sample description text in the training sample and a second feature corresponding to a hypernym of each sample in the training sample;
and determining the associated features between the sample hyponyms and the hypernyms of the samples through the semantic matching result prediction model based on the first features and the second features, and obtaining the semantic matching prediction result corresponding to the training sample based on the associated features.
8. The method of claim 6, wherein the hypernym prediction model and the semantic matching result prediction model are trained in an alternating training manner.
9. The method of claim 4, wherein each sample hypernym of the training samples is determined by:
acquiring an initial sample hypernym set corresponding to each training sample, wherein the initial sample hypernym set comprises at least two initial correct hypernyms;
for each initial correct hypernym, determining the negative sampling probability of the initial correct hypernym based on the number of times that the initial correct hypernym appears in all initial sample hypernym sets;
for any training sample, determining at least one error hypernym corresponding to the training sample based on the negative sampling probability of each initial correct hypernym corresponding to the training sample;
and taking other initial correct hypernyms except the initial correct hypernym corresponding to the at least one error hypernym in the initial sample hypernym set corresponding to the training sample as the correct hypernym corresponding to the training sample.
10. The method of claim 9, wherein determining at least one error hypernym corresponding to the training sample based on the negative sampling probability of each initial correct hypernym corresponding to the training sample comprises:
determining at least one of the initial correct hypernyms corresponding to the training sample as at least one error hypernym corresponding to the training sample based on the negative sampling probability of each initial correct hypernym corresponding to the training sample, and/or replacing at least one initial correct hypernym in the initial sample hypernym set corresponding to the training sample with the at least one error hypernym based on the negative sampling probability of each initial correct hypernym corresponding to the training sample.
11. The method according to claim 10, wherein replacing at least one initial correct hypernym in the initial sample hypernym set corresponding to the training sample with the at least one wrong hypernym based on the negative sampling probability of each initial correct hypernym corresponding to the training sample comprises:
based on the negative sampling probability of each initial correct hypernym corresponding to the training sample, the set number of initial correct hypernyms are replaced by the wrong hypernyms, and the probability of the initial sample hypernym set corresponding to the training sample is higher.
12. The method according to claim 9, wherein for any of the training samples, if there are at least two same probabilities in the negative sampling probabilities of the initial correct hypernyms, the negative sampling probability is greater than a set threshold;
the determining at least one error hypernym corresponding to the training sample based on the negative sampling probability of each initial correct hypernym corresponding to the training sample includes:
and determining at least one of the initial correct hypernyms corresponding to the at least two same probabilities as the at least one error hypernym.
13. The method of claim 4, wherein the number of erroneous hypernyms is the same as the number of correct hypernyms for one of the training samples.
14. An hypernym determination apparatus for a hyponym, comprising:
the information processing device comprises a to-be-processed information acquisition module, a processing module and a processing module, wherein the to-be-processed information acquisition module is used for acquiring a description text corresponding to a to-be-processed hyponym;
the information processing module is used for determining the probability that the to-be-processed hyponym corresponds to each candidate hypernym in the hypernym set based on the description text corresponding to the to-be-processed hyponym;
and the target hypernym determining module is used for determining the target hypernym corresponding to the hyponym to be processed from each candidate hypernym based on the corresponding probability of each candidate hypernym.
15. An electronic device comprising a memory and a processor;
the memory has stored therein a computer program;
the processor for executing the computer program to implement the method of any one of claims 1 to 13.
CN202010430946.7A 2020-05-20 2020-05-20 Hypernym determination method and device for hyponym, electronic device and storage medium Pending CN111611796A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010430946.7A CN111611796A (en) 2020-05-20 2020-05-20 Hypernym determination method and device for hyponym, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010430946.7A CN111611796A (en) 2020-05-20 2020-05-20 Hypernym determination method and device for hyponym, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN111611796A true CN111611796A (en) 2020-09-01

Family

ID=72205033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010430946.7A Pending CN111611796A (en) 2020-05-20 2020-05-20 Hypernym determination method and device for hyponym, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN111611796A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114020880A (en) * 2022-01-06 2022-02-08 杭州费尔斯通科技有限公司 Method, system, electronic device and storage medium for extracting hypernym
CN115879468A (en) * 2022-12-30 2023-03-31 北京百度网讯科技有限公司 Text element extraction method, device and equipment based on natural language understanding

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114020880A (en) * 2022-01-06 2022-02-08 杭州费尔斯通科技有限公司 Method, system, electronic device and storage medium for extracting hypernym
CN115879468A (en) * 2022-12-30 2023-03-31 北京百度网讯科技有限公司 Text element extraction method, device and equipment based on natural language understanding
CN115879468B (en) * 2022-12-30 2023-11-14 北京百度网讯科技有限公司 Text element extraction method, device and equipment based on natural language understanding

Similar Documents

Publication Publication Date Title
CN111737476B (en) Text processing method and device, computer readable storage medium and electronic equipment
CN112966074B (en) Emotion analysis method and device, electronic equipment and storage medium
US20190130249A1 (en) Sequence-to-sequence prediction using a neural network model
CN112183577A (en) Training method of semi-supervised learning model, image processing method and equipment
CN112084331A (en) Text processing method, text processing device, model training method, model training device, computer equipment and storage medium
CN113095415B (en) Cross-modal hashing method and system based on multi-modal attention mechanism
CN111738001B (en) Training method of synonym recognition model, synonym determination method and equipment
CN112711953A (en) Text multi-label classification method and system based on attention mechanism and GCN
CN110457718B (en) Text generation method and device, computer equipment and storage medium
CN111782826A (en) Knowledge graph information processing method, device, equipment and storage medium
US11900250B2 (en) Deep learning model for learning program embeddings
CN111881671B (en) Attribute word extraction method
CN112100377B (en) Text classification method, apparatus, computer device and storage medium
CN110245353B (en) Natural language expression method, device, equipment and storage medium
CN116304748B (en) Text similarity calculation method, system, equipment and medium
CN114358188A (en) Feature extraction model processing method, feature extraction model processing device, sample retrieval method, sample retrieval device and computer equipment
CN114358203A (en) Training method and device for image description sentence generation module and electronic equipment
US11948078B2 (en) Joint representation learning from images and text
CN111611796A (en) Hypernym determination method and device for hyponym, electronic device and storage medium
CN111241271B (en) Text emotion classification method and device and electronic equipment
KR102448044B1 (en) Aspect based sentiment analysis method using aspect map and electronic device
CN116955644A (en) Knowledge fusion method, system and storage medium based on knowledge graph
CN116306606A (en) Financial contract term extraction method and system based on incremental learning
CN116089605A (en) Text emotion analysis method based on transfer learning and improved word bag model
CN115359296A (en) Image recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40028898

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination