CN111310475B - Training method and device of word sense disambiguation model - Google Patents

Training method and device of word sense disambiguation model Download PDF

Info

Publication number
CN111310475B
CN111310475B CN202010079725.XA CN202010079725A CN111310475B CN 111310475 B CN111310475 B CN 111310475B CN 202010079725 A CN202010079725 A CN 202010079725A CN 111310475 B CN111310475 B CN 111310475B
Authority
CN
China
Prior art keywords
word
text
node
semantic
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010079725.XA
Other languages
Chinese (zh)
Other versions
CN111310475A (en
Inventor
钱隽夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010079725.XA priority Critical patent/CN111310475B/en
Publication of CN111310475A publication Critical patent/CN111310475A/en
Application granted granted Critical
Publication of CN111310475B publication Critical patent/CN111310475B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the specification provides a training method and a device of a word sense disambiguation model. A first word is selected from the training text. Positive and negative examples corresponding to the first word are obtained. And calculating the similarity between each word in the training text and each word represented by each node in each semantic association diagram, and selecting a target association diagram based on the similarity. And determining the semantic vector of the first word based on the target association graph, and determining the word vectors of other words based on the word co-occurrence graph. And encoding by using an encoder based on the determined semantic vector and the word vector. Based on the word co-occurrence map, word vectors for words in the two samples are determined. And encoding by using an encoder according to the determined word vector. Based on the encoding result, a first text distance between the training text and the positive example sample is calculated, and a second text distance between the training text and the negative example sample is calculated. And training the encoder by taking the first text distance smaller than the second text distance as a target.

Description

Training method and device of word sense disambiguation model
Technical Field
One or more embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for training a word sense disambiguation model.
Background
Word sense disambiguation refers to automatically determining the current meaning of a word that is ambiguous based on the context of the word. In the conventional art, word sense disambiguation is generally performed based on a supervised learning method. For example, based on the context C, the posterior probability P (s _ i | C) of each sense s _ i of the word to be disambiguated is obtained by a supervised learning method. The meaning of the maximum a posteriori probability s _ k = argmax P (s _ i | C) is taken as the meaning determined after disambiguation.
However, when word sense disambiguation is performed by this method, the disambiguation result is often not accurate enough. Therefore, there is a need to provide a more accurate word sense disambiguation method.
Disclosure of Invention
One or more embodiments of the present disclosure describe a method and an apparatus for training a word sense disambiguation model, where the trained word sense disambiguation model can more accurately implement word sense disambiguation.
In a first aspect, a method for training a word sense disambiguation model is provided, including:
acquiring a word co-occurrence graph and a plurality of semantic association graphs; the word co-occurrence graph is constructed based on the co-occurrence relation among words in the text corpus, wherein each node represents one word and corresponds to a word vector, and the word vector is used for representing the average word meaning of the corresponding word; the semantic association graph is obtained by segmenting the word co-occurrence graph by adopting a graph segmentation algorithm, wherein each node represents a single word meaning of a word and corresponds to a first semantic vector;
selecting a first word with word ambiguity from a training text;
acquiring a first explanation text and a second explanation text of the first word; wherein the first interpretation text is used for interpreting the word sense of the first word corresponding to the training text, and the second interpretation text is used for interpreting other word senses of the first word;
for the training text, calculating the similarity between each word in the training text and the word represented by each node in each semantic association diagram, and selecting a target association diagram from the plurality of semantic association diagrams based on the similarity;
determining semantic vectors of the first words at least based on first semantic vectors corresponding to the nodes in the target association diagram, and determining word vectors of other words based on word vectors corresponding to the nodes in the word co-occurrence diagram; encoding the training text by using the encoder according to the semantic vector of the first word and the word vectors of other words;
respectively determining word vectors of words in the first interpretation text and the second interpretation text based on the word vectors corresponding to the nodes in the word co-occurrence graph; respectively encoding the first interpretation text and the second interpretation text by using the encoder according to the word vectors of the words in the first interpretation text and the second interpretation text;
calculating a first text distance between the training text and the first interpretation text and calculating a second text distance between the training text and the second interpretation text based on the encoding result;
and training the encoder by taking the first text distance smaller than the second text distance as a target.
In a second aspect, a word sense disambiguation method is provided, comprising:
acquiring a word co-occurrence graph and a plurality of semantic association graphs; the word co-occurrence graph is constructed based on the co-occurrence relation among words in the text corpus, wherein each node represents one word and corresponds to a word vector, and the word vector is used for representing the average word meaning of the corresponding word; the semantic association graph is obtained by segmenting the word co-occurrence graph by adopting a graph segmentation algorithm, wherein each node represents a single word meaning of a word and corresponds to a semantic vector;
acquiring a text to be disambiguated, and selecting a first word with one word and multiple meanings from the text to be disambiguated;
acquiring a plurality of interpretation texts of the first word, wherein each interpretation text is used for interpreting one word sense in a plurality of word senses of the first word;
for the text to be disambiguated, determining word vectors of words in the text to be disambiguated based on the word vectors corresponding to the nodes in the word co-occurrence graph; coding the text to be disambiguated by utilizing a coder in a pre-trained word sense disambiguation model according to the word vector of each word in the text to be disambiguated;
for each interpretation text, calculating the similarity between each word in the interpretation text and the word represented by each node in each semantic association diagram, and selecting a target association diagram from the plurality of semantic association diagrams based on the similarity;
determining semantic vectors of words in the explanation text based on the semantic vectors corresponding to the nodes in the target association diagram; coding the explanation text by utilizing the coder according to the semantic vector of each word in the explanation text;
determining a text distance between the text to be disambiguated and the plurality of interpreted texts based on the encoding result;
selecting a target text with the minimum corresponding text distance from the plurality of interpretation texts;
determining a word sense of the first word in the text to be disambiguated based on the target text.
In a third aspect, a training apparatus for word sense disambiguation model is provided, including:
the acquisition unit is used for acquiring a word co-occurrence graph and a plurality of semantic association graphs; the word co-occurrence graph is constructed based on the co-occurrence relation among words in the text corpus, wherein each node represents one word and corresponds to a word vector, and the word vector is used for representing the average word meaning of the corresponding word; the semantic association graph is obtained by segmenting the word co-occurrence graph by adopting a graph segmentation algorithm, wherein each node represents a single word meaning of a word and corresponds to a first semantic vector;
the selecting unit is used for selecting a first word with word ambiguity from the training text;
the obtaining unit is further configured to obtain a first interpretation text and a second interpretation text of the first word selected by the selecting unit; wherein the first interpretation text is used for interpreting the word sense of the first word corresponding to the training text, and the second interpretation text is used for interpreting other word senses of the first word;
the calculation unit is used for calculating the similarity between each word in the training text and the word represented by each node in each semantic association diagram for the training text, and selecting a target association diagram from the plurality of semantic association diagrams based on the similarity;
the determining unit is used for determining the semantic vector of the first word at least based on the first semantic vector corresponding to each node in the target association diagram, and determining the word vectors of other words based on the word vectors corresponding to each node in the word co-occurrence diagram; encoding the training text by using the encoder according to the semantic vector of the first word and the word vectors of other words;
the determining unit is further configured to determine word vectors of words in the first interpretation text and the second interpretation text respectively based on the word vectors corresponding to the nodes in the word co-occurrence graph; respectively encoding the first interpretation text and the second interpretation text by using the encoder according to the word vectors of the words in the first interpretation text and the second interpretation text;
the calculating unit is further configured to calculate a first text distance between the training text and the first interpretation text and calculate a second text distance between the training text and the second interpretation text based on the encoding result determined by the determining unit;
and the training unit is used for training the encoder by taking the first text distance calculated by the calculating unit as a target, wherein the first text distance is smaller than the second text distance.
In a fourth aspect, a word sense disambiguation apparatus is provided, comprising:
the acquisition unit is used for acquiring a word co-occurrence graph and a plurality of semantic association graphs; the word co-occurrence graph is constructed based on the co-occurrence relation among words in the text corpus, wherein each node represents one word and corresponds to a word vector, and the word vector is used for representing the average word meaning of the corresponding word; the semantic association graph is obtained by segmenting the word co-occurrence graph by adopting a graph segmentation algorithm, wherein each node represents a single word meaning of a word and corresponds to a semantic vector;
the acquiring unit is further used for acquiring a text to be disambiguated and selecting a first word with one word ambiguity from the text to be disambiguated;
the obtaining unit is further configured to obtain a plurality of interpretation texts of the first word, where each interpretation text is used to interpret one of a plurality of word senses of the first word;
a determining unit, configured to determine, for the text to be disambiguated obtained by the obtaining unit, word vectors of words in the text to be disambiguated based on word vectors corresponding to nodes in the word co-occurrence graph; coding the text to be disambiguated by utilizing a coder in a pre-trained word sense disambiguation model according to the word vector of each word in the text to be disambiguated;
the calculation unit is used for calculating the similarity between each term in the interpretation text and each term represented by each node in each semantic association diagram for each interpretation text acquired by the acquisition unit, and selecting a target association diagram from the plurality of semantic association diagrams based on the similarity;
the determining unit is further configured to determine a semantic vector of each word in the interpretation text based on the semantic vector corresponding to each node in the target association map; coding the explanation text by utilizing the coder according to the semantic vector of each word in the explanation text;
the determining unit is further used for determining text distances between the text to be disambiguated and the plurality of explanatory texts based on the encoding result;
the selecting unit is used for selecting a target text with the minimum corresponding text distance from the plurality of interpretation texts;
the determining unit is further configured to determine a word sense of the first word in the text to be disambiguated based on the target text selected by the selecting unit.
In a fifth aspect, there is provided a computer storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first or second aspect.
In a sixth aspect, there is provided a computing device comprising a memory having stored therein executable code, and a processor that when executing the executable code, implements the method of the first or second aspect.
According to the training method and device for the word sense disambiguation model provided by one or more embodiments of the present specification, the training text may be encoded by using an encoder according to a semantic vector of a target word with multiple senses and a word vector of another word. And because the semantic vector of the target word is used for representing the single word meaning, when the training text is coded based on the semantic vector, the coding result of the training text can be greatly improved, and the accuracy of the expression of the training text can also be improved. In addition, under the condition that the training text can be accurately expressed, the trained word sense disambiguation model can be more accurate, and further the word sense disambiguation can be more accurately realized.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating an embodiment of the present disclosure;
FIG. 2 is a flowchart of a method for training a word sense disambiguation model provided in an embodiment of the present description;
FIG. 3 is a schematic representation of the co-occurrence of words provided by the present specification;
FIG. 4a is an initial segmentation pictorial illustration of a word co-occurrence provided in the present specification;
FIG. 4b is a schematic diagram of a semantic association graph provided herein;
FIG. 4c is a second semantic association diagram provided in the present specification;
FIG. 5 is a flow diagram of a word sense disambiguation method provided by one embodiment of the present description;
FIG. 6 is a diagram illustrating an apparatus for training a word sense disambiguation model according to an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a word sense disambiguation apparatus according to an embodiment of the present disclosure.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
As described in the background, when word sense disambiguation is performed based on a supervised learning method, the disambiguation result is often not accurate enough. To improve the accuracy of the disambiguation result, those skilled in the art have also tried to perform word sense disambiguation using two methods: a) Word sense disambiguation is performed based on word sense definitions on the dictionary. Specifically, if the word sense s _ i of w in the dictionary contains the word e, then if e also appears in a sentence containing w, then the word sense of w in the sentence is considered to be s _ i. However, this method strongly depends on the definition of the dictionary, and if there is no e in the sentence and only synonyms of e appear, the result cannot be obtained. b) Word sense disambiguation is performed based on topic. Specifically, for each word sense of each word, its corresponding topic or category (e.g., "tennis" corresponds to a topic of "sports") may be manually defined, and multiple word senses correspond to multiple topics. According to the theme of the vocabulary in the context C of w, the theme with the highest frequency is selected as the current theme of w, and after the theme of w is determined, the corresponding word meaning can be determined. However, since the definition of a topic relies on a lot of manual work, if there are some new words in the context, there is less topic information available in the context and thus accurate disambiguation is not possible either.
In summary, the conventional word sense disambiguation methods are all low in accuracy, and therefore the inventor of the present application proposes a word sense disambiguation method based on a word sense disambiguation model. In the method, the word sense disambiguation model may be trained first, and then word sense disambiguation may be performed based on the trained word sense disambiguation model.
The training process of the word sense disambiguation model, which may be as shown in fig. 1, may include the following steps:
1) And constructing a word co-occurrence diagram (a co-occurrence diagram for short). For example, a co-occurrence graph may be constructed based on co-occurrence relationships between words in a corpus of text. A co-occurrence herein may refer to a word appearing in the same text as the word. Each node in the constructed co-occurrence graph represents a word and corresponds to a word vector, and the word vector is used for representing the average word sense of the corresponding word. The word vector can be obtained by a graph node vectorization method or a word vectorization method.
2) And carrying out graph segmentation on the word co-occurrence graph to obtain a semantic association graph. Graph partitioning herein may refer to a method of mapping an original graph (e.g., a word co-occurrence graph) into a plurality of subgraphs associated with the original graph. For example, the EGO-SPLITTING algorithm (a graph partitioning algorithm) can be used to partition a word co-occurrence graph into multiple subgraphs (i.e., semantically related graphs) with coincident nodes, where each node in the subgraph represents a single word sense of a word.
3) Semantic vector representation of nodes in the semantic dependency graph. Because the semantic association graph is obtained by segmenting the word co-occurrence graph by adopting a graph segmentation algorithm, nodes in the semantic association graph correspond to a word vector. And initializing the word vector corresponding to each node into a corresponding semantic vector. Then, for each node, the semantic vector of the node is adjusted based on the association relationship (e.g., similarity) between the node and its neighboring nodes (or referred to as association nodes) in the semantic association graph. It should be understood that the adjustment process of the semantic vectors of each node is performed iteratively, and the termination condition is that the semantic vectors of two adjacent adjustments are similar or identical. And after the iteration is finished, taking the current semantic vector of each node after the last adjustment as the first semantic vector.
4) A word sense disambiguation model is trained. For example, the word sense disambiguation model may be trained based on the word co-occurrence graph and the semantic association graph described above. The word sense disambiguation model herein may be, for example, a twin neural network, which may include one encoder or multiple encoders. The encoder here functions as: a sequence is encoded as a fixed-length vector, which may include, but is not limited to, recurrent Neural Networks (RNNs), long Short-Term Memory Networks (LSTM), convolutional Neural Networks (CNNs), and the like.
When the word sense disambiguation model includes multiple encoders, the multiple encoders have the same structure and parameters. The above-described training process of the word sense disambiguation model may also be understood as a process of adjusting parameters of the respective encoders. The training process of the word sense disambiguation model is described subsequently.
The present invention has been made in view of the above-mentioned problems, and it is an object of the present invention to provide a novel and improved method for manufacturing a display device.
Fig. 2 is a flowchart of a training method of a word sense disambiguation model according to an embodiment of the present disclosure. The execution subject of the method may be a device with processing capabilities: a server or a system or device. As shown in fig. 2, the method may include:
step 202, acquiring a word co-occurrence graph and a plurality of semantic association graphs.
Here, acquiring the word co-occurrence diagram may be understood as reading a word co-occurrence diagram constructed in advance, or may be understood as constructing the word co-occurrence diagram in real time.
In the present specification, the word co-occurrence diagram may be constructed by the following steps:
step a, aiming at each word in the text corpus, taking the word as a current node, determining the word appearing in the word context window from the text corpus, and taking the determined word as a related word.
Of course, in practical applications, before step a is executed, the words may be segmented in each text in the text corpus, and then the parts of speech tagging may be performed on each word. The parts of speech here may be, for example, verbs, nouns, adjectives, and the like. After performing word segmentation and part-of-speech tagging on each text in the text corpus, the step a may be performed.
In step a, the size of the contextual window may be set manually. It is understood that the top N words and/or the bottom M words of the current word may be included within the context window, where N and M are both positive integers.
And b, taking the associated words as the associated nodes of the current node, and constructing the connecting edges of the two nodes.
And c, determining the weight of the connecting edge between the words and the associated words at least according to the distance between the words and the associated words.
Of course, in practical applications, the weights of the above connecting edges may also be determined in combination with the parts of speech. For example, in the case where both words are verbs, a first weight smaller than a threshold value may be set for the connection edge. And when one of the two words is an adjective and the other word is a verb, a second weight larger than a threshold may be set for the connection edge, and so on.
And d, establishing a connection edge between each word and the associated node and determining the weight of the connection edge to obtain a word co-occurrence graph.
It should be noted that, the above steps a-c are performed iteratively, and the termination condition is to establish a connection edge with the associated node and determine the weight of the connection edge for each word.
In practical application, after the word co-occurrence graph is obtained, the following filtering process can be further performed: if so, removing nodes and connecting edges of stop words of the part of speech of the corresponding word from the word co-occurrence graph; and/or, connected edges with weights less than a threshold may be eliminated, and so on. It will be appreciated that after the filtering process described above is performed, the final used word co-occurrence graph is obtained.
In one example, the word co-occurrence map constructed by steps a-d above may be as shown in FIG. 3. In fig. 3, the word co-occurrence graph may include nodes: a-k, wherein each node represents a word and corresponds to a word vector, and the word vector is used for representing the average word sense of the corresponding word.
The word vector can be obtained by adopting a graph node vectorization method or a word vectorization method. The graph node vectorization method comprises any one of the following steps: node2vec, deep walk, LINE, etc., the word vectorization method includes any one of the following: word2Vec and Glove, etc.
The above is an explanation of the process of constructing the word co-occurrence graph, and the semantic relation graph obtained by the division thereof will be explained below.
Similarly, the obtaining of the plurality of semantic association graphs may be understood as reading a pre-segmented semantic association graph, or may be understood as segmenting a constructed word co-occurrence graph to obtain a semantic association graph by using a graph segmentation algorithm in real time.
When the graph segmentation algorithm is the EGO-SPLITTING algorithm, the segmentation process for the word co-occurrence graph may be: and carrying out breadth-first search on each node in the word co-occurrence graph by taking the node as a center to obtain an initial segmentation graph associated with the node. And carrying out maximum connected subgraph segmentation on the initial segmentation graph to obtain at least one connected component. And if the number of the connected components obtained by the division is more than one, splitting the node so as to enable each split node to correspond to each connected component one by one. And constructing a connecting edge between each node and the corresponding connected component based on the connecting edge of the node before splitting so as to form a semantic association graph related to the node.
Taking the word co-occurrence graph shown in fig. 3 as an example, when the current node is node a, the initial segmentation graph associated with the node may be as shown in fig. 4 a. After performing maximum connected subgraph segmentation on the initial segmentation graph, two connected components can be obtained: g-h-i and b-c-e. Since the number of the connected components is two (i.e. more than 1), the node a can be split to obtain the nodes a1 and a2. The purpose of splitting node a here is to make it possible for each connected component to correspond to one node a. Thus, in this specification, the number of split nodes coincides with the number of connected components. After the nodes a1 and a2 are obtained by splitting, based on the connecting edges of the node a before splitting and each node in each connected component, the connecting edges between the nodes a1 and a2 and each node in the corresponding connected component can be constructed to form a semantic association diagram related to the node a as shown in fig. 4 b.
For the word co-occurrence graph shown in fig. 3, after the above operation of forming the semantic association graph related to each node corresponding to a word ambiguity (referred to as a multi-meaning node) in the word co-occurrence graph is performed, the finally used semantic association graph shown in fig. 4c can be obtained. That is, in the present specification, the finally used semantic relation graph is composed of a plurality of semantic relation graphs relating to the multi-meaning nodes.
After the semantic association graph is obtained, the following step of determining the corresponding first semantic vector may be further performed for each node in the semantic association graph.
It should be understood that, since the semantic association graph is segmented based on the word co-occurrence graph, each node in the semantic association graph may correspond to one word vector. Thus, the step of determining the first semantic vector may be: for each node in the semantic association graph, the word vector corresponding to the node can be used as the current semantic vector of the node, and then the current semantic vector of the node is adjusted by analyzing the association relationship between the node and the adjacent node (or association node) in the semantic association graph. And when the current semantic vector meets the predefined convergence condition, determining the current semantic vector as a corresponding first semantic vector. The step of adjusting the current semantic vector of each node may specifically be:
and step x, for each node in the semantic association graph, taking the node as a starting point, and performing multiple random walks based on the weight of the connecting edge between the node and other nodes to sample the corresponding node sequence.
In one example, for a certain node, the sampled node sequence may be represented as: v = { V1, V2, … vt }, where t is a positive integer. It should be understood that the greater the weight corresponding to a connecting edge, the greater the probability that the node corresponding to that connecting edge is sampled. The sample sequence here may be determined based on the result of a plurality of random walks, or may be determined based on the result of one random walk. When the node sequence is determined based on the result of one random walk, the number of the node sequences may be plural.
And step y, sequentially taking each node in the node sequence as a current node, and calculating a first probability value at least according to the similarity between the current semantic vector of the current node and the current semantic vector of the adjacent node of the current node in the node sequence. And calculating a second probability value at least according to the similarity between the word vector of the current node and the current semantic vector of the adjacent node of the current node in the node sequence.
Taking node vi in the node sequence as an example, the neighboring node of the node in the node sequence can be represented as: { v [ i-s],v[i-s+1],…,v[i+s-1],v[i+s]Where s is a positive integer. Hereinafter, these neighboring nodes are collectively denoted as u GP
In one example, the first probability value may be calculated based on equation 1 as follows:
Figure BDA0002379858320000111
wherein vi is the ith (i is more than or equal to 1 and less than or equal to t) node in the node sequence, and u GP Is a neighbor of node vi, v GP Each node in the semantic association graph. Phi GP (v i ) Is the current semantic vector of node vi, Φ GP (u GP ) Current semantic vector of neighboring node, phi, for node vi GP (v) Is the current semantic vector of each node in the semantic association graph. The meaning of this formula is: calculating the current semantic vector of the node vi and the adjacent node u GP The similarity of the current semantic vector.
Further, the second probability value may also be calculated based on the following equation 2:
Figure BDA0002379858320000121
wherein vi is the ith (i is more than or equal to 1 and less than or equal to t) node in the node sequence, and u GP Is a neighbor of node vi, v G Are nodes in the word co-occurrence graph. Phi G (v i ) Word vector of node vi, [ phi ] GP (u GP ) Current semantic vector of neighboring node, phi, for node vi G (v) Is a word vector for each node in the word co-occurrence graph. The meaning of this formula is: calculating the word vector of the node vi and the adjacent node u GP The similarity of the current semantic vector.
Of course, in practical applications, the calculation of the first probability value and the second probability value may be converted into the calculation of the logarithmic probability value for convenience of calculation.
In one example, the calculation for equation 1 above may be transformed into a calculation: j. the design is a square GP =-logP(u GP Vi), for the calculation of the above equation 2, it can be transformed into a calculation: j. the design is a square G =-logP(u G | vi). Here, J GP May be referred to as a transformed first probability value, J G May be referred to as a transformed second probability value.
It should be understood that in the transformed formula, J GP And P (u) GP The values of | vi) are reversed, J G And P (u) G The values of | vi) are reversed.
And step z, taking the maximized first probability value and the second probability value as targets, and adjusting the current semantic vector of the current node.
I.e. to minimize J GP And J G And adjusting the current semantic vector of the current node as a target.
In one example, the current semantic vector of the current node may be adjusted based on the following iterative formula.
Figure BDA0002379858320000122
Wherein phi GP Current semantic vector, J, for node vi GP Is a transformed first probability value, J, corresponding to node vi G For the transformed second probability value corresponding to node vi, α and λ are parameters defined artificially in iteration, both being greater than 0.
It can be understood that when the current semantic vector of a certain node is adjusted based on the above formula 3, the current semantic vector of the node and the current semantic vectors of the neighboring nodes can be made as close as possible. In addition, the current semantic vector of the node can be mapped to the same vector space as the word vector of each node in the word co-occurrence graph.
And (4) iteratively executing the steps x-z until the current semantic vector of each node in the semantic association graph meets a predefined convergence condition. For example, a node may have a convergence condition as follows: the semantic vectors of two adjacent adjustments are similar or identical. And taking the current semantic vector of each node in the semantic association graph as the first semantic vector.
Step 204, a first word with a word ambiguity is selected from the training text.
The training text can be any text in the text corpus. For example, a first word may be selected: a meter, where the meter may include two senses: first, human appearance; second, instruments that measure temperature, air pressure, etc.
Step 206, a first interpretation text and a second interpretation text of the first word are obtained.
Wherein the first interpretation text is used for interpreting the word sense of the first word corresponding to the training text, and the second interpretation text is used for interpreting the other word sense of the first word. For example, the first word in the training text: the word sense of the meter is: when the person looks like it, then the first interpretation text may refer to text for interpreting the person's look, and the second interpretation text may refer to text for interpreting an instrument measuring temperature, air pressure, etc. It should be understood that in practical applications, the number of the second interpretation texts may be multiple. In the following description of the present specification, the number of the second explanatory text is 1 as an example.
It should be noted that the source of the first explanatory text and the second explanatory text is not limited in this specification. For example, the two texts may be obtained from a dictionary, a hundred-degree encyclopedia, or a knowledge graph.
It should be appreciated that in training the word disambiguation model, the first explanatory text described above may be considered a positive example, while the second explanatory text may be considered a negative example. Further, the label of the training text may be an identification of the first interpretation text.
And 208, calculating the similarity between each word in the training text and each word represented by each node in each semantic association diagram for the training text, and selecting a target association diagram from the plurality of semantic association diagrams based on the similarity.
Specifically, for each semantic association diagram in the semantic association diagrams, calculating the similarity between each word in the training text and the word represented by each node in the semantic association diagram. The similarity here may be determined based on an edit distance between two words or a cosine value or the like. And counting the number of similar words corresponding to the semantic association diagram based on the calculated similarity. It can be understood that, for each semantic association graph, the number of similar words to the training text can be counted. And selecting the semantic association diagram with the maximum number of corresponding similar words from the plurality of semantic association diagrams, and taking the selected semantic association diagram as a target association diagram.
And step 210, determining a semantic vector of the first word at least based on the first semantic vector corresponding to each node in the target association diagram, determining word vectors of other words based on the word vectors corresponding to each node in the word co-occurrence diagram, and encoding the training text by using an encoder according to the semantic vector of the first word and the word vectors of the other words.
It should be understood that after the target association graph is selected, in one implementation, a first semantic vector of a node (or a similar node for short) in the target association graph, where a term represented by the target association graph is similar to the first term, may be used as the semantic vector of the first term. The similarity node can be selected based on the similarity calculated above.
In another implementation, the semantic vector of the first term may also be determined by: and acquiring a first semantic vector of a node of which the represented word is similar to the first word in the target association diagram. And acquiring a second semantic vector of a node of which the query word represented in the pre-constructed personalized semantic association diagram is similar to the first word. The personalized semantic association graph is constructed based on a plurality of historical query words of a user, wherein each node represents a single word meaning of one query word and corresponds to a second semantic vector. And fusing the first semantic vector and the second semantic vector, and taking a fusion result as the semantic vector of the first word.
The fusing the first semantic vector and the second semantic vector includes: and performing maximum pooling operation, averaging operation or summing operation and the like on the first semantic vector and the second semantic vector to obtain the fusion result.
In the embodiment of the description, the method for determining the semantic vector of the first word based on the first semantic vector and the second semantic vector can realize personalized recommendation of the user.
For any second word in other words, the similarity between the second word and the word represented by each node in the word co-occurrence graph can be calculated. And then based on the calculated similarity, selecting a node of the represented word similar to the second word from the word co-occurrence graph, and taking the word vector of the selected node as the word vector of the second word.
When the word sense disambiguation model includes multiple encoders, it is assumed that the three encoders are: a first encoder, a second encoder, and a third encoder. After the semantic vector of the first word and the word vectors of the other words are determined, the determined vectors can be spliced and the result of the splicing (which can be seen as a sequence) can be input to a first encoder therein for encoding by means of the first encoder. And then, obtaining a first coding result of the training text through the output of the first coder.
In one example, if the training text is represented as q, the first encoding result of the training text is represented as O q In this case, the step 210 can also be expressed as: o is q = Encoder1 (q). Wherein Encoder1 is the first coder.
And step 212, respectively determining word vectors of words in the first explanation text and the second explanation text based on the word vectors corresponding to the nodes in the word co-occurrence graph, and respectively encoding the first explanation text and the second explanation text by using an encoder according to the word vectors of the words in the first explanation text and the second explanation text.
Here, the method for determining the word vector of each word in the first explanatory text and the second explanatory text is similar to the method for determining the word vector of other words in the training text, and is not repeated here.
It should be noted that, for the first interpretation text, after the word vector of each word is determined, the word vectors of each word may be spliced, and then the splicing result is input to the second encoder, and the second encoding result of the first interpretation text is obtained through the output of the second encoder.
In one example, if the first explanatory text is represented as q SK The second encoding result of the first interpretation text is represented as O sk In this case, the step of encoding the first interpretation text may be represented as: o is sk =Encoder2(q SK ). Wherein the Encoder2 is a second Encoder.
Similarly, for the second interpretation text, after the word vector of each word is determined, the word vectors of the words may be spliced, and then the splicing result is input to the third encoder, and the third encoding result of the second interpretation text is obtained through the output of the third encoder.
In one example, if the second explanatory text is represented as q SK’ And the third encoding result of the second interpretation text is represented as O SK’ In this case, the encoding step of the second interpretation text may be represented as: o is SK’ =Encoder3(q SK’ ). Wherein the Encoder3 is a third Encoder.
Based on the encoding result, a first text distance between the training text and the first interpretation text is calculated, and a second text distance between the training text and the second interpretation text is calculated, step 214.
For example, a first text distance between the training text and the first interpretation text may be calculated based on the first encoding result and the second encoding result. A second text distance between the training text and the second explanatory text may be calculated based on the first encoding result and the third encoding result.
In this specification, the first text distance or the second text distance may include, but is not limited to, a cosine similarity, a manhattan distance, a euclidean distance, and the like.
Step 216, training the encoder with the first text distance being less than the second text distance as a target.
In one example, the encoder may be trained based on the following loss function.
L=max(||O q -O sk || 2 -||O q -O sk' || 2 + α, 0) (equation 4)
Wherein, O q For the first coding result of the training text, O sk For a second encoding result of the first interpretation text, O SK’ For the third encoding result of the second interpretation text, | O q -O sk || 2 Is the first text distance, | O q -O sk' || 2 And alpha is a hyper-parameter for the second text distance.
It should be understood that steps 204-216 are performed iteratively, and the last trained encoder is used as the final encoder.
In summary, the training method of the word sense disambiguation model provided in the embodiment of the present specification may encode the training text by using an encoder according to the semantic vector of the target word with one word or more and the word vectors of other words. And because the semantic vector of the target word is used for representing the single word meaning, when the training text is coded based on the semantic vector, the coding result of the training text can be greatly improved, and the accuracy of the expression of the training text can also be improved. In addition, under the condition that the training text can be accurately expressed, the trained word sense disambiguation model can be more accurate, and further the word sense disambiguation can be more accurately realized.
It should be noted that the word sense disambiguation model obtained through training in the embodiment of the present specification can be applied to processing systems of various natural languages, such as a search system, a translation system, and a recommendation system for synonyms and related words.
The following describes a process for performing word sense disambiguation based on a trained word sense disambiguation model.
FIG. 5 is a flow chart of a word sense disambiguation method provided in one embodiment of the present description. The execution subject of the method may be a device with processing capabilities: a server or a system or device. As shown in fig. 5, the method may include:
step 502 is the same as step 202, and will not be repeated herein.
Step 504, obtaining the text to be disambiguated, and selecting a first word with one word ambiguity from the text to be disambiguated.
Step 506, a number of explanatory texts of the first word are obtained.
Wherein each of the interpretation texts is for interpreting one of a plurality of word senses of the first word.
The source of the several interpretative texts is not limited in this specification. For example, the explanation texts may be obtained from a dictionary, a hundred-degree encyclopedia or a knowledge graph.
And step 508, for the text to be disambiguated, determining word vectors of the words in the text to be disambiguated based on the word vectors corresponding to the nodes in the word co-occurrence graph, and encoding the text to be disambiguated by utilizing an encoder in a pre-trained word sense disambiguation model according to the word vectors of the words in the text to be disambiguated.
Here, the method for determining the word vector of each word in the text to be disambiguated may refer to the method for determining the word vector of other words in the training text, which is not repeated herein.
It should be noted that, for the text to be disambiguated, after the word vectors of the words in the text to be disambiguated are determined, the word vectors of the words may be spliced, and then the splicing result is input to the first encoder trained in advance, and the encoding result of the text to be disambiguated is obtained through the output of the first encoder.
And step 510, for each interpretation text, calculating the similarity between each word in the interpretation text and the word represented by each node in each semantic association diagram, and selecting a target association diagram from a plurality of semantic association diagrams based on the calculated similarity.
Here, for the method for explaining the text to select the target association diagram, reference may be made to the method for selecting the target association diagram for the training text in step 208, which is not repeated herein.
And step 512, determining the semantic vector of each word in the interpretation text based on the semantic vector corresponding to each node in the target association diagram, and encoding the interpretation text by using a pre-trained encoder according to the semantic vector of each word in the interpretation text.
Here, the method for determining the semantic vector of each term in the explanatory text may refer to the method for determining the semantic vector of the first term in step 210, which is not repeated herein.
When the number of the encoders is plural, the respective interpretation texts may be encoded based on different encoders, so that a corresponding encoding result may be obtained for each interpretation text.
Taking one of several interpretations as an example, the encoding process may be: and splicing the semantic vectors of the words in the explanation text, inputting the splicing result into a pre-trained second encoder, and obtaining the encoding result corresponding to the explanation text through the output of the second encoder.
Step 514, determining text distance between the text to be disambiguated and the plurality of interpreted texts based on the encoding result.
For example, the text distance between the text to be disambiguated and the several interpreted texts may be determined based on the encoding result of the text to be disambiguated and the respective encoding result of each interpreted text. Text distances herein may include, but are not limited to, cosine similarity, manhattan distance, euclidean distance, and the like.
And 516, selecting a target text with the minimum corresponding text distance from the plurality of interpretation texts.
Step 518, based on the target text, determines the word sense of the first word in the text to be disambiguated.
In one example, the word sense of the first word may be determined based on the following formula.
Figure BDA0002379858320000181
Wherein, O q For the coding result of the text to be disambiguated, O si For the encoding result of the ith interpretation text, | O q -O si || 2 SK is the finally determined word sense of the first word for the text distance between the text to be disambiguated and the i-th interpreted text.
In summary, the word sense disambiguation method provided by the embodiment of the present specification can more accurately disambiguate words.
Corresponding to the above training method for the word sense disambiguation model, an embodiment of the present specification further provides a training apparatus for a word sense disambiguation model, where the word sense disambiguation model includes an encoder. As shown in fig. 6, the apparatus may include:
an obtaining unit 602, configured to obtain a word co-occurrence graph and a plurality of semantic association graphs. The word co-occurrence graph is constructed based on the co-occurrence relation among words in the text corpus, each node represents a word and corresponds to a word vector, and the word vector is used for representing the average word meaning of the corresponding word. The semantic association graph is obtained by segmenting the word co-occurrence graph by adopting a graph segmentation algorithm, wherein each node represents a single word meaning of a word and corresponds to a first semantic vector.
The word vectors of the nodes in the word co-occurrence graph may be determined based on a graph node vectorization method or a word vectorization method. The graph node vectorization method comprises any one of the following steps: node2vec, deepwalk, and LINE. The word vectorization method herein includes any of the following: word2Vec and Glove.
A selecting unit 604 for selecting a first word with a word ambiguity from the training text.
The obtaining unit 602 is further configured to obtain a first interpretation text and a second interpretation text of the first word selected by the selecting unit 604. Wherein the first interpretation text is used for interpreting the word sense of the first word corresponding to the training text, and the second interpretation text is used for interpreting the other word sense of the first word.
The calculating unit 606 is configured to calculate, for the training text, a similarity between each word in the training text and a word represented by each node in each semantic association diagram, and select a target association diagram from the plurality of semantic association diagrams based on the calculated similarity.
The computing unit 606 may be further specifically configured to:
and for each semantic association diagram in the semantic association diagrams, calculating the similarity between each word in the training text and the word represented by each node in the semantic association diagram.
And counting the number of similar words corresponding to the semantic association diagram based on the similarity.
And selecting the semantic association diagram with the maximum number of corresponding similar words from the plurality of semantic association diagrams, and taking the selected semantic association diagram as a target association diagram.
The determining unit 608 is configured to determine a semantic vector of the first word based on at least the first semantic vector corresponding to each node in the target association map, determine word vectors of other words based on the word vectors corresponding to each node in the word co-occurrence map, and encode the training text by using an encoder according to the semantic vector of the first word and the word vectors of the other words.
The determining unit 608 may specifically be configured to:
and acquiring a first semantic vector of a node of which the represented word is similar to the first word in the target association diagram.
And acquiring a second semantic vector of a node of which the query word represented in a pre-constructed personalized semantic association diagram is similar to the first word, wherein the personalized semantic association diagram is constructed on the basis of a plurality of historical query words of a user, and each node represents a single word meaning of one query word and corresponds to one second semantic vector.
And fusing the first semantic vector and the second semantic vector, and taking a fusion result as the semantic vector of the first word.
The determining unit 608 may further specifically be configured to:
and performing maximum pooling operation, averaging operation or summing operation on the first semantic vector and the second semantic vector to obtain a fusion result.
The encoder herein may include any one of: a recurrent neural network RNN, a long short term memory network LSTM, and a convolutional neural network CNN.
The determining unit 608 is further configured to determine word vectors of words in the first interpretation text and the second interpretation text respectively based on the word vectors corresponding to the nodes in the word co-occurrence graph, and encode the first interpretation text and the second interpretation text respectively by using an encoder according to the word vectors of the words in the first interpretation text and the second interpretation text.
The calculating unit 606 is further configured to calculate a first text distance between the training text and the first interpretation text, and calculate a second text distance between the training text and the second interpretation text, based on the encoding result determined by the determining unit 608.
A training unit 610, configured to train the encoder with the first text distance calculated by the calculating unit 606 being smaller than the second text distance as a target.
Optionally, the apparatus may further include: a first building element (not shown in the figure).
The determining unit 608 is further configured to, for each word in the text corpus, use the word as a current node, determine, from the text corpus, a word that appears in the word context window, and use the determined word as an associated word.
A first constructing unit, configured to use the associated word determined by the determining unit 608 as an associated node of the current node, and construct a connecting edge between the two.
The determining unit 608 is further configured to determine a weight of a connecting edge between the word and the associated word according to at least a distance between the word and the associated word.
The obtaining unit 602 is further configured to obtain a term co-occurrence graph after establishing, for each term, a connection edge with the associated node and determining a weight of the connection edge.
Optionally, the apparatus may further include: a search unit (not shown in the figure), a segmentation unit (not shown in the figure), a splitting unit (not shown in the figure) and a second construction unit (not shown in the figure).
And the searching unit is used for carrying out breadth-first searching on each node in the word co-occurrence graph by taking the node as a center to obtain an initial segmentation graph associated with the node.
And the segmentation unit is used for carrying out maximum connected subgraph segmentation on the initial segmentation graph to obtain at least one connected component.
And the splitting unit is used for splitting the nodes if the number of the connected components obtained by splitting by the splitting unit is more than 1, so that each split node corresponds to each connected component one by one.
And the second construction unit is used for constructing the connecting edges between each node and the corresponding connected components based on the connecting edges of the nodes before splitting so as to form a semantic association graph related to the node.
Optionally, the apparatus may further include: a sampling unit (not shown), an adjusting unit (not shown) and an executing unit (not shown).
And the sampling unit is used for sampling the corresponding node sequence for each node in the semantic association graph.
And the adjusting unit is used for sequentially taking each node in the node sequence obtained by sampling of the sampling unit as a current node and adjusting the current semantic vector of the current node.
The adjusting unit may specifically be configured to:
calculating a first probability value based at least on similarity of a current semantic vector of a current node to current semantic vectors of neighboring nodes of the current node in the sequence of nodes. And calculating a second probability value at least according to the word vector of the current node and the similarity of the current semantic vector of the adjacent node of the current node in the node sequence.
And adjusting the current semantic vector of the current node by taking the maximized first probability value and the maximized second probability value as targets.
And the execution unit is used for iteratively calling the sampling unit and the adjusting unit to execute the steps until the current semantic vector of each node in the semantic association graph meets the predefined convergence condition.
The determining unit 608 is further configured to use the current semantic vector of each node in the semantic association graph as the first semantic vector thereof.
The functions of the functional modules of the device in the foregoing embodiments of the present specification may be implemented through the steps of the foregoing method embodiments, and therefore, detailed working processes of the device provided in an embodiment of the present specification are not described herein again.
In the training device for the word sense disambiguation model provided in one embodiment of the present specification, the trained word sense disambiguation model can more accurately implement word sense disambiguation.
Corresponding to the word sense disambiguation method, an embodiment of the present specification further provides a word sense disambiguation apparatus. As shown in fig. 7, the apparatus may include:
an obtaining unit 702 is configured to obtain a word co-occurrence graph and a plurality of semantic association graphs. The word co-occurrence graph is constructed based on the co-occurrence relation among words in the text corpus, each node represents a word and corresponds to a word vector, and the word vector is used for representing the average word meaning of the corresponding word. The semantic association graph is obtained by segmenting the word co-occurrence graph by adopting a graph segmentation algorithm, wherein each node represents a single word meaning of a word and corresponds to a semantic vector.
The obtaining unit 702 is further configured to obtain a text to be disambiguated, and select a first word with a word ambiguity from the text to be disambiguated.
The obtaining unit 702 is further configured to obtain a plurality of interpretation texts of the first word, where each interpretation text is used to interpret one of a plurality of word senses of the first word.
A determining unit 704, configured to determine, for the text to be disambiguated acquired by the acquiring unit 702, a word vector of each word in the text to be disambiguated based on the word vector corresponding to each node in the word co-occurrence graph. And coding the text to be disambiguated by utilizing a coder in a pre-trained word sense disambiguation model according to the word vector of each word in the text to be disambiguated.
The calculating unit 706 is configured to calculate, for each interpretation text acquired by the acquiring unit 702, a similarity between each term in the interpretation text and a term represented by each node in each semantic association diagram, and select a target association diagram from the plurality of semantic association diagrams based on the calculated similarity.
The determining unit 704 is further configured to determine a semantic vector of each word in the interpretation text based on the semantic vector corresponding to each node in the target association map. And coding the interpreted text by utilizing a coder according to the semantic vector of each word in the interpreted text.
The determining unit 704 is further configured to determine a text distance between the text to be disambiguated and the number of interpreted texts based on the encoding result.
The selecting unit 708 is configured to select a target text with a minimum corresponding text distance from the plurality of interpretation texts.
The determining unit 704 is further configured to determine a word sense of a first word in the text to be disambiguated based on the target text selected by the selecting unit 708.
The functions of each functional module of the device in the above embodiments of the present description may be implemented through each step of the above method embodiments, and therefore, a specific working process of the device provided in one embodiment of the present description is not repeated herein.
The word sense disambiguation device provided by one embodiment of the specification can more accurately disambiguate words.
In another aspect, embodiments of the present specification provide a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method shown in fig. 2 or fig. 5.
In another aspect, embodiments of the present specification provide a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method illustrated in fig. 2 or fig. 5.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied in hardware or may be embodied in software instructions executed by a processor. The software instructions may consist of corresponding software modules that may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, a hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in a server. Of course, the processor and the storage medium may reside as discrete components in a server.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The above-mentioned embodiments, objects, technical solutions and advantages of the present specification are further described in detail, it should be understood that the above-mentioned embodiments are only specific embodiments of the present specification, and are not intended to limit the scope of the present specification, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present specification should be included in the scope of the present specification.

Claims (22)

1. A method of training a word sense disambiguation model, the word sense disambiguation model comprising an encoder; the method comprises the following steps:
acquiring a word co-occurrence graph and a plurality of semantic association graphs; the word co-occurrence graph is constructed based on the co-occurrence relation among words in the text corpus, wherein each node represents one word and corresponds to a word vector, and the word vector is used for representing the average word meaning of the corresponding word; the semantic association graph is obtained by segmenting the word co-occurrence graph by adopting a graph segmentation algorithm, wherein each node represents a single word meaning of a word and corresponds to a first semantic vector;
selecting a first word with word ambiguity from a training text;
acquiring a first explanation text and a second explanation text of the first word; wherein the first interpretation text is used for interpreting the word sense of the first word corresponding to the training text, and the second interpretation text is used for interpreting other word senses of the first word;
for the training text, calculating the similarity between each word in the training text and the word represented by each node in each semantic association diagram, and selecting a target association diagram from the plurality of semantic association diagrams based on the similarity;
determining semantic vectors of the first words at least based on first semantic vectors corresponding to the nodes in the target association diagram, and determining word vectors of other words based on word vectors corresponding to the nodes in the word co-occurrence diagram; encoding the training text by using the encoder according to the semantic vector of the first word and the word vectors of other words;
respectively determining word vectors of words in the first interpretation text and the second interpretation text based on the word vectors corresponding to the nodes in the word co-occurrence graph; respectively encoding the first explanation text and the second explanation text by utilizing the encoder according to the word vectors of all words in the first explanation text and the second explanation text;
calculating a first text distance between the training text and the first interpretation text and calculating a second text distance between the training text and the second interpretation text based on the encoding result;
and training the encoder by taking the first text distance smaller than the second text distance as a target.
2. The method of claim 1, the word co-occurrence map obtained by:
regarding each word in the text corpus as a current node, determining words appearing in the word context window from the text corpus, and regarding the determined words as associated words;
taking the associated word as an associated node of the current node, and constructing a connecting edge of the associated word and the current node;
determining the weight of a connecting edge between the word and the associated word at least according to the distance between the word and the associated word;
and obtaining the word co-occurrence graph after establishing a connecting edge between each word and the associated node and determining the weight of the connecting edge.
3. The method of claim 1, wherein the semantic associations are obtained by:
carrying out breadth-first search on each node in the word co-occurrence graph by taking the node as a center to obtain an initial segmentation graph associated with the node;
performing maximum connected subgraph segmentation on the initial segmentation graph to obtain at least one connected component;
if the number of the connected components is more than 1, splitting the nodes so that each split node corresponds to each connected component one by one;
and constructing a connecting edge between each node and the corresponding connected component based on the connecting edge of the node before splitting so as to form a semantic association graph related to the node in the plurality of semantic association graphs.
4. The method of claim 1, the word vector for each node in the word co-occurrence graph is determined based on a graph node vectorization method or a word vectorization method; the graph node vectorization method comprises any one of the following steps: node2vec, deepwalk, and LINE; the word vectorization method comprises any one of the following steps: word2Vec and Glove.
5. The method of claim 1, wherein the first semantic vector of each node in the semantic dependency graph is obtained by:
for each node in the semantic association graph, sampling a corresponding node sequence;
and taking each node in the node sequence as a current node in turn, and adjusting the current semantic vector of the current node, wherein the adjusting step comprises the following steps:
calculating a first probability value at least according to the similarity of the current semantic vector of the current node and the current semantic vectors of the adjacent nodes of the current node in the node sequence; calculating a second probability value at least according to the word vector of the current node and the similarity of the current semantic vector of the adjacent node of the current node in the node sequence;
adjusting a current semantic vector of the current node with a goal of maximizing the first probability value and the second probability value;
iteratively executing the steps until the current semantic vector of each node in the semantic association diagram meets a predefined convergence condition;
and taking the current semantic vector of each node in the semantic association graph as the first semantic vector.
6. The method of claim 1, the encoder comprising any of: a recurrent neural network RNN, a long short term memory network LSTM, and a convolutional neural network CNN.
7. The method of claim 1, wherein determining the semantic vector of the first term based on at least the first semantic vector corresponding to each node in the target dependency graph comprises:
acquiring a first semantic vector of a node of a term similar to the first term in the target association graph;
acquiring a second semantic vector of a node of which the query word represented in the pre-constructed personalized semantic association diagram is similar to the first word; the personalized semantic association graph is constructed on the basis of a plurality of historical query words of a user, wherein each node represents a single word meaning of one query word and corresponds to a second semantic vector;
and fusing the first semantic vector and the second semantic vector, and taking a fusion result as the semantic vector of the first word.
8. The method of claim 7, the fusing the first semantic vector and the second semantic vector comprising:
and performing maximum pooling operation, averaging operation or summing operation on the first semantic vector and the second semantic vector to obtain the fusion result.
9. The method according to claim 1, wherein for the training text, calculating similarity between each word in the training text and a word represented by each node in each semantic association graph, and selecting a target association graph from the plurality of semantic association graphs based on the similarity comprises:
for each semantic association diagram in the semantic association diagrams, calculating the similarity between each word in the training text and the word represented by each node in the semantic association diagram;
counting the number of similar words corresponding to the semantic association diagram based on the similarity;
and selecting the semantic association diagram with the most number of corresponding similar words from the plurality of semantic association diagrams, and taking the selected semantic association diagram as a target association diagram.
10. A word sense disambiguation method comprising:
acquiring a word co-occurrence graph and a plurality of semantic association graphs; the word co-occurrence graph is constructed based on the co-occurrence relation among words in the text corpus, wherein each node represents one word and corresponds to a word vector, and the word vector is used for representing the average word meaning of the corresponding word; the semantic association graph is obtained by segmenting the word co-occurrence graph by adopting a graph segmentation algorithm, wherein each node represents a single word meaning of a word and corresponds to a semantic vector;
acquiring a text to be disambiguated, and selecting a first word with one word and multiple meanings from the text to be disambiguated;
acquiring a plurality of interpretation texts of the first word, wherein each interpretation text is used for interpreting one word sense in a plurality of word senses of the first word;
for the text to be disambiguated, determining word vectors of words in the text to be disambiguated based on the word vectors corresponding to the nodes in the word co-occurrence graph; coding the text to be disambiguated by utilizing a coder in a pre-trained word sense disambiguation model according to the word vector of each word in the text to be disambiguated;
for each interpretation text, calculating the similarity between each word in the interpretation text and the word represented by each node in each semantic association diagram, and selecting a target association diagram from the plurality of semantic association diagrams based on the similarity;
determining semantic vectors of words in the interpretation text based on the semantic vectors corresponding to the nodes in the target association diagram; coding the explanation text by utilizing the coder according to the semantic vector of each word in the explanation text;
determining a text distance between the text to be disambiguated and the plurality of interpreted texts based on the encoding result;
selecting a target text with the minimum corresponding text distance from the plurality of interpretation texts;
determining a word sense of the first word in the text to be disambiguated based on the target text.
11. A training apparatus for a word sense disambiguation model, the word sense disambiguation model comprising an encoder; the device comprises:
the acquisition unit is used for acquiring a word co-occurrence graph and a plurality of semantic association graphs; the word co-occurrence graph is constructed based on the co-occurrence relation among words in the text corpus, wherein each node represents one word and corresponds to a word vector, and the word vector is used for representing the average word meaning of the corresponding word; the semantic association graph is obtained by segmenting the word co-occurrence graph by adopting a graph segmentation algorithm, wherein each node represents a single word meaning of a word and corresponds to a first semantic vector;
the selecting unit is used for selecting a first word with one word or more from the training text;
the acquisition unit is further used for acquiring a first explanation text and a second explanation text of the first word selected by the selection unit; wherein the first interpretation text is used for interpreting the word sense of the first word corresponding to the training text, and the second interpretation text is used for interpreting other word senses of the first word;
the calculation unit is used for calculating the similarity between each word in the training text and the word represented by each node in each semantic association diagram for the training text, and selecting a target association diagram from the plurality of semantic association diagrams based on the similarity;
the determining unit is used for determining the semantic vector of the first word at least based on the first semantic vector corresponding to each node in the target association diagram, and determining the word vectors of other words based on the word vectors corresponding to each node in the word co-occurrence diagram; encoding the training text by using the encoder according to the semantic vector of the first word and the word vectors of other words;
the determining unit is further configured to determine word vectors of words in the first interpretation text and the second interpretation text respectively based on the word vectors corresponding to the nodes in the word co-occurrence graph; respectively encoding the first interpretation text and the second interpretation text by using the encoder according to the word vectors of the words in the first interpretation text and the second interpretation text;
the calculating unit is further configured to calculate a first text distance between the training text and the first interpretation text and calculate a second text distance between the training text and the second interpretation text based on the encoding result determined by the determining unit;
and the training unit is used for training the encoder by taking the first text distance calculated by the calculating unit as a target and smaller than the second text distance.
12. The apparatus of claim 11, the apparatus further comprising: a first building element;
the determining unit is further configured to determine, for each word in the text corpus, a word appearing in the word context window from the text corpus using the word as a current node, and use the determined word as an associated word;
the first construction unit is used for taking the associated word determined by the determination unit as an associated node of the current node and constructing a connecting edge of the associated word and the current node;
the determining unit is further configured to determine a weight of a connecting edge between the word and the associated word at least according to a distance between the word and the associated word;
the obtaining unit is further configured to obtain the term co-occurrence graph after establishing, for each term, a connection edge with the associated node and determining a weight of the connection edge.
13. The apparatus of claim 11, the apparatus further comprising: the system comprises a searching unit, a dividing unit, a splitting unit and a second constructing unit;
the searching unit is used for carrying out breadth-first searching on each node in the word co-occurrence graph by taking the node as a center to obtain an initial segmentation graph associated with the node;
the segmentation unit is used for performing maximum connected subgraph segmentation on the initial segmentation graph to obtain at least one connected component;
the splitting unit is configured to split the node if the number of the connected components obtained by splitting by the splitting unit is more than 1, so that each split node corresponds to each connected component one to one;
and the second construction unit is used for constructing the connection edges between each node and the corresponding connected components based on the connection edges of the nodes before splitting so as to form a semantic association diagram related to the node.
14. The apparatus of claim 11, the word vector for each node in the word co-occurrence graph is determined based on a graph node vectorization method or a word vectorization method; the graph node vectorization method comprises any one of the following steps: node2vec, deepwalk, and LINE; the word vectorization method comprises any one of the following steps: word2Vec and Glove.
15. The apparatus of claim 11, the apparatus further comprising: the device comprises a sampling unit, an adjusting unit and an executing unit;
the sampling unit is used for sampling a corresponding node sequence for each node in the semantic association diagram;
the adjusting unit is used for taking each node in the node sequence obtained by sampling of the sampling unit as a current node in sequence and adjusting the current semantic vector of the current node;
the adjusting unit is specifically configured to:
calculating a first probability value at least according to the similarity of the current semantic vector of the current node and the current semantic vectors of the adjacent nodes of the current node in the node sequence; calculating a second probability value at least according to the word vector of the current node and the similarity of the current semantic vector of the adjacent node of the current node in the node sequence;
adjusting a current semantic vector of the current node with a goal of maximizing the first probability value and the second probability value;
the execution unit is used for invoking the sampling unit and the adjusting unit to execute the steps until the current semantic vector of each node in the semantic association diagram meets a predefined convergence condition;
the determining unit is further configured to use the current semantic vector of each node in the semantic association graph as the first semantic vector thereof.
16. The apparatus of claim 11, the encoder comprising any of: a recurrent neural network RNN, a long-short term memory network LSTM, and a convolutional neural network CNN.
17. The apparatus according to claim 11, wherein the determining unit is specifically configured to:
acquiring a first semantic vector of a node of a term similar to the first term in the target association graph;
acquiring a second semantic vector of a node of which the query word represented in a pre-constructed personalized semantic association graph is similar to the first word; the personalized semantic association graph is constructed on the basis of a plurality of historical query words of a user, wherein each node represents a single word meaning of one query word and corresponds to a second semantic vector;
and fusing the first semantic vector and the second semantic vector, and taking a fusion result as the semantic vector of the first word.
18. The apparatus of claim 17, wherein the determining unit is further specifically configured to:
and performing maximum pooling operation, averaging operation or summing operation on the first semantic vector and the second semantic vector to obtain the fusion result.
19. The apparatus according to claim 11, wherein the computing unit is specifically configured to:
for each semantic association diagram in the semantic association diagrams, calculating the similarity between each word in the training text and the word represented by each node in the semantic association diagram;
counting the number of similar words corresponding to the semantic association diagram based on the similarity;
and selecting the semantic association diagram with the maximum number of corresponding similar words from the plurality of semantic association diagrams, and taking the selected semantic association diagram as a target association diagram.
20. A word sense disambiguation apparatus comprising:
the acquisition unit is used for acquiring a word co-occurrence graph and a plurality of semantic association graphs; the word co-occurrence graph is constructed based on the co-occurrence relation among words in the text corpus, wherein each node represents one word and corresponds to a word vector, and the word vector is used for representing the average word meaning of the corresponding word; the semantic association graph is obtained by segmenting the word co-occurrence graph by adopting a graph segmentation algorithm, wherein each node represents a single word meaning of a word and corresponds to a semantic vector;
the acquiring unit is further used for acquiring a text to be disambiguated and selecting a first word with one word ambiguity from the text to be disambiguated;
the obtaining unit is further configured to obtain a plurality of interpretation texts of the first word, where each interpretation text is used to interpret one of a plurality of word senses of the first word;
a determining unit, configured to determine, for the text to be disambiguated obtained by the obtaining unit, word vectors of words in the text to be disambiguated based on word vectors corresponding to nodes in the word co-occurrence graph; coding the text to be disambiguated by utilizing a coder in a pre-trained word sense disambiguation model according to the word vector of each word in the text to be disambiguated;
the calculation unit is used for calculating the similarity between each term in the interpretation text and each term represented by each node in each semantic association diagram for each interpretation text acquired by the acquisition unit, and selecting a target association diagram from the plurality of semantic association diagrams based on the similarity;
the determining unit is further configured to determine a semantic vector of each word in the interpretation text based on the semantic vector corresponding to each node in the target association map; coding the explanation text by utilizing the coder according to the semantic vector of each word in the explanation text;
the determining unit is further used for determining text distances between the text to be disambiguated and the plurality of explanatory texts based on the encoding result;
the selecting unit is used for selecting a target text with the minimum corresponding text distance from the plurality of interpretation texts;
the determining unit is further configured to determine a word sense of the first word in the text to be disambiguated based on the target text selected by the selecting unit.
21. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to perform the method of any one of claims 1-9 or the method of claim 10.
22. A computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of any of claims 1-9 or the method of claim 10.
CN202010079725.XA 2020-02-04 2020-02-04 Training method and device of word sense disambiguation model Active CN111310475B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010079725.XA CN111310475B (en) 2020-02-04 2020-02-04 Training method and device of word sense disambiguation model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010079725.XA CN111310475B (en) 2020-02-04 2020-02-04 Training method and device of word sense disambiguation model

Publications (2)

Publication Number Publication Date
CN111310475A CN111310475A (en) 2020-06-19
CN111310475B true CN111310475B (en) 2023-03-10

Family

ID=71159898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010079725.XA Active CN111310475B (en) 2020-02-04 2020-02-04 Training method and device of word sense disambiguation model

Country Status (1)

Country Link
CN (1) CN111310475B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797204A (en) * 2020-07-01 2020-10-20 北京三快在线科技有限公司 Text matching method and device, computer equipment and storage medium
CN113763014A (en) * 2021-01-05 2021-12-07 北京沃东天骏信息技术有限公司 Article co-occurrence relation determining method and device and judgment model obtaining method and device
CN112949319B (en) * 2021-03-12 2023-01-06 江南大学 Method, device, processor and storage medium for marking ambiguous words in text
CN113158687B (en) * 2021-04-29 2021-12-28 新声科技(深圳)有限公司 Semantic disambiguation method and device, storage medium and electronic device
CN113095087B (en) * 2021-04-30 2022-11-25 哈尔滨理工大学 Chinese word sense disambiguation method based on graph convolution neural network
CN113407717B (en) * 2021-05-28 2022-12-20 数库(上海)科技有限公司 Method, device, equipment and storage medium for eliminating ambiguity of industrial words in news
CN113688245B (en) * 2021-08-31 2023-09-26 中国平安人寿保险股份有限公司 Processing method, device and equipment of pre-training language model based on artificial intelligence
CN114330359A (en) * 2021-11-30 2022-04-12 青岛海尔科技有限公司 Semantic recognition method and device and electronic equipment
CN114912449B (en) * 2022-07-18 2022-09-30 山东大学 Technical feature keyword extraction method and system based on code description text
CN117709367A (en) * 2022-09-15 2024-03-15 华为技术有限公司 Translation method and related equipment
CN116756347B (en) * 2023-08-21 2023-10-27 中国标准化研究院 Semantic information retrieval method based on big data

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107844473A (en) * 2017-09-25 2018-03-27 沈阳航空航天大学 Word sense disambiguation method based on linguistic context Similarity Measure
CN108280061A (en) * 2018-01-17 2018-07-13 北京百度网讯科技有限公司 Text handling method based on ambiguity entity word and device
CN108446269A (en) * 2018-03-05 2018-08-24 昆明理工大学 A kind of Word sense disambiguation method and device based on term vector
CN108874772A (en) * 2018-05-25 2018-11-23 太原理工大学 A kind of polysemant term vector disambiguation method
CN109033307A (en) * 2018-07-17 2018-12-18 华北水利水电大学 Word polyarch vector based on CRP cluster indicates and Word sense disambiguation method
CN109117471A (en) * 2017-06-23 2019-01-01 中国移动通信有限公司研究院 A kind of calculation method and terminal of the word degree of correlation
CN109359303A (en) * 2018-12-10 2019-02-19 枣庄学院 A kind of Word sense disambiguation method and system based on graph model
WO2019085640A1 (en) * 2017-10-31 2019-05-09 株式会社Ntt都科摩 Word meaning disambiguation method and device, word meaning expansion method, apparatus and device, and computer-readable storage medium
CN110321434A (en) * 2019-06-27 2019-10-11 厦门美域中央信息科技有限公司 A kind of file classification method based on word sense disambiguation convolutional neural networks

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117471A (en) * 2017-06-23 2019-01-01 中国移动通信有限公司研究院 A kind of calculation method and terminal of the word degree of correlation
CN107844473A (en) * 2017-09-25 2018-03-27 沈阳航空航天大学 Word sense disambiguation method based on linguistic context Similarity Measure
WO2019085640A1 (en) * 2017-10-31 2019-05-09 株式会社Ntt都科摩 Word meaning disambiguation method and device, word meaning expansion method, apparatus and device, and computer-readable storage medium
CN108280061A (en) * 2018-01-17 2018-07-13 北京百度网讯科技有限公司 Text handling method based on ambiguity entity word and device
CN108446269A (en) * 2018-03-05 2018-08-24 昆明理工大学 A kind of Word sense disambiguation method and device based on term vector
CN108874772A (en) * 2018-05-25 2018-11-23 太原理工大学 A kind of polysemant term vector disambiguation method
CN109033307A (en) * 2018-07-17 2018-12-18 华北水利水电大学 Word polyarch vector based on CRP cluster indicates and Word sense disambiguation method
CN109359303A (en) * 2018-12-10 2019-02-19 枣庄学院 A kind of Word sense disambiguation method and system based on graph model
CN110321434A (en) * 2019-06-27 2019-10-11 厦门美域中央信息科技有限公司 A kind of file classification method based on word sense disambiguation convolutional neural networks

Also Published As

Publication number Publication date
CN111310475A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
CN111310475B (en) Training method and device of word sense disambiguation model
CN108920460B (en) Training method of multi-task deep learning model for multi-type entity recognition
CN110309514B (en) Semantic recognition method and device
CN111783474B (en) Comment text viewpoint information processing method and device and storage medium
CN108846077B (en) Semantic matching method, device, medium and electronic equipment for question and answer text
CN106557563B (en) Query statement recommendation method and device based on artificial intelligence
CN110096567B (en) QA knowledge base reasoning-based multi-round dialogue reply selection method and system
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
CN109829162B (en) Text word segmentation method and device
CN110163181B (en) Sign language identification method and device
CN110704621A (en) Text processing method and device, storage medium and electronic equipment
CN111709243A (en) Knowledge extraction method and device based on deep learning
CN111191002A (en) Neural code searching method and device based on hierarchical embedding
CN111061840A (en) Data identification method and device and computer readable storage medium
CN110210043A (en) Text translation method and device, electronic equipment and readable storage medium
CN113204611A (en) Method for establishing reading understanding model, reading understanding method and corresponding device
CN109145083B (en) Candidate answer selecting method based on deep learning
CN113158687B (en) Semantic disambiguation method and device, storage medium and electronic device
JP2018185771A (en) Sentence pair classification apparatus, sentence pair classification learning apparatus, method, and program
CN112632250A (en) Question and answer method and system under multi-document scene
CN113435208A (en) Student model training method and device and electronic equipment
CN115658846A (en) Intelligent search method and device suitable for open-source software supply chain
CN113761151A (en) Synonym mining method, synonym mining device, synonym question answering method, synonym question answering device, computer equipment and storage medium
CN112800205A (en) Method and device for obtaining question-answer related paragraphs based on semantic change manifold analysis
CN112765985A (en) Named entity identification method for specific field patent embodiment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant