CN108021544B - Method and device for classifying semantic relation of entity words and electronic equipment - Google Patents

Method and device for classifying semantic relation of entity words and electronic equipment Download PDF

Info

Publication number
CN108021544B
CN108021544B CN201610929103.5A CN201610929103A CN108021544B CN 108021544 B CN108021544 B CN 108021544B CN 201610929103 A CN201610929103 A CN 201610929103A CN 108021544 B CN108021544 B CN 108021544B
Authority
CN
China
Prior art keywords
matrix
words
text sequence
attention
predetermined number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610929103.5A
Other languages
Chinese (zh)
Other versions
CN108021544A (en
Inventor
张姝
杨铭
孙俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201610929103.5A priority Critical patent/CN108021544B/en
Publication of CN108021544A publication Critical patent/CN108021544A/en
Application granted granted Critical
Publication of CN108021544B publication Critical patent/CN108021544B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a method, a device and electronic equipment for classifying semantic relations of entity words in a text sequence, wherein the device comprises: a first obtaining unit, configured to represent each word in the text sequence by a word vector to construct a first matrix; a second obtaining unit that processes the first matrix using a deep learning model to obtain a second matrix; a third obtaining unit, which processes the second matrix by using more than 2 attention models to determine the attention degree of the words in the text sequence, and obtains a third matrix of the text sequence based on the attention degree; a classification unit determining semantic relationships between entity words in the text sequence at least according to the third matrix of the text sequence and a pre-stored classification model. According to the present embodiment, the classification efficiency can be improved.

Description

Method and device for classifying semantic relation of entity words and electronic equipment
Technical Field
The present application relates to the field of information technologies, and in particular, to a method and an apparatus for classifying semantic relationships of entity words in a text sequence, and an electronic device.
Background
The semantic relation classification of the entity words refers to determining which of predetermined semantic relations, such as a relation between an upper concept and a lower concept, a motonest relation, etc., the semantic relation between the entity words in the text sequence belongs to, for example, in a sentence "< e1> where a machine < e1> generates a large amount of < e2> noise < e2 >", the relation between the entity word e1 and the entity word e2 is determined as: esino-fruit (e1, e 2).
In the field of natural language processing, semantic relation classification of entity words is more concerned, because semantic relation classification has important application value in tasks such as information extraction, information retrieval, machine translation, question answering, knowledge base construction, semantic disambiguation and the like.
In the existing semantic relation classification method of entity words, classification can be performed by using a Recurrent Neural Network (RNN) model based on Long-Short Term Memory (LSTM) units, and the model can effectively utilize the ability of Long-distance dependence on information in sequence data, so that the method is very effective for processing text sequence data.
It should be noted that the above background description is only for the convenience of clear and complete description of the technical solutions of the present application and for the understanding of those skilled in the art. Such solutions are not considered to be known to the person skilled in the art merely because they have been set forth in the background section of the present application.
Disclosure of Invention
The inventor of the application finds that when the semantic relation classification task is heavy, the importance degree of other words in the sentence to the entity word is different, and the influence on the classification result is also different. When the number of words in the text sequence is small, the existing semantic relation classification method for entity words can efficiently classify the words, and when the number of words in the text sequence is large, the efficiency of classification is reduced because a large number of words which have little influence on the classification result exist.
Embodiments of the present application provide a method, an apparatus, and an electronic device for classifying semantic relationships of entity words, where an Attention Model (Attention Model) is introduced to determine a degree of Attention of words in a text sequence, and then semantic relationships between entity words are classified based on the degree of Attention, so that efficiency of classification can be improved.
According to a first aspect of embodiments of the present application, there is provided an apparatus for classifying semantic relationships of entity words in a text sequence, the apparatus including:
a first obtaining unit, configured to represent each word in the text sequence by a word vector to construct a first matrix;
a second obtaining unit, configured to process the first matrix using a deep learning model to obtain a second matrix, where rows or columns of the second matrix correspond to words in the text sequence;
a third obtaining unit, which processes the second matrix by using more than 2 attention models to determine the attention degree of the words in the text sequence, and obtains a third matrix of the text sequence based on the attention degree;
a classification unit determining semantic relationships between entity words in the text sequence at least according to the third matrix of the text sequence and a pre-stored classification model.
According to a second aspect of the embodiments of the present application, there is provided a method for classifying semantic relationships of entity words in a text sequence, the method including:
representing each word in the text sequence by a word vector to construct a first matrix;
processing the first matrix by using a deep learning model to obtain a second matrix, wherein rows or columns of the second matrix correspond to words in the text sequence;
processing the second matrix by using more than 2 attention models to determine the attention degree of words in the text sequence, and obtaining a third matrix of the text sequence based on the attention degree; and
determining semantic relationships between entity words in the text sequence at least according to the third matrix of the text sequence and a pre-stored classification model.
According to a third aspect of the embodiments of the present application, an electronic device is provided, which includes the apparatus for classifying semantic relationships of entity words in a text sequence according to the first aspect of the embodiments of the present application.
The beneficial effect of this application lies in: the efficiency of classifying the semantic relation of the entity words is improved.
Specific embodiments of the present invention are disclosed in detail with reference to the following description and drawings, indicating the manner in which the principles of the invention may be employed. It should be understood that the embodiments of the invention are not so limited in scope. The embodiments of the invention include many variations, modifications and equivalents within the spirit and scope of the appended claims.
Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments, in combination with or instead of the features of the other embodiments.
It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, integers, steps or components but does not preclude the presence or addition of one or more other features, integers, steps or components.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 is a schematic diagram of a classification method according to embodiment 1 of the present application;
FIG. 2 is a schematic diagram of a method for obtaining a third matrix in embodiment 1 of the present application;
FIG. 3 is a schematic diagram of a method of selecting a predetermined number of words according to embodiment 1 of the present application;
FIG. 4 is a schematic view of a sorting apparatus according to embodiment 2 of the present application;
FIG. 5 is a schematic view of a third obtaining unit of embodiment 2 of the present application;
FIG. 6 is a schematic view of a selecting unit according to embodiment 2 of the present application;
fig. 7 is a schematic diagram of a configuration of an electronic device according to embodiment 3 of the present application.
Detailed Description
The foregoing and other features of the invention will become apparent from the following description taken in conjunction with the accompanying drawings. In the description and drawings, particular embodiments of the invention have been disclosed in detail as being indicative of some of the embodiments in which the principles of the invention may be employed, it being understood that the invention is not limited to the embodiments described, but, on the contrary, is intended to cover all modifications, variations, and equivalents falling within the scope of the appended claims.
Example 1
The embodiment 1 of the present application provides a classification method, which is used for classifying semantic relations of entity words in a text sequence.
Fig. 1 is a schematic diagram of a classification method of embodiment 1, as shown in fig. 1, the method including:
s101, representing each word in the text sequence by a word vector to construct a first matrix (namely, word embedding);
s102, processing the first matrix by using a deep learning model to obtain a second matrix (namely, output of BLSTM), wherein rows or columns of the second matrix correspond to words in the text sequence;
s103, processing the second matrix by utilizing more than 2 Attention models (Attention models) to determine the Attention degree of words in the text sequence, and obtaining a third matrix (result of Attention) of the text sequence based on the Attention degree;
s104, determining semantic relations among entity words in the text sequence at least according to the third matrix of the text sequence and a pre-stored classification model.
In this embodiment, at least 2 Attention models (Attention models) are introduced to determine the Attention degree of words in a text sequence, and then semantic relationships between entity words are classified based on the Attention degree, so that the classification efficiency can be improved.
In step S101 of this embodiment, a word may be represented as a word vector (word Embedding) according to the feature of the word, and the word vector may be a multidimensional floating point number vector.
The feature of the word may include a feature of the word itself, a position feature of the word in the text sequence, and the like, for example, the feature of the word itself may be represented as a 50-dimensional or 100-dimensional vector, the position feature of the word may be represented as a 5-dimensional vector, and the like. Of course, the present embodiment is not limited to this, and besides the characteristics of the word itself and the position characteristics of the word, the word vector of the word may be constructed in consideration of the characteristics of the hypernym, the part of speech, the named entity, the parse tree, and the like.
In this embodiment, each word in the text sequence is represented by a word vector, whereby the word vectors of all words in the entire text sequence are constructed as a first matrix, which corresponds to the text sequence. For example, one row or column of the first matrix corresponds to a word vector for a word in the text sequence.
In step S102 of this embodiment, the first matrix may be processed by using a deep learning model to obtain a second matrix. For example, the first matrix obtained in step S101 may be processed using a Bi-directional long-short term memory (Bi-LSTM) model. In addition, the first matrix may be processed using other deep learning models, such as long-short term memory (LSTM) models.
In this embodiment, the row or column vectors of the second matrix may correspond to words in the text sequence. For example, the second matrix M2 may be represented as M2 ═ { F1, …, Fi, …, Ft }, where i and t are both integers, 1 ≦ i ≦ t, t represents the number of words in the text sequence, Fi is a vector corresponding to the ith word in the text sequence, and it is assumed that the entity words e1, e2 in the text sequence are the ith word in the text sequence, respectivelye1、ie2Word, then, vector Fie1And Fie2The vectors are corresponding to the entity words e1 and e2 in the sequence.
In step S103 of this embodiment, more than 2 Attention models (Attention models) are used to determine the Attention degree of a word in the text sequence, and the second matrix is processed to obtain a third matrix of the text sequence, where the Attention degree of the word can reflect the importance degree of the word to an entity word in the text sequence, and thus the third matrix can represent a word with a higher Attention degree in the second matrix, so that the classification in step S104 is more efficient; moreover, because more than 2 attention models are adopted, the words with higher attention degree can be selected more effectively.
Fig. 2 is a schematic diagram of a method for obtaining the third matrix in this embodiment, and as shown in fig. 2, the method may include:
s201, determining the attention degree of each word in the text sequence by using more than 2 attention models, and selecting a preset number of words from the text sequence based on the attention degree; and
s202, combining vectors corresponding to the selected words in the second matrix to form the third matrix.
In the present embodiment, through step S201 and step S202, a predetermined number of words can be extracted from the text sequence to compose a third matrix, and thus the size of the third matrix can be smaller than the second matrix.
Fig. 3 is a schematic diagram of a method for selecting a predetermined number of words according to the present embodiment, and is used to implement step S201. As shown in fig. 3, the method includes:
s301, combining vectors corresponding to entity words in the second matrix with the second matrix to form a fourth matrix;
s302, at least carrying out nonlinear processing on the fourth matrix by using an Attention Model (Attention Model) corresponding to the scale of the fourth matrix so as to determine the Attention degree of each word in the text sequence, and selecting a first preset number of words from the text sequence based on the Attention degree; and
s303, merging the vectors corresponding to the selected first predetermined number of words in the second matrix with the fourth matrix to form an updated fourth matrix, performing at least nonlinear processing on the updated fourth matrix by using an attention model corresponding to the scale of the updated fourth matrix, and selecting again the first predetermined number of words from the text sequence, wherein the sum of all the selected first predetermined number of words is equal to the predetermined number.
In this embodiment, since the information contained in the entity word itself is important for classifying the semantic relationship, in step S301, the vector corresponding to the entity word in the second matrix is merged with the second matrix to form a fourth matrix, and thus the fourth matrix is formedThe matrix may be composed of two parts, wherein one part is a second matrix M2, and the other part is a join unit U composed of vectors corresponding to entity words added to the second matrix, and the join unit U may be a vector or a matrix composed of a plurality of vectors, for example, the join unit U may be a vector Fi corresponding to an entity word e1e1Vector Fi corresponding to entity word e2e2Formed matrix { Fie1,Fie2Thus, the fourth matrix M4 may be denoted as M4 ═ M2+ U ═ F1, …, Fi, …, Ft, Fie1,Fie2}。
In step S302, the fourth matrix may be processed to determine the degree of attention of each word in the text sequence. The processing includes at least a non-linear processing, and the non-linear processing may include a Neural Network (Neural Network) processing based on an Attention (Attention) mechanism, such as a sigma (σ) processing, a tangent (tangent) processing, a relu processing, a sigmoid processing, and the like, but the embodiment is not limited thereto, and the non-linear processing may also include other processing manners. Further, the process may include a linear process.
For example, in the present embodiment, the fourth matrix M4 may be processed by the following equation (1), and the processing includes both linear processing and nonlinear processing:
y=WX(WfM2+WuU) (1)
wherein X is a function corresponding to the non-linear processing, WfIs a linear coefficient, W, corresponding to a second matrix M2uIs a linear coefficient corresponding to the adding unit U, and W is a linear coefficient corresponding to the fourth matrix M4.
After the fourth matrix is processed, a weight corresponding to each word in the text sequence can be obtained, and the weight represents the attention degree of the word.
In step S302, a first predetermined number of words are selected from the text sequence based on the attention degree of each word, for example, a first predetermined number of words before are selected according to the ranking of the attention degree from high to low, where the first predetermined number may be 1 or 2, for example.
In step S303, vectors corresponding to the first predetermined number of words selected in step S302 in the second matrix may be merged with the fourth matrix to form an updated fourth matrix, e.g., the jth word and the kth word in the text sequence selected in step S302, and vectors Fj and Fk corresponding to the jth word and the kth word in the second matrix M2 may be merged with the fourth matrix M4 to form an updated fourth matrix M4 ' in step S303, which updated fourth matrix M4 ' may be expressed as M4 ' ═ F1, …, Fi, …, Ft, Fie1,Fie2Fj, Fk ═ M2+ U', where Fie1,Fie2Fj, Fk may be regarded as an updated joining unit U ', that is, the vectors corresponding to the first predetermined number of words selected in step S302 in the second matrix may be merged with the original joining unit U to form an updated joining unit U ', which is merged with the second matrix M2 to form an updated fourth matrix M4 '.
In step S303, the updated fourth matrix may be subjected to a non-linear process in the same manner as in step S302 to re-determine the degree of attention of the words in the text sequence, and to re-select the first predetermined number of words from the text sequence based on the re-determined degree of attention.
In this embodiment, step S303 may be performed at least once until the first predetermined number of words selected from step S302 and the sum of the first predetermined number of words selected from step S303 equals the predetermined number of words required in step S201.
It should be noted that the vectors corresponding to the first predetermined number of words selected each time step S303 is executed may be used to form the updated fourth matrix when step S303 is executed next time.
In the present embodiment, the processing performed in step S302 for the fourth matrix M4 and the processing performed in step S303 for the updated fourth matrix M4' at least once may be collectively expressed as, for example, the following expression (2):
ym=WmXm(WfmM2+WumUm) (2)
wherein m is an integer, m is not less than 0 and not more than N, N is the number of times of performing step S303, and N is a natural number; when M is not less than 1 and not more than N, corresponding to the mth execution of step S303, processing the updated fourth matrix M4' each time step S303 is executed, and when M is 0, corresponding to the processing of the initial fourth matrix M4 in step S302; when m is more than or equal to 1 and less than or equal to N, Wfm、Wum、WmThe linear coefficients corresponding to the second matrix M2 and the updated adding unit U are respectively set to the M-th time step S303mCorresponding linear coefficient, linear coefficient corresponding to the updated fourth matrix M4', XmIs a function corresponding to the non-linear processing; when m is 0, Wf0、Wu0、W0In step S302, the linear coefficients corresponding to the second matrix M2 and the initial adding unit U are assigned0Corresponding linear coefficients, linear coefficients corresponding to the initial fourth matrix M4, X0Is a function corresponding to the non-linear processing; the type of non-linear treatment may be the same or different each time the treatment is performed, i.e. X varies with mmMay or may not be changed in type.
In the above equation (2), the fourth matrix and the updated fourth matrix are different in scale, and the fourth matrix after each update is also different in scale, and for example, when m is 0, the fourth matrix is smallest in scale, and as m gradually increases, the updated fourth matrix is also increased in scale, and W corresponding to m increasesfm、Wum、WmAlso varies with the change, so that the function X and each linear coefficient Wfm、Wum、WmThe commonly determined attention model is also changed, so that the attention model corresponding to the scale of the fourth matrix or the updated fourth matrix can be used for determining the attention degree of the words in the text sequence for multiple times, and the accuracy is improved. It should be noted that the formula shown in the above formula (2) is only an example, and the present embodiment may adopt other forms of formulas.
In addition, attention modelsParameter (2), e.g. linear coefficient Wfm、Wum、WmEtc. may be obtained by training a large number of training samples.
Further, in step S302 and step S303 of the present embodiment, a word that is not repeated each time the first predetermined number of words is selected and the first predetermined number of words that have already been selected, that is, a word that already exists in the joining unit or the updated joining unit, will not be added again to the joining unit or the updated joining unit.
In this embodiment, the number of the 2 or more attention models may be determined according to the number of words in the text sequence. For example, the greater the number of words in a text sequence, the greater the number of attention models that are used.
In this embodiment, before step S301, the number of attention models may be set according to the number of words in the text sequence, for example, a corresponding relationship between the number of words in the text sequence and the number of attention models may be preset, and the number of attention models may be set according to the corresponding relationship and the number of words in the text sequence to be detected.
In this embodiment, the correspondence between the number of words in the text sequence and the number of attention models may be preset by training a large number of training samples, for example, the text sequence as a training sample is divided into a plurality of training sets according to the number of words contained in the text sequence, and the number of words contained in each training sample in each training set may belong to a specific number interval, for example, the number of words contained in each training sample in the first training set may belong to a number interval of 1 to 10, and the like; and classifying each training sample in each training set by multiple semantic relations, wherein the number of the attention models adopted in each classification is different, so that the number of the optimal attention models of the training set can be determined according to the result of each classification, and the corresponding relation between the number interval corresponding to the training set and the number of the optimal attention models is determined. Of course, the embodiment is not limited to this, and the corresponding relationship between the number of words in the text sequence and the number of attention models may also be determined in other manners.
According to step S201 of the present embodiment, at least 2 attention models can be employed to determine the attention degree of a word in a text sequence, and thus, the result of determining the attention degree can be made more accurate.
In step S202 of this embodiment, the vectors corresponding to the predetermined number of words selected in step S201 in the second matrix are combined to form the third matrix. For example, the jth, kth, lth, mth, nth, and mth words in the text sequence are selected as a predetermined number of words in step S201, and vectors Fj, Fk, Fl, Fm, Fn, and Fo corresponding to the predetermined number of words in the second matrix M2 are combined to form a third matrix M3 in step S202, which third matrix M3 may be denoted as M3 { Fj, Fk, Fl, Fm, Fn, Fo }.
In the above-mentioned fig. 2 and 3, a method of extracting vectors of a predetermined number of words from the second matrix to form a third matrix is shown; however, the present embodiment is not limited thereto, and other methods may be used to form the third matrix.
The method for obtaining the third matrix in step S103 is described above with reference to fig. 2-4, but the embodiment is not limited thereto, and a method different from that in fig. 2-4 may also be adopted to obtain the third matrix.
In step S104 of this embodiment, the semantic relationship between the entity words in the text sequence is determined at least according to the third matrix obtained in step S103 and the pre-stored classification model. For example, no matter the number of words in the text sequence, the third matrix may be subjected to hidden layer processing to generate feature vectors, and the feature vectors are classified according to a pre-stored classification model to obtain categories of semantic relationships, where the method for performing hidden layer processing may refer to the prior art and is not described here.
In step S104 of the present embodiment, the semantic relationship may be determined based on a classification model stored in advance from both the third matrix M3 and the second matrix M2.
In the present embodiment, the classification model used in step S104 may include softmax, maximum entropy, bayes, or support vector machine. Also, the classification model may be obtained by training and stored for use in step S104. In this embodiment, the method corresponding to steps S101-S104 may be applied to the training samples of the training set, so as to train and obtain the classification model, and the description of the training process will not be repeated here.
In the embodiment, more than 2 Attention models (Attention models) are introduced to determine the Attention degree of words in the text sequence, and then the semantic relation between the entity words is classified based on the Attention degree, so that the classification efficiency can be improved.
Example 2
Embodiment 2 of the present application provides a device for classifying semantic relationships of entity words in a text sequence, which corresponds to the method in embodiment 1.
Fig. 4 is a schematic diagram of the classification apparatus of this embodiment 2, and as shown in fig. 4, the apparatus 400 includes a first obtaining unit 401, a second obtaining unit 402, a third obtaining unit 403, and a classification unit 404.
The first obtaining unit 401 is configured to represent each word in the text sequence by a word vector to construct a first matrix; the second obtaining unit 402 processes the first matrix by using a deep learning model to obtain a second matrix; a third obtaining unit 403, using more than 2 attention models, processes the second matrix to determine the attention degree of words in the text sequence, and obtains a third matrix of the text sequence based on the attention degree; the classification unit 404 determines semantic relationships between entity words in the text sequence at least according to the third matrix of the text sequence and a pre-stored classification model.
Fig. 5 is a schematic diagram of a third obtaining unit of this embodiment 2, and as shown in fig. 5, the third obtaining unit 403 may include a selecting unit 501 and a combining unit 502.
The selection unit 501 determines the attention degree of each word in the text sequence by using more than 2 attention models, and selects a predetermined number of words from the text sequence based on the attention degree; a merging unit 502 merges vectors corresponding to the selected predetermined number of words in the second matrix to form the third matrix.
Fig. 6 is a schematic diagram of the selecting unit of this embodiment 2, and as shown in fig. 6, the selecting unit 501 may include a first combining sub-unit 601, a first processing sub-unit 602, and a second processing sub-unit 603.
The first merging subunit 601 is configured to merge a vector corresponding to an entity word in the second matrix with the second matrix to form a fourth matrix; the first processing subunit 602 performs nonlinear processing on the fourth matrix to determine a degree of interest of each word in the text sequence, and selects a first predetermined number of words from the text sequence based on the degree of interest; the second processing subunit 603 merges the vectors corresponding to the selected first predetermined number of words in the second matrix with the second matrix to form an updated fourth matrix, and selects again the first predetermined number of words from the text sequence based on the updated fourth matrix, where the sum of all the selected first predetermined number of words is equal to the predetermined number.
In the embodiment, more than 2 Attention models (Attention models) are introduced to determine the Attention degree of words in the text sequence, and then the semantic relation between the entity words is classified based on the Attention degree, so that the classification efficiency can be improved.
Example 3
An embodiment 3 of the present application provides an electronic device, including: the apparatus for classifying semantic relationships of entity words in a text sequence as described in embodiment 2.
Fig. 7 is a schematic diagram of a configuration of an electronic device according to embodiment 3 of the present application. As shown in fig. 7, the electronic device 700 may include: a Central Processing Unit (CPU)701 and a memory 702; the memory 702 is coupled to the central processor 701. Wherein the memory 702 can store various data; a program for classifying semantic relationships of entity words in a text sequence is also stored and executed under the control of the central processor 701.
In one embodiment, the function of the sorting apparatus may be integrated into the central processor 701.
Wherein, the central processor 701 may be configured to:
representing each word in the text sequence by a word vector to construct a first matrix; processing the first matrix by using a deep learning model to obtain a second matrix, wherein rows or columns of the second matrix correspond to words in the text sequence; processing the second matrix with more than 2 Attention models (Attention models) to determine the Attention degree of words in the text sequence, and obtaining a third matrix (result of Attention) of the text sequence based on the Attention degree; determining semantic relationships between entity words in the text sequence at least according to the third matrix of the text sequence and a pre-stored classification model.
Wherein, the central processor 701 may be further configured to:
determining the attention degree of each word in the text sequence by using more than 2 attention models, and selecting a preset number of words from the text sequence based on the attention degree; merging vectors in the second matrix corresponding to the selected predetermined number of words to form the third matrix.
Wherein, the central processor 701 may be further configured to:
combining vectors corresponding to entity words in the second matrix with the second matrix to form a fourth matrix;
performing at least a non-linear processing on the fourth matrix using an Attention Model (Attention Model) corresponding to a scale of the fourth matrix to determine a degree of Attention of each word in the text sequence, and selecting a first predetermined number of words from the text sequence based on the degree of Attention; and
merging vectors corresponding to the selected first predetermined number of words in the second matrix with the fourth matrix (merging the selected words with BLSTM and previous context information) to form an updated fourth matrix;
performing at least nonlinear processing on the updated fourth matrix by using an Attention Model (Attention Model) corresponding to the scale of the updated fourth matrix, and selecting a first preset number of words from the text sequence again;
and updating the fourth matrix according to vectors corresponding to the first preset number of words selected at the previous time at least once, and selecting the first preset number of words from the updated fourth matrix again by using the attention model, wherein the sum of all the selected first preset number of words is equal to the preset number.
Wherein, the central processor 701 may be further configured to:
each time a first predetermined number of words is selected, there are no words in the first predetermined number of words that have been selected that are repeated.
Wherein, the central processor 701 may be further configured to:
determining the semantic relationship from the third matrix and the second matrix, and the classification model.
Wherein, the central processor 701 may be further configured to:
the number of the more than 2 attention models is determined according to the number of words in the text sequence.
Further, as shown in fig. 7, the electronic device 700 may further include: an input/output unit 703, a display unit 704, and the like; the functions of the above components are similar to those of the prior art, and are not described in detail here. It is noted that the electronic device 700 does not necessarily include all of the components shown in fig. 7; furthermore, the electronic device 700 may also comprise components not shown in fig. 7, reference being made to the prior art.
Embodiments of the present application further provide a computer readable program, where when the program is executed in a positioning apparatus or an electronic device, the program causes the classification apparatus or the electronic device to execute the classification method described in embodiment 2.
An embodiment of the present application further provides a storage medium storing a computer-readable program, where the storage medium stores the computer-readable program, and the computer-readable program enables a classification apparatus or an electronic device to execute the classification method described in embodiment 2.
The classification means described in connection with the embodiments of the invention may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. For example, one or more of the functional block diagrams and/or one or more combinations of the functional block diagrams illustrated in fig. 4-6 may correspond to individual software modules of a computer program flow or individual hardware modules. These software modules may correspond to the respective steps shown in embodiment 1. These hardware modules may be implemented, for example, by solidifying these software modules using a Field Programmable Gate Array (FPGA).
A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium; or the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The software module may be stored in the memory of the mobile terminal or in a memory card that is insertable into the mobile terminal. For example, if the apparatus (e.g., mobile terminal) employs a relatively large capacity MEGA-SIM card or a large capacity flash memory device, the software module may be stored in the MEGA-SIM card or the large capacity flash memory device.
One or more of the functional block diagrams and/or one or more combinations of the functional block diagrams described with respect to fig. 4-6 may be implemented as a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any suitable combination thereof designed to perform the functions described herein. One or more of the functional block diagrams and/or one or more combinations of the functional block diagrams described with respect to fig. 4-6 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP communication, or any other such configuration.
The present application has been described in conjunction with specific embodiments, but it should be understood by those skilled in the art that these descriptions are intended to be illustrative, and not limiting. Various modifications and adaptations of the present application may occur to those skilled in the art based on the teachings herein and are within the scope of the present application.
With respect to the embodiments including the above embodiments, the following remarks are also disclosed:
annex 1, a device for classifying semantic relationships of entity words in a text sequence, the device comprising:
a first obtaining unit, configured to represent each word in the text sequence by a word vector to construct a first matrix;
a second obtaining unit, configured to process the first matrix using a deep learning model to obtain a second matrix, where rows or columns of the second matrix correspond to words in the text sequence;
a third obtaining unit, which processes the second matrix by using more than 2 attention models to determine the attention degree of the words in the text sequence, and obtains a third matrix of the text sequence based on the attention degree;
a classification unit determining semantic relationships between entity words in the text sequence at least according to the third matrix of the text sequence and a pre-stored classification model.
Supplementary note 2, the apparatus as set forth in supplementary note 1, wherein the third obtaining unit includes:
a selection unit that determines a degree of attention of each word in the text sequence using 2 or more attention models, and selects a predetermined number of words from the text sequence based on the degree of attention; and
a merging unit, configured to merge vectors corresponding to the selected predetermined number of words in the second matrix to form the third matrix.
Note 3 that the apparatus according to note 2, wherein the selection unit includes:
a first merging subunit, configured to merge a vector corresponding to an entity word in the second matrix with the second matrix to form a fourth matrix;
a first processing subunit that performs at least nonlinear processing on the fourth matrix using an attention model corresponding to a scale of the fourth matrix to determine a degree of attention of each word in the text sequence, and selects a first predetermined number of words from the text sequence based on the degree of attention; and
a second processing subunit that merges the vectors in the second matrix corresponding to the selected first predetermined number of words with the fourth matrix to form an updated fourth matrix, and performs at least a nonlinear processing on the updated fourth matrix using an attention model corresponding to the scale of the updated fourth matrix to select again the first predetermined number of words from the text sequence,
the second processing subunit updates the fourth matrix at least once according to the vectors corresponding to the selected first predetermined number of words, and selects the first predetermined number of words again from the updated fourth matrix by using the attention model, wherein the sum of all the selected first predetermined number of words is equal to the predetermined number.
The apparatus according to supplementary note 4, supplementary note 3 wherein,
the first predetermined number of words selected each time by the first processing subunit or the second processing subunit are not repeated with the words of the first predetermined number of words that have already been selected.
The apparatus according to supplementary note 5, as set forth in supplementary note 1, wherein,
the number of the more than 2 attention models is determined according to the number of words in the text sequence.
Supplementary note 6, the apparatus as recited in supplementary note 1, wherein,
the classification unit determines the semantic relationship according to the third matrix and the second matrix, and the classification model.
Reference 7 discloses an electronic device including the apparatus according to any one of reference 1 to 6.
Annex 8. a method for classifying semantic relations of entity words in a text sequence, the method comprising:
representing each word in the text sequence by a word vector to construct a first matrix;
processing the first matrix by using a deep learning model to obtain a second matrix, wherein rows or columns of the second matrix correspond to words in the text sequence;
processing the second matrix by using more than 2 attention models to determine the attention degree of words in the text sequence, and obtaining a third matrix of the text sequence based on the attention degree;
determining semantic relationships between entity words in the text sequence at least according to the third matrix of the text sequence and a pre-stored classification model.
The method according to supplementary note 9, wherein the method of obtaining the third matrix using 2 or more attention models includes:
determining the attention degree of each word in the text sequence by using more than 2 attention models, and selecting a preset number of words from the text sequence based on the attention degree; and
merging vectors in the second matrix corresponding to the selected predetermined number of words to form the third matrix.
The method of annex 10, as recited in annex 9, wherein selecting a predetermined number of words from the text sequence comprises:
combining vectors corresponding to entity words in the second matrix with the second matrix to form a fourth matrix;
performing at least non-linear processing on the fourth matrix by using an attention model corresponding to the scale of the fourth matrix to determine the attention degree of each word in the text sequence, and selecting a first preset number of words from the text sequence based on the attention degree; and
merging vectors corresponding to the selected first predetermined number of words in the second matrix with the fourth matrix to form an updated fourth matrix, performing at least nonlinear processing on the updated fourth matrix by using an Attention Model (Attention Model) corresponding to the scale of the updated fourth matrix, and selecting the first predetermined number of words again from the text sequence,
and updating the fourth matrix according to vectors corresponding to the first preset number of words selected at the previous time at least once, and selecting the first preset number of words from the updated fourth matrix again by using the attention model, wherein the sum of all the selected first preset number of words is equal to the preset number.
The method as set forth in supplementary note 11, in which,
each time a first predetermined number of words is selected, there are no words in the first predetermined number of words that have been selected that are repeated.
The method according to supplementary note 12, supplementary note 8, wherein,
the number of the more than 2 attention models is determined according to the number of words in the text sequence.
Reference 13 the method of reference 8, wherein determining the semantic relationship based at least on the third matrix and the classification model comprises:
determining the semantic relationship from the third matrix and the second matrix, and the classification model.

Claims (6)

1. An apparatus for classifying semantic relationships of entity words in a text sequence, the apparatus comprising:
a first obtaining unit, configured to represent each word in the text sequence by a word vector to construct a first matrix;
a second obtaining unit, configured to process the first matrix using a deep learning model to obtain a second matrix, where rows or columns of the second matrix correspond to words in the text sequence;
a third obtaining unit, which processes the second matrix by using more than 2 attention models to determine the attention degree of the words in the text sequence, and obtains a third matrix of the text sequence based on the attention degree;
a classification unit, which determines semantic relations between entity words in the text sequence at least according to the third matrix of the text sequence and a pre-stored classification model;
wherein the third obtaining unit includes:
a selection unit that determines a degree of attention of each word in the text sequence using 2 or more attention models, and selects a predetermined number of words from the text sequence based on the degree of attention; and
a merging unit, configured to merge vectors corresponding to the selected predetermined number of words in the second matrix to form the third matrix;
wherein the selection unit includes:
a first merging subunit, configured to merge a vector corresponding to an entity word in the second matrix with the second matrix to form a fourth matrix;
a first processing subunit that performs at least nonlinear processing on the fourth matrix using an attention model corresponding to a scale of the fourth matrix to determine a degree of attention of each word in the text sequence, and selects a first predetermined number of words from the text sequence based on the degree of attention; and
a second processing subunit that merges the vectors in the second matrix corresponding to the selected first predetermined number of words with the fourth matrix to form an updated fourth matrix, and performs at least a nonlinear processing on the updated fourth matrix using an attention model corresponding to the scale of the updated fourth matrix to select again the first predetermined number of words from the text sequence,
wherein, the second processing subunit performs the following processing for N times: updating the fourth matrix according to the vectors corresponding to the first predetermined number of words that have been selected, and selecting again the first predetermined number of words from the updated fourth matrix using the attention model,
wherein a sum of the number of the first predetermined number of words selected by the second processing subunit in N times and the number of the first predetermined number of words selected by the first processing subunit is equal to the predetermined number,
n is a natural number of 1 or more.
2. The apparatus of claim 1, wherein,
the first predetermined number of words selected each time by the first processing subunit or the second processing subunit are not repeated with the words of the first predetermined number of words that have already been selected.
3. The apparatus of claim 1, wherein,
the number of the more than 2 attention models is determined according to the number of words in the text sequence.
4. The apparatus of claim 1, wherein,
the classification unit determines the semantic relationship according to the third matrix and the second matrix, and the classification model.
5. An electronic device comprising the apparatus of any of claims 1-4.
6. A method of classifying semantic relationships of entity words in a text sequence, the method comprising:
representing each word in the text sequence by a word vector to construct a first matrix;
processing the first matrix by using a deep learning model to obtain a second matrix, wherein rows or columns of the second matrix correspond to words in the text sequence;
processing the second matrix by using more than 2 attention models to determine the attention degree of words in the text sequence, and obtaining a third matrix of the text sequence based on the attention degree;
determining semantic relations between entity words in the text sequence at least according to the third matrix of the text sequence and a pre-stored classification model;
wherein, with more than 2 attention models, the method for obtaining the third matrix comprises:
determining the attention degree of each word in the text sequence by using more than 2 attention models, and selecting a preset number of words from the text sequence based on the attention degree; and
merging vectors corresponding to the selected predetermined number of words in the second matrix to form the third matrix;
wherein selecting a predetermined number of words from the text sequence comprises:
combining vectors corresponding to entity words in the second matrix with the second matrix to form a fourth matrix;
performing a first process, the first process comprising: performing at least non-linear processing on the fourth matrix by using an attention model corresponding to the scale of the fourth matrix to determine the attention degree of each word in the text sequence, and selecting a first preset number of words from the text sequence based on the attention degree; and
performing a second process, the second process comprising: combining vectors corresponding to the selected first predetermined number of words in the second matrix with the fourth matrix to form an updated fourth matrix, performing at least nonlinear processing on the updated fourth matrix by using an attention model corresponding to the scale of the updated fourth matrix, and selecting the first predetermined number of words again from the text sequence;
wherein the second processing includes the following processing N times: updating the fourth matrix according to vectors corresponding to the first predetermined number of words selected last time, and selecting again the first predetermined number of words from the updated fourth matrix by using the attention model, and the sum of the number of the first predetermined number of words selected by the second process in N times and the number of the first predetermined number of words selected by the first process is equal to the predetermined number,
n is a natural number of 1 or more.
CN201610929103.5A 2016-10-31 2016-10-31 Method and device for classifying semantic relation of entity words and electronic equipment Active CN108021544B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610929103.5A CN108021544B (en) 2016-10-31 2016-10-31 Method and device for classifying semantic relation of entity words and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610929103.5A CN108021544B (en) 2016-10-31 2016-10-31 Method and device for classifying semantic relation of entity words and electronic equipment

Publications (2)

Publication Number Publication Date
CN108021544A CN108021544A (en) 2018-05-11
CN108021544B true CN108021544B (en) 2021-07-06

Family

ID=62069665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610929103.5A Active CN108021544B (en) 2016-10-31 2016-10-31 Method and device for classifying semantic relation of entity words and electronic equipment

Country Status (1)

Country Link
CN (1) CN108021544B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376222B (en) * 2018-09-27 2021-05-25 国信优易数据股份有限公司 Question-answer matching degree calculation method, question-answer automatic matching method and device
CN111177383B (en) * 2019-12-24 2024-01-16 上海大学 Text entity relation automatic classification method integrating text grammar structure and semantic information
CN112085837B (en) * 2020-09-10 2022-04-26 哈尔滨理工大学 Three-dimensional model classification method based on geometric shape and LSTM neural network
CN112417156B (en) * 2020-11-30 2024-05-14 百度国际科技(深圳)有限公司 Multi-task learning method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1716921A (en) * 2004-06-30 2006-01-04 微软公司 When-free messaging
CN101044470A (en) * 2003-06-30 2007-09-26 微软公司 Positioning and rendering notification heralds based on user's focus of attention and activity
CN102111601A (en) * 2009-12-23 2011-06-29 大猩猩科技股份有限公司 Content-based adaptive multimedia processing system and method
CN104298651A (en) * 2014-09-09 2015-01-21 大连理工大学 Biomedicine named entity recognition and protein interactive relationship extracting on-line system based on deep learning

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7274741B2 (en) * 2002-11-01 2007-09-25 Microsoft Corporation Systems and methods for generating a comprehensive user attention model
US8758018B2 (en) * 2009-12-31 2014-06-24 Teledyne Scientific & Imaging, Llc EEG-based acceleration of second language learning
US9201864B2 (en) * 2013-03-15 2015-12-01 Luminoso Technologies, Inc. Method and system for converting document sets to term-association vector spaces on demand
US20150095017A1 (en) * 2013-09-27 2015-04-02 Google Inc. System and method for learning word embeddings using neural language models
US9721002B2 (en) * 2013-11-29 2017-08-01 Sap Se Aggregating results from named entity recognition services
CN104834747B (en) * 2015-05-25 2018-04-27 中国科学院自动化研究所 Short text classification method based on convolutional neural networks
CN105183720B (en) * 2015-08-05 2019-07-09 百度在线网络技术(北京)有限公司 Machine translation method and device based on RNN model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101044470A (en) * 2003-06-30 2007-09-26 微软公司 Positioning and rendering notification heralds based on user's focus of attention and activity
CN1716921A (en) * 2004-06-30 2006-01-04 微软公司 When-free messaging
CN102111601A (en) * 2009-12-23 2011-06-29 大猩猩科技股份有限公司 Content-based adaptive multimedia processing system and method
CN104298651A (en) * 2014-09-09 2015-01-21 大连理工大学 Biomedicine named entity recognition and protein interactive relationship extracting on-line system based on deep learning

Also Published As

Publication number Publication date
CN108021544A (en) 2018-05-11

Similar Documents

Publication Publication Date Title
CN107562752B (en) Method and device for classifying semantic relation of entity words and electronic equipment
US11948066B2 (en) Processing sequences using convolutional neural networks
CN109871532B (en) Text theme extraction method and device and storage medium
CN109947919B (en) Method and apparatus for generating text matching model
CN108021544B (en) Method and device for classifying semantic relation of entity words and electronic equipment
JP2020520492A (en) Document abstract automatic extraction method, device, computer device and storage medium
US11347995B2 (en) Neural architecture search with weight sharing
US8019594B2 (en) Method and apparatus for progressively selecting features from a large feature space in statistical modeling
CN109816438B (en) Information pushing method and device
CN111274797A (en) Intention recognition method, device and equipment for terminal and storage medium
CN110929524A (en) Data screening method, device, equipment and computer readable storage medium
US8019593B2 (en) Method and apparatus for generating features through logical and functional operations
CN110276382B (en) Crowd classification method, device and medium based on spectral clustering
CN104572614A (en) Training method and system for language model
CN111967264A (en) Named entity identification method
CN112686046A (en) Model training method, device, equipment and computer readable medium
WO2014073206A1 (en) Information-processing device and information-processing method
CN110019832B (en) Method and device for acquiring language model
CN110705279A (en) Vocabulary selection method and device and computer readable storage medium
CN113449089B (en) Intent recognition method, question-answering method and computing device of query statement
CN117077679B (en) Named entity recognition method and device
US20230063686A1 (en) Fine-grained stochastic neural architecture search
CN113239697A (en) Entity recognition model training method and device, computer equipment and storage medium
CN112559725A (en) Text matching method, device, terminal and storage medium
CN108009150B (en) Input method and device based on recurrent neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant