CN108021544B

CN108021544B - Method and device for classifying semantic relation of entity words and electronic equipment

Info

Publication number: CN108021544B
Application number: CN201610929103.5A
Authority: CN
Inventors: 张姝; 杨铭; 孙俊
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2016-10-31
Filing date: 2016-10-31
Publication date: 2021-07-06
Anticipated expiration: 2036-10-31
Also published as: CN108021544A

Abstract

The embodiment of the application provides a method, a device and electronic equipment for classifying semantic relations of entity words in a text sequence, wherein the device comprises: a first obtaining unit, configured to represent each word in the text sequence by a word vector to construct a first matrix; a second obtaining unit that processes the first matrix using a deep learning model to obtain a second matrix; a third obtaining unit, which processes the second matrix by using more than 2 attention models to determine the attention degree of the words in the text sequence, and obtains a third matrix of the text sequence based on the attention degree; a classification unit determining semantic relationships between entity words in the text sequence at least according to the third matrix of the text sequence and a pre-stored classification model. According to the present embodiment, the classification efficiency can be improved.

Description

Method and device for classifying semantic relation of entity words and electronic equipment

Technical Field

The present application relates to the field of information technologies, and in particular, to a method and an apparatus for classifying semantic relationships of entity words in a text sequence, and an electronic device.

Background

The semantic relation classification of the entity words refers to determining which of predetermined semantic relations, such as a relation between an upper concept and a lower concept, a motonest relation, etc., the semantic relation between the entity words in the text sequence belongs to, for example, in a sentence "< e1> where a machine < e1> generates a large amount of < e2> noise < e2 >", the relation between the entity word e1 and the entity word e2 is determined as: esino-fruit (e1, e 2).

In the field of natural language processing, semantic relation classification of entity words is more concerned, because semantic relation classification has important application value in tasks such as information extraction, information retrieval, machine translation, question answering, knowledge base construction, semantic disambiguation and the like.

In the existing semantic relation classification method of entity words, classification can be performed by using a Recurrent Neural Network (RNN) model based on Long-Short Term Memory (LSTM) units, and the model can effectively utilize the ability of Long-distance dependence on information in sequence data, so that the method is very effective for processing text sequence data.

It should be noted that the above background description is only for the convenience of clear and complete description of the technical solutions of the present application and for the understanding of those skilled in the art. Such solutions are not considered to be known to the person skilled in the art merely because they have been set forth in the background section of the present application.

Disclosure of Invention

The inventor of the application finds that when the semantic relation classification task is heavy, the importance degree of other words in the sentence to the entity word is different, and the influence on the classification result is also different. When the number of words in the text sequence is small, the existing semantic relation classification method for entity words can efficiently classify the words, and when the number of words in the text sequence is large, the efficiency of classification is reduced because a large number of words which have little influence on the classification result exist.

Embodiments of the present application provide a method, an apparatus, and an electronic device for classifying semantic relationships of entity words, where an Attention Model (Attention Model) is introduced to determine a degree of Attention of words in a text sequence, and then semantic relationships between entity words are classified based on the degree of Attention, so that efficiency of classification can be improved.

According to a first aspect of embodiments of the present application, there is provided an apparatus for classifying semantic relationships of entity words in a text sequence, the apparatus including:

a first obtaining unit, configured to represent each word in the text sequence by a word vector to construct a first matrix;

a second obtaining unit, configured to process the first matrix using a deep learning model to obtain a second matrix, where rows or columns of the second matrix correspond to words in the text sequence;

a third obtaining unit, which processes the second matrix by using more than 2 attention models to determine the attention degree of the words in the text sequence, and obtains a third matrix of the text sequence based on the attention degree;

a classification unit determining semantic relationships between entity words in the text sequence at least according to the third matrix of the text sequence and a pre-stored classification model.

According to a second aspect of the embodiments of the present application, there is provided a method for classifying semantic relationships of entity words in a text sequence, the method including:

representing each word in the text sequence by a word vector to construct a first matrix;

processing the first matrix by using a deep learning model to obtain a second matrix, wherein rows or columns of the second matrix correspond to words in the text sequence;

processing the second matrix by using more than 2 attention models to determine the attention degree of words in the text sequence, and obtaining a third matrix of the text sequence based on the attention degree; and

determining semantic relationships between entity words in the text sequence at least according to the third matrix of the text sequence and a pre-stored classification model.

According to a third aspect of the embodiments of the present application, an electronic device is provided, which includes the apparatus for classifying semantic relationships of entity words in a text sequence according to the first aspect of the embodiments of the present application.

The beneficial effect of this application lies in: the efficiency of classifying the semantic relation of the entity words is improved.

Specific embodiments of the present invention are disclosed in detail with reference to the following description and drawings, indicating the manner in which the principles of the invention may be employed. It should be understood that the embodiments of the invention are not so limited in scope. The embodiments of the invention include many variations, modifications and equivalents within the spirit and scope of the appended claims.

Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments, in combination with or instead of the features of the other embodiments.

It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, integers, steps or components but does not preclude the presence or addition of one or more other features, integers, steps or components.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 is a schematic diagram of a classification method according to embodiment 1 of the present application;

FIG. 2 is a schematic diagram of a method for obtaining a third matrix in embodiment 1 of the present application;

FIG. 3 is a schematic diagram of a method of selecting a predetermined number of words according to embodiment 1 of the present application;

FIG. 4 is a schematic view of a sorting apparatus according to embodiment 2 of the present application;

FIG. 5 is a schematic view of a third obtaining unit of embodiment 2 of the present application;

FIG. 6 is a schematic view of a selecting unit according to embodiment 2 of the present application;

fig. 7 is a schematic diagram of a configuration of an electronic device according to embodiment 3 of the present application.

Detailed Description

The foregoing and other features of the invention will become apparent from the following description taken in conjunction with the accompanying drawings. In the description and drawings, particular embodiments of the invention have been disclosed in detail as being indicative of some of the embodiments in which the principles of the invention may be employed, it being understood that the invention is not limited to the embodiments described, but, on the contrary, is intended to cover all modifications, variations, and equivalents falling within the scope of the appended claims.

Example 1

The embodiment 1 of the present application provides a classification method, which is used for classifying semantic relations of entity words in a text sequence.

Fig. 1 is a schematic diagram of a classification method of embodiment 1, as shown in fig. 1, the method including:

s101, representing each word in the text sequence by a word vector to construct a first matrix (namely, word embedding);

s102, processing the first matrix by using a deep learning model to obtain a second matrix (namely, output of BLSTM), wherein rows or columns of the second matrix correspond to words in the text sequence;

s103, processing the second matrix by utilizing more than 2 Attention models (Attention models) to determine the Attention degree of words in the text sequence, and obtaining a third matrix (result of Attention) of the text sequence based on the Attention degree;

s104, determining semantic relations among entity words in the text sequence at least according to the third matrix of the text sequence and a pre-stored classification model.

In this embodiment, at least 2 Attention models (Attention models) are introduced to determine the Attention degree of words in a text sequence, and then semantic relationships between entity words are classified based on the Attention degree, so that the classification efficiency can be improved.

In step S101 of this embodiment, a word may be represented as a word vector (word Embedding) according to the feature of the word, and the word vector may be a multidimensional floating point number vector.

The feature of the word may include a feature of the word itself, a position feature of the word in the text sequence, and the like, for example, the feature of the word itself may be represented as a 50-dimensional or 100-dimensional vector, the position feature of the word may be represented as a 5-dimensional vector, and the like. Of course, the present embodiment is not limited to this, and besides the characteristics of the word itself and the position characteristics of the word, the word vector of the word may be constructed in consideration of the characteristics of the hypernym, the part of speech, the named entity, the parse tree, and the like.

In this embodiment, each word in the text sequence is represented by a word vector, whereby the word vectors of all words in the entire text sequence are constructed as a first matrix, which corresponds to the text sequence. For example, one row or column of the first matrix corresponds to a word vector for a word in the text sequence.

In step S102 of this embodiment, the first matrix may be processed by using a deep learning model to obtain a second matrix. For example, the first matrix obtained in step S101 may be processed using a Bi-directional long-short term memory (Bi-LSTM) model. In addition, the first matrix may be processed using other deep learning models, such as long-short term memory (LSTM) models.

In this embodiment, the row or column vectors of the second matrix may correspond to words in the text sequence. For example, the second matrix M2 may be represented as M2 ═ { F1, …, Fi, …, Ft }, where i and t are both integers, 1 ≦ i ≦ t, t represents the number of words in the text sequence, Fi is a vector corresponding to the ith word in the text sequence, and it is assumed that the entity words e1, e2 in the text sequence are the ith word in the text sequence, respectively_e1、i_e2Word, then, vector Fi_e1And Fi_e2The vectors are corresponding to the entity words e1 and e2 in the sequence.

In step S103 of this embodiment, more than 2 Attention models (Attention models) are used to determine the Attention degree of a word in the text sequence, and the second matrix is processed to obtain a third matrix of the text sequence, where the Attention degree of the word can reflect the importance degree of the word to an entity word in the text sequence, and thus the third matrix can represent a word with a higher Attention degree in the second matrix, so that the classification in step S104 is more efficient; moreover, because more than 2 attention models are adopted, the words with higher attention degree can be selected more effectively.

Fig. 2 is a schematic diagram of a method for obtaining the third matrix in this embodiment, and as shown in fig. 2, the method may include:

s201, determining the attention degree of each word in the text sequence by using more than 2 attention models, and selecting a preset number of words from the text sequence based on the attention degree; and

s202, combining vectors corresponding to the selected words in the second matrix to form the third matrix.

In the present embodiment, through step S201 and step S202, a predetermined number of words can be extracted from the text sequence to compose a third matrix, and thus the size of the third matrix can be smaller than the second matrix.

Fig. 3 is a schematic diagram of a method for selecting a predetermined number of words according to the present embodiment, and is used to implement step S201. As shown in fig. 3, the method includes:

s301, combining vectors corresponding to entity words in the second matrix with the second matrix to form a fourth matrix;

s302, at least carrying out nonlinear processing on the fourth matrix by using an Attention Model (Attention Model) corresponding to the scale of the fourth matrix so as to determine the Attention degree of each word in the text sequence, and selecting a first preset number of words from the text sequence based on the Attention degree; and

s303, merging the vectors corresponding to the selected first predetermined number of words in the second matrix with the fourth matrix to form an updated fourth matrix, performing at least nonlinear processing on the updated fourth matrix by using an attention model corresponding to the scale of the updated fourth matrix, and selecting again the first predetermined number of words from the text sequence, wherein the sum of all the selected first predetermined number of words is equal to the predetermined number.

In this embodiment, since the information contained in the entity word itself is important for classifying the semantic relationship, in step S301, the vector corresponding to the entity word in the second matrix is merged with the second matrix to form a fourth matrix, and thus the fourth matrix is formedThe matrix may be composed of two parts, wherein one part is a second matrix M2, and the other part is a join unit U composed of vectors corresponding to entity words added to the second matrix, and the join unit U may be a vector or a matrix composed of a plurality of vectors, for example, the join unit U may be a vector Fi corresponding to an entity word e1_e1Vector Fi corresponding to entity word e2_e2Formed matrix { Fi_e1，Fi_e2Thus, the fourth matrix M4 may be denoted as M4 ═ M2+ U ═ F1, …, Fi, …, Ft, Fi_e1，Fi_e2}。

In step S302, the fourth matrix may be processed to determine the degree of attention of each word in the text sequence. The processing includes at least a non-linear processing, and the non-linear processing may include a Neural Network (Neural Network) processing based on an Attention (Attention) mechanism, such as a sigma (σ) processing, a tangent (tangent) processing, a relu processing, a sigmoid processing, and the like, but the embodiment is not limited thereto, and the non-linear processing may also include other processing manners. Further, the process may include a linear process.

For example, in the present embodiment, the fourth matrix M4 may be processed by the following equation (1), and the processing includes both linear processing and nonlinear processing:

y＝WX(W_fM2+W_uU) (1)

wherein X is a function corresponding to the non-linear processing, W_fIs a linear coefficient, W, corresponding to a second matrix M2_uIs a linear coefficient corresponding to the adding unit U, and W is a linear coefficient corresponding to the fourth matrix M4.

After the fourth matrix is processed, a weight corresponding to each word in the text sequence can be obtained, and the weight represents the attention degree of the word.

In step S302, a first predetermined number of words are selected from the text sequence based on the attention degree of each word, for example, a first predetermined number of words before are selected according to the ranking of the attention degree from high to low, where the first predetermined number may be 1 or 2, for example.

In step S303, vectors corresponding to the first predetermined number of words selected in step S302 in the second matrix may be merged with the fourth matrix to form an updated fourth matrix, e.g., the jth word and the kth word in the text sequence selected in step S302, and vectors Fj and Fk corresponding to the jth word and the kth word in the second matrix M2 may be merged with the fourth matrix M4 to form an updated fourth matrix M4 ' in step S303, which updated fourth matrix M4 ' may be expressed as M4 ' ═ F1, …, Fi, …, Ft, Fi_e1，Fi_e2Fj, Fk ═ M2+ U', where Fi_e1，Fi_e2Fj, Fk may be regarded as an updated joining unit U ', that is, the vectors corresponding to the first predetermined number of words selected in step S302 in the second matrix may be merged with the original joining unit U to form an updated joining unit U ', which is merged with the second matrix M2 to form an updated fourth matrix M4 '.

In step S303, the updated fourth matrix may be subjected to a non-linear process in the same manner as in step S302 to re-determine the degree of attention of the words in the text sequence, and to re-select the first predetermined number of words from the text sequence based on the re-determined degree of attention.

In this embodiment, step S303 may be performed at least once until the first predetermined number of words selected from step S302 and the sum of the first predetermined number of words selected from step S303 equals the predetermined number of words required in step S201.

It should be noted that the vectors corresponding to the first predetermined number of words selected each time step S303 is executed may be used to form the updated fourth matrix when step S303 is executed next time.

In the present embodiment, the processing performed in step S302 for the fourth matrix M4 and the processing performed in step S303 for the updated fourth matrix M4' at least once may be collectively expressed as, for example, the following expression (2):

y_m＝W_mX_m(W_fmM2+W_umU_m) (2)

wherein m is an integer, m is not less than 0 and not more than N, N is the number of times of performing step S303, and N is a natural number; when M is not less than 1 and not more than N, corresponding to the mth execution of step S303, processing the updated fourth matrix M4' each time step S303 is executed, and when M is 0, corresponding to the processing of the initial fourth matrix M4 in step S302; when m is more than or equal to 1 and less than or equal to N, W_fm、W_um、W_mThe linear coefficients corresponding to the second matrix M2 and the updated adding unit U are respectively set to the M-th time step S303_mCorresponding linear coefficient, linear coefficient corresponding to the updated fourth matrix M4', X_mIs a function corresponding to the non-linear processing; when m is 0, W_f0、W_u0、W₀In step S302, the linear coefficients corresponding to the second matrix M2 and the initial adding unit U are assigned₀Corresponding linear coefficients, linear coefficients corresponding to the initial fourth matrix M4, X₀Is a function corresponding to the non-linear processing; the type of non-linear treatment may be the same or different each time the treatment is performed, i.e. X varies with m_mMay or may not be changed in type.

In the above equation (2), the fourth matrix and the updated fourth matrix are different in scale, and the fourth matrix after each update is also different in scale, and for example, when m is 0, the fourth matrix is smallest in scale, and as m gradually increases, the updated fourth matrix is also increased in scale, and W corresponding to m increases_fm、W_um、W_mAlso varies with the change, so that the function X and each linear coefficient W_fm、W_um、W_mThe commonly determined attention model is also changed, so that the attention model corresponding to the scale of the fourth matrix or the updated fourth matrix can be used for determining the attention degree of the words in the text sequence for multiple times, and the accuracy is improved. It should be noted that the formula shown in the above formula (2) is only an example, and the present embodiment may adopt other forms of formulas.

In addition, attention modelsParameter (2), e.g. linear coefficient W_fm、W_um、W_mEtc. may be obtained by training a large number of training samples.

Further, in step S302 and step S303 of the present embodiment, a word that is not repeated each time the first predetermined number of words is selected and the first predetermined number of words that have already been selected, that is, a word that already exists in the joining unit or the updated joining unit, will not be added again to the joining unit or the updated joining unit.

In this embodiment, the number of the 2 or more attention models may be determined according to the number of words in the text sequence. For example, the greater the number of words in a text sequence, the greater the number of attention models that are used.

In this embodiment, before step S301, the number of attention models may be set according to the number of words in the text sequence, for example, a corresponding relationship between the number of words in the text sequence and the number of attention models may be preset, and the number of attention models may be set according to the corresponding relationship and the number of words in the text sequence to be detected.

In this embodiment, the correspondence between the number of words in the text sequence and the number of attention models may be preset by training a large number of training samples, for example, the text sequence as a training sample is divided into a plurality of training sets according to the number of words contained in the text sequence, and the number of words contained in each training sample in each training set may belong to a specific number interval, for example, the number of words contained in each training sample in the first training set may belong to a number interval of 1 to 10, and the like; and classifying each training sample in each training set by multiple semantic relations, wherein the number of the attention models adopted in each classification is different, so that the number of the optimal attention models of the training set can be determined according to the result of each classification, and the corresponding relation between the number interval corresponding to the training set and the number of the optimal attention models is determined. Of course, the embodiment is not limited to this, and the corresponding relationship between the number of words in the text sequence and the number of attention models may also be determined in other manners.

According to step S201 of the present embodiment, at least 2 attention models can be employed to determine the attention degree of a word in a text sequence, and thus, the result of determining the attention degree can be made more accurate.

In step S202 of this embodiment, the vectors corresponding to the predetermined number of words selected in step S201 in the second matrix are combined to form the third matrix. For example, the jth, kth, lth, mth, nth, and mth words in the text sequence are selected as a predetermined number of words in step S201, and vectors Fj, Fk, Fl, Fm, Fn, and Fo corresponding to the predetermined number of words in the second matrix M2 are combined to form a third matrix M3 in step S202, which third matrix M3 may be denoted as M3 { Fj, Fk, Fl, Fm, Fn, Fo }.

In the above-mentioned fig. 2 and 3, a method of extracting vectors of a predetermined number of words from the second matrix to form a third matrix is shown; however, the present embodiment is not limited thereto, and other methods may be used to form the third matrix.

The method for obtaining the third matrix in step S103 is described above with reference to fig. 2-4, but the embodiment is not limited thereto, and a method different from that in fig. 2-4 may also be adopted to obtain the third matrix.

In step S104 of this embodiment, the semantic relationship between the entity words in the text sequence is determined at least according to the third matrix obtained in step S103 and the pre-stored classification model. For example, no matter the number of words in the text sequence, the third matrix may be subjected to hidden layer processing to generate feature vectors, and the feature vectors are classified according to a pre-stored classification model to obtain categories of semantic relationships, where the method for performing hidden layer processing may refer to the prior art and is not described here.

In step S104 of the present embodiment, the semantic relationship may be determined based on a classification model stored in advance from both the third matrix M3 and the second matrix M2.

In the present embodiment, the classification model used in step S104 may include softmax, maximum entropy, bayes, or support vector machine. Also, the classification model may be obtained by training and stored for use in step S104. In this embodiment, the method corresponding to steps S101-S104 may be applied to the training samples of the training set, so as to train and obtain the classification model, and the description of the training process will not be repeated here.

In the embodiment, more than 2 Attention models (Attention models) are introduced to determine the Attention degree of words in the text sequence, and then the semantic relation between the entity words is classified based on the Attention degree, so that the classification efficiency can be improved.

Example 2

Embodiment 2 of the present application provides a device for classifying semantic relationships of entity words in a text sequence, which corresponds to the method in embodiment 1.

Fig. 4 is a schematic diagram of the classification apparatus of this embodiment 2, and as shown in fig. 4, the apparatus 400 includes a first obtaining unit 401, a second obtaining unit 402, a third obtaining unit 403, and a classification unit 404.

The first obtaining unit 401 is configured to represent each word in the text sequence by a word vector to construct a first matrix; the second obtaining unit 402 processes the first matrix by using a deep learning model to obtain a second matrix; a third obtaining unit 403, using more than 2 attention models, processes the second matrix to determine the attention degree of words in the text sequence, and obtains a third matrix of the text sequence based on the attention degree; the classification unit 404 determines semantic relationships between entity words in the text sequence at least according to the third matrix of the text sequence and a pre-stored classification model.

Fig. 5 is a schematic diagram of a third obtaining unit of this embodiment 2, and as shown in fig. 5, the third obtaining unit 403 may include a selecting unit 501 and a combining unit 502.

The selection unit 501 determines the attention degree of each word in the text sequence by using more than 2 attention models, and selects a predetermined number of words from the text sequence based on the attention degree; a merging unit 502 merges vectors corresponding to the selected predetermined number of words in the second matrix to form the third matrix.

Fig. 6 is a schematic diagram of the selecting unit of this embodiment 2, and as shown in fig. 6, the selecting unit 501 may include a first combining sub-unit 601, a first processing sub-unit 602, and a second processing sub-unit 603.

The first merging subunit 601 is configured to merge a vector corresponding to an entity word in the second matrix with the second matrix to form a fourth matrix; the first processing subunit 602 performs nonlinear processing on the fourth matrix to determine a degree of interest of each word in the text sequence, and selects a first predetermined number of words from the text sequence based on the degree of interest; the second processing subunit 603 merges the vectors corresponding to the selected first predetermined number of words in the second matrix with the second matrix to form an updated fourth matrix, and selects again the first predetermined number of words from the text sequence based on the updated fourth matrix, where the sum of all the selected first predetermined number of words is equal to the predetermined number.

Example 3

An embodiment 3 of the present application provides an electronic device, including: the apparatus for classifying semantic relationships of entity words in a text sequence as described in embodiment 2.

Fig. 7 is a schematic diagram of a configuration of an electronic device according to embodiment 3 of the present application. As shown in fig. 7, the electronic device 700 may include: a Central Processing Unit (CPU)701 and a memory 702; the memory 702 is coupled to the central processor 701. Wherein the memory 702 can store various data; a program for classifying semantic relationships of entity words in a text sequence is also stored and executed under the control of the central processor 701.

In one embodiment, the function of the sorting apparatus may be integrated into the central processor 701.

Wherein, the central processor 701 may be configured to:

representing each word in the text sequence by a word vector to construct a first matrix; processing the first matrix by using a deep learning model to obtain a second matrix, wherein rows or columns of the second matrix correspond to words in the text sequence; processing the second matrix with more than 2 Attention models (Attention models) to determine the Attention degree of words in the text sequence, and obtaining a third matrix (result of Attention) of the text sequence based on the Attention degree; determining semantic relationships between entity words in the text sequence at least according to the third matrix of the text sequence and a pre-stored classification model.

Wherein, the central processor 701 may be further configured to:

determining the attention degree of each word in the text sequence by using more than 2 attention models, and selecting a preset number of words from the text sequence based on the attention degree; merging vectors in the second matrix corresponding to the selected predetermined number of words to form the third matrix.

Wherein, the central processor 701 may be further configured to:

combining vectors corresponding to entity words in the second matrix with the second matrix to form a fourth matrix;

performing at least a non-linear processing on the fourth matrix using an Attention Model (Attention Model) corresponding to a scale of the fourth matrix to determine a degree of Attention of each word in the text sequence, and selecting a first predetermined number of words from the text sequence based on the degree of Attention; and

merging vectors corresponding to the selected first predetermined number of words in the second matrix with the fourth matrix (merging the selected words with BLSTM and previous context information) to form an updated fourth matrix;

performing at least nonlinear processing on the updated fourth matrix by using an Attention Model (Attention Model) corresponding to the scale of the updated fourth matrix, and selecting a first preset number of words from the text sequence again;

and updating the fourth matrix according to vectors corresponding to the first preset number of words selected at the previous time at least once, and selecting the first preset number of words from the updated fourth matrix again by using the attention model, wherein the sum of all the selected first preset number of words is equal to the preset number.

Wherein, the central processor 701 may be further configured to:

each time a first predetermined number of words is selected, there are no words in the first predetermined number of words that have been selected that are repeated.

Wherein, the central processor 701 may be further configured to:

determining the semantic relationship from the third matrix and the second matrix, and the classification model.

Wherein, the central processor 701 may be further configured to:

the number of the more than 2 attention models is determined according to the number of words in the text sequence.

Further, as shown in fig. 7, the electronic device 700 may further include: an input/output unit 703, a display unit 704, and the like; the functions of the above components are similar to those of the prior art, and are not described in detail here. It is noted that the electronic device 700 does not necessarily include all of the components shown in fig. 7; furthermore, the electronic device 700 may also comprise components not shown in fig. 7, reference being made to the prior art.

Embodiments of the present application further provide a computer readable program, where when the program is executed in a positioning apparatus or an electronic device, the program causes the classification apparatus or the electronic device to execute the classification method described in embodiment 2.

An embodiment of the present application further provides a storage medium storing a computer-readable program, where the storage medium stores the computer-readable program, and the computer-readable program enables a classification apparatus or an electronic device to execute the classification method described in embodiment 2.

The classification means described in connection with the embodiments of the invention may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. For example, one or more of the functional block diagrams and/or one or more combinations of the functional block diagrams illustrated in fig. 4-6 may correspond to individual software modules of a computer program flow or individual hardware modules. These software modules may correspond to the respective steps shown in embodiment 1. These hardware modules may be implemented, for example, by solidifying these software modules using a Field Programmable Gate Array (FPGA).

A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium; or the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The software module may be stored in the memory of the mobile terminal or in a memory card that is insertable into the mobile terminal. For example, if the apparatus (e.g., mobile terminal) employs a relatively large capacity MEGA-SIM card or a large capacity flash memory device, the software module may be stored in the MEGA-SIM card or the large capacity flash memory device.

One or more of the functional block diagrams and/or one or more combinations of the functional block diagrams described with respect to fig. 4-6 may be implemented as a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any suitable combination thereof designed to perform the functions described herein. One or more of the functional block diagrams and/or one or more combinations of the functional block diagrams described with respect to fig. 4-6 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP communication, or any other such configuration.

The present application has been described in conjunction with specific embodiments, but it should be understood by those skilled in the art that these descriptions are intended to be illustrative, and not limiting. Various modifications and adaptations of the present application may occur to those skilled in the art based on the teachings herein and are within the scope of the present application.

With respect to the embodiments including the above embodiments, the following remarks are also disclosed:

annex 1, a device for classifying semantic relationships of entity words in a text sequence, the device comprising:

Supplementary note 2, the apparatus as set forth in supplementary note 1, wherein the third obtaining unit includes:

a selection unit that determines a degree of attention of each word in the text sequence using 2 or more attention models, and selects a predetermined number of words from the text sequence based on the degree of attention; and

a merging unit, configured to merge vectors corresponding to the selected predetermined number of words in the second matrix to form the third matrix.

Note 3 that the apparatus according to note 2, wherein the selection unit includes:

a first merging subunit, configured to merge a vector corresponding to an entity word in the second matrix with the second matrix to form a fourth matrix;

a first processing subunit that performs at least nonlinear processing on the fourth matrix using an attention model corresponding to a scale of the fourth matrix to determine a degree of attention of each word in the text sequence, and selects a first predetermined number of words from the text sequence based on the degree of attention; and

a second processing subunit that merges the vectors in the second matrix corresponding to the selected first predetermined number of words with the fourth matrix to form an updated fourth matrix, and performs at least a nonlinear processing on the updated fourth matrix using an attention model corresponding to the scale of the updated fourth matrix to select again the first predetermined number of words from the text sequence,

the second processing subunit updates the fourth matrix at least once according to the vectors corresponding to the selected first predetermined number of words, and selects the first predetermined number of words again from the updated fourth matrix by using the attention model, wherein the sum of all the selected first predetermined number of words is equal to the predetermined number.

The apparatus according to supplementary note 4, supplementary note 3 wherein,

the first predetermined number of words selected each time by the first processing subunit or the second processing subunit are not repeated with the words of the first predetermined number of words that have already been selected.

The apparatus according to supplementary note 5, as set forth in supplementary note 1, wherein,

Supplementary note 6, the apparatus as recited in supplementary note 1, wherein,

the classification unit determines the semantic relationship according to the third matrix and the second matrix, and the classification model.

Reference 7 discloses an electronic device including the apparatus according to any one of reference 1 to 6.

Annex 8. a method for classifying semantic relations of entity words in a text sequence, the method comprising:

processing the second matrix by using more than 2 attention models to determine the attention degree of words in the text sequence, and obtaining a third matrix of the text sequence based on the attention degree;

The method according to supplementary note 9, wherein the method of obtaining the third matrix using 2 or more attention models includes:

determining the attention degree of each word in the text sequence by using more than 2 attention models, and selecting a preset number of words from the text sequence based on the attention degree; and

merging vectors in the second matrix corresponding to the selected predetermined number of words to form the third matrix.

The method of annex 10, as recited in annex 9, wherein selecting a predetermined number of words from the text sequence comprises:

performing at least non-linear processing on the fourth matrix by using an attention model corresponding to the scale of the fourth matrix to determine the attention degree of each word in the text sequence, and selecting a first preset number of words from the text sequence based on the attention degree; and

merging vectors corresponding to the selected first predetermined number of words in the second matrix with the fourth matrix to form an updated fourth matrix, performing at least nonlinear processing on the updated fourth matrix by using an Attention Model (Attention Model) corresponding to the scale of the updated fourth matrix, and selecting the first predetermined number of words again from the text sequence,

The method as set forth in supplementary note 11, in which,

The method according to supplementary note 12, supplementary note 8, wherein,

Reference 13 the method of reference 8, wherein determining the semantic relationship based at least on the third matrix and the classification model comprises:

Claims

1. An apparatus for classifying semantic relationships of entity words in a text sequence, the apparatus comprising:

a classification unit, which determines semantic relations between entity words in the text sequence at least according to the third matrix of the text sequence and a pre-stored classification model;

wherein the third obtaining unit includes:

a merging unit, configured to merge vectors corresponding to the selected predetermined number of words in the second matrix to form the third matrix;

wherein the selection unit includes:

wherein, the second processing subunit performs the following processing for N times: updating the fourth matrix according to the vectors corresponding to the first predetermined number of words that have been selected, and selecting again the first predetermined number of words from the updated fourth matrix using the attention model,

wherein a sum of the number of the first predetermined number of words selected by the second processing subunit in N times and the number of the first predetermined number of words selected by the first processing subunit is equal to the predetermined number,

n is a natural number of 1 or more.

2. The apparatus of claim 1, wherein,

3. The apparatus of claim 1, wherein,

4. The apparatus of claim 1, wherein,

5. An electronic device comprising the apparatus of any of claims 1-4.

6. A method of classifying semantic relationships of entity words in a text sequence, the method comprising:

determining semantic relations between entity words in the text sequence at least according to the third matrix of the text sequence and a pre-stored classification model;

wherein, with more than 2 attention models, the method for obtaining the third matrix comprises:

merging vectors corresponding to the selected predetermined number of words in the second matrix to form the third matrix;

wherein selecting a predetermined number of words from the text sequence comprises:

performing a first process, the first process comprising: performing at least non-linear processing on the fourth matrix by using an attention model corresponding to the scale of the fourth matrix to determine the attention degree of each word in the text sequence, and selecting a first preset number of words from the text sequence based on the attention degree; and

performing a second process, the second process comprising: combining vectors corresponding to the selected first predetermined number of words in the second matrix with the fourth matrix to form an updated fourth matrix, performing at least nonlinear processing on the updated fourth matrix by using an attention model corresponding to the scale of the updated fourth matrix, and selecting the first predetermined number of words again from the text sequence;

wherein the second processing includes the following processing N times: updating the fourth matrix according to vectors corresponding to the first predetermined number of words selected last time, and selecting again the first predetermined number of words from the updated fourth matrix by using the attention model, and the sum of the number of the first predetermined number of words selected by the second process in N times and the number of the first predetermined number of words selected by the first process is equal to the predetermined number,

n is a natural number of 1 or more.