WO2022042125A1 - 一种命名实体识别方法 - Google Patents
一种命名实体识别方法 Download PDFInfo
- Publication number
- WO2022042125A1 WO2022042125A1 PCT/CN2021/106650 CN2021106650W WO2022042125A1 WO 2022042125 A1 WO2022042125 A1 WO 2022042125A1 CN 2021106650 W CN2021106650 W CN 2021106650W WO 2022042125 A1 WO2022042125 A1 WO 2022042125A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- word
- target text
- matrix
- array
- target
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 239000011159 matrix material Substances 0.000 claims abstract description 443
- 239000013598 vector Substances 0.000 claims description 138
- 230000004927 fusion Effects 0.000 claims description 88
- 230000002457 bidirectional effect Effects 0.000 claims description 41
- 238000012549 training Methods 0.000 claims description 23
- 238000013145 classification model Methods 0.000 claims description 14
- 238000002372 labelling Methods 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 238000003491 array Methods 0.000 description 18
- 230000008569 process Effects 0.000 description 12
- 238000004891 communication Methods 0.000 description 9
- 230000015654 memory Effects 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 7
- 238000011176 pooling Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 238000007781 pre-processing Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000007500 overflow downdraw method Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
Definitions
- the present disclosure relates to the technical field of artificial intelligence algorithms, and in particular, to a named entity recognition method.
- NER Named Entity Recognition
- word slots can be understood as entities with specific meanings in the text, such as person names, place names, institution names, proper nouns, etc., and then, person names, place names, etc. are the types of word slots, and the positions of word slots are: The position of each word belonging to the same slot within the slot, for example, the beginning of the slot, the middle of the slot, and the end of the slot.
- the NER tag of each word in the target text is determined based on the text content of the target text to be subjected to named entity recognition, and the named entity recognition result of the target text is obtained.
- the representation matrix of the target text to be recognized by the named entity is output to the pre-trained model for recognition, and the recognition result is obtained.
- One or more embodiments of the present invention provide a method for identifying a named entity, the method comprising:
- the first classification label is used to represent the user intent corresponding to the target text
- a named entity recognition NER tag of each word in the target text is determined, and a named entity recognition result of the target text is obtained.
- the step of constructing a target representation matrix using the target text and the first classification label includes:
- each element in the fusion array is: the index value of each word in the target text, and the first classification label. the index value of the virtual word represented;
- a matrix corresponding to the fusion array is generated as a target representation matrix, wherein each element in the target representation matrix is: a word vector corresponding to each index value in the fusion array.
- the step of generating a fusion array of the target text and the first classification label includes:
- each element in the first array is: the index value of each word in the target text
- each element in the second array is: each preset classification The value of the label, the first classification label is a label in each of the classification labels, the value of the first classification label is the first set value, and each other classification except the first classification label The value of the label is the second set value;
- the target index value is added to a first specified position in the first array to obtain a fusion array, wherein the first specified position includes: before the first element in the first array, or , after the last element in the first array.
- the step of constructing a target representation matrix using the target text and the first classification label includes:
- each element in the first array is: the index value of each word in the target text
- a matrix corresponding to the first array is generated as the initial matrix of the target text, wherein each element in the initial matrix is: a word vector corresponding to each index value in the first array;
- a label, the value of the first classification label is the first set value, and the value of each other classification label except the first classification label is the second set value;
- the initial matrix is extended with the second array to generate a target representation matrix.
- the step of using the second array to expand the initial matrix to generate a target representation matrix includes:
- the step of using the second array to expand the initial matrix to generate a target representation matrix includes:
- the third specified position includes: before the first element in the one-dimensional array representing the word vector, or after the last element in the one-dimensional array representing the word vector.
- the step of determining the named entity recognition NER tag of each word in the target text based on the target representation matrix, and obtaining the named entity recognition result of the target text includes:
- a word feature matrix of the target text is determined based on the target representation matrix, wherein the word feature matrix includes: a word vector of each word in the target text in the forward order of the target text, and the word vector of each word in the target text in the reverse order of the target text;
- a label matrix of the target text is determined, wherein the label matrix is used to represent: the probability that each word in the target text has a respective NER label;
- the step of using the target text and the first classification label to construct a target representation matrix, and the step of determining the NER label of each word in the target text based on the target representation matrix are implemented by a pre-trained named entity recognition model;
- the named entity recognition model includes: an input layer, a fusion layer, a word embedding layer, a bidirectional LSTM layer, a fully connected layer, a CRF layer and an output layer connected in series, wherein the input layer, the fusion layer and the The word embedding layer is used to realize the step of constructing a target representation matrix by using the target text and the first classification label, and the bidirectional LSTM layer, the fully connected layer, the CRF layer and the output layer are used for Realizing the step of determining the NER label of each word in the target text based on the target representation matrix;
- the input layer is used to generate a first array about the target text
- the fusion layer is used to construct a second array about the first classification label, determine the index value of the virtual word represented by the second array as a target index value, and add the target index value to the first index value.
- a fusion array is obtained;
- the word embedding layer is used to generate a matrix corresponding to the fusion array as a target representation matrix
- the bidirectional LSTM layer is used to determine the word feature matrix of the target text based on the target representation matrix
- the fully connected layer is used to determine the label matrix of the target text based on the word feature matrix, wherein the label matrix is used to represent the probability that each word in the target text has a respective NER label;
- the CRF layer is used to determine the NER label index of each word in the target text based on the label matrix
- the output layer is used to convert the NER label index of each word in the target text into the NER label of each word in the target text, and obtain the named entity recognition result of the target text.
- the step of constructing a target representation matrix by using the target text and the first classification label, and the step of determining the character of each word in the target text based on the target representation matrix is achieved by a pre-trained named entity recognition model;
- the named entity recognition model includes: an input layer, a word embedding layer, a fusion layer, a bidirectional LSTM layer, a fully connected layer, a CRF layer and an output layer connected in series, wherein the input layer, the word embedding layer and the The fusion layer is used to realize the step of constructing a target representation matrix by using the target text and the first classification label, and the bidirectional LSTM layer, the fully connected layer, the CRF layer and the output layer are used to realize The step of determining the NER label of each word in the target text based on the target representation matrix;
- the input layer is used to generate a first array about the target text
- the word embedding layer is used to generate a matrix corresponding to the first array as an initial matrix of the target text
- the fusion layer is used to construct a second array about the first classification label, and use the second array to expand the initial matrix to generate a target representation matrix;
- the bidirectional LSTM layer is used to determine the word feature matrix of the target text based on the target representation matrix
- the fully connected layer is used to determine the label matrix of the target text based on the word feature matrix, wherein the label matrix is used to represent the probability that each word in the target text has a respective NER label;
- the CRF layer is used to determine the NER label index of each word in the target text based on the label matrix
- the output layer is used to convert the NER label index of each word in the target text into the NER label of each word in the target text, and obtain the named entity recognition result of the target text.
- the training method of the named entity recognition model includes:
- the named entity recognition model Based on the true value of the NER label and the predicted value of the NER label for each word of the sample label, determine whether the named entity recognition model converges, if so, end the training to obtain the trained named entity recognition model; otherwise, adjust the The model parameters in the named entity recognition model are returned to the step of obtaining the sample text to be used, the second classification label of the sample text, and the true value of the NER label of each word in the sample text.
- the named entity recognition when the named entity recognition is performed on the target text to be recognized by the named entity, after the target text to be recognized by the named entity is acquired, a user is determined to represent the user corresponding to the target text The first classification label of the intention; then, the target representation matrix is constructed by using the target text and the first classification label; then, the NER label of each word in the target text can be determined based on the target representation matrix, and the named entity recognition of the target text can be obtained. result.
- the above-mentioned target representation matrix is formed by using the target text and the first classification label representing the user's intention
- the above-mentioned target representation matrix can represent the fusion result of the text content of the target text and the label content of the user's intention.
- the information of the user's intention dimension can be added, so that in the recognition process, the relationship between the target text and the user's intention can be learned, the user's intention expressed by the target text can be determined, and then,
- the NER tags of each word in the target text can be comprehensively considered from the two dimensions of text content and user intent.
- the influence of the user intent corresponding to the target text on the word slot type can be considered, and further, texts in which the same word slot expresses different user intentions can be identified.
- the word slot type in the target text can improve the accuracy of the recognition results obtained when performing named entity recognition on the target text.
- FIG. 1 shows a schematic flowchart of a method for identifying a named entity according to one or more embodiments of the present invention.
- FIG. 2 shows a schematic flowchart of a manner of generating a fusion array of target text and a first classification label according to one or more embodiments of the present invention.
- FIG. 3 shows a schematic flowchart of a training manner of a named entity recognition model according to one or more embodiments of the present invention.
- FIG. 4 shows a schematic structural diagram of a pre-trained named entity recognition model according to one or more embodiments of the present invention.
- FIG. 5 shows a schematic structural diagram of a pre-trained named entity recognition model according to one or more embodiments of the present invention.
- FIG. 6 shows a schematic structural diagram of an intent classification model according to one or more embodiments of the present invention.
- FIG. 7 shows a schematic structural diagram of an electronic device according to one or more embodiments of the present invention.
- the named entity recognition method according to one or more embodiments of the present invention can be applied to any type of electronic device, for example, a desktop computer, a notebook computer, a tablet computer, etc., which is not specifically limited in the embodiments of the present invention, hereinafter referred to as Electronic equipment.
- this method can be applied to any scenario that requires named entity recognition.
- the named entity recognition result of the annual report can be obtained, and the result is: the annual report text after adding NER tags to each word in the annual report; another example, in the field of traffic management, for a traffic accident report, the Identify word slots such as name, location, time, number of casualties, etc., and add NER tags to each word in the report.
- the named entity recognition result of the report can be obtained, and the result is the report text after adding NER tags to each word in the report.
- NER tags added to each word can be as shown in Table 1 below:
- NER tag meaning B-word slot type B is short for Begin, indicating the beginning of the word slot I-Slot Type I is short for Internal, indicating the middle of the word slot L-word slot type L is the abbreviation of Last, indicating the end of the word slot U-word slot type U is the abbreviation of Unique, which means a single-word slot O O is short for Other, which means non-word slot
- a "B-word slot type” can be added to each word in the word slot based on the word slot type of the word slot and the position of each word in the word slot ” label, “I-slot type” label, or “L-slot type” label; for words that are not recognized as word slots, a “non-word slot” label can be added to the word; for words that are recognized Word slot, you can add a "single word slot label" to the word.
- the first classification label is used to represent the user intent corresponding to the target text
- a named entity recognition NER tag of each word in the target text is determined, and a named entity recognition result of the target text is obtained.
- the named entity recognition when the named entity recognition is performed on the target text to be recognized by the named entity, after the target text to be recognized by the named entity is acquired, a user is determined to represent the user corresponding to the target text The first classification label of the intention; then, the target representation matrix is constructed by using the target text and the first classification label; then, the NER label of each word in the target text can be determined based on the target representation matrix, and the named entity recognition of the target text can be obtained. result.
- the above-mentioned target representation matrix is formed by using the target text and the first classification label representing the user's intention
- the above-mentioned target representation matrix can represent the fusion result of the text content of the target text and the label content of the user's intention.
- the information of the user's intention dimension can be added, so that in the recognition process, the relationship between the target text and the user's intention can be learned, the user's intention expressed by the target text can be determined, and then,
- the NER tags of each word in the target text can be comprehensively considered from the two dimensions of text content and user intent.
- the influence of the user intent corresponding to the target text on the word slot type can be considered, and further, texts in which the same word slot expresses different user intentions can be identified.
- the word slot type in the target text can improve the accuracy of the recognition results obtained when performing named entity recognition on the target text.
- FIG. 1 shows a schematic flowchart of a method for identifying a named entity according to one or more embodiments of the present invention. As shown in Figure 1, the method may include the following steps:
- the target text to be recognized by the named entity is first acquired.
- the above-mentioned target text can be obtained in various ways, for example, the above-mentioned target text manually input by the user can be obtained, or, for example, the user's voice information can be collected, so that the voice information can be converted into the above-mentioned target text, or, for example, The above-mentioned target text can also be obtained from other devices that are communicatively connected. It's all reasonable.
- the first classification label is used to represent the user intent corresponding to the target text
- a first classification label used to represent the user's intention corresponding to the target text may be further determined.
- the above-mentioned first classification label can be determined in various ways.
- the user intent corresponding to the above-mentioned target text input by the user can be obtained, so that the user's intention can be determined as the first classification label of the target text;
- the intent The classification model can be: CNN (Convolutional Neural Networks, convolutional neural network) classification model, etc. It's all reasonable.
- a target representation matrix can be constructed by using the target text and the first classification label.
- the target representation matrix constructed above can represent the fusion result of the text content of the target text and the label content of the first classification label.
- S104 Determine the named entity recognition NER tag of each word in the target text based on the target representation matrix, and obtain the named entity recognition result of the target text.
- the NER label of each word in the target text can be determined based on the target representation matrix, and the named entity recognition result of the target text can be obtained.
- the above-mentioned target representation matrix can fuse the text content of the target text and the label content of the user's intention.
- the relationship between the target text and the user's intent can be learned, and the user's intent expressed by the target text can be determined.
- the NER tag of each word in the target text can be comprehensively considered from the two dimensions of text content and user intent.
- the word slot in the word slot when identifying the word slot type of the word slot, the word slot in the word slot can be identified from the two dimensions of the text content of the word slot and the user intent corresponding to the target text. The slot type is comprehensively considered, thereby improving the accuracy of the named entity recognition result of the target text.
- a user intent for characterizing the target text corresponding to the target text is determined. Then, use the target text and the first classification label to construct a target representation matrix; and then, based on the target representation matrix, the NER label of each word in the target text can be determined, and the named entity recognition result of the target text can be obtained. .
- the above-mentioned target representation matrix is formed by using the target text and the first classification label representing the user's intention
- the above-mentioned target representation matrix can represent the fusion result of the text content of the target text and the label content of the user's intention.
- the information of the user's intention dimension can be added, so that in the recognition process, the relationship between the target text and the user's intention can be learned, the user's intention expressed by the target text can be determined, and then,
- the NER tags of each word in the target text can be comprehensively considered from the two dimensions of text content and user intent.
- the influence of the user intent corresponding to the target text on the word slot type can be considered, and further, texts in which the same word slot expresses different user intentions can be identified.
- the word slot type in the target text can improve the accuracy of the recognition results obtained when performing named entity recognition on the target text.
- the above step S103 (the step of constructing a target representation matrix by using the target text and the first classification label) may include the following steps 11-12:
- Step 11 Generate a fusion array about the target text and the first category label
- each element in the fusion array is: the index value of each word in the target text, and the index value of the virtual word represented by the first classification label;
- the obtained target text and the first classification label can be represented in the form of words. Furthermore, when performing named entity recognition on the target text, the target text and the first classification label need to be converted into mathematical representations. content.
- a virtual word represented by each classification label can be preset, and the virtual word can be considered as a certain word corresponding to the index value of the intent category. Furthermore, an index value can be set in advance for each word that the text may involve and the virtual word represented by each classification label, so that after obtaining the above-mentioned target text and the first classification label, each word in the target text can be determined. The index value of each word and the index value of the virtual word represented by the first classification label, so that after each preset index value is determined, a fusion array including each of the above index values can be generated.
- the index value of each word in the target text and the index value of the virtual word represented by the first classification label are respectively used as an element in the fusion array.
- the number of index values included in the fusion array may be: the sum of the number of words included in the target text and the number of virtual words represented by the first classification label.
- the number of virtual words represented by the first classification label is one.
- the generated fusion array about the target text and the first classification label is a One-dimensional array.
- Step 12 generate the matrix corresponding to the fusion array as the target representation matrix
- each element in the target representation matrix is: a word vector corresponding to each index value in the fusion array.
- the word vector corresponding to each index value in the fusion array can be determined. Therefore, after the word vector corresponding to all the index values in the fusion array is obtained, the word vector can be determined based on the determined word vectors. A matrix corresponding to the above fusion array is generated, thereby obtaining a target representation matrix.
- the word vector corresponding to each index value may be a one-dimensional array, and the type of the one-dimensional array is floating-point data.
- the generated target representation matrix includes multiple one-dimensional arrays
- the target representation matrix includes a one-dimensional array.
- the number of dimension arrays is the same as the number of index values included in the above fusion array.
- each one-dimensional array in the target representation matrix may include a preset number of elements, for example, the preset number may be 128, and each one-dimensional array in the target representation matrix may include 128 elements.
- the above-mentioned preset number may also be other values, which are not specifically limited in the embodiment of the present invention.
- the above-mentioned step 11 (generating a fusion array about the target text and the first classification label) may include the following steps:
- Step S111 generating a first array about the target text
- each element in the first array is: the index value of each word in the target text
- the index value of each word in the target text can be determined first, so that a first array about the target text is generated according to the obtained index value.
- the index value of each word in the target text can be used as an element in the first array respectively.
- the number of index values included in the first array obtained above may be: the number of words included in the target text.
- the maximum length of the target text can be set, so that when the number of words included in the obtained target text to be subjected to named entity recognition exceeds the set maximum length, the target text can be discarded. Words that exceed the set maximum length in the target text to be recognized by the named entity, and then use the reserved words to generate the first array. For example, since the number of words included in one speech of a person usually does not exceed 70 words, the maximum length of the target text can be set to 70.
- the target text used to form the above target representation matrix is the above obtained target text.
- the target text for named entity recognition discards words exceeding the set maximum length, and the remaining target text, the number of words included in the remaining target text is the same as the set maximum length of the target text above.
- the target text includes N (N ⁇ 1) words, and the set maximum length of the target text is P (P ⁇ N), then, the obtained target text to be recognized by named entity can be discarded. From the P+1th word to the Nth word of , the index values of the 1st to Pth words in the target text can be determined, so that a first array is generated based on the determined P index values.
- Step S112 constructing a second array about the first classification label, and determining the index value of the virtual word represented by the second array as the target index value;
- each element in the second array is: the preset value of each classification label, the first classification label is a label in each classification label, the value of the first classification label is the first set value, except the first classification label.
- each classification label can be preset, and the above-determined first classification label of the target text is one of the preset classification labels.
- the preset values of each classification label can be determined.
- the value of the first classification label is the first set value
- the value of each other classification label except the first classification label is the second set value. Therefore, the obtained second array is: a one-dimensional array composed of a first set value and at least one second set value, and the number of elements included in the second array is the above preset respective classification The total number of labels.
- the first set value may be 1, and the second set value may be 0.
- each classification label corresponds to an element in the second array.
- the element corresponding to the first classification label in the second array can be set as the first set value, and the element corresponding to the first classification label in the second array can be divided by the first classification label.
- Each other element other than the corresponding element is set as the second set value.
- a second array related to the first classification label is obtained.
- the virtual word represented by the second array can be determined according to the preset correspondence between the array and the virtual word, so that the virtual word represented by the second array can be further determined. index value to get the target index value.
- Step S113 adding the target index value to the first specified position in the first array to obtain a fusion array
- the first specified position includes: before the first element in the first array, or after the last element in the first array.
- the target index value can be added to the first designated position in the first array to obtain a fusion array.
- the target index value can be added before the first element in the first array, or the target index value can be added after the last element in the first array; of course, the target index value can also be added to the first element at other specified positions in the array.
- the number of index values in the fusion array is increased by one, and the newly-added index value is the above-mentioned target index value.
- the above-mentioned step S103 (the step of constructing a target representation matrix by using the target text and the first classification label) may include the following steps 21-23:
- Step 21 generate a first array about the target text
- each element in the first array is: the index value of each word in the target text
- the index value of each word in the target text can be determined first, so that a first array about the target text is generated according to the obtained index value.
- the index value of each word in the target text can be used as an element in the first array respectively.
- the number of index values included in the first array obtained above may be: the number of words included in the target text.
- the maximum length of the target text can be set, so that when the number of words included in the obtained target text to be subjected to named entity recognition exceeds the set maximum length, the target text can be discarded. Words that exceed the set maximum length in the target text to be recognized by the named entity, and then use the reserved words to generate the first array. For example, since the number of words included in one speech of a person usually does not exceed 70 words, the maximum length of the target text can be set to 70.
- the target text used to form the above target representation matrix is the above obtained target text to be identified.
- the target text for named entity recognition discards words exceeding the set maximum length, and the remaining target text, the number of words included in the remaining target text is the same as the set maximum length of the target text above.
- the target text includes N (N ⁇ 1) words, and the set maximum length of the target text is P (P ⁇ N), then, the obtained target text to be recognized by named entity can be discarded. From the P+1th word to the Nth word of , the index values of the 1st to Pth words in the target text can be determined, so that a first array is generated based on the determined P index values.
- Step 22 generate the matrix corresponding to the first array as the initial matrix of the target text
- each element in the initial matrix is: the word vector corresponding to each index value in the first array
- the word vector corresponding to each index value in the first array can be determined. Therefore, after the word vector corresponding to all the index values in the above-mentioned first array is obtained, it can be determined based on the determined word vector. Each word vector generates a matrix corresponding to the above-mentioned first array, thereby obtaining an initial matrix of the target text.
- the number of rows of the obtained initial matrix of the target text is: the number of index values included in the first array; the number of columns of the obtained initial matrix of the target text is: the determined word vector corresponding to each index value The number of elements included. That is to say, each row of the initial matrix of the target text may be a word vector determined above, so that each row of the initial matrix of the target text may correspond to an index value in the above-mentioned first array, and further, the initial value of the target text Each row of the matrix can correspond to a word in the target text.
- the obtained initial matrix of the target text includes the word vector corresponding to each index value in the first array
- the obtained The initial matrix of the target text includes the word vector corresponding to each word in the target text.
- the number of word vectors included in the initial matrix of the target text is the same as the number of index values included in the above-mentioned first array, that is, the number of word vectors included in the initial matrix of the target text is the same as the number of words included in the target text. The same amount.
- Step 23 Construct a second array about the first classification label
- each element in the second array is: the preset value of each classification label
- the first classification label is a label in each classification label
- the value of the first classification label is the first set value, except the first classification label.
- the value of each other classification label that a classification label is supposed to take is the second set value.
- each classification label can be preset, and the above-determined first classification label of the target text is one of the preset classification labels.
- the preset values of each classification label can be determined.
- the value of the first classification label is the first set value
- the value of each other classification label except the first classification label is the second set value.
- the obtained second array is: a one-dimensional array composed of a first set value and at least one second set value, and the number of numbers included in the second array is each of the above preset classifications The total number of labels.
- the first set value may be 1, and the second set value may be 0.
- each classification label corresponds to an element in the second array.
- the element corresponding to the first classification label in the second array can be set as the first set value, and the element corresponding to the first classification label in the second array can be divided by the first classification label.
- Each other element other than the corresponding element is set as the second set value.
- a second array related to the first classification label is obtained.
- Step 24 Expand the initial matrix with the second array to generate a target representation matrix.
- the above-mentioned initial matrix can be expanded by using the above-mentioned second array, thereby generating a target representation matrix.
- the above-mentioned step 24 may include the following steps 241A-242A:
- Step 241A Determine the word vector corresponding to the index value of the virtual word represented by the second array
- Step 242A adding the determined word vector to the second specified position in the initial matrix to obtain a target representation matrix
- the second specified position includes: before the first element in the initial matrix, or after the last element in the initial matrix.
- the number of rows of the initial matrix is: the number of index values included in the first array; the number of columns of the initial matrix is: each index value determined The number of elements contained in the corresponding word vector.
- each row of the initial matrix of the target text can be a word vector determined above, therefore, before the first element in the above initial matrix is: before the first row in the initial matrix; After the last element can be: After the last row in the initial matrix.
- the word vector determined in the above step 242A can be used as the first row in the obtained target representation matrix, and in the target representation matrix, each row in the initial matrix is moved down by one row , so that the target representation matrix has one more row than the initial matrix;
- the word vector determined in the above step 242A can be used as the last row in the obtained target representation matrix, and in the target representation matrix, the number of rows of each row in the initial matrix remains unchanged, so that the target representation matrix One more row than the number of rows in the initial matrix.
- the initial matrix includes 20 word vectors, and each word vector can be a one-dimensional array, and the number of elements included in the one-dimensional array can be 128, then the word corresponding to the index value of the virtual word represented by the second array can be The vector is added before the first row of the initial matrix or after the last row of the initial matrix, so that the resulting target representation matrix includes 21 one-dimensional arrays, and the number of elements included in each one-dimensional array is 128. That is, the initial matrix is a matrix with 20 rows and 128 columns, and the target representation matrix is a matrix with 21 rows and 128 columns.
- the second array represents When the word vector corresponding to the index value of the virtual word is added to the specified position of the initial matrix, it can be repeatedly added at the specified position for many times.
- each row in the initial matrix can be moved down by T (T>1) rows, so that the second array represented by the
- the word vector corresponding to the index value of the virtual word is used as the content in each row before the first row of the initial matrix; in this way, the obtained target representation matrix has T rows more than the initial matrix, and the first row in the target representation matrix is Lines 1 to T are the same, and they are all word vectors corresponding to the index values of the virtual words represented by the second array, that is, the target representation matrix may have T indices of the virtual words represented by the second array.
- the word vector corresponding to the value is the same.
- the initial matrix includes 20 word vectors, and each word vector can be a one-dimensional array, and the number of elements included in the one-dimensional array can be 128, then the virtual word represented by the second array can be The word vector corresponding to the index value is added before the first row of the initial matrix, and 3 rows are added repeatedly.
- the obtained target representation matrix includes 23 one-dimensional arrays, and the number of elements included in each one-dimensional array is 128. That is to say, the initial matrix is a matrix with 20 rows and 128 columns, the target representation matrix is a matrix with 23 rows and 128 columns, and the 1st, 2nd and 3rd rows of the target representation matrix are the same.
- the word vector corresponding to the index value of the virtual word represented by the second array is the initial matrix with 20 rows and 128 columns, the target representation matrix is a matrix with 23 rows and 128 columns, and the 1st, 2nd and 3rd rows of the target representation matrix are the same.
- the index values corresponding to the virtual words represented by T (T>1) second arrays may be repeatedly added.
- the obtained target representation matrix has T rows more than the initial matrix, and the last T rows in the target representation matrix are the same, which are all corresponding to the index value of the virtual word represented by the second array , that is, word vectors corresponding to the index values of the virtual words represented by the T second arrays may exist in the target representation matrix.
- the initial matrix includes 20 word vectors, and each word vector can be a one-dimensional array, and the number of elements included in the one-dimensional array can be 128, then the virtual word represented by the second array can be The word vector corresponding to the index value is added after the last row of the initial matrix, and 3 rows are added repeatedly.
- the obtained target representation matrix includes 23 one-dimensional arrays, and the number of elements included in each one-dimensional array is 128. That is to say, the initial matrix is a matrix with 20 rows and 128 columns, the target representation matrix is a matrix with 23 rows and 128 columns, and the 21st, 22nd and 23rd rows of the target representation matrix are the same.
- the word vector corresponding to the index value of the virtual word represented by the second array is the initial matrix with 20 rows and 128 columns, the target representation matrix is a matrix with 23 rows and 128 columns, and the 21st, 22nd and 23rd rows of the target representation matrix are the same.
- the virtual word represented by the above-mentioned second array can be determined according to the preset correspondence between the array and the virtual word, so that it is possible to further The index value of the virtual word represented by the second array is determined. Further, the word vector corresponding to the index value of the virtual word represented by the second array can be determined.
- the number of elements included in the word vector corresponding to the index value of the virtual word represented by the second array is the same as the number of elements included in the word vector in the initial matrix of the target text.
- the word vector corresponding to the index value of the virtual word represented by the second array can be added to the second specified position in the initial matrix of the target text, thereby realizing the expansion of the initial matrix of the target text, and,
- the expanded initial matrix is the target representation matrix.
- the word vector corresponding to the index value of the virtual word represented by the second array may be added before the first element in the initial matrix, or the word vector corresponding to the index value of the virtual word represented by the second array may be added. After the last element in the initial matrix; of course, the word vector corresponding to the index value of the virtual word represented by the second array may also be added to other specified positions in the initial matrix.
- the number of word vectors in the obtained target representation matrix is increased by at least one, and at least one newly added word vector is the word corresponding to the index value of the virtual word represented by the second array vector.
- the above step 24 may include the following step 241B:
- Step 241B adding the second array to the third specified position in the initial matrix to obtain the target representation matrix
- the third specified position includes: before the first element in the one-dimensional array representing the word vector, or after the last element in the one-dimensional array representing the word vector.
- the initial matrix includes a plurality of word vectors, and each word vector is represented by a one-dimensional array, and each one-dimensional array of each word vector is represented.
- An array can contain multiple elements.
- the number of rows of the initial matrix is: the number of index values included in the first array; the number of columns of the initial matrix is: the number of elements included in the determined word vector corresponding to each index value, then the initial Each row of the matrix may be a one-dimensional array representing word vectors as determined above.
- each row in the initial matrix is before the element located in the first column, that is, it is assumed that the second array includes Q (Q>0) elements , then take the Q elements included in the second array as the elements located in the first column to the Qth column in each row of the obtained target representation matrix, and put the target representation matrix, the initial matrix
- Each column is shifted to the right by Q columns, so that the target representation matrix has more Q columns than the initial matrix;
- the last element in the one-dimensional array representing the word vector is: after the element located in the last column in each row of the initial matrix, that is, it is assumed that the number of columns of the initial matrix is R (R>0), and the second array
- the Q elements included in the above-mentioned second array are taken as the elements located in the R+1th column to the R+Qth column in each row of the obtained target representation matrix, and the target representation In the matrix, the number of columns of each column in the initial matrix remains unchanged, so that the target representation matrix has more Q columns than the initial matrix.
- the initial matrix of the target text may be expanded directly by using the second array. That is, the second array can be directly added to the third specified position in the initial matrix to obtain the target representation matrix.
- the second array can be added before the first element in the one-dimensional array representing the word vector in the initial matrix, that is, the second array is added before the element located in the first column in each row of the initial matrix, or the second array can be added
- the second array is added after the last element in the one-dimensional array representing the word vector in the initial matrix, that is, after the element in the last column of each row in the initial matrix; of course, the second array can also be added to other specified positions in the initial matrix.
- each word vector included in the initial matrix of the target text is a word vector corresponding to each word in the target text; in this way, when a second array is added to each one-dimensional array representing word vectors in the initial matrix, the The user intent corresponding to the target text can be added to the target text, thereby realizing the fusion of the text content of the target text and the label content representing the user's intent.
- each of the obtained target representation matrix represents the word vector in the one-dimensional array of word vectors.
- the number of included elements increases the total amount of the above-mentioned preset classification labels, and the newly added element is the above-mentioned second array.
- the initial matrix includes 20 one-dimensional arrays representing word vectors, and the number of elements included in each one-dimensional array representing word vectors is 128, and the total number of preset classification labels is 10, then the obtained target
- the representation matrix includes 20 one-dimensional arrays representing word vectors, and the number of elements included in each one-dimensional array representing word vectors is 138.
- the above-mentioned step S104 determines the named entity recognition NER tag of each word in the target text, and the step of obtaining the named entity recognition result of the target text may include the following steps 31-34:
- Step 31 Determine the word feature matrix of the target text based on the target representation matrix
- the word feature matrix includes: the word vector of each word in the target text in the forward order of the target text, and the word vector of each word in the target text in the reverse order of the target text;
- the word feature matrix of the target text can be determined based on the target representation matrix.
- the content of each word used to characterize the target text and the content of the first classification label characterizing the target text are obtained by analyzing the target representation matrix. Therefore, according to the forward order and reverse order of each word in the target text, and the contextual semantic relationship between each word in the target text and other words, we can use the analysis obtained to characterize each word included in the target text. content, determine the word vector of each word in the target text in the forward order of the target text, and the word vector of each word in the target text in the reverse order of the target text, thus, get the word feature matrix of the target text .
- Step 32 Determine the label matrix in the target text based on the word feature matrix
- the label matrix is used to represent: the probability that each word in the target text has each NER label;
- the word feature matrix of the target text After obtaining the word feature matrix of the target text, it is possible to determine the content of the first classification label representing the target text and the determined word feature matrix of the target text according to the content of the first classification label that characterizes the target text analyzed from the target representation matrix.
- the possible NER tags of each word, and the probability that the word has each NER tag thus, a tag matrix is obtained that characterizes the probability that each word in the target text has each NER tag.
- the relationship between the content representing each word in the target text and the content representing the first classification label of the target text obtained from the analysis of the target representation matrix is used. , that is to say, the probability that each word in the target text represented by the determined label matrix of the target text has each NER label is determined under the dimension of user intent.
- Step 33 Based on the label matrix, determine the NER label index of each word in the target text;
- the final NER label of each word in the target text can be determined based on the label matrix, and the NER label index of the final NER label can be further determined. NER tag index for each word.
- Step 34 Convert the NER label index of each word in the target text to the NER label of each word in the target text, and obtain the named entity recognition result of the target text.
- the target text can be further The NER tag index of each word is converted into the NER tag of each word in the target text, thereby improving readability and enabling the user to finally obtain the named entity recognition result of the target text.
- the named entity recognition method may be implemented by a pre-trained named entity recognition model. That is, according to one or more embodiments, the above-mentioned step S103 (using the target text and the first classification label to construct the target representation matrix) and the above-mentioned step S104 (determining the NER label of each word in the target text based on the target representation matrix) are performed by the above naming. Implemented by the entity recognition model.
- the above named entity recognition model includes: an input layer connected in series, a fusion layer, a word embedding layer, and a bidirectional LSTM (Long Short-Term Memory) layer , fully connected layer, CRF (conditional random field, conditional random field) layer and output layer;
- LSTM Long Short-Term Memory
- CRF conditional random field, conditional random field
- the input layer, the fusion layer and the word embedding layer are used to realize the step of constructing the target representation matrix using the target text and the first classification label; the bidirectional LSTM layer, the fully connected layer, the CRF layer and the output layer are used to realize the target text based on the target representation matrix. Steps for the NER label of each word in ;
- the input layer is used to generate the first array about the target text according to the target text
- the fusion layer is used to construct a second array about the first classification label, determine the index value of the virtual word represented by the second array as the target index value, and add the target index value to the first designated position in the first array, get the fusion array;
- the word embedding layer is used to generate the matrix corresponding to the fusion array as the target representation matrix
- the bidirectional LSTM layer is used to determine the word feature matrix of the target text based on the target representation matrix
- the fully connected layer is used to determine the label matrix of the target text based on the word feature matrix, and the label matrix is used to represent the probability that each word in the target text has each NER label;
- the CRF layer is used to determine the NER label index of each word in the target text based on the label matrix
- the output layer is used to convert the NER label index of each word in the target text into the NER label of each word in the target text, and obtain the named entity recognition result of the target text.
- the above-mentioned pre-trained named entity recognition model is: a pre-trained NER model based on bidirectional LSTM+CRF. specific:
- the target text and the first classification label can be input into the pre-trained named entity recognition model.
- an index value of each word is preset in the input layer, wherein the format of the index value of each word may be a one-hot (one-hot code) format.
- the input layer may first start from the first word of the target text, and sequentially determine the index value of each word in the target text in one-hot format.
- the output layer can generate a first array about the target text.
- the first array includes an index value of each word in the target text, and the number of the index values included is the same as the number of words included in the target text.
- the maximum length of the target text may be set to 70. In this way, when the number of words included in the acquired target text to be subjected to named entity recognition exceeds 70, each word after the 70th word is discarded. Therefore, when the number of words included in the obtained target text to be recognized by the named entity exceeds 70, the obtained first array includes 70 index values.
- the maximum length of the above-mentioned target text may also be set to other specific values, which are not specifically limited in this embodiment of the present invention.
- each index value in the generated first array is an integer value.
- the input layer can use the first array as an output, so as to input the first array into the fusion layer.
- the above-mentioned target text and the first classification label when the above-mentioned target text and the first classification label are input into the above-mentioned pre-trained named entity recognition model, the above-mentioned target text and the first classification label may be input into the above-mentioned input layer.
- the input layer since the input layer does not process the first classification label, the input layer can also use the first classification label as an output, thereby inputting the first classification label to the fusion layer; the above target text and the first classification label can also be Input to the above input layer and fusion layer respectively.
- the fusion layer can first construct a second array about the first classification label.
- the above-mentioned first set value can be set to 1
- the above-mentioned second set value can be set to 0, and the total number of classification labels is 10, and the first classification label is the 5th classification label among the 10 classification labels, then
- the second array can be obtained: [0,0,0,0,1,0,0,0,0,0].
- the virtual word represented by the above-mentioned second array can be determined, and then the index value of the one-hot format of the virtual word can be determined, thereby, the index value of the virtual word represented by the second array of the first classification label is obtained, That is, the target index value is obtained.
- the fusion layer can add the above target index value to the first specified position in the first array to obtain a fusion array.
- the obtained fusion result is still a one-dimensional array, and the number of index values included in the fusion result is the sum of the number of words included in the target text and the number of virtual words represented by the second array. Typically, the number of virtual words represented by the second array is one.
- the above-mentioned first array includes the index value of each word in the target text
- the above-mentioned target index value is: the index value of the virtual word represented by the second array of the first classification label. Therefore, the above-mentioned The obtained fusion array includes: the index value of each word in the target text and the index value of the virtual word represented by the first classification label.
- the fusion method provided by this embodiment is equivalent to: taking the first classification label as a virtual word, thereby expanding the target text into "virtual word+target text", and then, for the expanded "virtual word+target text” Text" to perform index value conversion to obtain the above fusion array.
- the fusion layer can use the fusion array obtained above as an output, so that the fusion array is input to the word embedding layer.
- the so-called word embedding means that each word is represented by a one-dimensional array including multiple elements, wherein each element is a number. For example, using a one-dimensional array including 128 elements to represent each word, that is, using a A one-dimensional array of numbers representing each word.
- the word embedding layer can determine the word vector corresponding to each index value in the obtained fusion array, thereby generating a target representation matrix based on the determined word vectors.
- the number of elements included in the word vector corresponding to each determined index value is a preset number.
- the word embedding layer can use the obtained target representation matrix as an output, thereby inputting the above target representation matrix to the above-mentioned bidirectional LSTM layer.
- the LSTM layer is a neural network model that considers every word in the target text when processing the text. For example, when the LSTM layer processes the text "I want to listen to Andy Lau's Wang Qingshui", the last word it gets is “Forgetting Qingshui", and before “Forgetting Qingshui", it also gets two words “I want to listen” and “Andy Lau” Therefore, the LSTM layer considers the factors of "I want to listen” and "Andy Lau” when recognizing the word slot for "Wangqingshui", and thus, combined with the context in the text, recognizes that "Wangqingshui” may be a Song name.
- LSTM Because if one-way LSTM is used, information about the order of words and words in the text may be lost. For example, there is no way to differentiate between "I love you” and "You love me.” Therefore, according to one or more embodiments, a bidirectional LSTM layer is adopted, so that the recognition results in the forward direction and the reverse direction can be combined to obtain the order relationship of each character and word in the text.
- the input to the bidirectional LSTM layer is the target representation matrix obtained by the word embedding layer described above.
- the target representation matrix may be represented as [1+X,Y].
- X represents the number of words in the target text
- Y represents the number of elements included in each word vector in the target representation matrix.
- the output of the bidirectional LSTM layer is the word feature matrix of the target text.
- the target representation matrix is represented as [1+X, Y]
- the word feature matrix of the target text output by the bidirectional LSTM layer can be represented as: [2*(1+X), HIDDENUNIT].
- the word feature matrix of the target text output by the bidirectional LSTM layer includes 2*(1+X) one-dimensional arrays, and the number of elements included in each one-dimensional array is HIDDENUNIT.
- the number of one-dimensional arrays included in the word feature matrix of the obtained target text is twice the number of one-dimensional arrays included in the target representation matrix. times, and each one-dimensional array included in the word feature matrix of the target text is a one-dimensional array with a set length, that is, each one-dimensional array includes a preset number of elements.
- each one-dimensional array included in the word feature matrix of the target text is a word vector
- the word feature matrix of the target text includes: the word vector of each word in the target text in the forward order of the target text, and the word vector for each word in the target text in the reverse order of the target text.
- the bidirectional LSTM layer can use the word feature matrix as an output, so that the word feature matrix is input to the fully connected layer.
- the fully connected layer includes two functions: dimension transformation and feature extraction. In this way, after obtaining the word feature matrix of the target text, the fully connected layer can determine the label matrix of the target text based on the word feature matrix.
- the word feature matrix [2*(1+X), HIDDENUNIT] of the target text output by the bidirectional LSTM as an example, after the fully connected layer, the word feature matrix [2*(1+X), HIDDENUNIT] can be converted into a target
- the label matrix of the text, and the label matrix can be expressed as [(1+X), OUTPUTDIM].
- the label matrix [(1+X), OUTPUTDIM] of the above target text includes (1+X) one-dimensional vectors, and each one-dimensional vector can represent: about the virtual word represented by the second array of the first classification label.
- the label vector and the label vector for each word in the target text are (1+X) one-dimensional vectors, and each one-dimensional vector can represent: about the virtual word represented by the second array of the first classification label. The label vector and the label vector for each word in the target text.
- the NER label of the virtual word represented by the second array of the first classification label is "XX", for example, it can be "O" in the above Table 1, that is, the NER label of the virtual word represented by the second array of the first classification label
- Virtual words are non-word slots.
- the label vector of each word in the target text can represent: the word corresponds to OUTPUTDIM values, and the OUTPUTDIM values are the number of NER labels that the word may have, indicating that the word can have OUTPUTDIM probability values, where, The size of each probability value represents the possibility that the word belongs to the NER label corresponding to the probability value. The larger the probability value, the greater the probability that the word has the NER label corresponding to the probability value.
- the fully connected layer can determine the label matrix of the target text based on the word feature matrix of the target text, and the label matrix can be used to represent: the probability that each word in the target text has a respective NER label. That is, according to the label matrix of the target text, the probability that each word in the target text has a respective NER label can be determined.
- the fully connected layer can take the label matrix of the target text as an output, so that the label matrix of the target text is input to the CRF layer.
- the CRF layer can be understood as a Viterbi decoding layer. Further, after receiving the label matrix of the target text, the CRF layer can use the preset transition matrix to calculate the sum of each link of the label matrix. And get the link with the largest sum value, thus, get the path with the greatest possibility. In this way, the NER label of each word in the target text can be determined, thereby obtaining the NER label index of each word in the target text.
- the CRF layer can use the NER tag index of each word in the target text as an output, thereby inputting the NER tag index of each word in the target text into the output layer.
- the output layer can convert the NER tag index of each word in the target text into the NER tag index of each word in the target text.
- NER tags of words wherein the NER tags of each word in the target text can be represented by NER tag strings, so that the readability can be improved.
- the NER label of each word can be represented in the form of Table 2 above.
- a named entity recognition method provided by the embodiments of the present invention may be implemented by a pre-trained named entity recognition model. That is, according to one or more embodiments, in the above-mentioned step S103, a target representation matrix is constructed using the target text and the first classification label; and in the above-mentioned step S104, it is determined based on the target representation matrix that the NER label of each word in the target text is obtained through the above-mentioned named entity. Identify what the model implements.
- the above-mentioned named entity recognition model includes: an input layer, a word embedding layer, a fusion layer, a bidirectional LSTM layer, a fully connected layer, a CRF layer and an output layer connected in series;
- the input layer, the word embedding layer and the fusion layer are used to realize the step of constructing the target representation matrix using the target text and the first classification label;
- the bidirectional LSTM layer, the fully connected layer, the CRF layer and the output layer are used to realize the target based on the target representation matrix. Determine the target Steps for NER tags for each word in the text;
- the input layer is used to generate the first array about the target text according to the target text
- the word embedding layer is used to generate the matrix corresponding to the first array as the initial matrix of the target text
- the fusion layer is used to construct a second array about the first classification label, and use the second array to expand the initial matrix to generate a target representation matrix;
- the bidirectional LSTM layer is used to determine the word feature matrix of the target text based on the target representation matrix
- the fully connected layer is used to determine the label matrix of the target text based on the word feature matrix, and the label matrix is used to represent the probability that each word in the target text has each NER label;
- the CRF layer is used to determine the NER label index of each word in the target text based on the label matrix
- the output layer is used to convert the NER label index of each word in the target text into the NER label of each word in the target text, and obtain the named entity recognition result of the target text.
- the above-mentioned pre-trained named entity recognition model is: a pre-trained NER model based on bidirectional LSTM+CRF. specific:
- the target text and the first classification label can be input into the pre-trained named entity recognition model.
- an index value of each word is preset in the input layer, and the format of the index value of each word may be a one-hot format.
- the input layer may first start from the first word of the target text, and sequentially determine the index value of each word in the target text in one-hot format.
- the output layer can generate a first array about the target text.
- the first array includes an index value of each word in the target text, and the number of the index values included is the same as the number of words included in the target text.
- the maximum length of the target text may be set to 70. In this way, when the number of words included in the acquired target text to be subjected to named entity recognition exceeds 70, each word after the 70th word is discarded. Therefore, when the number of words included in the obtained target text to be recognized by the named entity exceeds 70, the obtained first array includes 70 index values.
- the maximum length of the above-mentioned target text may also be set to other specific values, which are not specifically limited in this embodiment of the present invention.
- each index value in the generated first array is an integer value.
- the input layer can use the first array as an output, so that the first array is input to the word embedding layer.
- the so-called word embedding means that each word is represented by a one-dimensional array including multiple elements, wherein each element is a number. For example, using a one-dimensional array including 128 elements to represent each word, that is, using a A one-dimensional array of numbers representing each word.
- the word embedding layer can determine the word vector corresponding to each index value in the obtained first array, so as to generate the target text based on each determined word vector.
- the number of elements included in the word vector corresponding to each determined index value is a preset number.
- the word embedding layer can take the initial matrix of the target text as an output, thereby inputting the initial matrix of the target text to the above-mentioned fusion layer.
- the above-mentioned target text and the first classification label when the above-mentioned target text and the first classification label are input into the above-mentioned pre-trained named entity recognition model, the above-mentioned target text and the first classification label may be input into the above-mentioned input layer.
- the input layer since the input layer does not process the first classification label, the input layer can also use the first classification label as an output, so that the first classification label is input to the word embedding layer, and further, because the word embedding layer is not correct
- the first classification label is processed, so that the word embedding layer can also use the first classification label as an output, so that the first classification label is input into the fusion layer; the target text and the first classification label can also be input into in the above input layer and fusion layer.
- the fusion layer can first construct a second array of the first classification label.
- the above-mentioned first set value can be set to 1
- the above-mentioned second set value can be set to 0, and the total number of classification labels is 10, and the first classification label is the 5th classification label among the 10 classification labels, then
- the second array that can be obtained: [0,0,0,0,1,0,0,0,0,0].
- the initial matrix of the target text can be expanded by using the second array, thereby obtaining the target representation matrix.
- the fusion layer may perform steps 241A-242A described above.
- the virtual word represented by the second array can be determined, and further, the index value of the virtual word in one-hot format can be determined, so as to obtain the index value of the virtual word represented by the second array of the first classification label. Further, the word vector corresponding to the index value of the virtual word represented by the second array can be determined.
- the word vector corresponding to the index value of the virtual word represented by the second array can be added to the second specified position in the initial matrix of the target text, thereby realizing the expansion of the initial matrix of the target text, and,
- the expanded initial matrix is the target representation matrix.
- the initial matrix of the target text includes a word vector corresponding to the index value of each word in the target text. Therefore, the target representation matrix includes: the word corresponding to the index value of each word in the target text. vector A word vector corresponding to the index value of the virtual word represented by the first classification label. Therefore, the above-mentioned target representation matrix includes: a word vector corresponding to each word in the target text and a word vector corresponding to the virtual word represented by the first classification label.
- the fusion method provided by this embodiment is equivalent to: taking the first classification label as a virtual word, thereby expanding the target text into "virtual word+target text", and then, for the expanded "virtual word+target text” Text” to perform word embedding transformation to obtain the above target representation matrix, and the NER label of the “virtual word” is “XX”, for example, it can be “O” in the above Table 1, that is, the second classification label for the first classification label.
- the virtual word represented by the array is a non-word slot.
- the fusion layer may perform step 241B described above.
- the initial matrix of the target text can be expanded directly by using the second array. That is, the second array can be directly added to the third specified position in the initial matrix to obtain the target representation matrix.
- the initial matrix of the target text includes a word vector corresponding to the index value of each word in the target text, that is, the initial matrix of the target text includes the word corresponding to each word in the target text. Therefore, when a second array is added to each one-dimensional array representing word vectors in the initial matrix of the target text, the user intent corresponding to the target text can be added to the target text.
- the fusion method provided in this embodiment is equivalent to: taking the first classification label as a plurality of elements in the word vector corresponding to each word of the target text, so as to expand the corresponding word vector of each word of the target text .
- the LSTM layer is a neural network model that considers every word in the target text when processing the text. For example, when the LSTM layer processes the text "I want to listen to Andy Lau's Wang Qingshui", the last word it gets is “Forgetting Qingshui", and before “Forgetting Qingshui", it also gets two words “I want to listen” and “Andy Lau” Therefore, the LSTM layer considers the factors of "I want to listen” and "Andy Lau” when recognizing the word slot for "Wangqingshui", and thus, combined with the context in the text, recognizes that "Wangqingshui” may be a Song name.
- LSTM Because if one-way LSTM is used, information about the order of words and words in the text may be lost. For example, there is no way to differentiate between "I love you” and "You love me.” Therefore, according to one or more embodiments, a bidirectional LSTM layer is adopted, so that the recognition results in the forward direction and the reverse direction can be combined to obtain the order relationship of each character and word in the text.
- the input to the bidirectional LSTM layer is the target representation matrix obtained by the word embedding layer described above.
- the target representation matrix may be represented as [1+X,Y].
- X represents the number of words in the target text
- Y represents the number of elements included in each word vector in the target representation matrix.
- the output of the bidirectional LSTM layer is the word feature matrix of the target text.
- the word feature matrix of the target text output by the bidirectional LSTM layer can be represented as: [2*(1+X), HIDDENUNIT] .
- the word feature matrix of the target text output by the bidirectional LSTM layer includes 2*(1+X) one-dimensional arrays, and the number of elements included in each one-dimensional array is HIDDENUNIT.
- the target representation matrix is represented as [X, Y+CLASS], where X represents the number of words in the target text, and Y represents the elements included in each word vector in the initial matrix of the target text , CLASS represents the total number of preset classification labels.
- the output of the bidirectional LSTM layer is the word feature matrix of the target text, which can be expressed as: [2*X, HIDDENUNIT'].
- the matrix output by the bidirectional LSTM layer includes 2*X one-dimensional arrays, and the number of elements included in each one-dimensional array is HIDDENUNIT'.
- the number of one-dimensional arrays included in the word feature matrix of the obtained target text is twice the number of one-dimensional arrays included in the target representation matrix. times, and each one-dimensional array included in the word feature matrix of the target text is a one-dimensional array with a set length, that is, each one-dimensional array includes a preset number of elements.
- each one-dimensional array included in the word feature matrix of the target text is a word vector
- the word feature matrix of the target text includes: the word vector of each word in the target text in the forward order of the target text, and the word vector for each word in the target text in the reverse order of the target text.
- the bidirectional LSTM layer can use the word feature matrix as an output, so that the word feature matrix is input to the fully connected layer.
- the fully connected layer includes two functions: dimension transformation and feature extraction. In this way, after obtaining the word feature matrix of the target text, the fully connected layer can determine the label matrix of the target text based on the word feature matrix.
- the word feature matrix [2*(1+X), HIDDENUNIT] of the target text output by the bidirectional LSTM as an example, after the fully connected layer, the word feature matrix [2*(1+X), HIDDENUNIT] can be converted into a target
- the label matrix of the text, and the label matrix can be expressed as [(1+X), OUTPUTDIM].
- the label matrix [(1+X), OUTPUTDIM] of the above target text includes (1+X) one-dimensional vectors, and each one-dimensional vector can represent: about the virtual word represented by the second array of the first classification label.
- the label vector and the label vector for each word in the target text are (1+X) one-dimensional vectors, and each one-dimensional vector can represent: about the virtual word represented by the second array of the first classification label. The label vector and the label vector for each word in the target text.
- the NER label of the virtual word represented by the second array of the first classification label is "XX", for example, it can be "O" in the above Table 1, that is, the NER label of the virtual word represented by the second array of the first classification label
- Virtual words are non-word slots.
- the label vector of each word in the target text can represent: the word corresponds to OUTPUTDIM values, and the OUTPUTDIM values are the number of NER labels that the word may have, indicating that the word can have OUTPUTDIM probability values, where, The size of each probability value represents the possibility that the word belongs to the NER label corresponding to the probability value. The larger the probability value, the greater the probability that the word has the NER label corresponding to the probability value.
- the fully connected layer can determine the label matrix of the target text based on the word feature matrix of the target text, and the label matrix can be used to represent: the probability that each word in the target text has a respective NER label. That is, according to the label matrix of the target text, the probability that each word in the target text has a respective NER label can be determined.
- the fully connected layer can take the label matrix of the target text as an output, so that the label matrix of the target text is input to the CRF layer.
- the CRF layer can be understood as a Viterbi decoding layer. Further, after receiving the label matrix of the target text, the CRF layer can use the preset transition matrix to calculate the sum of each link of the label matrix. And get the link with the largest sum value, thus, get the path with the greatest possibility. In this way, the NER label of each word in the target text can be determined, thereby obtaining the NER label index of each word in the target text.
- the CRF layer can use the NER tag index of each word in the target text as an output, thereby inputting the NER tag index of each word in the target text into the output layer.
- the output layer can convert the NER tag index of each word in the target text into the NER tag index of each word in the target text.
- NER tags of words wherein the NER tags of each word in the target text can be represented by NER tag strings, so that the readability can be improved.
- the NER label of each word can be represented in the form of Table 2 above.
- the named entity recognition method when the named entity recognition method according to one or more embodiments of the present invention is implemented by using a pre-trained named entity recognition model, the named entity recognition model needs to be obtained through training.
- the training method of the above named entity recognition model includes:
- S301 Obtain the sample text to be utilized, the second classification label of the sample text, and the true value of the NER label of each word in the sample text;
- the second classification label is used to represent the user intent corresponding to the sample text
- S302 Input the sample text and the second classification label of the sample text into the named entity recognition model, so that the named entity recognition model uses the sample text and the second classification label to construct a sample representation matrix, and uses the sample representation matrix to predict each of the sample texts.
- step S303 Based on the true value of the NER label and the predicted value of the NER label for each word of the sample label, determine whether the named entity recognition model has converged, if so, go to step S304; otherwise, go to S305, and return to the above step S301;
- S305 Adjust the model parameters in the named entity recognition model, and return to the step of acquiring the sample text to be used, the second classification label of the sample text, and the true value of the NER label of each word in the sample text.
- the named entity recognition model may be obtained by any type of training, for example, a laptop computer, a desktop computer, a tablet computer, etc., which is not specifically limited in the embodiment of the present invention, and is hereinafter referred to as a training device.
- the training equipment may be the same as the above, or may be different.
- the training device is the same device, the above named entity recognition model can be obtained by training in the same device, and then, on this, a named entity recognition method provided by the embodiment of the present invention is implemented by using the obtained named entity recognition model;
- the training device obtains the above-mentioned named entity recognition model, it can send the obtained named entity recognition model to . In this way, after the named entity recognition model is obtained, the named entity recognition method provided by the embodiment of the present invention can be implemented by using the obtained named entity recognition model.
- the training device can first obtain the sample text to be used, the second classification label of the sample text, and the true value of the NER label of each word in the sample text, and then, based on the obtained sample text and the second classification of the sample text Label and the true value of the NER label of each word in the sample text, train the named entity recognition model, and obtain the trained named entity recognition model.
- the sample text can be a sentence, or a phrase or phrase composed of multiple words, which is reasonable; in addition, the second classification label of the sample text can be a preset intention classification model for the target text. determined by the classification.
- the sample text to be utilized, the second classification label of the sample text, and the true value of the NER label of each word in the sample text can be obtained in various ways.
- the sample text to be used, the second classification label of the sample text, and the true value of the NER label of each word in the sample text stored in the local storage space can be directly obtained; the sample text to be used can also be obtained from other non-local storage spaces.
- the sample text used, the second classification label of the sample text, and the ground-truth NER label of each word in the sample text It's all reasonable.
- the second classification label of each sample text and each sample text can be used.
- the ground-truth NER label for each word in the sample text Therefore, it is possible to obtain multiple sets of sample texts to be utilized, the second classification labels of the sample texts, and the true value of the NER label of each word in the sample texts.
- the number of training samples can be set according to requirements in practical applications, which is not specifically limited in the present invention.
- the type of the sample text may only include sentences, phrases or phrases, or may include at least two types of sentences, phrases and phrases. It's all reasonable.
- the second classification label of the sample text, and the true value of the NER label of each word in the sample text, the sample text, the second classification label of the sample text, and each of the sample texts can be
- the ground-truth NER labels of each word are input into the named entity recognition model.
- the named entity recognition model includes a preprocessing sub-network and a named entity recognition sub-network.
- the above-mentioned preprocessing sub-network can use the sample text and the second classification label to construct a sample representation matrix.
- the above named entity recognition sub-network can use the sample representation matrix to predict the NER label prediction value of each word in the sample text.
- the preprocessing sub-network can use the sample text and the second classification label to construct the sample representation matrix, which is similar to the above-mentioned preprocessing sub-network using the target text and the first classification label to construct the target representation matrix. Repeat.
- the named entity recognition model After the predicted value of the NER label for each word in the sample text is obtained, it can be judged whether the named entity recognition model converges based on the true value of the NER label and the predicted value of the NER label for each word of the sample label.
- the named entity recognition model converges, it means that the named entity recognition model has been trained, and the training can be stopped to obtain the trained named entity recognition model.
- the model parameters in the named entity recognition model can be adjusted, so that the sample text to be used, the second classification label of the sample text and the NER label true value of each word in the sample text are obtained again, and the obtained sample text, The second classification label of the sample text and the true value of the NER label of each word in the sample text are used to continue training the parameter-adjusted named entity recognition model. Until it is judged that the named entity recognition model converges, the trained named entity recognition model is obtained.
- the matching degree between the true value of the NER tag and the predicted value of the NER tag of each word of the sample tag is greater than the preset matching degree, it can be judged that the named entity recognition model has converged; otherwise, it is judged that the named entity recognition model has not converged.
- the pre-trained intent classification model for obtaining the above-mentioned first classification label and second classification label will be illustrated as an example.
- the above-mentioned intent classification model may be a CNN classification model.
- the CNN classification model can include: input layer, word embedding layer, convolution layer, pooling layer, fusion layer, fully connected layer and output layer.
- the target text can be input into the above CNN classification model.
- the index value of each word is preset in the input layer, and the format of the index value of each word may be one-hot format.
- the input layer may first start from the first word of the target text, and sequentially determine the index value of each word in the target text in one-hot format.
- the output layer can generate a first array about the target text.
- the first array includes an index value of each word in the target text, and the number of search quotation marks included is the same as the number of words included in the target text.
- the maximum length of the target text may be set to 70. In this way, when the number of words included in the acquired target text exceeds 70, each word after the 70th word is discarded. Therefore, when the number of words included in the obtained target text exceeds 70, the obtained first array includes 70 index values.
- the maximum length of the above-mentioned target text may also be set to other specific values, which are not specifically limited in this embodiment of the present invention.
- each index value in the generated first array is an integer value.
- the input layer can take the first array as an output, so that the first array is input into the word embedding layer.
- the so-called word embedding means that each word is represented by a one-dimensional array including multiple elements, wherein each element is a number. For example, using a one-dimensional array including 128 elements to represent each word, that is, using a A one-dimensional array of numbers representing each word.
- the word embedding layer can determine the word vector corresponding to each index value in the obtained index array, thereby generating a target matrix based on each determined word vector.
- the number of elements included in the word vector corresponding to each determined index value is a preset number.
- the word embedding layer can take the generated target matrix as output and feed this target matrix into the convolutional layer.
- the role of the convolutional layer is to amplify and propose certain features in the target text, thereby outputting a feature matrix about the features of the target text.
- the size of this feature matrix is related to the convolution kernel of the convolutional layer.
- the convolution kernel can be expressed as [K, Length], where K represents the feature extraction using K word length, that is, the continuous K words in the target text are used as the features of interest, so that the target text can be extracted. Consecutive K words are processed as a whole. Wherein, when the consecutive K words are words or phrases, the consecutive K words can be considered as a whole; when the consecutive K words are single words, it is necessary to consider the consecutive K words, The context of each word. Length indicates the number of convolution kernels of K word length.
- multiple convolution kernels may be included in the convolution layer, so that a feature matrix may be obtained for each convolution kernel.
- the purpose of the pooling layer is to ignore the unimportant features in the features extracted by the convolution kernel and retain only the most important features.
- the pooling layer can adopt the "down-sampling" method.
- the so-called “down-sampling” method is to find the maximum value in each matrix for each matrix output by the convolution layer, so that the maximum value is used to replace the matrix.
- each convolutional layer is followed by a pooling layer, so that the output of the pooling layer That is, the maximum value in the matrix output by the adjacent convolutional layer.
- the fusion layer is used to combine the outputs of multiple pooling layers to obtain a new one-dimensional array.
- the input of the fully connected layer is the one-dimensional array of the output of the fusion layer. It is used to convert each number in the one-dimensional array into a preset total number of probability values of classification labels, wherein the converted probability values may be floating-point values. And, the magnitude of each probability value represents the likelihood that the target text corresponds to each classification label. Among them, the larger the probability value, the greater the possibility that the target file corresponds to the classification label represented by the probability value.
- the obtained probability values have relatively large numerical values. Therefore, the obtained probability values may be normalized so that the sum of the normalized probability values is 1.
- the output layer receives each probability value that is fully connected into the output, that is, receives a one-dimensional array containing the total number of classification labels.
- the subscript of each number in the one-dimensional array represents the classification number of a classification label
- the output layer can convert the classification number of the classification label into a user-recognizable classification label, that is, into an intention that the user can recognize .
- the embodiment of the present invention further provides an electronic device, as shown in FIG. 7 , including a processor 701, a communication interface 702, a memory 703 and a communication bus 704, Among them, the processor 701, the communication interface 702, and the memory 703 complete the communication with each other through the communication bus 704,
- the processor 701 is configured to implement the steps of any of the named entity identification methods provided by the foregoing embodiments of the present invention when executing the program stored in the memory 703 .
- the communication bus mentioned in the above electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like.
- PCI peripheral component interconnect standard
- EISA Extended Industry Standard Architecture
- the communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
- the communication interface is used for communication between the above electronic device and other devices.
- the memory may include random access memory (Random Access Memory, RAM), and may also include non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk storage. According to one or more embodiments, the memory may also be at least one storage device located remotely from the aforementioned processor.
- RAM Random Access Memory
- NVM non-Volatile Memory
- the above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processing, DSP), dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
- CPU Central Processing Unit
- NP Network Processor
- DSP Digital Signal Processing
- ASIC Application Specific Integrated Circuit
- FPGA Field-Programmable Gate Array
- FPGA Field-Programmable Gate Array
- a computer-readable storage medium is also provided, and a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps of any of the above named entity identification methods are implemented .
- a computer program product comprising instructions, which, when executed on a computer, cause the computer to perform the steps of any of the named entity recognition methods in the above embodiments.
- the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
- software it can be implemented in whole or in part in the form of a computer program product.
- the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated.
- the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
- the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
- the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media.
- the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
NER标签 | 含义 |
B-词槽类型 | B为Begin的简写,表示词槽开始 |
I-词槽类型 | I为Internal的简写,表示词槽中间 |
L-词槽类型 | L为Last的简写,表示词槽结尾 |
U-词槽类型 | U为Unique的简写,表示单字词槽 |
O | O为Other的简写,表示非词槽 |
字 | NER标签 |
给 | O |
我 | O |
放 | O |
一 | O |
首 | O |
刘 | B-人名 |
德 | I-人名 |
华 | L-人名 |
的 | O |
忘 | B-歌曲名 |
情 | I-歌曲名 |
水 | L-歌曲名 |
Claims (18)
- 一种命名实体识别方法,所述方法包括:获取待进行命名实体识别的目标文本;确定所述目标文本的第一分类标签,其中,所述第一分类标签用于表征所述目标文本所对应的用户意图;利用所述目标文本和所述第一分类标签构建目标表征矩阵;基于所述目标表征矩阵确定所述目标文本中每个字的命名实体识别NER标签,得到所述目标文本的命名实体识别结果。
- 根据权利要求1所述的方法,其中,所述利用所述目标文本和所述第一分类标签构建目标表征矩阵的步骤,包括:生成关于所述目标文本和所述第一分类标签的融合数组,其中,所述融合数组中的各元素为:所述目标文本中的每个字的索引值,以及所述第一分类标签所表征的虚拟字的索引值;生成所述融合数组对应的矩阵作为目标表征矩阵,其中,所述目标表征矩阵中的各元素为:所述融合数组中的每个索引值对应的字向量。
- 根据权利要求2所述的方法,其中,所述生成关于所述目标文本和所述第一分类标签的融合数组的步骤,包括:生成关于所述目标文本的第一数组,其中,所述第一数组中的各元素为:所述目标文本中每个字的索引值;构建关于所述第一分类标签的第二数组,确定所述第二数组所表征的虚拟字的索引值作为目标索引值,其中,所述第二数组中的各元素为:预设的各个分类标签的取值,所述第一分类标签为各个所述分类标签中的一个标签,所述第一分类标签的取值为第一设定值,除所述第一分类标签以外的各个其他分类标签的取值为第二设定值;将所述目标索引值添加到所述第一数组中的第一指定位置处,得到融合数组,其中,所述第一指定位置处包括:所述第一数组中的第一个元素之前,或者,所述第一数组中的最后一个元素之后。
- 根据权利要求1所述的方法,其中,所述利用所述目标文本和所述第一分类标签构建目标表征矩阵的步骤,包括:生成关于所述目标文本的第一数组,其中,所述第一数组中的各元素为:所述目标文本中每个字的索引值;生成所述第一数组对应的矩阵作为所述目标文本的初始矩阵,其中,所述初始矩阵中的各元素为:所述第一数组中的每个索引值对应的字向量;构建关于所述第一分类标签的第二数组,其中,所述第二数组中的各元素为:预设的各个分类标签的取值,所述第一分类标签为所述各个分类标签中的一个标签,所述第一分类标签的取值为第一设定值,除所述第一分类标签以外的各个其他分类标签的取值为第二设定值;利用所述第二数组对所述初始矩阵进行扩展,生成目标表征矩阵。
- 根据权利要求4所述的方法,其中,所述利用所述第二数组对所述初始矩阵进行扩展,生成目标表征矩阵的步骤,包括:确定所述第二数组所表征的虚拟字的索引值对应的字向量;将所确定的字向量添加至所述初始矩阵中的第二指定位置处,得到目标表征矩阵,其中,所述第二指定位置处包括:所述初始矩阵中的第一个元素之前,或者,所述初始矩阵中的最后一个元素之后。
- 根据权利要求4所述的方法,其中,所述利用所述第二数组对所述初始矩阵进行扩展,生成目标表征矩阵的步骤,包括:将所述第二数组添加至所述初始矩阵中的第三指定位置处,得到目标表征矩阵,其中,所述第三指定位置包括:表示字向量的一维数组中的第一个元素之前,或者,表示字向量的一维数组中的最后一个元素之后。
- 根据权利要求3或4所述的方法,其中,所述基于所述目标表征矩阵确定所述目标文本中每个字的命名实体识别NER标签,得到所述目标文本的命名实体识别结果的步骤,包括:基于所述目标表征矩阵确定所述目标文本的字特征矩阵,其中,所述字特征矩阵中包括:所述目标文本中每个字在所述目标文本的正向顺序中的字向量,以及所述目标文本中每个字在所述目标文本的反向顺序中的字向量;基于所述字特征矩阵,确定所述目标文本的标签矩阵,其中,所述标签矩阵用于表征:所述目标文本中每个字具有各个NER标签的概率;基于所述标签矩阵,确定所述目标文本中每个字的NER标签索引;将所述目标文本中每个字的NER标签索引转换为所述目标文本中每个字的NER标签,得到所述目标文本的命名实体识别结果。
- 根据权利要求3所述的方法,其中,所述利用所述目标文本和所述第一分类标签构建目标表征矩阵的步骤,以及所述基于所述目标表征矩阵确定所述目标文本中每个字的NER标签的步骤是通过预先训练的命名实体识别模型实现的;所述命名实体识别模型包括:串联相接的输入层、融合层、字嵌入层、双向LSTM层、全连接层、CRF层和输出层,其中,所述输入层、所述融合层和所述字嵌入层用于实现所述利用所述目标文本和所述第一分类标签构建目标表征矩阵的步骤,所述双向LSTM层、所述全连接层、所述CRF层和所述输出层用于实现所述基于所述目标表征矩阵确定所述目标文本中每个字的NER标签的步骤;其中,所述输入层,用于生成关于所述目标文本的第一数组;所述融合层,用于构建关于所述第一分类标签的第二数组,确定所述第二数组所表征的虚拟字的索引值作为目标索引值,将所述目标索引值添加到所述第一数组中的第一指定位置处,得到融合数组;所述字嵌入层,用于生成所述融合数组对应的矩阵作为目标表征矩阵;所述双向LSTM层,用于基于所述目标表征矩阵确定所述目标文本的字特征矩阵;所述全连接层,用于基于所述字特征矩阵,确定所述目标文本的标签矩阵;所述CRF层,用于基于所述标签矩阵,确定所述目标文本中每个字的NER标签索引;所述输出层,用于将所述目标文本中每个字的NER标签索引转换为所述目标文本中每个字的NER标签,得到所述目标文本的命名实体识别结果。
- 根据权利要求4所述的方法,其中,所述利用所述目标文本和所述第一分类标签,构建目标表征矩阵的步骤,以及所述基于所述目标表征矩阵,确定所述目标文本中每个字的NER标签的步骤是通过预先训练的命名实体识别模型实现的;所述命名实体识别模型包括:串联相接的输入层、字嵌入层、融合层、双向LSTM层、全连接层、CRF层和输出层,其中,所述输入层、所字嵌入层和所述融合层用于实现所述利用所述目标文本和所述第一分类标签构建目标表征矩阵的步骤,所述双向LSTM层、所述全连接层、所述CRF层和所述输出层用于实现所述基于所述目标表征矩阵确定所述目标文本中每个字的NER标签的步骤;其中,所述输入层,用于生成关于所述目标文本的第一数组;所述字嵌入层,用于生成所述第一数组对应的矩阵作为所述目标文本的初始矩阵;所述融合层,用于构建关于所述第一分类标签的第二数组,利用所述第二数组对所述初始矩阵进行扩展,生成目标表征矩阵;所述双向LSTM层,用于基于所述目标表征矩阵确定所述目标文本的字特征矩阵;所述全连接层,用于基于所述字特征矩阵,确定所述目标文本的标签矩阵;所述CRF层,用于基于所述标签矩阵,确定所述目标文本中每个字的NER标签索引;所述输出层,用于将所述目标文本中每个字的NER标签索引转换为所述目标文本中每个字的NER标签,得到所述目标文本的命名实体识别结果。
- 根据权利要求8或9所述的方法,其中,所述命名实体识别模型的训练方式,包括:获取待利用的样本文本、所述样本文本的第二分类标签和所述样本文本中每个字的NER标签真值,其中,所述第二分类标签用于表征所述样本文本所对应的用户意图;将所述样本文本和所述样本文本的第二分类标签输入所述命名实体识别模型,以使所述命名实体识别模型利用所述样本文本和所述第二分类标签,构建样本表征矩阵,并利用所述样本表征矩阵预测所述样本文本中每个字的NER标签预测值;基于所述样本标签的每个字的NER标签真值和NER标签预测值,判断所述命名实体识别模型是否收敛,如果是,结束训练,得到训练完成的所述命名实体识别模型;否则,调整所述命名实体识别模型中的模型参数,返回所述获取待利用的样本文本、所述样本文本的第二分类标签和所述样本文本中每个字的NER标签真值的步骤。
- 根据权利要求3或4所述的方法,其中,所述生成关于所述目标文本的第一数组的步骤包括:当所获取的目标文本中所包括的字的数量超过预设的最大长度时,丢弃所获取的目标文本中超过所述预设的最大长度的字,利用所保留的各个字生成所 述第一数组。
- 根据权利要求3或4所述的方法,其中,所述第一设定值为1,并且所述第二设定值为0。
- 根据权利要求5所述的方法,其中,在所述第二指定位置处多次添加所确定的所述虚拟字的索引值对应的字向量。
- 根据权利要求8或9所述的方法,其中,所生成的第一数组中的每个索引值为整数数值。
- 根据权利要求9所述的方法,其中,所述融合层利用所述第二数组对所述初始矩阵进行扩展,生成目标表征矩阵的步骤包括:确定所述第二数组所表征的虚拟字的索引值对应的字向量;将所确定的字向量添加至所述初始矩阵中的第二指定位置处,得到目标表征矩阵,其中,所述第二指定位置处包括:所述初始矩阵中的第一个元素之前,或者,所述初始矩阵中的最后一个元素之后。
- 根据权利要求9所述的方法,其中,所述融合层利用所述第二数组对所述初始矩阵进行扩展,生成目标表征矩阵的步骤包括:将所述第二数组添加至所述初始矩阵中的第三指定位置处,得到目标表征矩阵,其中,所述第三指定位置包括:表示字向量的一维数组中的第一个元素之前,或者,表示字向量的一维数组中的最后一个元素之后。
- 根据权利要求1所述的方法,其中,所述确定所述目标文本的第一分类标签的步骤是通过预先训练的意图分类模型实现的。
- 根据权利要求8或9所述的方法,其中,所述标签矩阵用于表征所述目标文本中每个字具有各个NER标签的概率。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010870524.1A CN111967264B (zh) | 2020-08-26 | 2020-08-26 | 一种命名实体识别方法 |
CN202010870524.1 | 2020-08-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022042125A1 true WO2022042125A1 (zh) | 2022-03-03 |
Family
ID=73390759
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/106650 WO2022042125A1 (zh) | 2020-08-26 | 2021-07-16 | 一种命名实体识别方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111967264B (zh) |
WO (1) | WO2022042125A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI817921B (zh) * | 2023-05-31 | 2023-10-01 | 明合智聯股份有限公司 | 模型建模指令生成方法及其系統 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111967264B (zh) * | 2020-08-26 | 2021-09-24 | 湖北亿咖通科技有限公司 | 一种命名实体识别方法 |
CN112765984A (zh) * | 2020-12-31 | 2021-05-07 | 平安资产管理有限责任公司 | 命名实体识别方法、装置、计算机设备和存储介质 |
CN113515946B (zh) * | 2021-06-22 | 2024-01-05 | 亿咖通(湖北)技术有限公司 | 信息处理方法及装置 |
CN113571052A (zh) * | 2021-07-22 | 2021-10-29 | 湖北亿咖通科技有限公司 | 一种噪声提取及指令识别方法和电子设备 |
CN114282538A (zh) * | 2021-11-24 | 2022-04-05 | 重庆邮电大学 | 基于bie位置词列表的中文文本数据字向量表征方法 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287283A (zh) * | 2019-05-22 | 2019-09-27 | 中国平安财产保险股份有限公司 | 意图模型训练方法、意图识别方法、装置、设备及介质 |
CN110502755A (zh) * | 2019-08-27 | 2019-11-26 | 湖北亿咖通科技有限公司 | 基于融合模型的字符串识别方法及计算机存储介质 |
CN110516247A (zh) * | 2019-08-27 | 2019-11-29 | 湖北亿咖通科技有限公司 | 基于神经网络的命名实体识别方法及计算机存储介质 |
CN110569332A (zh) * | 2019-09-09 | 2019-12-13 | 腾讯科技(深圳)有限公司 | 一种语句特征的提取处理方法及装置 |
CN111177394A (zh) * | 2020-01-03 | 2020-05-19 | 浙江大学 | 基于句法注意力神经网络的知识图谱关系数据分类方法 |
CN111967264A (zh) * | 2020-08-26 | 2020-11-20 | 湖北亿咖通科技有限公司 | 一种命名实体识别方法 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080052262A1 (en) * | 2006-08-22 | 2008-02-28 | Serhiy Kosinov | Method for personalized named entity recognition |
CN106874256A (zh) * | 2015-12-11 | 2017-06-20 | 北京国双科技有限公司 | 识别领域命名实体的方法及装置 |
CN109165384A (zh) * | 2018-08-23 | 2019-01-08 | 成都四方伟业软件股份有限公司 | 一种命名实体识别方法及装置 |
CN109902307B (zh) * | 2019-03-15 | 2023-06-02 | 北京金山数字娱乐科技有限公司 | 命名实体识别方法、命名实体识别模型的训练方法及装置 |
CN110807324A (zh) * | 2019-10-09 | 2020-02-18 | 四川长虹电器股份有限公司 | 一种基于IDCNN-crf与知识图谱的影视实体识别方法 |
-
2020
- 2020-08-26 CN CN202010870524.1A patent/CN111967264B/zh active Active
-
2021
- 2021-07-16 WO PCT/CN2021/106650 patent/WO2022042125A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287283A (zh) * | 2019-05-22 | 2019-09-27 | 中国平安财产保险股份有限公司 | 意图模型训练方法、意图识别方法、装置、设备及介质 |
CN110502755A (zh) * | 2019-08-27 | 2019-11-26 | 湖北亿咖通科技有限公司 | 基于融合模型的字符串识别方法及计算机存储介质 |
CN110516247A (zh) * | 2019-08-27 | 2019-11-29 | 湖北亿咖通科技有限公司 | 基于神经网络的命名实体识别方法及计算机存储介质 |
CN110569332A (zh) * | 2019-09-09 | 2019-12-13 | 腾讯科技(深圳)有限公司 | 一种语句特征的提取处理方法及装置 |
CN111177394A (zh) * | 2020-01-03 | 2020-05-19 | 浙江大学 | 基于句法注意力神经网络的知识图谱关系数据分类方法 |
CN111967264A (zh) * | 2020-08-26 | 2020-11-20 | 湖北亿咖通科技有限公司 | 一种命名实体识别方法 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI817921B (zh) * | 2023-05-31 | 2023-10-01 | 明合智聯股份有限公司 | 模型建模指令生成方法及其系統 |
Also Published As
Publication number | Publication date |
---|---|
CN111967264B (zh) | 2021-09-24 |
CN111967264A (zh) | 2020-11-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022042125A1 (zh) | 一种命名实体识别方法 | |
US11403680B2 (en) | Method, apparatus for evaluating review, device and storage medium | |
US11948058B2 (en) | Utilizing recurrent neural networks to recognize and extract open intent from text inputs | |
US11403345B2 (en) | Method and system for processing unclear intent query in conversation system | |
JP6909832B2 (ja) | オーディオにおける重要語句を認識するための方法、装置、機器及び媒体 | |
JP5901001B1 (ja) | 音響言語モデルトレーニングのための方法およびデバイス | |
WO2021051866A1 (zh) | 判案结果确定方法、装置、设备及计算机可读存储介质 | |
CN108416032B (zh) | 一种文本分类方法、装置及存储介质 | |
WO2021204017A1 (zh) | 文本意图识别方法、装置以及相关设备 | |
US20220261545A1 (en) | Systems and methods for producing a semantic representation of a document | |
WO2021190662A1 (zh) | 医学文献排序方法、装置、电子设备及存储介质 | |
WO2021159812A1 (zh) | 癌症分期信息处理方法、装置及存储介质 | |
WO2023130951A1 (zh) | 语音断句方法、装置、电子设备及存储介质 | |
CN101689198A (zh) | 使用规格化串的语音搜索 | |
CN112417878A (zh) | 实体关系抽取方法、系统、电子设备及存储介质 | |
WO2022022049A1 (zh) | 文本长难句的压缩方法、装置、计算机设备及存储介质 | |
US20230096070A1 (en) | Natural-language processing across multiple languages | |
CN114742062B (zh) | 文本关键词提取处理方法及系统 | |
CN112183114B (zh) | 模型训练、语义完整性识别方法和装置 | |
CN111831823B (zh) | 一种语料生成、模型训练方法 | |
CN111949765B (zh) | 基于语义的相似文本搜索方法、系统、设备和存储介质 | |
CN110276001B (zh) | 盘点页识别方法、装置、计算设备和介质 | |
CN114239601A (zh) | 语句的处理方法、装置及电子设备 | |
CN112528657A (zh) | 基于双向lstm的文本意图识别方法及装置、服务器和介质 | |
US20230252225A1 (en) | Automatic Text Summarisation Post-processing for Removal of Erroneous Sentences |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21859965 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21859965 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29.06.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21859965 Country of ref document: EP Kind code of ref document: A1 |