WO2022042125A1 - 一种命名实体识别方法 - Google Patents

一种命名实体识别方法 Download PDF

Info

Publication number
WO2022042125A1
WO2022042125A1 PCT/CN2021/106650 CN2021106650W WO2022042125A1 WO 2022042125 A1 WO2022042125 A1 WO 2022042125A1 CN 2021106650 W CN2021106650 W CN 2021106650W WO 2022042125 A1 WO2022042125 A1 WO 2022042125A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
target text
matrix
array
target
Prior art date
Application number
PCT/CN2021/106650
Other languages
English (en)
French (fr)
Inventor
米良
黄海荣
李林峰
孔晓泉
宋寒风
Original Assignee
湖北亿咖通科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 湖北亿咖通科技有限公司 filed Critical 湖北亿咖通科技有限公司
Publication of WO2022042125A1 publication Critical patent/WO2022042125A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs

Definitions

  • the present disclosure relates to the technical field of artificial intelligence algorithms, and in particular, to a named entity recognition method.
  • NER Named Entity Recognition
  • word slots can be understood as entities with specific meanings in the text, such as person names, place names, institution names, proper nouns, etc., and then, person names, place names, etc. are the types of word slots, and the positions of word slots are: The position of each word belonging to the same slot within the slot, for example, the beginning of the slot, the middle of the slot, and the end of the slot.
  • the NER tag of each word in the target text is determined based on the text content of the target text to be subjected to named entity recognition, and the named entity recognition result of the target text is obtained.
  • the representation matrix of the target text to be recognized by the named entity is output to the pre-trained model for recognition, and the recognition result is obtained.
  • One or more embodiments of the present invention provide a method for identifying a named entity, the method comprising:
  • the first classification label is used to represent the user intent corresponding to the target text
  • a named entity recognition NER tag of each word in the target text is determined, and a named entity recognition result of the target text is obtained.
  • the step of constructing a target representation matrix using the target text and the first classification label includes:
  • each element in the fusion array is: the index value of each word in the target text, and the first classification label. the index value of the virtual word represented;
  • a matrix corresponding to the fusion array is generated as a target representation matrix, wherein each element in the target representation matrix is: a word vector corresponding to each index value in the fusion array.
  • the step of generating a fusion array of the target text and the first classification label includes:
  • each element in the first array is: the index value of each word in the target text
  • each element in the second array is: each preset classification The value of the label, the first classification label is a label in each of the classification labels, the value of the first classification label is the first set value, and each other classification except the first classification label The value of the label is the second set value;
  • the target index value is added to a first specified position in the first array to obtain a fusion array, wherein the first specified position includes: before the first element in the first array, or , after the last element in the first array.
  • the step of constructing a target representation matrix using the target text and the first classification label includes:
  • each element in the first array is: the index value of each word in the target text
  • a matrix corresponding to the first array is generated as the initial matrix of the target text, wherein each element in the initial matrix is: a word vector corresponding to each index value in the first array;
  • a label, the value of the first classification label is the first set value, and the value of each other classification label except the first classification label is the second set value;
  • the initial matrix is extended with the second array to generate a target representation matrix.
  • the step of using the second array to expand the initial matrix to generate a target representation matrix includes:
  • the step of using the second array to expand the initial matrix to generate a target representation matrix includes:
  • the third specified position includes: before the first element in the one-dimensional array representing the word vector, or after the last element in the one-dimensional array representing the word vector.
  • the step of determining the named entity recognition NER tag of each word in the target text based on the target representation matrix, and obtaining the named entity recognition result of the target text includes:
  • a word feature matrix of the target text is determined based on the target representation matrix, wherein the word feature matrix includes: a word vector of each word in the target text in the forward order of the target text, and the word vector of each word in the target text in the reverse order of the target text;
  • a label matrix of the target text is determined, wherein the label matrix is used to represent: the probability that each word in the target text has a respective NER label;
  • the step of using the target text and the first classification label to construct a target representation matrix, and the step of determining the NER label of each word in the target text based on the target representation matrix are implemented by a pre-trained named entity recognition model;
  • the named entity recognition model includes: an input layer, a fusion layer, a word embedding layer, a bidirectional LSTM layer, a fully connected layer, a CRF layer and an output layer connected in series, wherein the input layer, the fusion layer and the The word embedding layer is used to realize the step of constructing a target representation matrix by using the target text and the first classification label, and the bidirectional LSTM layer, the fully connected layer, the CRF layer and the output layer are used for Realizing the step of determining the NER label of each word in the target text based on the target representation matrix;
  • the input layer is used to generate a first array about the target text
  • the fusion layer is used to construct a second array about the first classification label, determine the index value of the virtual word represented by the second array as a target index value, and add the target index value to the first index value.
  • a fusion array is obtained;
  • the word embedding layer is used to generate a matrix corresponding to the fusion array as a target representation matrix
  • the bidirectional LSTM layer is used to determine the word feature matrix of the target text based on the target representation matrix
  • the fully connected layer is used to determine the label matrix of the target text based on the word feature matrix, wherein the label matrix is used to represent the probability that each word in the target text has a respective NER label;
  • the CRF layer is used to determine the NER label index of each word in the target text based on the label matrix
  • the output layer is used to convert the NER label index of each word in the target text into the NER label of each word in the target text, and obtain the named entity recognition result of the target text.
  • the step of constructing a target representation matrix by using the target text and the first classification label, and the step of determining the character of each word in the target text based on the target representation matrix is achieved by a pre-trained named entity recognition model;
  • the named entity recognition model includes: an input layer, a word embedding layer, a fusion layer, a bidirectional LSTM layer, a fully connected layer, a CRF layer and an output layer connected in series, wherein the input layer, the word embedding layer and the The fusion layer is used to realize the step of constructing a target representation matrix by using the target text and the first classification label, and the bidirectional LSTM layer, the fully connected layer, the CRF layer and the output layer are used to realize The step of determining the NER label of each word in the target text based on the target representation matrix;
  • the input layer is used to generate a first array about the target text
  • the word embedding layer is used to generate a matrix corresponding to the first array as an initial matrix of the target text
  • the fusion layer is used to construct a second array about the first classification label, and use the second array to expand the initial matrix to generate a target representation matrix;
  • the bidirectional LSTM layer is used to determine the word feature matrix of the target text based on the target representation matrix
  • the fully connected layer is used to determine the label matrix of the target text based on the word feature matrix, wherein the label matrix is used to represent the probability that each word in the target text has a respective NER label;
  • the CRF layer is used to determine the NER label index of each word in the target text based on the label matrix
  • the output layer is used to convert the NER label index of each word in the target text into the NER label of each word in the target text, and obtain the named entity recognition result of the target text.
  • the training method of the named entity recognition model includes:
  • the named entity recognition model Based on the true value of the NER label and the predicted value of the NER label for each word of the sample label, determine whether the named entity recognition model converges, if so, end the training to obtain the trained named entity recognition model; otherwise, adjust the The model parameters in the named entity recognition model are returned to the step of obtaining the sample text to be used, the second classification label of the sample text, and the true value of the NER label of each word in the sample text.
  • the named entity recognition when the named entity recognition is performed on the target text to be recognized by the named entity, after the target text to be recognized by the named entity is acquired, a user is determined to represent the user corresponding to the target text The first classification label of the intention; then, the target representation matrix is constructed by using the target text and the first classification label; then, the NER label of each word in the target text can be determined based on the target representation matrix, and the named entity recognition of the target text can be obtained. result.
  • the above-mentioned target representation matrix is formed by using the target text and the first classification label representing the user's intention
  • the above-mentioned target representation matrix can represent the fusion result of the text content of the target text and the label content of the user's intention.
  • the information of the user's intention dimension can be added, so that in the recognition process, the relationship between the target text and the user's intention can be learned, the user's intention expressed by the target text can be determined, and then,
  • the NER tags of each word in the target text can be comprehensively considered from the two dimensions of text content and user intent.
  • the influence of the user intent corresponding to the target text on the word slot type can be considered, and further, texts in which the same word slot expresses different user intentions can be identified.
  • the word slot type in the target text can improve the accuracy of the recognition results obtained when performing named entity recognition on the target text.
  • FIG. 1 shows a schematic flowchart of a method for identifying a named entity according to one or more embodiments of the present invention.
  • FIG. 2 shows a schematic flowchart of a manner of generating a fusion array of target text and a first classification label according to one or more embodiments of the present invention.
  • FIG. 3 shows a schematic flowchart of a training manner of a named entity recognition model according to one or more embodiments of the present invention.
  • FIG. 4 shows a schematic structural diagram of a pre-trained named entity recognition model according to one or more embodiments of the present invention.
  • FIG. 5 shows a schematic structural diagram of a pre-trained named entity recognition model according to one or more embodiments of the present invention.
  • FIG. 6 shows a schematic structural diagram of an intent classification model according to one or more embodiments of the present invention.
  • FIG. 7 shows a schematic structural diagram of an electronic device according to one or more embodiments of the present invention.
  • the named entity recognition method according to one or more embodiments of the present invention can be applied to any type of electronic device, for example, a desktop computer, a notebook computer, a tablet computer, etc., which is not specifically limited in the embodiments of the present invention, hereinafter referred to as Electronic equipment.
  • this method can be applied to any scenario that requires named entity recognition.
  • the named entity recognition result of the annual report can be obtained, and the result is: the annual report text after adding NER tags to each word in the annual report; another example, in the field of traffic management, for a traffic accident report, the Identify word slots such as name, location, time, number of casualties, etc., and add NER tags to each word in the report.
  • the named entity recognition result of the report can be obtained, and the result is the report text after adding NER tags to each word in the report.
  • NER tags added to each word can be as shown in Table 1 below:
  • NER tag meaning B-word slot type B is short for Begin, indicating the beginning of the word slot I-Slot Type I is short for Internal, indicating the middle of the word slot L-word slot type L is the abbreviation of Last, indicating the end of the word slot U-word slot type U is the abbreviation of Unique, which means a single-word slot O O is short for Other, which means non-word slot
  • a "B-word slot type” can be added to each word in the word slot based on the word slot type of the word slot and the position of each word in the word slot ” label, “I-slot type” label, or “L-slot type” label; for words that are not recognized as word slots, a “non-word slot” label can be added to the word; for words that are recognized Word slot, you can add a "single word slot label" to the word.
  • the first classification label is used to represent the user intent corresponding to the target text
  • a named entity recognition NER tag of each word in the target text is determined, and a named entity recognition result of the target text is obtained.
  • the named entity recognition when the named entity recognition is performed on the target text to be recognized by the named entity, after the target text to be recognized by the named entity is acquired, a user is determined to represent the user corresponding to the target text The first classification label of the intention; then, the target representation matrix is constructed by using the target text and the first classification label; then, the NER label of each word in the target text can be determined based on the target representation matrix, and the named entity recognition of the target text can be obtained. result.
  • the above-mentioned target representation matrix is formed by using the target text and the first classification label representing the user's intention
  • the above-mentioned target representation matrix can represent the fusion result of the text content of the target text and the label content of the user's intention.
  • the information of the user's intention dimension can be added, so that in the recognition process, the relationship between the target text and the user's intention can be learned, the user's intention expressed by the target text can be determined, and then,
  • the NER tags of each word in the target text can be comprehensively considered from the two dimensions of text content and user intent.
  • the influence of the user intent corresponding to the target text on the word slot type can be considered, and further, texts in which the same word slot expresses different user intentions can be identified.
  • the word slot type in the target text can improve the accuracy of the recognition results obtained when performing named entity recognition on the target text.
  • FIG. 1 shows a schematic flowchart of a method for identifying a named entity according to one or more embodiments of the present invention. As shown in Figure 1, the method may include the following steps:
  • the target text to be recognized by the named entity is first acquired.
  • the above-mentioned target text can be obtained in various ways, for example, the above-mentioned target text manually input by the user can be obtained, or, for example, the user's voice information can be collected, so that the voice information can be converted into the above-mentioned target text, or, for example, The above-mentioned target text can also be obtained from other devices that are communicatively connected. It's all reasonable.
  • the first classification label is used to represent the user intent corresponding to the target text
  • a first classification label used to represent the user's intention corresponding to the target text may be further determined.
  • the above-mentioned first classification label can be determined in various ways.
  • the user intent corresponding to the above-mentioned target text input by the user can be obtained, so that the user's intention can be determined as the first classification label of the target text;
  • the intent The classification model can be: CNN (Convolutional Neural Networks, convolutional neural network) classification model, etc. It's all reasonable.
  • a target representation matrix can be constructed by using the target text and the first classification label.
  • the target representation matrix constructed above can represent the fusion result of the text content of the target text and the label content of the first classification label.
  • S104 Determine the named entity recognition NER tag of each word in the target text based on the target representation matrix, and obtain the named entity recognition result of the target text.
  • the NER label of each word in the target text can be determined based on the target representation matrix, and the named entity recognition result of the target text can be obtained.
  • the above-mentioned target representation matrix can fuse the text content of the target text and the label content of the user's intention.
  • the relationship between the target text and the user's intent can be learned, and the user's intent expressed by the target text can be determined.
  • the NER tag of each word in the target text can be comprehensively considered from the two dimensions of text content and user intent.
  • the word slot in the word slot when identifying the word slot type of the word slot, the word slot in the word slot can be identified from the two dimensions of the text content of the word slot and the user intent corresponding to the target text. The slot type is comprehensively considered, thereby improving the accuracy of the named entity recognition result of the target text.
  • a user intent for characterizing the target text corresponding to the target text is determined. Then, use the target text and the first classification label to construct a target representation matrix; and then, based on the target representation matrix, the NER label of each word in the target text can be determined, and the named entity recognition result of the target text can be obtained. .
  • the above-mentioned target representation matrix is formed by using the target text and the first classification label representing the user's intention
  • the above-mentioned target representation matrix can represent the fusion result of the text content of the target text and the label content of the user's intention.
  • the information of the user's intention dimension can be added, so that in the recognition process, the relationship between the target text and the user's intention can be learned, the user's intention expressed by the target text can be determined, and then,
  • the NER tags of each word in the target text can be comprehensively considered from the two dimensions of text content and user intent.
  • the influence of the user intent corresponding to the target text on the word slot type can be considered, and further, texts in which the same word slot expresses different user intentions can be identified.
  • the word slot type in the target text can improve the accuracy of the recognition results obtained when performing named entity recognition on the target text.
  • the above step S103 (the step of constructing a target representation matrix by using the target text and the first classification label) may include the following steps 11-12:
  • Step 11 Generate a fusion array about the target text and the first category label
  • each element in the fusion array is: the index value of each word in the target text, and the index value of the virtual word represented by the first classification label;
  • the obtained target text and the first classification label can be represented in the form of words. Furthermore, when performing named entity recognition on the target text, the target text and the first classification label need to be converted into mathematical representations. content.
  • a virtual word represented by each classification label can be preset, and the virtual word can be considered as a certain word corresponding to the index value of the intent category. Furthermore, an index value can be set in advance for each word that the text may involve and the virtual word represented by each classification label, so that after obtaining the above-mentioned target text and the first classification label, each word in the target text can be determined. The index value of each word and the index value of the virtual word represented by the first classification label, so that after each preset index value is determined, a fusion array including each of the above index values can be generated.
  • the index value of each word in the target text and the index value of the virtual word represented by the first classification label are respectively used as an element in the fusion array.
  • the number of index values included in the fusion array may be: the sum of the number of words included in the target text and the number of virtual words represented by the first classification label.
  • the number of virtual words represented by the first classification label is one.
  • the generated fusion array about the target text and the first classification label is a One-dimensional array.
  • Step 12 generate the matrix corresponding to the fusion array as the target representation matrix
  • each element in the target representation matrix is: a word vector corresponding to each index value in the fusion array.
  • the word vector corresponding to each index value in the fusion array can be determined. Therefore, after the word vector corresponding to all the index values in the fusion array is obtained, the word vector can be determined based on the determined word vectors. A matrix corresponding to the above fusion array is generated, thereby obtaining a target representation matrix.
  • the word vector corresponding to each index value may be a one-dimensional array, and the type of the one-dimensional array is floating-point data.
  • the generated target representation matrix includes multiple one-dimensional arrays
  • the target representation matrix includes a one-dimensional array.
  • the number of dimension arrays is the same as the number of index values included in the above fusion array.
  • each one-dimensional array in the target representation matrix may include a preset number of elements, for example, the preset number may be 128, and each one-dimensional array in the target representation matrix may include 128 elements.
  • the above-mentioned preset number may also be other values, which are not specifically limited in the embodiment of the present invention.
  • the above-mentioned step 11 (generating a fusion array about the target text and the first classification label) may include the following steps:
  • Step S111 generating a first array about the target text
  • each element in the first array is: the index value of each word in the target text
  • the index value of each word in the target text can be determined first, so that a first array about the target text is generated according to the obtained index value.
  • the index value of each word in the target text can be used as an element in the first array respectively.
  • the number of index values included in the first array obtained above may be: the number of words included in the target text.
  • the maximum length of the target text can be set, so that when the number of words included in the obtained target text to be subjected to named entity recognition exceeds the set maximum length, the target text can be discarded. Words that exceed the set maximum length in the target text to be recognized by the named entity, and then use the reserved words to generate the first array. For example, since the number of words included in one speech of a person usually does not exceed 70 words, the maximum length of the target text can be set to 70.
  • the target text used to form the above target representation matrix is the above obtained target text.
  • the target text for named entity recognition discards words exceeding the set maximum length, and the remaining target text, the number of words included in the remaining target text is the same as the set maximum length of the target text above.
  • the target text includes N (N ⁇ 1) words, and the set maximum length of the target text is P (P ⁇ N), then, the obtained target text to be recognized by named entity can be discarded. From the P+1th word to the Nth word of , the index values of the 1st to Pth words in the target text can be determined, so that a first array is generated based on the determined P index values.
  • Step S112 constructing a second array about the first classification label, and determining the index value of the virtual word represented by the second array as the target index value;
  • each element in the second array is: the preset value of each classification label, the first classification label is a label in each classification label, the value of the first classification label is the first set value, except the first classification label.
  • each classification label can be preset, and the above-determined first classification label of the target text is one of the preset classification labels.
  • the preset values of each classification label can be determined.
  • the value of the first classification label is the first set value
  • the value of each other classification label except the first classification label is the second set value. Therefore, the obtained second array is: a one-dimensional array composed of a first set value and at least one second set value, and the number of elements included in the second array is the above preset respective classification The total number of labels.
  • the first set value may be 1, and the second set value may be 0.
  • each classification label corresponds to an element in the second array.
  • the element corresponding to the first classification label in the second array can be set as the first set value, and the element corresponding to the first classification label in the second array can be divided by the first classification label.
  • Each other element other than the corresponding element is set as the second set value.
  • a second array related to the first classification label is obtained.
  • the virtual word represented by the second array can be determined according to the preset correspondence between the array and the virtual word, so that the virtual word represented by the second array can be further determined. index value to get the target index value.
  • Step S113 adding the target index value to the first specified position in the first array to obtain a fusion array
  • the first specified position includes: before the first element in the first array, or after the last element in the first array.
  • the target index value can be added to the first designated position in the first array to obtain a fusion array.
  • the target index value can be added before the first element in the first array, or the target index value can be added after the last element in the first array; of course, the target index value can also be added to the first element at other specified positions in the array.
  • the number of index values in the fusion array is increased by one, and the newly-added index value is the above-mentioned target index value.
  • the above-mentioned step S103 (the step of constructing a target representation matrix by using the target text and the first classification label) may include the following steps 21-23:
  • Step 21 generate a first array about the target text
  • each element in the first array is: the index value of each word in the target text
  • the index value of each word in the target text can be determined first, so that a first array about the target text is generated according to the obtained index value.
  • the index value of each word in the target text can be used as an element in the first array respectively.
  • the number of index values included in the first array obtained above may be: the number of words included in the target text.
  • the maximum length of the target text can be set, so that when the number of words included in the obtained target text to be subjected to named entity recognition exceeds the set maximum length, the target text can be discarded. Words that exceed the set maximum length in the target text to be recognized by the named entity, and then use the reserved words to generate the first array. For example, since the number of words included in one speech of a person usually does not exceed 70 words, the maximum length of the target text can be set to 70.
  • the target text used to form the above target representation matrix is the above obtained target text to be identified.
  • the target text for named entity recognition discards words exceeding the set maximum length, and the remaining target text, the number of words included in the remaining target text is the same as the set maximum length of the target text above.
  • the target text includes N (N ⁇ 1) words, and the set maximum length of the target text is P (P ⁇ N), then, the obtained target text to be recognized by named entity can be discarded. From the P+1th word to the Nth word of , the index values of the 1st to Pth words in the target text can be determined, so that a first array is generated based on the determined P index values.
  • Step 22 generate the matrix corresponding to the first array as the initial matrix of the target text
  • each element in the initial matrix is: the word vector corresponding to each index value in the first array
  • the word vector corresponding to each index value in the first array can be determined. Therefore, after the word vector corresponding to all the index values in the above-mentioned first array is obtained, it can be determined based on the determined word vector. Each word vector generates a matrix corresponding to the above-mentioned first array, thereby obtaining an initial matrix of the target text.
  • the number of rows of the obtained initial matrix of the target text is: the number of index values included in the first array; the number of columns of the obtained initial matrix of the target text is: the determined word vector corresponding to each index value The number of elements included. That is to say, each row of the initial matrix of the target text may be a word vector determined above, so that each row of the initial matrix of the target text may correspond to an index value in the above-mentioned first array, and further, the initial value of the target text Each row of the matrix can correspond to a word in the target text.
  • the obtained initial matrix of the target text includes the word vector corresponding to each index value in the first array
  • the obtained The initial matrix of the target text includes the word vector corresponding to each word in the target text.
  • the number of word vectors included in the initial matrix of the target text is the same as the number of index values included in the above-mentioned first array, that is, the number of word vectors included in the initial matrix of the target text is the same as the number of words included in the target text. The same amount.
  • Step 23 Construct a second array about the first classification label
  • each element in the second array is: the preset value of each classification label
  • the first classification label is a label in each classification label
  • the value of the first classification label is the first set value, except the first classification label.
  • the value of each other classification label that a classification label is supposed to take is the second set value.
  • each classification label can be preset, and the above-determined first classification label of the target text is one of the preset classification labels.
  • the preset values of each classification label can be determined.
  • the value of the first classification label is the first set value
  • the value of each other classification label except the first classification label is the second set value.
  • the obtained second array is: a one-dimensional array composed of a first set value and at least one second set value, and the number of numbers included in the second array is each of the above preset classifications The total number of labels.
  • the first set value may be 1, and the second set value may be 0.
  • each classification label corresponds to an element in the second array.
  • the element corresponding to the first classification label in the second array can be set as the first set value, and the element corresponding to the first classification label in the second array can be divided by the first classification label.
  • Each other element other than the corresponding element is set as the second set value.
  • a second array related to the first classification label is obtained.
  • Step 24 Expand the initial matrix with the second array to generate a target representation matrix.
  • the above-mentioned initial matrix can be expanded by using the above-mentioned second array, thereby generating a target representation matrix.
  • the above-mentioned step 24 may include the following steps 241A-242A:
  • Step 241A Determine the word vector corresponding to the index value of the virtual word represented by the second array
  • Step 242A adding the determined word vector to the second specified position in the initial matrix to obtain a target representation matrix
  • the second specified position includes: before the first element in the initial matrix, or after the last element in the initial matrix.
  • the number of rows of the initial matrix is: the number of index values included in the first array; the number of columns of the initial matrix is: each index value determined The number of elements contained in the corresponding word vector.
  • each row of the initial matrix of the target text can be a word vector determined above, therefore, before the first element in the above initial matrix is: before the first row in the initial matrix; After the last element can be: After the last row in the initial matrix.
  • the word vector determined in the above step 242A can be used as the first row in the obtained target representation matrix, and in the target representation matrix, each row in the initial matrix is moved down by one row , so that the target representation matrix has one more row than the initial matrix;
  • the word vector determined in the above step 242A can be used as the last row in the obtained target representation matrix, and in the target representation matrix, the number of rows of each row in the initial matrix remains unchanged, so that the target representation matrix One more row than the number of rows in the initial matrix.
  • the initial matrix includes 20 word vectors, and each word vector can be a one-dimensional array, and the number of elements included in the one-dimensional array can be 128, then the word corresponding to the index value of the virtual word represented by the second array can be The vector is added before the first row of the initial matrix or after the last row of the initial matrix, so that the resulting target representation matrix includes 21 one-dimensional arrays, and the number of elements included in each one-dimensional array is 128. That is, the initial matrix is a matrix with 20 rows and 128 columns, and the target representation matrix is a matrix with 21 rows and 128 columns.
  • the second array represents When the word vector corresponding to the index value of the virtual word is added to the specified position of the initial matrix, it can be repeatedly added at the specified position for many times.
  • each row in the initial matrix can be moved down by T (T>1) rows, so that the second array represented by the
  • the word vector corresponding to the index value of the virtual word is used as the content in each row before the first row of the initial matrix; in this way, the obtained target representation matrix has T rows more than the initial matrix, and the first row in the target representation matrix is Lines 1 to T are the same, and they are all word vectors corresponding to the index values of the virtual words represented by the second array, that is, the target representation matrix may have T indices of the virtual words represented by the second array.
  • the word vector corresponding to the value is the same.
  • the initial matrix includes 20 word vectors, and each word vector can be a one-dimensional array, and the number of elements included in the one-dimensional array can be 128, then the virtual word represented by the second array can be The word vector corresponding to the index value is added before the first row of the initial matrix, and 3 rows are added repeatedly.
  • the obtained target representation matrix includes 23 one-dimensional arrays, and the number of elements included in each one-dimensional array is 128. That is to say, the initial matrix is a matrix with 20 rows and 128 columns, the target representation matrix is a matrix with 23 rows and 128 columns, and the 1st, 2nd and 3rd rows of the target representation matrix are the same.
  • the word vector corresponding to the index value of the virtual word represented by the second array is the initial matrix with 20 rows and 128 columns, the target representation matrix is a matrix with 23 rows and 128 columns, and the 1st, 2nd and 3rd rows of the target representation matrix are the same.
  • the index values corresponding to the virtual words represented by T (T>1) second arrays may be repeatedly added.
  • the obtained target representation matrix has T rows more than the initial matrix, and the last T rows in the target representation matrix are the same, which are all corresponding to the index value of the virtual word represented by the second array , that is, word vectors corresponding to the index values of the virtual words represented by the T second arrays may exist in the target representation matrix.
  • the initial matrix includes 20 word vectors, and each word vector can be a one-dimensional array, and the number of elements included in the one-dimensional array can be 128, then the virtual word represented by the second array can be The word vector corresponding to the index value is added after the last row of the initial matrix, and 3 rows are added repeatedly.
  • the obtained target representation matrix includes 23 one-dimensional arrays, and the number of elements included in each one-dimensional array is 128. That is to say, the initial matrix is a matrix with 20 rows and 128 columns, the target representation matrix is a matrix with 23 rows and 128 columns, and the 21st, 22nd and 23rd rows of the target representation matrix are the same.
  • the word vector corresponding to the index value of the virtual word represented by the second array is the initial matrix with 20 rows and 128 columns, the target representation matrix is a matrix with 23 rows and 128 columns, and the 21st, 22nd and 23rd rows of the target representation matrix are the same.
  • the virtual word represented by the above-mentioned second array can be determined according to the preset correspondence between the array and the virtual word, so that it is possible to further The index value of the virtual word represented by the second array is determined. Further, the word vector corresponding to the index value of the virtual word represented by the second array can be determined.
  • the number of elements included in the word vector corresponding to the index value of the virtual word represented by the second array is the same as the number of elements included in the word vector in the initial matrix of the target text.
  • the word vector corresponding to the index value of the virtual word represented by the second array can be added to the second specified position in the initial matrix of the target text, thereby realizing the expansion of the initial matrix of the target text, and,
  • the expanded initial matrix is the target representation matrix.
  • the word vector corresponding to the index value of the virtual word represented by the second array may be added before the first element in the initial matrix, or the word vector corresponding to the index value of the virtual word represented by the second array may be added. After the last element in the initial matrix; of course, the word vector corresponding to the index value of the virtual word represented by the second array may also be added to other specified positions in the initial matrix.
  • the number of word vectors in the obtained target representation matrix is increased by at least one, and at least one newly added word vector is the word corresponding to the index value of the virtual word represented by the second array vector.
  • the above step 24 may include the following step 241B:
  • Step 241B adding the second array to the third specified position in the initial matrix to obtain the target representation matrix
  • the third specified position includes: before the first element in the one-dimensional array representing the word vector, or after the last element in the one-dimensional array representing the word vector.
  • the initial matrix includes a plurality of word vectors, and each word vector is represented by a one-dimensional array, and each one-dimensional array of each word vector is represented.
  • An array can contain multiple elements.
  • the number of rows of the initial matrix is: the number of index values included in the first array; the number of columns of the initial matrix is: the number of elements included in the determined word vector corresponding to each index value, then the initial Each row of the matrix may be a one-dimensional array representing word vectors as determined above.
  • each row in the initial matrix is before the element located in the first column, that is, it is assumed that the second array includes Q (Q>0) elements , then take the Q elements included in the second array as the elements located in the first column to the Qth column in each row of the obtained target representation matrix, and put the target representation matrix, the initial matrix
  • Each column is shifted to the right by Q columns, so that the target representation matrix has more Q columns than the initial matrix;
  • the last element in the one-dimensional array representing the word vector is: after the element located in the last column in each row of the initial matrix, that is, it is assumed that the number of columns of the initial matrix is R (R>0), and the second array
  • the Q elements included in the above-mentioned second array are taken as the elements located in the R+1th column to the R+Qth column in each row of the obtained target representation matrix, and the target representation In the matrix, the number of columns of each column in the initial matrix remains unchanged, so that the target representation matrix has more Q columns than the initial matrix.
  • the initial matrix of the target text may be expanded directly by using the second array. That is, the second array can be directly added to the third specified position in the initial matrix to obtain the target representation matrix.
  • the second array can be added before the first element in the one-dimensional array representing the word vector in the initial matrix, that is, the second array is added before the element located in the first column in each row of the initial matrix, or the second array can be added
  • the second array is added after the last element in the one-dimensional array representing the word vector in the initial matrix, that is, after the element in the last column of each row in the initial matrix; of course, the second array can also be added to other specified positions in the initial matrix.
  • each word vector included in the initial matrix of the target text is a word vector corresponding to each word in the target text; in this way, when a second array is added to each one-dimensional array representing word vectors in the initial matrix, the The user intent corresponding to the target text can be added to the target text, thereby realizing the fusion of the text content of the target text and the label content representing the user's intent.
  • each of the obtained target representation matrix represents the word vector in the one-dimensional array of word vectors.
  • the number of included elements increases the total amount of the above-mentioned preset classification labels, and the newly added element is the above-mentioned second array.
  • the initial matrix includes 20 one-dimensional arrays representing word vectors, and the number of elements included in each one-dimensional array representing word vectors is 128, and the total number of preset classification labels is 10, then the obtained target
  • the representation matrix includes 20 one-dimensional arrays representing word vectors, and the number of elements included in each one-dimensional array representing word vectors is 138.
  • the above-mentioned step S104 determines the named entity recognition NER tag of each word in the target text, and the step of obtaining the named entity recognition result of the target text may include the following steps 31-34:
  • Step 31 Determine the word feature matrix of the target text based on the target representation matrix
  • the word feature matrix includes: the word vector of each word in the target text in the forward order of the target text, and the word vector of each word in the target text in the reverse order of the target text;
  • the word feature matrix of the target text can be determined based on the target representation matrix.
  • the content of each word used to characterize the target text and the content of the first classification label characterizing the target text are obtained by analyzing the target representation matrix. Therefore, according to the forward order and reverse order of each word in the target text, and the contextual semantic relationship between each word in the target text and other words, we can use the analysis obtained to characterize each word included in the target text. content, determine the word vector of each word in the target text in the forward order of the target text, and the word vector of each word in the target text in the reverse order of the target text, thus, get the word feature matrix of the target text .
  • Step 32 Determine the label matrix in the target text based on the word feature matrix
  • the label matrix is used to represent: the probability that each word in the target text has each NER label;
  • the word feature matrix of the target text After obtaining the word feature matrix of the target text, it is possible to determine the content of the first classification label representing the target text and the determined word feature matrix of the target text according to the content of the first classification label that characterizes the target text analyzed from the target representation matrix.
  • the possible NER tags of each word, and the probability that the word has each NER tag thus, a tag matrix is obtained that characterizes the probability that each word in the target text has each NER tag.
  • the relationship between the content representing each word in the target text and the content representing the first classification label of the target text obtained from the analysis of the target representation matrix is used. , that is to say, the probability that each word in the target text represented by the determined label matrix of the target text has each NER label is determined under the dimension of user intent.
  • Step 33 Based on the label matrix, determine the NER label index of each word in the target text;
  • the final NER label of each word in the target text can be determined based on the label matrix, and the NER label index of the final NER label can be further determined. NER tag index for each word.
  • Step 34 Convert the NER label index of each word in the target text to the NER label of each word in the target text, and obtain the named entity recognition result of the target text.
  • the target text can be further The NER tag index of each word is converted into the NER tag of each word in the target text, thereby improving readability and enabling the user to finally obtain the named entity recognition result of the target text.
  • the named entity recognition method may be implemented by a pre-trained named entity recognition model. That is, according to one or more embodiments, the above-mentioned step S103 (using the target text and the first classification label to construct the target representation matrix) and the above-mentioned step S104 (determining the NER label of each word in the target text based on the target representation matrix) are performed by the above naming. Implemented by the entity recognition model.
  • the above named entity recognition model includes: an input layer connected in series, a fusion layer, a word embedding layer, and a bidirectional LSTM (Long Short-Term Memory) layer , fully connected layer, CRF (conditional random field, conditional random field) layer and output layer;
  • LSTM Long Short-Term Memory
  • CRF conditional random field, conditional random field
  • the input layer, the fusion layer and the word embedding layer are used to realize the step of constructing the target representation matrix using the target text and the first classification label; the bidirectional LSTM layer, the fully connected layer, the CRF layer and the output layer are used to realize the target text based on the target representation matrix. Steps for the NER label of each word in ;
  • the input layer is used to generate the first array about the target text according to the target text
  • the fusion layer is used to construct a second array about the first classification label, determine the index value of the virtual word represented by the second array as the target index value, and add the target index value to the first designated position in the first array, get the fusion array;
  • the word embedding layer is used to generate the matrix corresponding to the fusion array as the target representation matrix
  • the bidirectional LSTM layer is used to determine the word feature matrix of the target text based on the target representation matrix
  • the fully connected layer is used to determine the label matrix of the target text based on the word feature matrix, and the label matrix is used to represent the probability that each word in the target text has each NER label;
  • the CRF layer is used to determine the NER label index of each word in the target text based on the label matrix
  • the output layer is used to convert the NER label index of each word in the target text into the NER label of each word in the target text, and obtain the named entity recognition result of the target text.
  • the above-mentioned pre-trained named entity recognition model is: a pre-trained NER model based on bidirectional LSTM+CRF. specific:
  • the target text and the first classification label can be input into the pre-trained named entity recognition model.
  • an index value of each word is preset in the input layer, wherein the format of the index value of each word may be a one-hot (one-hot code) format.
  • the input layer may first start from the first word of the target text, and sequentially determine the index value of each word in the target text in one-hot format.
  • the output layer can generate a first array about the target text.
  • the first array includes an index value of each word in the target text, and the number of the index values included is the same as the number of words included in the target text.
  • the maximum length of the target text may be set to 70. In this way, when the number of words included in the acquired target text to be subjected to named entity recognition exceeds 70, each word after the 70th word is discarded. Therefore, when the number of words included in the obtained target text to be recognized by the named entity exceeds 70, the obtained first array includes 70 index values.
  • the maximum length of the above-mentioned target text may also be set to other specific values, which are not specifically limited in this embodiment of the present invention.
  • each index value in the generated first array is an integer value.
  • the input layer can use the first array as an output, so as to input the first array into the fusion layer.
  • the above-mentioned target text and the first classification label when the above-mentioned target text and the first classification label are input into the above-mentioned pre-trained named entity recognition model, the above-mentioned target text and the first classification label may be input into the above-mentioned input layer.
  • the input layer since the input layer does not process the first classification label, the input layer can also use the first classification label as an output, thereby inputting the first classification label to the fusion layer; the above target text and the first classification label can also be Input to the above input layer and fusion layer respectively.
  • the fusion layer can first construct a second array about the first classification label.
  • the above-mentioned first set value can be set to 1
  • the above-mentioned second set value can be set to 0, and the total number of classification labels is 10, and the first classification label is the 5th classification label among the 10 classification labels, then
  • the second array can be obtained: [0,0,0,0,1,0,0,0,0,0].
  • the virtual word represented by the above-mentioned second array can be determined, and then the index value of the one-hot format of the virtual word can be determined, thereby, the index value of the virtual word represented by the second array of the first classification label is obtained, That is, the target index value is obtained.
  • the fusion layer can add the above target index value to the first specified position in the first array to obtain a fusion array.
  • the obtained fusion result is still a one-dimensional array, and the number of index values included in the fusion result is the sum of the number of words included in the target text and the number of virtual words represented by the second array. Typically, the number of virtual words represented by the second array is one.
  • the above-mentioned first array includes the index value of each word in the target text
  • the above-mentioned target index value is: the index value of the virtual word represented by the second array of the first classification label. Therefore, the above-mentioned The obtained fusion array includes: the index value of each word in the target text and the index value of the virtual word represented by the first classification label.
  • the fusion method provided by this embodiment is equivalent to: taking the first classification label as a virtual word, thereby expanding the target text into "virtual word+target text", and then, for the expanded "virtual word+target text” Text" to perform index value conversion to obtain the above fusion array.
  • the fusion layer can use the fusion array obtained above as an output, so that the fusion array is input to the word embedding layer.
  • the so-called word embedding means that each word is represented by a one-dimensional array including multiple elements, wherein each element is a number. For example, using a one-dimensional array including 128 elements to represent each word, that is, using a A one-dimensional array of numbers representing each word.
  • the word embedding layer can determine the word vector corresponding to each index value in the obtained fusion array, thereby generating a target representation matrix based on the determined word vectors.
  • the number of elements included in the word vector corresponding to each determined index value is a preset number.
  • the word embedding layer can use the obtained target representation matrix as an output, thereby inputting the above target representation matrix to the above-mentioned bidirectional LSTM layer.
  • the LSTM layer is a neural network model that considers every word in the target text when processing the text. For example, when the LSTM layer processes the text "I want to listen to Andy Lau's Wang Qingshui", the last word it gets is “Forgetting Qingshui", and before “Forgetting Qingshui", it also gets two words “I want to listen” and “Andy Lau” Therefore, the LSTM layer considers the factors of "I want to listen” and "Andy Lau” when recognizing the word slot for "Wangqingshui", and thus, combined with the context in the text, recognizes that "Wangqingshui” may be a Song name.
  • LSTM Because if one-way LSTM is used, information about the order of words and words in the text may be lost. For example, there is no way to differentiate between "I love you” and "You love me.” Therefore, according to one or more embodiments, a bidirectional LSTM layer is adopted, so that the recognition results in the forward direction and the reverse direction can be combined to obtain the order relationship of each character and word in the text.
  • the input to the bidirectional LSTM layer is the target representation matrix obtained by the word embedding layer described above.
  • the target representation matrix may be represented as [1+X,Y].
  • X represents the number of words in the target text
  • Y represents the number of elements included in each word vector in the target representation matrix.
  • the output of the bidirectional LSTM layer is the word feature matrix of the target text.
  • the target representation matrix is represented as [1+X, Y]
  • the word feature matrix of the target text output by the bidirectional LSTM layer can be represented as: [2*(1+X), HIDDENUNIT].
  • the word feature matrix of the target text output by the bidirectional LSTM layer includes 2*(1+X) one-dimensional arrays, and the number of elements included in each one-dimensional array is HIDDENUNIT.
  • the number of one-dimensional arrays included in the word feature matrix of the obtained target text is twice the number of one-dimensional arrays included in the target representation matrix. times, and each one-dimensional array included in the word feature matrix of the target text is a one-dimensional array with a set length, that is, each one-dimensional array includes a preset number of elements.
  • each one-dimensional array included in the word feature matrix of the target text is a word vector
  • the word feature matrix of the target text includes: the word vector of each word in the target text in the forward order of the target text, and the word vector for each word in the target text in the reverse order of the target text.
  • the bidirectional LSTM layer can use the word feature matrix as an output, so that the word feature matrix is input to the fully connected layer.
  • the fully connected layer includes two functions: dimension transformation and feature extraction. In this way, after obtaining the word feature matrix of the target text, the fully connected layer can determine the label matrix of the target text based on the word feature matrix.
  • the word feature matrix [2*(1+X), HIDDENUNIT] of the target text output by the bidirectional LSTM as an example, after the fully connected layer, the word feature matrix [2*(1+X), HIDDENUNIT] can be converted into a target
  • the label matrix of the text, and the label matrix can be expressed as [(1+X), OUTPUTDIM].
  • the label matrix [(1+X), OUTPUTDIM] of the above target text includes (1+X) one-dimensional vectors, and each one-dimensional vector can represent: about the virtual word represented by the second array of the first classification label.
  • the label vector and the label vector for each word in the target text are (1+X) one-dimensional vectors, and each one-dimensional vector can represent: about the virtual word represented by the second array of the first classification label. The label vector and the label vector for each word in the target text.
  • the NER label of the virtual word represented by the second array of the first classification label is "XX", for example, it can be "O" in the above Table 1, that is, the NER label of the virtual word represented by the second array of the first classification label
  • Virtual words are non-word slots.
  • the label vector of each word in the target text can represent: the word corresponds to OUTPUTDIM values, and the OUTPUTDIM values are the number of NER labels that the word may have, indicating that the word can have OUTPUTDIM probability values, where, The size of each probability value represents the possibility that the word belongs to the NER label corresponding to the probability value. The larger the probability value, the greater the probability that the word has the NER label corresponding to the probability value.
  • the fully connected layer can determine the label matrix of the target text based on the word feature matrix of the target text, and the label matrix can be used to represent: the probability that each word in the target text has a respective NER label. That is, according to the label matrix of the target text, the probability that each word in the target text has a respective NER label can be determined.
  • the fully connected layer can take the label matrix of the target text as an output, so that the label matrix of the target text is input to the CRF layer.
  • the CRF layer can be understood as a Viterbi decoding layer. Further, after receiving the label matrix of the target text, the CRF layer can use the preset transition matrix to calculate the sum of each link of the label matrix. And get the link with the largest sum value, thus, get the path with the greatest possibility. In this way, the NER label of each word in the target text can be determined, thereby obtaining the NER label index of each word in the target text.
  • the CRF layer can use the NER tag index of each word in the target text as an output, thereby inputting the NER tag index of each word in the target text into the output layer.
  • the output layer can convert the NER tag index of each word in the target text into the NER tag index of each word in the target text.
  • NER tags of words wherein the NER tags of each word in the target text can be represented by NER tag strings, so that the readability can be improved.
  • the NER label of each word can be represented in the form of Table 2 above.
  • a named entity recognition method provided by the embodiments of the present invention may be implemented by a pre-trained named entity recognition model. That is, according to one or more embodiments, in the above-mentioned step S103, a target representation matrix is constructed using the target text and the first classification label; and in the above-mentioned step S104, it is determined based on the target representation matrix that the NER label of each word in the target text is obtained through the above-mentioned named entity. Identify what the model implements.
  • the above-mentioned named entity recognition model includes: an input layer, a word embedding layer, a fusion layer, a bidirectional LSTM layer, a fully connected layer, a CRF layer and an output layer connected in series;
  • the input layer, the word embedding layer and the fusion layer are used to realize the step of constructing the target representation matrix using the target text and the first classification label;
  • the bidirectional LSTM layer, the fully connected layer, the CRF layer and the output layer are used to realize the target based on the target representation matrix. Determine the target Steps for NER tags for each word in the text;
  • the input layer is used to generate the first array about the target text according to the target text
  • the word embedding layer is used to generate the matrix corresponding to the first array as the initial matrix of the target text
  • the fusion layer is used to construct a second array about the first classification label, and use the second array to expand the initial matrix to generate a target representation matrix;
  • the bidirectional LSTM layer is used to determine the word feature matrix of the target text based on the target representation matrix
  • the fully connected layer is used to determine the label matrix of the target text based on the word feature matrix, and the label matrix is used to represent the probability that each word in the target text has each NER label;
  • the CRF layer is used to determine the NER label index of each word in the target text based on the label matrix
  • the output layer is used to convert the NER label index of each word in the target text into the NER label of each word in the target text, and obtain the named entity recognition result of the target text.
  • the above-mentioned pre-trained named entity recognition model is: a pre-trained NER model based on bidirectional LSTM+CRF. specific:
  • the target text and the first classification label can be input into the pre-trained named entity recognition model.
  • an index value of each word is preset in the input layer, and the format of the index value of each word may be a one-hot format.
  • the input layer may first start from the first word of the target text, and sequentially determine the index value of each word in the target text in one-hot format.
  • the output layer can generate a first array about the target text.
  • the first array includes an index value of each word in the target text, and the number of the index values included is the same as the number of words included in the target text.
  • the maximum length of the target text may be set to 70. In this way, when the number of words included in the acquired target text to be subjected to named entity recognition exceeds 70, each word after the 70th word is discarded. Therefore, when the number of words included in the obtained target text to be recognized by the named entity exceeds 70, the obtained first array includes 70 index values.
  • the maximum length of the above-mentioned target text may also be set to other specific values, which are not specifically limited in this embodiment of the present invention.
  • each index value in the generated first array is an integer value.
  • the input layer can use the first array as an output, so that the first array is input to the word embedding layer.
  • the so-called word embedding means that each word is represented by a one-dimensional array including multiple elements, wherein each element is a number. For example, using a one-dimensional array including 128 elements to represent each word, that is, using a A one-dimensional array of numbers representing each word.
  • the word embedding layer can determine the word vector corresponding to each index value in the obtained first array, so as to generate the target text based on each determined word vector.
  • the number of elements included in the word vector corresponding to each determined index value is a preset number.
  • the word embedding layer can take the initial matrix of the target text as an output, thereby inputting the initial matrix of the target text to the above-mentioned fusion layer.
  • the above-mentioned target text and the first classification label when the above-mentioned target text and the first classification label are input into the above-mentioned pre-trained named entity recognition model, the above-mentioned target text and the first classification label may be input into the above-mentioned input layer.
  • the input layer since the input layer does not process the first classification label, the input layer can also use the first classification label as an output, so that the first classification label is input to the word embedding layer, and further, because the word embedding layer is not correct
  • the first classification label is processed, so that the word embedding layer can also use the first classification label as an output, so that the first classification label is input into the fusion layer; the target text and the first classification label can also be input into in the above input layer and fusion layer.
  • the fusion layer can first construct a second array of the first classification label.
  • the above-mentioned first set value can be set to 1
  • the above-mentioned second set value can be set to 0, and the total number of classification labels is 10, and the first classification label is the 5th classification label among the 10 classification labels, then
  • the second array that can be obtained: [0,0,0,0,1,0,0,0,0,0].
  • the initial matrix of the target text can be expanded by using the second array, thereby obtaining the target representation matrix.
  • the fusion layer may perform steps 241A-242A described above.
  • the virtual word represented by the second array can be determined, and further, the index value of the virtual word in one-hot format can be determined, so as to obtain the index value of the virtual word represented by the second array of the first classification label. Further, the word vector corresponding to the index value of the virtual word represented by the second array can be determined.
  • the word vector corresponding to the index value of the virtual word represented by the second array can be added to the second specified position in the initial matrix of the target text, thereby realizing the expansion of the initial matrix of the target text, and,
  • the expanded initial matrix is the target representation matrix.
  • the initial matrix of the target text includes a word vector corresponding to the index value of each word in the target text. Therefore, the target representation matrix includes: the word corresponding to the index value of each word in the target text. vector A word vector corresponding to the index value of the virtual word represented by the first classification label. Therefore, the above-mentioned target representation matrix includes: a word vector corresponding to each word in the target text and a word vector corresponding to the virtual word represented by the first classification label.
  • the fusion method provided by this embodiment is equivalent to: taking the first classification label as a virtual word, thereby expanding the target text into "virtual word+target text", and then, for the expanded "virtual word+target text” Text” to perform word embedding transformation to obtain the above target representation matrix, and the NER label of the “virtual word” is “XX”, for example, it can be “O” in the above Table 1, that is, the second classification label for the first classification label.
  • the virtual word represented by the array is a non-word slot.
  • the fusion layer may perform step 241B described above.
  • the initial matrix of the target text can be expanded directly by using the second array. That is, the second array can be directly added to the third specified position in the initial matrix to obtain the target representation matrix.
  • the initial matrix of the target text includes a word vector corresponding to the index value of each word in the target text, that is, the initial matrix of the target text includes the word corresponding to each word in the target text. Therefore, when a second array is added to each one-dimensional array representing word vectors in the initial matrix of the target text, the user intent corresponding to the target text can be added to the target text.
  • the fusion method provided in this embodiment is equivalent to: taking the first classification label as a plurality of elements in the word vector corresponding to each word of the target text, so as to expand the corresponding word vector of each word of the target text .
  • the LSTM layer is a neural network model that considers every word in the target text when processing the text. For example, when the LSTM layer processes the text "I want to listen to Andy Lau's Wang Qingshui", the last word it gets is “Forgetting Qingshui", and before “Forgetting Qingshui", it also gets two words “I want to listen” and “Andy Lau” Therefore, the LSTM layer considers the factors of "I want to listen” and "Andy Lau” when recognizing the word slot for "Wangqingshui", and thus, combined with the context in the text, recognizes that "Wangqingshui” may be a Song name.
  • LSTM Because if one-way LSTM is used, information about the order of words and words in the text may be lost. For example, there is no way to differentiate between "I love you” and "You love me.” Therefore, according to one or more embodiments, a bidirectional LSTM layer is adopted, so that the recognition results in the forward direction and the reverse direction can be combined to obtain the order relationship of each character and word in the text.
  • the input to the bidirectional LSTM layer is the target representation matrix obtained by the word embedding layer described above.
  • the target representation matrix may be represented as [1+X,Y].
  • X represents the number of words in the target text
  • Y represents the number of elements included in each word vector in the target representation matrix.
  • the output of the bidirectional LSTM layer is the word feature matrix of the target text.
  • the word feature matrix of the target text output by the bidirectional LSTM layer can be represented as: [2*(1+X), HIDDENUNIT] .
  • the word feature matrix of the target text output by the bidirectional LSTM layer includes 2*(1+X) one-dimensional arrays, and the number of elements included in each one-dimensional array is HIDDENUNIT.
  • the target representation matrix is represented as [X, Y+CLASS], where X represents the number of words in the target text, and Y represents the elements included in each word vector in the initial matrix of the target text , CLASS represents the total number of preset classification labels.
  • the output of the bidirectional LSTM layer is the word feature matrix of the target text, which can be expressed as: [2*X, HIDDENUNIT'].
  • the matrix output by the bidirectional LSTM layer includes 2*X one-dimensional arrays, and the number of elements included in each one-dimensional array is HIDDENUNIT'.
  • the number of one-dimensional arrays included in the word feature matrix of the obtained target text is twice the number of one-dimensional arrays included in the target representation matrix. times, and each one-dimensional array included in the word feature matrix of the target text is a one-dimensional array with a set length, that is, each one-dimensional array includes a preset number of elements.
  • each one-dimensional array included in the word feature matrix of the target text is a word vector
  • the word feature matrix of the target text includes: the word vector of each word in the target text in the forward order of the target text, and the word vector for each word in the target text in the reverse order of the target text.
  • the bidirectional LSTM layer can use the word feature matrix as an output, so that the word feature matrix is input to the fully connected layer.
  • the fully connected layer includes two functions: dimension transformation and feature extraction. In this way, after obtaining the word feature matrix of the target text, the fully connected layer can determine the label matrix of the target text based on the word feature matrix.
  • the word feature matrix [2*(1+X), HIDDENUNIT] of the target text output by the bidirectional LSTM as an example, after the fully connected layer, the word feature matrix [2*(1+X), HIDDENUNIT] can be converted into a target
  • the label matrix of the text, and the label matrix can be expressed as [(1+X), OUTPUTDIM].
  • the label matrix [(1+X), OUTPUTDIM] of the above target text includes (1+X) one-dimensional vectors, and each one-dimensional vector can represent: about the virtual word represented by the second array of the first classification label.
  • the label vector and the label vector for each word in the target text are (1+X) one-dimensional vectors, and each one-dimensional vector can represent: about the virtual word represented by the second array of the first classification label. The label vector and the label vector for each word in the target text.
  • the NER label of the virtual word represented by the second array of the first classification label is "XX", for example, it can be "O" in the above Table 1, that is, the NER label of the virtual word represented by the second array of the first classification label
  • Virtual words are non-word slots.
  • the label vector of each word in the target text can represent: the word corresponds to OUTPUTDIM values, and the OUTPUTDIM values are the number of NER labels that the word may have, indicating that the word can have OUTPUTDIM probability values, where, The size of each probability value represents the possibility that the word belongs to the NER label corresponding to the probability value. The larger the probability value, the greater the probability that the word has the NER label corresponding to the probability value.
  • the fully connected layer can determine the label matrix of the target text based on the word feature matrix of the target text, and the label matrix can be used to represent: the probability that each word in the target text has a respective NER label. That is, according to the label matrix of the target text, the probability that each word in the target text has a respective NER label can be determined.
  • the fully connected layer can take the label matrix of the target text as an output, so that the label matrix of the target text is input to the CRF layer.
  • the CRF layer can be understood as a Viterbi decoding layer. Further, after receiving the label matrix of the target text, the CRF layer can use the preset transition matrix to calculate the sum of each link of the label matrix. And get the link with the largest sum value, thus, get the path with the greatest possibility. In this way, the NER label of each word in the target text can be determined, thereby obtaining the NER label index of each word in the target text.
  • the CRF layer can use the NER tag index of each word in the target text as an output, thereby inputting the NER tag index of each word in the target text into the output layer.
  • the output layer can convert the NER tag index of each word in the target text into the NER tag index of each word in the target text.
  • NER tags of words wherein the NER tags of each word in the target text can be represented by NER tag strings, so that the readability can be improved.
  • the NER label of each word can be represented in the form of Table 2 above.
  • the named entity recognition method when the named entity recognition method according to one or more embodiments of the present invention is implemented by using a pre-trained named entity recognition model, the named entity recognition model needs to be obtained through training.
  • the training method of the above named entity recognition model includes:
  • S301 Obtain the sample text to be utilized, the second classification label of the sample text, and the true value of the NER label of each word in the sample text;
  • the second classification label is used to represent the user intent corresponding to the sample text
  • S302 Input the sample text and the second classification label of the sample text into the named entity recognition model, so that the named entity recognition model uses the sample text and the second classification label to construct a sample representation matrix, and uses the sample representation matrix to predict each of the sample texts.
  • step S303 Based on the true value of the NER label and the predicted value of the NER label for each word of the sample label, determine whether the named entity recognition model has converged, if so, go to step S304; otherwise, go to S305, and return to the above step S301;
  • S305 Adjust the model parameters in the named entity recognition model, and return to the step of acquiring the sample text to be used, the second classification label of the sample text, and the true value of the NER label of each word in the sample text.
  • the named entity recognition model may be obtained by any type of training, for example, a laptop computer, a desktop computer, a tablet computer, etc., which is not specifically limited in the embodiment of the present invention, and is hereinafter referred to as a training device.
  • the training equipment may be the same as the above, or may be different.
  • the training device is the same device, the above named entity recognition model can be obtained by training in the same device, and then, on this, a named entity recognition method provided by the embodiment of the present invention is implemented by using the obtained named entity recognition model;
  • the training device obtains the above-mentioned named entity recognition model, it can send the obtained named entity recognition model to . In this way, after the named entity recognition model is obtained, the named entity recognition method provided by the embodiment of the present invention can be implemented by using the obtained named entity recognition model.
  • the training device can first obtain the sample text to be used, the second classification label of the sample text, and the true value of the NER label of each word in the sample text, and then, based on the obtained sample text and the second classification of the sample text Label and the true value of the NER label of each word in the sample text, train the named entity recognition model, and obtain the trained named entity recognition model.
  • the sample text can be a sentence, or a phrase or phrase composed of multiple words, which is reasonable; in addition, the second classification label of the sample text can be a preset intention classification model for the target text. determined by the classification.
  • the sample text to be utilized, the second classification label of the sample text, and the true value of the NER label of each word in the sample text can be obtained in various ways.
  • the sample text to be used, the second classification label of the sample text, and the true value of the NER label of each word in the sample text stored in the local storage space can be directly obtained; the sample text to be used can also be obtained from other non-local storage spaces.
  • the sample text used, the second classification label of the sample text, and the ground-truth NER label of each word in the sample text It's all reasonable.
  • the second classification label of each sample text and each sample text can be used.
  • the ground-truth NER label for each word in the sample text Therefore, it is possible to obtain multiple sets of sample texts to be utilized, the second classification labels of the sample texts, and the true value of the NER label of each word in the sample texts.
  • the number of training samples can be set according to requirements in practical applications, which is not specifically limited in the present invention.
  • the type of the sample text may only include sentences, phrases or phrases, or may include at least two types of sentences, phrases and phrases. It's all reasonable.
  • the second classification label of the sample text, and the true value of the NER label of each word in the sample text, the sample text, the second classification label of the sample text, and each of the sample texts can be
  • the ground-truth NER labels of each word are input into the named entity recognition model.
  • the named entity recognition model includes a preprocessing sub-network and a named entity recognition sub-network.
  • the above-mentioned preprocessing sub-network can use the sample text and the second classification label to construct a sample representation matrix.
  • the above named entity recognition sub-network can use the sample representation matrix to predict the NER label prediction value of each word in the sample text.
  • the preprocessing sub-network can use the sample text and the second classification label to construct the sample representation matrix, which is similar to the above-mentioned preprocessing sub-network using the target text and the first classification label to construct the target representation matrix. Repeat.
  • the named entity recognition model After the predicted value of the NER label for each word in the sample text is obtained, it can be judged whether the named entity recognition model converges based on the true value of the NER label and the predicted value of the NER label for each word of the sample label.
  • the named entity recognition model converges, it means that the named entity recognition model has been trained, and the training can be stopped to obtain the trained named entity recognition model.
  • the model parameters in the named entity recognition model can be adjusted, so that the sample text to be used, the second classification label of the sample text and the NER label true value of each word in the sample text are obtained again, and the obtained sample text, The second classification label of the sample text and the true value of the NER label of each word in the sample text are used to continue training the parameter-adjusted named entity recognition model. Until it is judged that the named entity recognition model converges, the trained named entity recognition model is obtained.
  • the matching degree between the true value of the NER tag and the predicted value of the NER tag of each word of the sample tag is greater than the preset matching degree, it can be judged that the named entity recognition model has converged; otherwise, it is judged that the named entity recognition model has not converged.
  • the pre-trained intent classification model for obtaining the above-mentioned first classification label and second classification label will be illustrated as an example.
  • the above-mentioned intent classification model may be a CNN classification model.
  • the CNN classification model can include: input layer, word embedding layer, convolution layer, pooling layer, fusion layer, fully connected layer and output layer.
  • the target text can be input into the above CNN classification model.
  • the index value of each word is preset in the input layer, and the format of the index value of each word may be one-hot format.
  • the input layer may first start from the first word of the target text, and sequentially determine the index value of each word in the target text in one-hot format.
  • the output layer can generate a first array about the target text.
  • the first array includes an index value of each word in the target text, and the number of search quotation marks included is the same as the number of words included in the target text.
  • the maximum length of the target text may be set to 70. In this way, when the number of words included in the acquired target text exceeds 70, each word after the 70th word is discarded. Therefore, when the number of words included in the obtained target text exceeds 70, the obtained first array includes 70 index values.
  • the maximum length of the above-mentioned target text may also be set to other specific values, which are not specifically limited in this embodiment of the present invention.
  • each index value in the generated first array is an integer value.
  • the input layer can take the first array as an output, so that the first array is input into the word embedding layer.
  • the so-called word embedding means that each word is represented by a one-dimensional array including multiple elements, wherein each element is a number. For example, using a one-dimensional array including 128 elements to represent each word, that is, using a A one-dimensional array of numbers representing each word.
  • the word embedding layer can determine the word vector corresponding to each index value in the obtained index array, thereby generating a target matrix based on each determined word vector.
  • the number of elements included in the word vector corresponding to each determined index value is a preset number.
  • the word embedding layer can take the generated target matrix as output and feed this target matrix into the convolutional layer.
  • the role of the convolutional layer is to amplify and propose certain features in the target text, thereby outputting a feature matrix about the features of the target text.
  • the size of this feature matrix is related to the convolution kernel of the convolutional layer.
  • the convolution kernel can be expressed as [K, Length], where K represents the feature extraction using K word length, that is, the continuous K words in the target text are used as the features of interest, so that the target text can be extracted. Consecutive K words are processed as a whole. Wherein, when the consecutive K words are words or phrases, the consecutive K words can be considered as a whole; when the consecutive K words are single words, it is necessary to consider the consecutive K words, The context of each word. Length indicates the number of convolution kernels of K word length.
  • multiple convolution kernels may be included in the convolution layer, so that a feature matrix may be obtained for each convolution kernel.
  • the purpose of the pooling layer is to ignore the unimportant features in the features extracted by the convolution kernel and retain only the most important features.
  • the pooling layer can adopt the "down-sampling" method.
  • the so-called “down-sampling” method is to find the maximum value in each matrix for each matrix output by the convolution layer, so that the maximum value is used to replace the matrix.
  • each convolutional layer is followed by a pooling layer, so that the output of the pooling layer That is, the maximum value in the matrix output by the adjacent convolutional layer.
  • the fusion layer is used to combine the outputs of multiple pooling layers to obtain a new one-dimensional array.
  • the input of the fully connected layer is the one-dimensional array of the output of the fusion layer. It is used to convert each number in the one-dimensional array into a preset total number of probability values of classification labels, wherein the converted probability values may be floating-point values. And, the magnitude of each probability value represents the likelihood that the target text corresponds to each classification label. Among them, the larger the probability value, the greater the possibility that the target file corresponds to the classification label represented by the probability value.
  • the obtained probability values have relatively large numerical values. Therefore, the obtained probability values may be normalized so that the sum of the normalized probability values is 1.
  • the output layer receives each probability value that is fully connected into the output, that is, receives a one-dimensional array containing the total number of classification labels.
  • the subscript of each number in the one-dimensional array represents the classification number of a classification label
  • the output layer can convert the classification number of the classification label into a user-recognizable classification label, that is, into an intention that the user can recognize .
  • the embodiment of the present invention further provides an electronic device, as shown in FIG. 7 , including a processor 701, a communication interface 702, a memory 703 and a communication bus 704, Among them, the processor 701, the communication interface 702, and the memory 703 complete the communication with each other through the communication bus 704,
  • the processor 701 is configured to implement the steps of any of the named entity identification methods provided by the foregoing embodiments of the present invention when executing the program stored in the memory 703 .
  • the communication bus mentioned in the above electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like.
  • PCI peripheral component interconnect standard
  • EISA Extended Industry Standard Architecture
  • the communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface is used for communication between the above electronic device and other devices.
  • the memory may include random access memory (Random Access Memory, RAM), and may also include non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk storage. According to one or more embodiments, the memory may also be at least one storage device located remotely from the aforementioned processor.
  • RAM Random Access Memory
  • NVM non-Volatile Memory
  • the above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processing, DSP), dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • FPGA Field-Programmable Gate Array
  • a computer-readable storage medium is also provided, and a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps of any of the above named entity identification methods are implemented .
  • a computer program product comprising instructions, which, when executed on a computer, cause the computer to perform the steps of any of the named entity recognition methods in the above embodiments.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种命名实体识别方法及电子设备,涉及人工智能算法技术领域。该方法包括:获取待进行命名实体识别的目标文本;确定所述目标文本的第一分类标签;其中,所述第一分类标签用于表征所述目标文本所对应的用户意图;利用所述目标文本和所述第一分类标签构建目标表征矩阵;基于所述目标表征矩阵确定所述目标文本中每个字的命名实体识别NER标签,得到所述目标文本的命名实体识别结果。本方法可以提高对文本进行命名实体识别时,所得到的识别结果的准确率。

Description

一种命名实体识别方法 技术领域
本公开涉及人工智能算法技术领域,特别是涉及一种命名实体识别方法。
背景技术
当前,随着人工智能算法的不断发展,命名实体识别(Named Entity Recognition,NER)任务的需求越来越多,例如,问答系统、机器翻译系统等。
所谓命名实体识别,又称作“专名识别”,用于识别文本中的词槽的类型和位置,从而,为文本中的各个文字添加NER标签。其中,词槽可以理解为:文本中具有特定意义的实体,例如,人名、地名、机构名、专有名词等,进而,人名、地名等即为词槽的类型,而词槽的位置为:属于同一个词槽的各个字在该词槽中的位置,例如,词槽开始、词槽中间和词槽结尾。
相关技术中,命名实体识别方法中,基于待进行命名实体识别的目标文本的文本内容,确定目标文本中的每个字的NER标签,得到目标文本的命名实体识别结果。例如,将待进行命名实体识别的目标文本的表征矩阵输出至预先训练的模型进行识别,得到识别结果。
发明内容
本发明的一个或多个实施例一方面提供一种命名实体识别方法,该方法包括:
获取待进行命名实体识别的目标文本;
确定所述目标文本的第一分类标签,其中,所述第一分类标签用于表征所述目标文本所对应的用户意图;
利用所述目标文本和所述第一分类标签构建目标表征矩阵;
基于所述目标表征矩阵确定所述目标文本中每个字的命名实体识别NER标签,得到所述目标文本的命名实体识别结果。
根据一个或多个实施例,所述利用所述目标文本和所述第一分类标签构建目标表征矩阵的步骤,包括:
生成关于所述目标文本和所述第一分类标签的融合数组,其中,所述融合数组中的各元素为:所述目标文本中的每个字的索引值,以及所述第一分类标签所表征的虚拟字的索引值;
生成所述融合数组对应的矩阵作为目标表征矩阵,其中,所述目标表征矩阵中的各元素为:所述融合数组中的每个索引值对应的字向量。
根据一个或多个实施例,所述生成关于所述目标文本和所述第一分类标签的融合数组的步骤,包括:
生成关于所述目标文本的第一数组,其中,所述第一数组中的各元素为:所述目标文本中每个字的索引值;
构建关于所述第一分类标签的第二数组,确定所述第二数组所表征的虚拟字的索引值作为目标索引值,其中,所述第二数组中的各元素为:预设的各个分类标签的取值,所述第一分类标签为各个所述分类标签中的一个标签,所述第一分类标签的取值为第一设定值,除所述第一分类标签以外的各个其他分类标签的取值为第二设定值;
将所述目标索引值添加到所述第一数组中的第一指定位置处,得到融合数组,其中,所述第一指定位置处包括:所述第一数组中的第一个元素之前,或者,所述第一数组中的最后一个元素之后。
根据一个或多个实施例,所述利用所述目标文本和所述第一分类标签构建目标表征矩阵的步骤,包括:
生成关于所述目标文本的第一数组,其中,所述第一数组中的各元素为:所述目标文本中每个字的索引值;
生成所述第一数组对应的矩阵作为所述目标文本的初始矩阵,其中,所述初始矩阵中的各元素为:所述第一数组中的每个索引值对应的字向量;
构建关于所述第一分类标签的第二数组,其中,所述第二数组中的各元素为:预设的各个分类标签的取值,所述第一分类标签为所述各个分类标签中的一个标签,所述第一分类标签的取值为第一设定值,除所述第一分类标签以外的各个其他分类标签的取值为第二设定值;
利用所述第二数组对所述初始矩阵进行扩展,生成目标表征矩阵。
根据一个或多个实施例,所述利用所述第二数组对所述初始矩阵进行扩展,生成目标表征矩阵的步骤,包括:
确定所述第二数组所表征的虚拟字的索引值对应的字向量;
将所确定的字向量添加至所述初始矩阵中的第二指定位置处,得到目标表征矩阵,其中,所述第二指定位置处包括:所述初始矩阵中的第一个元素之前,或者,所述初始矩阵中的最后一个元素之后。
根据一个或多个实施例,所述利用所述第二数组对所述初始矩阵进行扩展,生成目标表征矩阵的步骤,包括:
将所述第二数组添加至所述初始矩阵中的第三指定位置处,得到目标表征矩阵,
其中,所述第三指定位置包括:表示字向量的一维数组中的第一个元素之前,或者,表示字向量的一维数组中的最后一个元素之后。
根据一个或多个实施例,所述基于所述目标表征矩阵确定所述目标文本中每个字的命名实体识别NER标签,得到所述目标文本的命名实体识别结果的步骤,包括:
基于所述目标表征矩阵确定所述目标文本的字特征矩阵,其中,所述字特征矩阵中包括:所述目标文本中每个字在所述目标文本的正向顺序中的字向量,以及所述目标文本中每个字在所述目标文本的反向顺序中的字向量;
基于所述字特征矩阵,确定所述目标文本的标签矩阵,其中,所述标签矩阵用于表征:所述目标文本中每个字具有各个NER标签的概率;
基于所述标签矩阵,确定所述目标文本中每个字的NER标签索引;
将所述目标文本中每个字的NER标签索引转换为所述目标文本中每个字的NER标签,得到所述目标文本的命名实体识别结果。
根据一个或多个实施例,所述利用所述目标文本和所述第一分类标签构建目标表征矩阵的步骤,以及所述基于所述目标表征矩阵确定所述目标文本中每个字的NER标签的步骤是通过预先训练的命名实体识别模型实现的;
所述命名实体识别模型包括:串联相接的输入层、融合层、字嵌入层、双向LSTM层、全连接层、CRF层和输出层,其中,所述输入层、所述融合层和所述字嵌入层用于实现所述利用所述目标文本和所述第一分类标签构建目标表征矩阵的步骤,所述双向LSTM层、所述全连接层、所述CRF层和所述输出层用于实现所述基于所述目标表征矩阵确定所述目标文本中每个字的NER标签的步骤;
其中,所述输入层,用于生成关于所述目标文本的第一数组;
所述融合层,用于构建关于所述第一分类标签的第二数组,确定所述第二 数组所表征的虚拟字的索引值作为目标索引值,将所述目标索引值添加到所述第一数组中的第一指定位置处,得到融合数组;
所述字嵌入层,用于生成所述融合数组对应的矩阵作为目标表征矩阵;
所述双向LSTM层,用于基于所述目标表征矩阵确定所述目标文本的字特征矩阵;
所述全连接层,用于基于所述字特征矩阵,确定所述目标文本的标签矩阵,其中,所述标签矩阵用于表征所述目标文本中每个字具有各个NER标签的概率;
所述CRF层,用于基于所述标签矩阵,确定所述目标文本中每个字的NER标签索引;
所述输出层,用于将所述目标文本中每个字的NER标签索引转换为所述目标文本中每个字的NER标签,得到所述目标文本的命名实体识别结果。
根据一个或多个实施例,所述利用所述目标文本和所述第一分类标签,构建目标表征矩阵的步骤,以及所述基于所述目标表征矩阵,确定所述目标文本中每个字的NER标签的步骤是通过预先训练的命名实体识别模型实现的;
所述命名实体识别模型包括:串联相接的输入层、字嵌入层、融合层、双向LSTM层、全连接层、CRF层和输出层,其中,所述输入层、所字嵌入层和所述融合层用于实现所述利用所述目标文本和所述第一分类标签构建目标表征矩阵的步骤,所述双向LSTM层、所述全连接层、所述CRF层和所述输出层用于实现所述基于所述目标表征矩阵确定所述目标文本中每个字的NER标签的步骤;
其中,所述输入层,用于生成关于所述目标文本的第一数组;
所述字嵌入层,用于生成所述第一数组对应的矩阵作为所述目标文本的初始矩阵;
所述融合层,用于构建关于所述第一分类标签的第二数组,利用所述第二数组对所述初始矩阵进行扩展,生成目标表征矩阵;
所述双向LSTM层,用于基于所述目标表征矩阵确定所述目标文本的字特征矩阵;
所述全连接层,用于基于所述字特征矩阵,确定所述目标文本的标签矩阵,其中,所述标签矩阵用于表征所述目标文本中每个字具有各个NER标签的概率;
所述CRF层,用于基于所述标签矩阵,确定所述目标文本中每个字的NER标签索引;
所述输出层,用于将所述目标文本中每个字的NER标签索引转换为所述目标文本中每个字的NER标签,得到所述目标文本的命名实体识别结果。
根据一个或多个实施例,所述命名实体识别模型的训练方式,包括:
获取待利用的样本文本、所述样本文本的第二分类标签和所述样本文本中每个字的NER标签真值,其中,所述第二分类标签用于表征所述样本文本所对应的用户意图;
将所述样本文本和所述样本文本的第二分类标签输入所述命名实体识别模型,以使所述命名实体识别模型利用所述样本文本和所述第二分类标签,构建样本表征矩阵,并利用所述样本表征矩阵预测所述样本文本中每个字的NER标签预测值;
基于所述样本标签的每个字的NER标签真值和NER标签预测值,判断所述命名实体识别模型是否收敛,如果是,结束训练,得到训练完成的所述命名实体识别模型;否则,调整所述命名实体识别模型中的模型参数,返回所述获取待利用的样本文本、所述样本文本的第二分类标签和所述样本文本中每个字的NER标签真值的步骤。
本发明的一个或多个实施例能够达到如下一个或多个有益效果:
根据本发明的一个或多个实施例,在对待进行命名实体识别的目标文本进行命名实体识别时,在获取到待进行命名实体识别的目标文本后,确定用于表征该目标文本所对应的用户意图的第一分类标签;进而,利用该目标文本和第一分类标签构建目标表征矩阵;进而,便可以基于该目标表征矩阵确定目标文本中每个字的NER标签,得到目标文本的命名实体识别结果。
由于上述目标表征矩阵是利用目标文本和表征用户意图的第一分类标签构成的,因此,上述目标表征矩阵可以表征目标文本的文本内容和用户意图的标签内容的融合结果。这样,在对目标文本进行命名实体识别时,可以增加用户意图维度的信息,从而,在识别过程中,可以学习到目标文本和用户意图的关联关系,确定目标文本所表达的用户意图,进而,可以从文本内容和用户意图两个维度对目标文本中每个字的NER标签进行综合考虑。
基于此,在识别目标文本中的各个词槽的词槽类型时,便可以考虑到目标文本对应的用户意图对词槽类型的影响,进而,可以识别得到同一词槽在表达不同用户意图的文本中的词槽类型,提高对目标文本进行命名实体识别时,所得到的识别结果的准确率。
附图说明
为了更清楚地说明本发明的实施例,下面将对实施例描述中所需要使用的附图作简单地介绍。显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了根据本发明一个或多个实施例的命名实体识别方法的流程示意图。
图2示出了根据本发明一个或多个实施例的生成关于目标文本和第一分类标签的融合数组的方式的流程示意图。
图3示出了根据本发明一个或多个实施例的命名实体识别模型的训练方式的流程示意图。
图4示出了根据本发明一个或多个实施例的预先训练的命名实体识别模型的结构示意图。
图5示出了根据本发明一个或多个实施例的预先训练的命名实体识别模型的结构示意图。
图6示出了根据本发明一个或多个实施例的意图分类模型的结构示意图。
图7示出了根据本发明一个或多个实施例的电子设备的结构示意图。
具体实施方式
下面将结合附图,对本发明的实施例进行描述。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
根据本发明一个或多个实施例的命名实体识别方法可以应用于任一类型的电子设备,例如,台式电脑、笔记本电脑、平板电脑等,对此,本发明实施例不做具体限定,以下简称电子设备。
此外,该方法可以适用于任一需要进行命名实体识别的场景。例如,在金融领域,针对某金融公司的年报,对该年报中的日期、货币类型、百分比等词槽进行识别,并为该年报中的每个字添加NER标签。从而,可以得到该年报的命名实体识别结果,该结果便为:为年报中的各个字添加NER标签后的年报文本;又例如,在交通管理领域,针对某交通事故报告,对该报告中的人名、地 点、时间、伤亡数量等词槽进行识别,并为该报告中的每个字添加NER标签。从而,可以得到该报告的命名实体识别结果,该结果便为:为报告中的各个字添加NER标签后的报告文本。
根据一个或多个实施例,在对文本进行命名实体识别时,为每个字所添加的各类NER标签可以如下表1所示:
表1
NER标签 含义
B-词槽类型 B为Begin的简写,表示词槽开始
I-词槽类型 I为Internal的简写,表示词槽中间
L-词槽类型 L为Last的简写,表示词槽结尾
U-词槽类型 U为Unique的简写,表示单字词槽
O O为Other的简写,表示非词槽
基于此,针对所识别到的多字词槽,可以基于该词槽的词槽类型和该词槽中的每个字的位置,为该词槽中的每个字添加“B-词槽类型”标签、“I-词槽类型”标签或“L-词槽类型”标签;针对未被识别为词槽的字,可以为该字添加“非词槽”标签;针对所识别到的单字词槽,可以为该字添加“单字词槽标签”。
示例性的,文本“给我放一首刘德华的忘情水”的命名实体识别结果可以如下表2所示:
表2
NER标签
O
O
O
O
O
B-人名
I-人名
L-人名
O
B-歌曲名
I-歌曲名
L-歌曲名
根据本发明一个或多个实施例的命名实体识别方法可以包括如下步骤:
获取待进行命名实体识别的目标文本;
确定所述目标文本的第一分类标签,其中,所述第一分类标签用于表征所述目标文本所对应的用户意图;
利用所述目标文本和所述第一分类标签构建目标表征矩阵;
基于所述目标表征矩阵确定所述目标文本中每个字的命名实体识别NER标签,得到所述目标文本的命名实体识别结果。
根据本发明的一个或多个实施例,在对待进行命名实体识别的目标文本进行命名实体识别时,在获取到待进行命名实体识别的目标文本后,确定用于表征该目标文本所对应的用户意图的第一分类标签;进而,利用该目标文本和第一分类标签构建目标表征矩阵;进而,便可以基于该目标表征矩阵确定目标文本中每个字的NER标签,得到目标文本的命名实体识别结果。
由于上述目标表征矩阵是利用目标文本和表征用户意图的第一分类标签构成的,因此,上述目标表征矩阵可以表征目标文本的文本内容和用户意图的标签内容的融合结果。这样,在对目标文本进行命名实体识别时,可以增加用户意图维度的信息,从而,在识别过程中,可以学习到目标文本和用户意图的关联关系,确定目标文本所表达的用户意图,进而,可以从文本内容和用户意图两个维度对目标文本中每个字的NER标签进行综合考虑。
基于此,在识别目标文本中的各个词槽的词槽类型时,便可以考虑到目标文本对应的用户意图对词槽类型的影响,进而,可以识别得到同一词槽在表达不同用户意图的文本中的词槽类型,提高对目标文本进行命名实体识别时,所得到的识别结果的准确率。
下面,对根据本发明一个或多个实施例的命名实体识别方法进行具体说明。
图1示出了根据本发明一个或多个实施例的命名实体识别方法的流程示意图。如图1所示,该方法可以包括如下步骤:
S101:获取待进行命名实体识别的目标文本;
在执行根据一个或多个实施例的命名实体识别方法时,首先获取待进行命名实体识别的目标文本。
其中,可以通过多种方式获取上述目标文本,例如,可以获取用户手动输 入的上述目标文本,又例如,可以采集到用户的语音信息,从而,将该语音信息转换为上述目标文本,再例如,还可以从通信连接的其他设备处获取上述目标文本。这都是合理的。
S102:确定目标文本的第一分类标签;
其中,第一分类标签用于表征目标文本所对应的用户意图;
在获取到上述目标文本后,可以进一步确定用于表征目标文本所对应的用户意图的第一分类标签。
其中,可以通过多种方式确定上述第一分类标签,例如,可以获取用户输入的上述目标文本所对应的用户意图,从而,将该用户意图确定为目标文本的第一分类标签;又例如,可以将上述所获取到的目标文本输入到预先训练得到的意图分类模型中,得到该意图分类模型的输出结果,从而,将该输出结果确定为目标文本的第一分类标签,示例性的,该意图分类模型可以为:CNN(Convolutional Neural Networks,卷积神经网络)分类模型等。这都是合理的。
需要说明的是,为了行为清晰,后续将会对上述预先训练得到的意图分类模型进行举例介绍。
S103:利用目标文本和第一分类标签构建目标表征矩阵;
在得到上述目标文本和第一分类标签后,便可以利用该目标文本和第一分类标签构建目标表征矩阵。
其中,上述所构建的目标表征矩阵可以表征目标文本的文本内容和第一分类标签的标签内容的融合结果。
S104:基于目标表征矩阵确定目标文本中每个字的命名实体识别NER标签,得到目标文本的命名实体识别结果。
在构建得到上述目标表征矩阵后,便可以基于该目标表征矩阵,确定目标文本中的每个字的NER标签,得到目标文本的命名实体识别结果。
其中,由于上述目标表征矩阵是利用目标文本和表征用户意图的第一分类标签构成的,因此,上述目标表征矩阵可以将目标文本的文本内容和用户意图的标签内容进行融合。这样,可以学习到目标文本和用户意图的关联关系,确定目标文本所表达的用户意图,进而,可以从文本内容和用户意图两个维度对目标文本中每个字的NER标签进行综合考虑。进而,针对目标文本中的各个词槽,在对该词槽的词槽类型进行识别时,便可以从该词槽的文本内容和目标文本所对应的用户意图两个维度,对词槽的词槽类型进行综合考虑,从而,提高 目标文本的命名实体识别结果的准确率。
根据本发明一个或多个实施例,在对待进行命名实体识别的目标文本进行命名实体识别时,在获取到待进行命名实体识别的目标文本后,确定用于表征该目标文本所对应的用户意图的第一分类标签;进而,利用该目标文本和第一分类标签构建目标表征矩阵;进而,便可以基于该目标表征矩阵确定目标文本中每个字的NER标签,得到目标文本的命名实体识别结果。
由于上述目标表征矩阵是利用目标文本和表征用户意图的第一分类标签构成的,因此,上述目标表征矩阵可以表征目标文本的文本内容和用户意图的标签内容的融合结果。这样,在对目标文本进行命名实体识别时,可以增加用户意图维度的信息,从而,在识别过程中,可以学习到目标文本和用户意图的关联关系,确定目标文本所表达的用户意图,进而,可以从文本内容和用户意图两个维度对目标文本中每个字的NER标签进行综合考虑。
基于此,在识别目标文本中的各个词槽的词槽类型时,便可以考虑到目标文本对应的用户意图对词槽类型的影响,进而,可以识别得到同一词槽在表达不同用户意图的文本中的词槽类型,提高对目标文本进行命名实体识别时,所得到的识别结果的准确率。
根据一个或多个实施例,上述步骤S103(利用目标文本和第一分类标签构建目标表征矩阵的步骤)可以包括如下步骤11-12:
步骤11:生成关于目标文本和第一分类标签的融合数组;
其中,融合数组中的各元素为:目标文本中的每个字的索引值,以及第一分类标签所表征的虚拟字的索引值;
可以理解的,通常,所得到的目标文本和第一分类标签可以通过文字形式表示,进而,在对目标文本进行命名实体识别时,需要将目标文本和第一分类标签转换为通过数学形式表示的内容。
基于此,可以预先设定每个分类标签所表征的虚拟字,该虚拟字可以认为是意图类别的索引值对应的某个字。进而,可以预先为文本所可能涉及的每个文字和每个分类标签所表征的虚拟字设定索引值,从而,在得到上述目标文本和第一分类标签后,便可以确定目标文本中的每个字的索引值和第一分类标签所表征的虚拟字的索引值,从而,在确定上述各个预设索引值后,便可以生成包括上述各个索引值的融合数组。
也就是说,在上述融合数组中,目标文本中的每个字的索引值和第一分类 标签所表征的虚拟字的索引值分别作为该融合数组中的一个元素。进而,融合数组中所包括的索引值的数量可以为:目标文本中所包括的字的数量与第一分类标签所表征的虚拟字的数量的和值。通常,第一分类标签所表征的虚拟字的数量为1。
其中,由于目标文本中的每个字的索引值和第一分类标签所表征的虚拟字的索引值均为一个数值,因此,所生成的关于目标文本和第一分类标签的融合数组即为一个一维数组。
步骤12:生成融合数组对应的矩阵作为目标表征矩阵;
其中,目标表征矩阵中的各元素为:融合数组中的每个索引值对应的字向量。
在生成上述融合数组后,便可以确定该融合数组中的每个索引值对应的字向量,从而,在得到上述融合数组中全部索引值对应的字向量后,便可以基于所确定的各个字向量生成上述融合数组对应的矩阵,从而,得到目标表征矩阵。
其中,每个索引值对应的字向量可以为一个一维数组,且该一维数组的类型为浮点数据。
由于融合数组中包括多个索引值,且每个索引值对应一个字向量,字向量可以为一维数组,则生成的目标表征矩阵包括多个一维数组,并且,目标表征矩阵所包括的一维数组的个数与上述融合数组中所包括的索引值的个数相同。此外,目标表征矩阵中的每个一维数组中可以包括预设数量个元素,例如,该预设数量可以为128,则目标表征矩阵中的每个一维数组中可以包括128个元素。当然,上述预设数量还可以为其他数值,对此,本发明实施例不做具体限定。
根据一个或多个实施例,如图2所示,上述步骤11(生成关于目标文本和第一分类标签的融合数组)可以包括如下步骤:
步骤S111:生成关于目标文本的第一数组;
其中,第一数组中的各元素为:目标文本中每个字的索引值;
在得到上述目标文本后,便可以首先确定目标文本中的各个字的索引值,从而,根据所得到的索引值,生成关于目标文本的第一数组。其中,目标文本中每个字的索引值可以分别作为第一数组中的一个元素。
其中,上述所得到的第一数组所包括的索引值的数量可以为:目标文本中所包括的字的数量。
然而,考虑到处理速度等因素,可以设置目标文本的最大长度,从而,当 获取到的待进行命名实体识别的目标文本中所包括的字的数量超过上述所设置的最大长度时,可以丢弃该待进行命名实体识别的目标文本中超过上述所设置的最大长度的字,进而,利用所保留的各个字生成上述第一数组。例如,由于通常人一次讲话所包括的字的数量不超过70个字,则可以设置目标文本的最大长度为70。
也就是说,在上述获取到的待进行命名实体识别的目标文本中所包括的字的数量超过上述所设置的最大长度的情况下,用于构成上述目标表征矩阵的目标文本为上述获取到的待进行命名实体识别的目标文本丢弃超过所设置的最大长度的字后,剩余的目标文本,则该剩余的目标文本中所包括字的数量与上述所设置的目标文本的最大长度相同。
例如,当目标文本中包括N(N≥1)个字,且所设置的目标文本的最大长度为P(P<N)时,那么,可以丢弃获取到的待进行命名实体识别的目标文本中的第P+1个字至第N个字,进而,便可以确定目标文本中第1个至第P个字的索引值,从而,基于所确定的P个索引值,生成第一数组。
其中,除上述所举的例子之外,其他的能够生成上述第一数组的方式也处于本发明实施例所保护的范围内。
步骤S112:构建关于第一分类标签的第二数组,确定第二数组所表征的虚拟字的索引值作为目标索引值;
其中,第二数组中的各元素为:预设的各个分类标签的取值,第一分类标签为各个分类标签中的一个标签,第一分类标签的取值为第一设定值,除第一分类标签以为的各个其他分类标签的取值为第二设定值;
可以理解的,在实际场景中,用户可以通过文本表达多种用户意图,例如,询问天气、查询歌曲、查询地名等。因此,根据实际场景的需求,可以预设各个分类标签,并且,上述所确定的目标文本的第一分类标签即为所预设的各个分类标签中的一个标签。
这样,在确定出目标文本的第一分类标签后,便可以确定所预设的各个分类标签的取值。其中,第一分类标签的取值为第一设定值,除第一分类标签以为的各个其他分类标签的取值为第二设定值。从而,所得到的第二数组即为:由一个第一设定值和至少一个第二设定值构成的一维数组,并且,第二数组所包括的元素的数量为上述预设的各个分类标签的总量。
根据一个或多个实施例,上述第一设定值可以为1,且上述第二设定值为0。
基于此,每个分类标签与第二数组中的一个元素存在对应关系。这样,在得到目标文本的第一分类标签后,便可以将第二数组中与该第一分类标签对应的元素设定为第一设定值,将第二数组中除与该第一分类标签对应的元素以外的各个其他元素设定为第二设定值。进而,在第二数组中各个元素设定完成后,得到关于第一分类标签的第二数组。
进而,在得到上述第二数组后,便可以根据预设的数组与虚拟字的对应关系,确定上述第二数组所表征的虚拟字,从而,可以进一步确定上述第二数组所表征的虚拟字的索引值,得到目标索引值。
步骤S113:将目标索引值添加到第一数组中的第一指定位置处,得到融合数组;
其中,第一指定位置处包括:第一数组中的第一个元素之前,或者,第一数组中的最后一个元素之后。
在得到上述第一数组和目标索引值后,便可以将目标索引值添加到第一数组中的第一指定位置处,得到融合数组。
其中,可以将目标索引值添加到第一数组中的第一个元素之前,也可以将目标索引值添加到第一数组中的最后一个元素之后;当然,还可以将目标索引值添加至第一数组中的其他指定位置处。
这样,相比于上述第一数组,该融合数组中的索引值的个数增加了一个,其该新增的索引值即为上述目标索引值。
根据一个或多个实施例,上述步骤S103(利用目标文本和第一分类标签构建目标表征矩阵的步骤)可以包括如下步骤21-23:
步骤21:生成关于目标文本的的第一数组;
其中,第一数组中的各元素为:目标文本中每个字的索引值;
在得到上述目标文本后,便可以首先确定目标文本中的各个字的索引值,从而,根据所得到的索引值,生成关于目标文本的第一数组。其中,目标文本中每个字的索引值可以分别作为第一数组中的一个元素。
其中,上述所得到的第一数组所包括的索引值的数量可以为:目标文本中所包括的字的数量。
然而,考虑到处理速度等因素,可以设置目标文本的最大长度,从而,当获取到的待进行命名实体识别的目标文本中所包括的字的数量超过上述所设置的最大长度时,可以丢弃该待进行命名实体识别的目标文本中超过上述所设 置的最大长度的字,进而,利用所保留的各个字生成上述第一数组。例如,由于通常人一次讲话所包括的字的数量不超过70个字,则可以设置目标文本的最大长度为70。
也就是说,当获取到的待进行命名实体识别的目标文本中所包括的字的数量超过所设置的目标文本的最大长度时,用于构成上述目标表征矩阵的目标文本为上述获取到的待进行命名实体识别的目标文本丢弃超过所设置的最大长度的字后,剩余的目标文本,则该剩余的目标文本中所包括字的数量与上述所设置的目标文本的最大长度相同。
例如,当目标文本中包括N(N≥1)个字,且所设置的目标文本的最大长度为P(P<N)时,那么,可以丢弃获取到的待进行命名实体识别的目标文本中的第P+1个字至第N个字,进而,便可以确定目标文本中第1个至第P个字的索引值,从而,基于所确定的P个索引值,生成第一数组。
其中,除上述所举的例子之外,其他的能够生成上述第一数组的方式也处于本发明实施例所保护的范围内。
步骤22:生成第一数组对应的矩阵作为目标文本的初始矩阵;
其中,初始矩阵中中的各元素为:第一数组中的每个索引值对应的字向量;
在生成上述第一数组后,便可以确定该第一数组中的每个索引值对应的字向量,从而,在得到上述第一数组中全部索引值对应的字向量后,便可以基于所确定的各个字向量生成上述第一数组对应的矩阵,从而得到目标文本的初始矩阵。
其中,所得到的目标文本的初始矩阵的行数为:第一数组中包括的索引值的数量;所得到的目标文本的初始矩阵的列数为:所确定的每个索引值对应的字向量所包括的元素的数量。也就是说,目标文本的初始矩阵的每一行可以为上述所确定的一个字向量,从而,目标文本的初始矩阵的每一行可以对应上述第一数组中的一个索引值,进而,目标文本的初始矩阵的每一行可以对应目标文本中的一个字。
其中,由于上述第一数组中包括目标文本中每个字的索引值,而所得到的目标文本的初始矩阵中包括第一数组中每个索引值对应的字向量,因此,可以认为,所得到的目标文本的初始矩阵中包括目标文本中每个字对应的字向量。进一步的,目标文本的初始矩阵中包括的字向量的数量与上述第一数组中包括的索引值的数量相同,即目标文本的初始矩阵中包括的字向量的数量与目标文 本中包括的字的数量相同。
步骤23:构建关于第一分类标签的第二数组;
其中,第二数组中的各元素为:预设的各个分类标签的取值,第一分类标签为各个分类标签中的一个标签,第一分类标签的取值为第一设定值,除第一分类标签以为的各个其他分类标签的取值为第二设定值。
可以理解的,在实际场景中,用户可以通过文本表达多种用户意图,例如,询问天气、查询歌曲、查询地名等。因此,根据实际场景的需求,可以预设各个分类标签,并且,上述所确定的目标文本的第一分类标签即为所预设的各个分类标签中的一个标签。
这样,在确定出目标文本的第一分类标签后,便可以确定所预设的各个分类标签的取值。其中,第一分类标签的取值为第一设定值,除第一分类标签以为的各个其他分类标签的取值为第二设定值。从而,所得到的第二数组即为:由一个第一设定值和至少一个第二设定值构成的一维数组,并且,第二数组所包括的数字的数量为上述预设的各个分类标签的总量。
根据一个或多个实施例,上述第一设定值可以为1,且上述第二设定值为0。
基于此,每个分类标签与第二数组中的一个元素存在对应关系。这样,在得到目标文本的第一分类标签后,便可以将第二数组中与该第一分类标签对应的元素设定为第一设定值,将第二数组中除与该第一分类标签对应的元素以外的各个其他元素设定为第二设定值。进而,在第二数组中各个元素设定完成后,得到关于第一分类标签的第二数组。
步骤24:利用第二数组对初始矩阵进行扩展,生成目标表征矩阵。
在得到上述第二数组和上述目标文本的初始矩阵后,便可以利用该第二数组对上述初始矩阵进行扩展,从而,生成目标表征矩阵。
根据一个或多个实施例,上述步骤24可以包括如下步骤241A-242A:
步骤241A:确定第二数组所表征的虚拟字的索引值对应的字向量;
步骤242A:将所确定的字向量添加至初始矩阵中的第二指定位置处,得到目标表征矩阵;
其中,第二指定位置处包括:初始矩阵中的第一个元素之前,或者,初始矩阵中的最后一个元素之后。
需要说明的是,针对上述所得到的目标文本的初始矩阵,该初始矩阵的行数为:第一数组中包括的索引值的数量;该初始矩阵的列数为:所确定的每个 索引值对应的字向量所包括的元素的数量。
也就是说,目标文本的初始矩阵的每一行可以为上述所确定的一个字向量,因此,上述初始矩阵中的第一个元素之前为:初始矩阵中的第一行之前;上述初始矩阵中的最后一个元素之后可以为:初始矩阵中的最后一行之后。
根据一个或多个实施例,可以将上述步骤242A中所确定的字向量作为所得到的目标表征矩阵中的第一行,并在目标表征矩阵中,将初始矩阵中的各个行向下移动一行,从而,目标表征矩阵比初始矩阵的行数多一行;
或者,可以将上述步骤242A中所确定的字向量作为所得到的目标表征矩阵中的最后一行,并在目标表征矩阵中,初始矩阵中的各个行的行数保持不变,从而,目标表征矩阵比初始矩阵的行数多一行。
例如,初始矩阵包括20个字向量,且每个字向量可以为一维数组,一维数组包括的元素的数量可以为128,则可以将第二数组所表征的虚拟字的索引值对应的字向量添加到该初始矩阵的第一行之前或者添加到该初始矩阵的最后一行之后,从而,所得到的目标表征矩阵包括21个一维数组,且每个一维数组中包括的元素的数量为128。也就是说,初始矩阵为一个20行,128列的矩阵,目标表征矩阵为一个21行,128列的矩阵。
根据一个或多个实施例,为了提高目标文本当用户意图的标签内容在目标表征矩阵中的权重,以进一步提高所得到的目标文本的命名实体识别结果的准确性,在将第二数组所表征的虚拟字的索引值对应的字向量添加到该初始矩阵的指定位置时,可以在该指定位置处重复添加多次。
基于此,当上述第二指定位置处为初始矩阵中的第一个元素之前时,可以将初始矩阵中的各个行向下移动T(T>1)行,从而,将第二数组所表征的虚拟字的索引值对应的字向量作为该初始矩阵的第一行之前的每一行中的内容;这样,所得到的目标表征矩阵比初始矩阵的行数多T行,且该目标表征矩阵中第1行至第T行是相同的,均为第二数组所表征的虚拟字的索引值对应的字向量,也就是说,目标表征矩阵中可以存在T个第二数组所表征的虚拟字的索引值对应的字向量。
例如,假设T=3,初始矩阵包括20个字向量,且每个字向量可以为一维数组,一维数组包括的元素的数量可以为128,则可以将第二数组所表征的虚拟字的索引值对应的字向量添加到该初始矩阵的第一行之前,并重复添加3行。从而,所得到的目标表征矩阵包括23个一维数组,且每个一维数组中包括的元 素的数量为128。也就是说,初始矩阵为一个20行,128列的矩阵,目标表征矩阵为一个23行,128列的矩阵,并且目标表征矩阵的第1行、第2行和第3行是相同的,均为第二数组所表征的虚拟字的索引值对应的字向量。
此外,当上述第二指定位置处为初始矩阵中的最后一行之后时,可以在初始矩阵中的最后一行之后,重复添加T(T>1)个第二数组所表征的虚拟字的索引值对应的字向量;这样,所得到的目标表征矩阵比初始矩阵的行数多T行,且该目标表征矩阵中的最后T行是相同的,均为第二数组所表征的虚拟字的索引值对应的字向量,也就是说,目标表征矩阵中可以存在T个第二数组所表征的虚拟字的索引值对应的字向量。
例如,假设T=3,初始矩阵包括20个字向量,且每个字向量可以为一维数组,一维数组包括的元素的数量可以为128,则可以将第二数组所表征的虚拟字的索引值对应的字向量添加到该初始矩阵的最后一行之后,并重复添加3行。从而,所得到的目标表征矩阵包括23个一维数组,且每个一维数组中包括的元素的数量为128。也就是说,初始矩阵为一个20行,128列的矩阵,目标表征矩阵为一个23行,128列的矩阵,并且目标表征矩阵的第21行、第22行和第23行是相同的,均为第二数组所表征的虚拟字的索引值对应的字向量。
根据一个或多个实施例,在得到上述关于第一分类标签的第二数组后,可以根据预设的数组与虚拟字的对应关系,确定上述第二数组所表征的虚拟字,从而,可以进一步确定上述第二数组所表征的虚拟字的索引值。进而,便可以确定上述第二数组所表征的虚拟字的索引值对应的字向量。
其中,上述第二数组所表征的虚拟字的索引值对应的字向量中包括的元素的数量与上述目标文本的初始矩阵中的字向量中包括的元素的数量相同。
基于此,便可以将第二数组所表征的虚拟字的索引值对应的字向量添加至目标文本的初始矩阵中的第二指定位置处,从而,实现对目标文本的初始矩阵的扩展,并且,扩展后的初始矩阵即为目标表征矩阵。
其中,可以将第二数组所表征的虚拟字的索引值对应的字向量添加到初始矩阵中的第一个元素之前,也可以将第二数组所表征的虚拟字的索引值对应的字向量添加到初始矩阵中的最后一个元素之后;当然,还可以将第二数组所表征的虚拟字的索引值对应的字向量添加至初始矩阵中的其他指定位置处。
这样,相比于初始矩阵,所得到的目标表征矩阵中的字向量的个数增加了至少一个,其新增的至少一个字向量均为第二数组所表征的虚拟字的索引值对 应的字向量。
根据一个或多个实施例,上述步骤24可以包括如下步骤241B:
步骤241B:将第二数组添加至初始矩阵中的第三指定位置处,得到目标表征矩阵;
其中,第三指定位置包括:表示字向量的一维数组中的的第一个元素之前,或者,表示字向量的一维数组中的最后一个元素之后。
需要说明的是,针对上述所得到的目标文本的初始矩阵,该初始矩阵中包括多个字向量,并且,每个字向量通过一个一维数组表示,而表示每个字向量的每个一维数组中可以包括多个元素。进一步的,该初始矩阵的行数为:第一数组中包括的索引值的数量;该初始矩阵的列数为:所确定的每个索引值对应的字向量所包括的元素的数量,则初始矩阵的每一行可以为上述所确定的一个表示字向量的一维数组。
因此,上述表示字向量的一维数组中的第一个元素之前为:该初始矩阵中的每一行中位于第一列的元素之前,即假设第二数组中包括Q(Q>0)个元素时,则将上述第二数组中所包括的Q个元素作为所得到的目标表征矩阵中的每一行中位于第一列至第Q列的元素,并将目标表征矩阵中,将初始矩阵中的各个列向右移动Q列,从而,目标表征矩阵比初始矩阵多Q列;
上述表示字向量的一维数组中的最后一个元素之后为:该初始矩阵中每一行中位于最后一列的元素之后,即假设初始矩阵的列数为R(R>0),且第二数组中包括Q个元素时,则将上述第二数组中所包括的Q个元素作为所得到的目标表征矩阵中的每一行中位于第R+1列至第R+Q列的元素,并在目标表征矩阵中,初始矩阵中的各个列的列数保持不变,从而,目标表征矩阵比初始矩阵多Q列。
根据一个或多个实施例,在得到上述关于第一分类标签的第二数组后,可以直接利用第二数组对目标文本的初始矩阵进行扩展。也就是说,可以直接将第二数组添加至初始矩阵中的第三指定位置处,得到目标表征矩阵。
其中,可以将第二数组添加到初始矩阵中表示字向量的一维数组中的第一个元素之前,即在初始矩阵中每一行中位于第一列的元素之前添加第二数组,也可以将第二数组添加到初始矩阵中表示字向量的一维数组中的最后一个元素之后,即在初始矩阵中每一行中位于最后一列的元素之后添加第二数组;当然,还可以将第二数组添加至初始矩阵中的其他指定位置处。
其中,可以理解的,不同的第一分类标签对应于不同的第二数组,因此,可以通过不同的第二数组表征不同的第一分类标签。进而,目标文本的初始矩阵所包括的各个字向量为目标文本中每个字对应的字向量;这样,当在初始矩阵中的每个表示字向量的一维数组中添加第二数组时,便可以将目标文本所对应的用户意图添加至目标文本中,从而,实现目标文本的文本内容和表征用户意图的标签内容的融合。
这样,由于第二数组所包括的元素的数量为上述预设的各个分类标签的总量,因此,相比于初始矩阵,所得到的目标表征矩阵中的每个表示字向量的一维数组中所包括的元素的数量增加了上述预设的分类标签的总量,其该新增的元素即为上述第二数组。
例如,初始矩阵包括20个表示字向量的一维数组,且每个表示字向量的一维数组中包括的元素的数量为128,预设的分类标签的总量为10,则所得到的目标表征矩阵包括20个表示字向量的一维数组,且每个表示字向量的一维数组中包括的元素的数量为138。
根据一个或多个实施例,上述步骤S104,基于目标表征矩阵确定目标文本中每个字的命名实体识别NER标签,得到目标文本的命名实体识别结果的步骤,可以包括如下步骤31-34:
步骤31:基于目标表征矩阵确定目标文本的字特征矩阵;
其中,字特征矩阵中包括:目标文本中每个字在目标文本的正向顺序中的字向量,以及目标文本中每个字在目标文本的反向顺序中的字向量;
在得到目标表征矩阵后,便可以基于目标表征矩阵确定目标文本的字特征矩阵。
其中,首先根据目标表征矩阵的构建方式,从目标表征矩阵中分析得到用于表征目标文本中包括的每个字的内容和表征目标文本的第一分类标签的内容。从而,便可以根据目标文本中每个字的正向顺序和反向顺序,以及目标文本中每个字与其他字之间的上下文语义关系,利用分析得到的表征目标文本中包括的每个字的内容,确定目标文本中每个字在目标文本的正向顺序中的字向量,以及目标文本中每个字在目标文本的反向顺序中的字向量,从而,得到目标文本的字特征矩阵。
步骤32:基于字特征矩阵,确定目标文本中的标签矩阵;
其中,标签矩阵用于表征:目标文本中每个字具有各个NER标签的概率;
进而,在得到目标文本的字特征矩阵后,便可以根据从目标表征矩阵中分析得到的表征目标文本的第一分类标签的内容,以及所确定的目标文本的字特征矩阵,确定目标文本中的每个字所可能具有的NER标签,以及该字具有每个NER标签的可能性,从而,得到用于表征目标文本中每个字具有每个NER标签的概率的标签矩阵。
其中,可以理解的,在确定目标文本的标签矩阵时,利用了从目标表征矩阵中分析得到的表征目标文本中每个字的内容和表征目标文本的第一分类标签的内容之间的关联关系,也就是说,所确定的目标文本的标签矩阵所表征的目标文本中每个字具有每个NER标签的概率,是在考虑了用户意图的维度下所确定的。
步骤33:基于标签矩阵,确定目标文本中每个字的NER标签索引;
进而,在得到目标文本的标签矩阵后,便可以基于该标签矩阵,确定目标文本中每个字最终的NER标签,并进一步确定该最终的NER标签的NER标签索引,从而,可以确定目标文本中每个字的NER标签索引。
步骤34:将目标文本中每个字的NER标签索引转换为目标文本中每个字的NER标签,得到目标文本的命名实体识别结果。
由于在上述步骤23中所确定的目标文本中每个字的NER标签索引的可读性较差,用户不能直观地得到目标文本中每个字的NER标签,因此,可以进一步的将目标文本中每个字的NER标签索引转换为目标文本中每个字的NER标签,从而,提高可读性,使得用户最终得到目标文本的命名实体识别结果。
根据一个或多个实施例的命名实体识别方法可以通过预先训练的命名实体识别模型实现。即根据一个或多个实施例,上述步骤S103(利用目标文本和第一分类标签构建目标表征矩阵)以及上述步骤S104(基于目标表征矩阵确定目标文本中每个字的NER标签)是通过上述命名实体识别模型实现的。
根据一个或多个实施例,如图4所示,上述命名实体识别模型包括:串联相接的输入层、融合层、字嵌入层、双向LSTM(Long Short-Term Memory,长短期记忆网络)层、全连接层、CRF(conditional random field,条件随机场)层和输出层;
输入层、融合层和字嵌入层用于实现利用目标文本和第一分类标签构建目标表征矩阵的步骤;双向LSTM层、全连接层、CRF层和输出层用于实现基于目标表征矩阵确定目标文本中每个字的NER标签的步骤;
其中,输入层,用于根据目标文本生成关于该目标文本的第一数组;
融合层,用于构建关于第一分类标签的第二数组,确定第二数组所表征的虚拟字的索引值作为目标索引值,将目标索引值添加到第一数组中的第一指定位置处,得到融合数组;
字嵌入层,用于生成融合数组对应的矩阵作为目标表征矩阵;
双向LSTM层,用于基于目标表征矩阵确定目标文本的字特征矩阵;
全连接层,用于基于字特征矩阵,确定目标文本的标签矩阵,该标签矩阵用于表征目标文本中每个字具有各个NER标签的概率;
CRF层,用于基于标签矩阵,确定目标文本中每个字的NER标签索引;
输出层,用于将目标文本中每个字的NER标签索引转换为目标文本中每个字的NER标签,得到目标文本的命名实体识别结果。
根据一个或多个实施例,上述预先训练的命名实体识别模型即为:预先训练的基于双向LSTM+CRF的NER模型。具体的:
针对输入层:
在得到上述目标文本和第一分类标签后,便可以将该目标文本和第一分类标签输入到上述预先训练的命名实体识别模型中。
进而,在上述命名实体识别模型中,输入层中预先设定了每个字的索引值,其中,每个字的索引值的格式可以为one-hot(独热码)格式。
这样,输入层在接收到目标文本后,可以首先从目标文本的第一个字开始,依次确定目标文本中的每个字的one-hot格式的索引值。进而,输出层便可以生成关于目标文本的第一数组。
其中,该第一数组中包括上述目标文本中每个字的索引值,且所包括的索引值的数量与上述目标文本中包括的字的数量。
此外,根据一个或多个实施例,由于用户讲话的时候,通常一句话不会超过70个字,因此,可以设置目标文本的最大长度为70。这样,当获取到的待进行命名实体识别的目标文本中包括的字的数量超过70时,丢弃第70个字之后的各个字。从而,获取到的待进行命名实体识别的目标文本中包括的字的数量超过70时,所得到的第一数组中包括70个索引值。
当然,上述目标文本的最大长度也可以设置为其他具体数值,对此,本发明实施例不做具体限定。
根据一个或多个实施例,所生成的第一数组中的每个索引值为整数数值。
进而,在生成上述第一数组后,输入层便可以将该第一数组作为输出,从而,将该第一数组输入到融合层中。
针对融合层:
其中,在将上述目标文本和第一分类标签输入到上述预先训练的命名实体识别模型中时,可以将上述目标文本和第一分类标签输入到上述输入层中。这样,由于输入层不对第一分类标签进行处理,因此,输入层还可以将第一分类标签作为输出,从而,将第一分类标签输入到融合层;也可以将上述目标文本和第一分类标签分别输入到上述输入层和融合层中。
这样,融合层在得到上述第一数组和第一分类标签后,便可以首先构建关于该第一分类标签的第二数组。
例如,可以将上述第一设定值设为1,将上述第二设定值设为0,且分类标签总数为10,第一分类标签为10个分类标签中的第5个分类标签,则可以得到第二数组:[0,0,0,0,1,0,0,0,0,0]。
其中,可以确定上述第二数组所表征的虚拟字,进而,确定该虚拟字的one-hot格式的索引值,从而,得到关于第一分类标签的第二数组所表征的虚拟字的索引值,即得到目标索引值。
进而,融合层便可以将上述目标索引值添加到第一数组中的第一指定位置处,得到融合数组。其中,所得到的融合结果仍然为一维数组,且融合结果所包括的索引值的数量为:目标文本所包括的字的数量和第二数组所表征的虚拟字的数量的和值。通常,第二数组所表征的虚拟字的数量为1。
其中,需要说明的是,上述第一数组包括目标文本中的各个字的索引值,上述目标索引值为:关于第一分类标签的第二数组所表征的虚拟字的索引值,因此,上述所得到的融合数组包括:目标文字中的各个字的索引值和第一分类标签所表征的虚拟字的索引值。
基于此,本实施例所提供的融合方式相当于:将第一分类标签作为虚拟字,从而,将目标文本扩展为“虚拟字+目标文本”,进而,对该拓展后的“虚拟字+目标文本”进行索引值转换,得到上述融合数组。
接着,融合层便可以将上述所得到的融合数组作为输出,从而,将上述融合数组输入到上述字嵌入层。
针对字嵌入层:
所谓字嵌入是指用一个包括多个元素的一维数组表示每个字,其中,每个 元素为一个数字,例如,利用包括128个元素的一维数组表示每个字,即利用包括128个数字的一维数组表示每个字。
这样,由于每个字的索引值对应一字向量,因此,字嵌入层可以确定所得到的融合数组中的各个索引值对应的字向量,从而,基于所确定的各个字向量,生成目标表征矩阵。其中,所确定的每个索引值对应的字向量中所包括的元素的数量均为预设数量。
接着,字嵌入层便可以将上述所得到的目标表征矩阵作为输出,从而,将上述目标表征矩阵输入到上述双向LSTM层。
针对双向LSTM层:
LSTM层是一个神经网络模型,其在处理文本时,会考虑目标文本中的每一个字。例如,LSTM层处理文本“我要听刘德华的忘情水”时,所得到的最后一个词是“忘情水”,并且在“忘情水”之前,还得到了“我要听”和“刘德华”两个字,从而,LSTM层在对“忘情水”进行词槽识别时,考虑了“我要听”和“刘德华”的因素,从而,结合文本中的上下文,识别出“忘情水”可能是一首歌名。
由于如果采用单向LSTM,可能会丢失文本中关于字、词顺序的信息。例如,无法区分“我爱你”和“你爱我”。因此,根据一个或多个实施例,采用双向LSTM层,从而可以将正向和反向两个方向的识别结果结合,获得文本中每个字、词的顺序关系。
根据一个或多个实施例,双向LSTM层的输入为上述字嵌入层得到的目标表征矩阵。
根据一个或多个实施例,目标表征矩阵可以表示为[1+X,Y]。其中,X表示目标文本中字的数量,Y表示目标表征矩阵中每个字向量中所包括的元素的数量。
进而,双向LSTM层的输出即为目标文本的字特征矩阵。其中,当目标表征矩阵表示为[1+X,Y]时,双向LSTM层所输出的目标文本的字特征矩阵可以表示为:[2*(1+X),HIDDENUNIT]。
也就是说,双向LSTM层输出的目标文本的字特征矩阵中包括2*(1+X)个一维数组,每个一维数组所包括的元素的数量为HIDDENUNIT。
由于双向LSTM层需要进行正向和反向两个方向的识别,因此,所得到的目标文本的字特征矩阵中包括的一维数组的数量为目标表征矩阵中包括的一 维数组的数量的两倍,并且,目标文本的字特征矩阵中包括的每个一维数组均为一个设定长度的一维数组,即每个一维数组中均包括预设数量个元素。
其中,目标文本的字特征矩阵中包括的每个一维数组为一个字向量,从而,目标文本的字特征矩阵中包括:目标文本中每个字在目标文本的正向顺序中的字向量,以及目标文本中每个字在目标文本的反向顺序中的字向量。
进而,在得到上述目标文本的字特征矩阵后,双向LSTM层便可以将该字特征矩阵作为输出,从而,将该字特征矩阵输入到全连接层。
针对全连接层:
全连接层包括维度变换和特征提取两个作用,这样,全连接层在得到上述目标文本的字特征矩阵后,便可以基于该字特征矩阵,确定目标文本的标签矩阵。
以双向LSTM输出的目标文本的字特征矩阵[2*(1+X),HIDDENUNIT]为例,经过全连接层后,该字特征矩阵[2*(1+X),HIDDENUNIT]可以转换为目标文本的标签矩阵,且该标签矩阵可以表示为[(1+X),OUTPUTDIM]。
上述目标文本的标签矩阵[(1+X),OUTPUTDIM]中包括(1+X)个一维向量,每个一维向量可以表示:关于第一分类标签的第二数组所表征的虚拟字的标签向量和目标文本中每个字的标签向量。
其中,关于第一分类标签的第二数组所表征的虚拟字的NER标签为“XX”,例如,可以为上述表1中的“O”,即关于第一分类标签的第二数组所表征的虚拟字为非词槽。相应的,目标文本中每个字的标签向量可以表示:该字对应OUTPUTDIM个值,且该OUTPUTDIM个值是该字可能具有的NER标签的数目,表示该字可以有OUTPUTDIM个概率值,其中,每个概率值的大小代表这个字属于该概率值所对应的NER标签的可能性,概率值越大则说明字具有该概率值所对应的NER标签的概率越大。
基于此,全连接层可以基于目标文本的字特征矩阵,确定目标文本的标签矩阵,其该标签矩阵可以用于表征:目标文本中每个字具有各个NER标签的概率。也就是说,根据目标文本的标签矩阵,可以确定目标文本中每个字具有各个NER标签的概率。
进一步的,全连接层便可以将上述目标文本的标签矩阵作为输出,从而,将该目标文本的标签矩阵输入到上述CRF层。
针对CRF层:
CRF层可以理解为维特比解码层(Viterbi decode),进而,CRF层在接收到目标文本的标签矩阵后,便可以利用预设的转移矩阵,计算该标签矩阵的每条链路的和值,并得到和值最大的链路,从而,得到可能性最大的路径。这样,便可以确定目标文本中的每个字的NER标签,从而,得到目标文本中每个字的NER标签索引。
进而,在得到上述目标文本中每个字的NER标签索引后,CRF层便可以将目标文本中每个字的NER标签索引作为输出,从而,将目标文本中每个字的NER标签索引输入到输出层。
针对输出层:
由于CRF的输出为目标文本中每个字的NER标签索引,因此,对于用户而言,其可读性较差,这样,输出层便可以将目标文本中每个字的NER标签索引转换成该字的NER标签,其中,目标文本中每个字的NER标签可以是通过NER标签字符串表示的,从而,可以提高可读性。例如,每个字的NER标签可以通过上述表2形式表示。
根据一个或多个实施例,本发明实施例提供的一种命名实体识别方法可以通过预先训练的命名实体识别模型实现。即根据一个或多个实施例,上述步骤S103,利用目标文本和第一分类标签构建目标表征矩阵;以及上述步骤S104,基于目标表征矩阵确定目标文本中每个字的NER标签是通过上述命名实体识别模型实现的。
根据一个或多个实施例,如图5所示,上述命名实体识别模型包括:串联相接的输入层、字嵌入层、融合层、双向LSTM层、全连接层、CRF层和输出层;
输入层、所字嵌入层和融合层用于实现利用目标文本和第一分类标签构建目标表征矩阵的步骤;双向LSTM层、全连接层、CRF层和输出层用于实现基于目标表征矩阵确定目标文本中每个字的NER标签的步骤;
其中,输入层,用于根据目标文本生成关于该目标文本的第一数组;
字嵌入层,用于生成第一数组对应的矩阵作为目标文本的初始矩阵;
融合层,用于构建关于第一分类标签的第二数组,利用第二数组对初始矩阵进行扩展,生成目标表征矩阵;
双向LSTM层,用于基于目标表征矩阵确定目标文本的字特征矩阵;
全连接层,用于基于字特征矩阵,确定目标文本的标签矩阵,该标签矩阵用于表征目标文本中每个字具有各个NER标签的概率;
CRF层,用于基于标签矩阵,确定目标文本中每个字的NER标签索引;
输出层,用于将目标文本中每个字的NER标签索引转换为目标文本中每个字的NER标签,得到目标文本的命名实体识别结果。
根据一个或多个实施例,上述预先训练的命名实体识别模型即为:预先训练的基于双向LSTM+CRF的NER模型。具体的:
针对输入层:
在得到上述目标文本和第一分类标签后,便可以将该目标文本和第一分类标签输入到上述预先训练的命名实体识别模型中。
进而,在上述命名实体识别模型中,输入层中预先设定了每个字的索引值,其中,每个字的索引值的格式可以为one-hot格式。
这样,输入层在接收到目标文本后,可以首先从目标文本的第一个字开始,依次确定目标文本中的每个字的one-hot格式的索引值。进而,输出层便可以生成关于目标文本的第一数组。
其中,该第一数组中包括上述目标文本中每个字的索引值,且所包括的索引值的数量与上述目标文本中包括的字的数量。
此外,根据一个或多个实施例,由于用户讲话的时候,通常一句话不会超过70个字,因此,可以设置目标文本的最大长度为70。这样,当获取到的待进行命名实体识别的目标文本中包括的字的数量超过70时,丢弃第70个字之后的各个字。从而,获取到的待进行命名实体识别的目标文本中包括的字的数量超过70时,所得到的第一数组中包括70个索引值。
当然,上述目标文本的最大长度也可以设置为其他具体数值,对此,本发明实施例不做具体限定。
根据一个或多个实施例,所生成的第一数组中的每个索引值为整数数值。
进而,在生成上述第一数组后,输入层便可以将该第一数组作为输出,从而,将该第一数组输入到上述字嵌入层。
针对字嵌入层:
所谓字嵌入是指用一个包括多个元素的一维数组表示每个字,其中,每个元素为一个数字,例如,利用包括128个元素的一维数组表示每个字,即利用包括128个数字的一维数组表示每个字。
这样,由于每个字的索引值对应一字向量,因此,字嵌入层可以确定所得到的第一数组中的各个索引值对应的字向量,从而,基于所确定的各个字向量, 生成目标文本的初始矩阵。其中,所确定的每个索引值对应的字向量中所包括的元素的数量均为预设数量。
进而,在生成上述初始矩阵后,字嵌入层便可以将该目标文本的初始矩阵作为输出,从而,将该目标文本的初始矩阵输入到上述融合层。
针对融合层:
其中,在将上述目标文本和第一分类标签输入到上述预先训练的命名实体识别模型中时,可以将上述目标文本和第一分类标签输入到上述输入层中。这样,由于输入层不对第一分类标签进行处理,因此,输入层还可以将第一分类标签作为输出,从而,将第一分类标签输入到上述字嵌入层,进而,由于上述字嵌入层也不对第一分类标签进行处理,从而,上述字嵌入层还可以将第一分类标签作为输出,从而,将第一分类标签输入到上述融合层;也可以将上述目标文本和第一分类标签分别输入到上述输入层和融合层中。
这样,融合层在得到上述初始矩阵和第一分类标签后,便可以首先构建关于该第一分类标签的第二数组。
例如,可以将上述第一设定值设为1,将上述第二设定值设为0,且分类标签总数为10,第一分类标签为10个分类标签中的第5个分类标签,则可以得到的第二数组:[0,0,0,0,1,0,0,0,0,0]。
进而,便可以利用该第二数组对目标文本的初始矩阵进行扩展,从而,得到目标表征矩阵。
根据一个或多个实施例,融合层可以执行上述步骤241A-242A。
其中,可以确定上述第二数组所表征的虚拟字,进而,确定该虚拟字的one-hot格式的索引值,从而,得到关于第一分类标签的第二数组所表征的虚拟字的索引值。进而,便可以确定上述第二数组所表征的虚拟字的索引值对应的字向量。
基于此,便可以将第二数组所表征的虚拟字的索引值对应的字向量添加至目标文本的初始矩阵中的第二指定位置处,从而,实现对目标文本的初始矩阵的扩展,并且,扩展后的初始矩阵即为目标表征矩阵。
其中,需要说明的是,上述目标文本的初始矩阵包括目标文本中的每个字的索引值对应的字向量,因此,上述目标表征矩阵包括:目标文本中的每个字的索引值对应的字向量与第一分类标签所表征的虚拟字的索引值对应的字向量。从而,上述目标表征矩阵包括:目标文本中的每个字对应的字向量与第一 分类标签所表征的虚拟字对应的字向量。
基于此,本实施例所提供的融合方式相当于:将第一分类标签作为虚拟字,从而,将目标文本扩展为“虚拟字+目标文本”,进而,对该拓展后的“虚拟字+目标文本”进行字嵌入转换,得到上述目标表征矩阵,并且,该“虚拟字”的NER标签为“XX”,例如,可以为上述表1中的“O”,即关于第一分类标签的第二数组所表征的虚拟字为为非词槽。
根据一个或多个实施例,融合层可以执行上述步骤241B。
在得到上述关于第一分类标签的第二数组后,可以直接利用第二数组对目标文本的初始矩阵进行扩展。也就是说,可以直接将第二数组添加至初始矩阵中的第三指定位置处,得到目标表征矩阵。
其中,需要说明的是,上述目标文本的初始矩阵包括目标文本中的每个字的索引值对应的字向量,也就是说,上述目标文本的初始矩阵包括目标文本中的每个字对应的字向量,因此,当在目标文本的初始矩阵中的每个表示字向量的一维数组添加第二数组时,便可以将目标文本所对应的用户意图添加至目标文本中。
基于此,本实施例所提供的融合方式相当于:将第一分类标签作为目标文本的各个字对应的字向量中的多个元素,从而,将目标文本的各个字的对应的字向量进行扩展。
针对双向LSTM层:
LSTM层是一个神经网络模型,其在处理文本时,会考虑目标文本中的每一个字。例如,LSTM层处理文本“我要听刘德华的忘情水”时,所得到的最后一个词是“忘情水”,并且在“忘情水”之前,还得到了“我要听”和“刘德华”两个字,从而,LSTM层在对“忘情水”进行词槽识别时,考虑了“我要听”和“刘德华”的因素,从而,结合文本中的上下文,识别出“忘情水”可能是一首歌名。
由于如果采用单向LSTM,可能会丢失文本中关于字、词顺序的信息。例如,无法区分“我爱你”和“你爱我”。因此,根据一个或多个实施例,采用双向LSTM层,从而可以将正向和反向两个方向的识别结果结合,获得文本中每个字、词的顺序关系。
根据一个或多个实施例,双向LSTM层的输入为上述字嵌入层得到的目标表征矩阵。
根据一个或多个实施例,目标表征矩阵可以表示为[1+X,Y]。其中,X表示目标文本中字的数量,Y表示目标表征矩阵中每个字向量中所包括的元素的数量。
进而,双向LSTM层的输出即为目标文本的字特征矩阵。
根据一个或多个实施例,当目标表征矩阵表示为[1+X,Y]时,双向LSTM层所输出的目标文本的字特征矩阵可以表示为:[2*(1+X),HIDDENUNIT]。
也就是说,双向LSTM层输出的目标文本的字特征矩阵中包括2*(1+X)个一维数组,每个一维数组中所包括的元素的数量为HIDDENUNIT。
根据一个或多个实施例,当目标表征矩阵表示为[X,Y+CLASS]时,其中,X表示目标文本中字的数量,Y表示目标文本的初始矩阵中的每个字向量所包括的元素的数量,CLASS表示预设的分类标签的总量。
则双向LSTM层的输出即为目标文本的字特征矩阵,可以表示为:[2*X,HIDDENUNIT’]。
也就是说,双向LSTM层输出的矩阵中包括2*X个一维数组,每个一维数组中所包括的元素的数量为HIDDENUNIT’。
由于双向LSTM层需要进行正向和反向两个方向的识别,因此,所得到的目标文本的字特征矩阵中包括的一维数组的数量为目标表征矩阵中包括的一维数组的数量的两倍,并且,目标文本的字特征矩阵中包括的每个一维数组均为一个设定长度的一维数组,即每个一维数组中均包括预设数量个元素。
其中,目标文本的字特征矩阵中包括的每个一维数组为一个字向量,从而,目标文本的字特征矩阵中包括:目标文本中每个字在目标文本的正向顺序中的字向量,以及目标文本中每个字在目标文本的反向顺序中的字向量。
进而,在得到上述目标文本的字特征矩阵后,双向LSTM层便可以将该字特征矩阵作为输出,从而,将该字特征矩阵输入到全连接层。
针对全连接层:
全连接层包括维度变换和特征提取两个作用,这样,全连接层在得到上述目标文本的字特征矩阵后,便可以基于该字特征矩阵,确定目标文本的标签矩阵。
以双向LSTM输出的目标文本的字特征矩阵[2*(1+X),HIDDENUNIT]为例,经过全连接层后,该字特征矩阵[2*(1+X),HIDDENUNIT]可以转换为目标文本的标签矩阵,且该标签矩阵可以表示为[(1+X),OUTPUTDIM]。
上述目标文本的标签矩阵[(1+X),OUTPUTDIM]中包括(1+X)个一维向量,每个一维向量可以表示:关于第一分类标签的第二数组所表征的虚拟字的标签向量和目标文本中每个字的标签向量。
其中,关于第一分类标签的第二数组所表征的虚拟字的NER标签为“XX”,例如,可以为上述表1中的“O”,即关于第一分类标签的第二数组所表征的虚拟字为非词槽。相应的,目标文本中每个字的标签向量可以表示:该字对应OUTPUTDIM个值,且该OUTPUTDIM个值是该字可能具有的NER标签的数目,表示该字可以有OUTPUTDIM个概率值,其中,每个概率值的大小代表这个字属于该概率值所对应的NER标签的可能性,概率值越大则说明字具有该概率值所对应的NER标签的概率越大。
基于此,全连接层可以基于目标文本的字特征矩阵,确定目标文本的标签矩阵,其该标签矩阵可以用于表征:目标文本中每个字具有各个NER标签的概率。也就是说,根据目标文本的标签矩阵,可以确定目标文本中每个字具有各个NER标签的概率。
进一步的,全连接层便可以将上述目标文本的标签矩阵作为输出,从而,将该目标文本的标签矩阵输入到上述CRF层。
针对CRF层:
CRF层可以理解为维特比解码层(Viterbi decode),进而,CRF层在接收到目标文本的标签矩阵后,便可以利用预设的转移矩阵,计算该标签矩阵的每条链路的和值,并得到和值最大的链路,从而,得到可能性最大的路径。这样,便可以确定目标文本中的每个字的NER标签,从而,得到目标文本中每个字的NER标签索引。
进而,在得到上述目标文本中每个字的NER标签索引后,CRF层便可以将目标文本中每个字的NER标签索引作为输出,从而,将目标文本中每个字的NER标签索引输入到输出层。
针对输出层:
由于CRF的输出为目标文本中每个字的NER标签索引,因此,对于用户而言,其可读性较差,这样,输出层便可以将目标文本中每个字的NER标签索引转换成该字的的NER标签,其中,目标文本中每个字的NER标签可以是通过NER标签字符串表示的,从而,可以提高可读性。例如,每个字的NER标签可以通过上述表2形式表示。
可以理解的,当通过预先训练的命名实体识别模型实现根据本发明一个或多个实施例的命名实体识别方法时,需要实现训练得到该命名实体识别模型。
根据一个或多个实施例,如图3所示,上述命名实体识别模型的训练方式,包括:
S301:获取待利用的样本文本、样本文本的第二分类标签和样本文本中每个字的NER标签真值;
其中,第二分类标签用于表征样本文本所对应的用户意图;
S302:将样本文本和样本文本的第二分类标签输入命名实体识别模型,以使命名实体识别模型利用样本文本和第二分类标签,构建样本表征矩阵,并利用样本表征矩阵预测样本文本中每个字的NER标签预测值;
S303:基于样本标签的每个字的NER标签真值和NER标签预测值,判断命名实体识别模型是否收敛,如果是,执行步骤S304;否则,执行S305,并返回上述步骤S301;
S304:结束训练,得到训练完成的命名实体识别模型;
S305:调整命名实体识别模型中的模型参数,返回获取待利用的样本文本、样本文本的第二分类标签和样本文本中每个字的NER标签真值的步骤。
其中,该命名实体识别模型可以是任一类型的训练得到的,例如,笔记本电脑、台式电脑、平板电脑等,对此,本发明实施例不做具体限定,以下简称训练设备。其中,训练设备与上述可以是同一,也可以是不同的。当训练设备和是同一设备时,即可以在同一中训练得到上述命名实体识别模型,进而,在该上,利用所得到的命名实体识别模型实现本发明实施例提供的一种命名实体识别方法;当上述训练设备和不是同一设备时,训练设备在训练得到上述命名实体识别模型后,可以将所得到的命名实体识别模型发送给。这样,在得到命名实体识别模型后,便可以利用所得到的命名实体识别模型实现本发明实施例提供的一种命名实体识别方法。
其中,训练设备可以首先获取待利用的样本文本、样本文本的第二分类标签和样本文本中每个字的NER标签真值,进而,便可以基于所获取的样本文本、样本文本的第二分类标签和样本文本中每个字的NER标签真值,对命名实体识别模型进行训练,得到训练完成的命名实体识别模型。
其中,样本文本可以是句子,也可以是由多个词语组成的词组或者短语,这都是合理的;另外,样本文本的第二分类标签可以是利用预设的意图分类模 型对目标文本进行意图分类所确定的。
可以通过多种方式获取待利用的样本文本、样本文本的第二分类标签和样本文本中每个字的NER标签真值。例如,可以直接获取保存在本地存储空间中的待利用的样本文本、样本文本的第二分类标签和样本文本中每个字的NER标签真值;也可以从其他非本地的存储空间中获取待利用的样本文本、样本文本的第二分类标签和样本文本中每个字的NER标签真值。这都是合理的。
此外,在本发明实施例中,为了保证训练得到的命名实体识别模型的准确率,在命名实体识别模型的训练过程中,可以利用大量的样本文本、每个样本文本的第二分类标签和每个样本文本中每个字的NER标签真值。因此,第二可以获取多组待利用的样本文本、样本文本的第二分类标签和样本文本中每个字的NER标签真值。
其中,训练样本的数量可以根据实际应用中的需求进行设定,本发明中不做具体限定。且样本文本的类型可以仅仅包括句子、短语或者词组,也可以包括句子、短语和词组中的至少两类。这都是合理的。
在获取到上述待利用的样本文本、样本文本的第二分类标签和样本文本中每个字的NER标签真值后,便可以将这些样本文本、样本文本的第二分类标签和样本文本中每个字的NER标签真值输入到命名实体识别模型中。
其中,命名实体识别模型包括预处理子网络和命名实体识别子网络。这样,在获取到上述样本文本、样本文本的第二分类标签和样本文本中每个字的NER标签真值后,上述预处理子网络便可以利用样本文本和第二分类标签,构建样本表征矩阵。进而,上述命名实体识别子网络便可以利用样本表征矩阵预测样本文本中每个字的NER标签预测值。
其中,预处理子网络便可以利用样本文本和第二分类标签,构建样本表征矩阵的方式,与上述预处理子网络利用目标文本和第一分类标签,构建目标表征矩阵的方式类似,在此不再赘述。
在得到样本文本中每个字的NER标签预测值后,便可以基于样本标签的每个字的NER标签真值和NER标签预测值,判断命名实体识别模型是否收敛。
其中,如果判断得到命名实体识别模型收敛,则说明命名实体识别模型已经训练完成,则可以停止训练,得到训练完成的命名实体识别模型。
如果判断出命名实体识别模型未收敛,则说明命名实体识别模型还未训练完成,需要继续训练。则可以调整命名实体识别模型中的模型参数,从而,再 次获取待利用的样本文本、样本文本的第二分类标签和样本文本中每个字的NER标签真值,并利用所获取的样本文本、样本文本的第二分类标签和样本文本中每个字的NER标签真值,对参数调整后的命名实体识别模型进行继续训练。直至判断出命名实体识别模型收敛,得到训练完成的命名实体识别模型。
根据一个或多个实施例,可以基于样本标签的每个字的NER标签真值和NER标签预测值之间的匹配度,判断命名实体识别模型是否收敛。
例如,当样本标签的每个字的NER标签真值和NER标签预测值之间的匹配度大于预设匹配度时,可以判断命名实体识别模型收敛,否则,判定命名实体识别模型未收敛。
下面,对预先训练的用于得到上述第一分类标签和第二分类标签的意图分类模型进行举例说明。
根据一个或多个实施例,上述意图分类模型可以为CNN分类模型。
其中,如图6所示,该CNN分类模型可以包括:输入层、字嵌入层、卷积层、池化层、融合层、全连接层和输出层。
其中,在得到上述目标文本后,便可以将该目标文本输入到上述CNN分类模型中。
进而,在上述CNN分类模型中,输入层中预先设定了每个字的索引值,其中,每个字的索引值的格式可以为one-hot格式。
这样,输入层在接收到目标文本后,可以首先从目标文本的第一个字开始,依次确定目标文本中的每个字的one-hot格式的索引值。进而,输出层便可以生成关于目标文本的第一数组。
其中,该第一数组中包括上述目标文本中每个字的索引值,且所包括的搜引号的数量与上述目标文本中包括的字的数量。
此外,根据一个或多个实施例,由于用户讲话的时候,通常一句话不会超过70个字,因此,可以设置目标文本的最大长度为70。这样,当获取到的目标文本中包括的字的数量超过70时,丢弃第70个字之后的各个字。从而,获取到的目标文本中包括的字的数量超过70时,所得到的第一数组中包括70个索引值。
当然,上述目标文本的最大长度也可以设置为其他具体数值,对此,本发明实施例不做具体限定。
根据一个或多个实施例,所生成的第一数组中的每个索引值为整数数值。
进而,在生成上述索引数组后,输入层便可以将该第一数组作为输出,从 而,将该第一数组输入到字嵌入层中。
所谓字嵌入是指用一个包括多个元素的一维数组表示每个字,其中,每个元素为一个数字,例如,利用包括128个元素的一维数组表示每个字,即利用包括128个数字的一维数组表示每个字。
这样,由于每个字的索引值对应一字向量,因此,字嵌入层可以确定所得到的索引数组中的各个索引值对应的字向量,从而,基于所确定的各个字向量,生成目标矩阵。其中,所确定的每个索引值对应的字向量中所包括的元素的数量均为预设数量。
进而,字嵌入层便可以将所生成的目标矩阵作为输出,并将该目标矩阵输入到卷积层中。
卷积层的作用是放大并提出目标文本中的某些特征,从而,输出一个关于目标文本的特征的特征矩阵。该特征矩阵的大小与卷积层的卷积核有关。
其中,卷积核可以表示为[K,Length],其中,K表示使用K字长的特征提取,即把目标文本中连续的K个字作为感兴趣的特征,从而,能够把目标文本中的连续的K个字进行整体处理。其中,当该连续的K个字是词语或短语时,可以将该该连续的K个字作为一个整体考虑,当该连续的K个字是单字时,需要考虑该连续的K个字中,各个字的前后关系。Length表示K字长的卷积核的数量。
根据一个或多个实施例,卷积层中可以包括多个卷积核,从而,可以针对每个卷积核,得到一个特征矩阵。
池化层的目的是忽略卷积核提取出来的特征中的不重要的特性,只保留最重要的特征。
其中,池化层可以采用“下采样”方法,所谓“下采样”方式是针对卷积层输出的各个矩阵,找到每个矩阵中的最大值,从而,用该最大值代替该矩阵。
并且,根据一个或多个实施例,由于可以存在多个卷积层,因此,可以存在多个池化层,即每个卷积层后相接一个池化层,从而,池化层的输出即为相接的卷积层输出的矩阵中的最大值。
融合层用于将多个池化层的输出进行组合,得到一个新的一维数组。
全连接层的输入为融合层输出的一维数组。其用于将该一维数组中的各个数字转换为预设的分类标签总数个概率值,其中,所转换得到的概率值可以为浮点数值。并且,每个概率值的大小代表目标文本对应于每个分类标签的可能性。其中,概率值越大,目标文件对应该概率值代表的分类标签的可能性越大。
通常,所得到的各个概率值的数值较大,因此,可以对所得到的各个概率值进行归一化,使归一化后的各个概率值的和值为1。
输出层接收全连接成输出的各个概率值,即接收一个包括分类标签总数个数字的一维数组。其中,该一维数组中每个数字的下标表示一个分类标签的分类号,并且,输出层可以将该分类标签的分类号转换为用户可识别的分类标签,即转换为用户可以识别的意图。
相应于上述本发明实施例提供的一种命名实体识别方法,本发明实施例还提供了一种电子设备,如图7所示,包括处理器701、通信接口702、存储器703和通信总线704,其中,处理器701,通信接口702,存储器703通过通信总线704完成相互间的通信,
存储器703,用于存放计算机程序;
处理器701,用于执行存储器703上所存放的程序时,实现上述本发明实施例提供的任一命名实体识别方法的步骤。
上述电子设备提到的通信总线可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
通信接口用于上述电子设备与其他设备之间的通信。
存储器可以包括随机存取存储器(Random Access Memory,RAM),也可以包括非易失性存储器(Non-Volatile Memory,NVM),例如至少一个磁盘存储器。根据一个或多个实施例,存储器还可以是至少一个位于远离前述处理器的存储装置。
上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
根据本发明的另一方面,还提供了一种计算机可读存储介质,该计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述任一命名实体识别方法的步骤。
根据本发明的另一方面,还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述实施例中任一命名实体识别方法的步骤。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
尽管已经针对有限数量的实施例描述了本发明,但是受益于本公开的本领域技术人员将理解,可以设计其他实施例而不脱离本文所公开的本发明的范围。 因此,本发明的范围应仅由所附权利要求书限制。

Claims (18)

  1. 一种命名实体识别方法,所述方法包括:
    获取待进行命名实体识别的目标文本;
    确定所述目标文本的第一分类标签,其中,所述第一分类标签用于表征所述目标文本所对应的用户意图;
    利用所述目标文本和所述第一分类标签构建目标表征矩阵;
    基于所述目标表征矩阵确定所述目标文本中每个字的命名实体识别NER标签,得到所述目标文本的命名实体识别结果。
  2. 根据权利要求1所述的方法,其中,所述利用所述目标文本和所述第一分类标签构建目标表征矩阵的步骤,包括:
    生成关于所述目标文本和所述第一分类标签的融合数组,其中,所述融合数组中的各元素为:所述目标文本中的每个字的索引值,以及所述第一分类标签所表征的虚拟字的索引值;
    生成所述融合数组对应的矩阵作为目标表征矩阵,其中,所述目标表征矩阵中的各元素为:所述融合数组中的每个索引值对应的字向量。
  3. 根据权利要求2所述的方法,其中,所述生成关于所述目标文本和所述第一分类标签的融合数组的步骤,包括:
    生成关于所述目标文本的第一数组,其中,所述第一数组中的各元素为:所述目标文本中每个字的索引值;
    构建关于所述第一分类标签的第二数组,确定所述第二数组所表征的虚拟字的索引值作为目标索引值,其中,所述第二数组中的各元素为:预设的各个分类标签的取值,所述第一分类标签为各个所述分类标签中的一个标签,所述第一分类标签的取值为第一设定值,除所述第一分类标签以外的各个其他分类标签的取值为第二设定值;
    将所述目标索引值添加到所述第一数组中的第一指定位置处,得到融合数组,其中,所述第一指定位置处包括:所述第一数组中的第一个元素之前,或者,所述第一数组中的最后一个元素之后。
  4. 根据权利要求1所述的方法,其中,所述利用所述目标文本和所述第一分类标签构建目标表征矩阵的步骤,包括:
    生成关于所述目标文本的第一数组,其中,所述第一数组中的各元素为:所述目标文本中每个字的索引值;
    生成所述第一数组对应的矩阵作为所述目标文本的初始矩阵,其中,所述初始矩阵中的各元素为:所述第一数组中的每个索引值对应的字向量;
    构建关于所述第一分类标签的第二数组,其中,所述第二数组中的各元素为:预设的各个分类标签的取值,所述第一分类标签为所述各个分类标签中的一个标签,所述第一分类标签的取值为第一设定值,除所述第一分类标签以外的各个其他分类标签的取值为第二设定值;
    利用所述第二数组对所述初始矩阵进行扩展,生成目标表征矩阵。
  5. 根据权利要求4所述的方法,其中,所述利用所述第二数组对所述初始矩阵进行扩展,生成目标表征矩阵的步骤,包括:
    确定所述第二数组所表征的虚拟字的索引值对应的字向量;
    将所确定的字向量添加至所述初始矩阵中的第二指定位置处,得到目标表征矩阵,其中,所述第二指定位置处包括:所述初始矩阵中的第一个元素之前,或者,所述初始矩阵中的最后一个元素之后。
  6. 根据权利要求4所述的方法,其中,所述利用所述第二数组对所述初始矩阵进行扩展,生成目标表征矩阵的步骤,包括:
    将所述第二数组添加至所述初始矩阵中的第三指定位置处,得到目标表征矩阵,其中,所述第三指定位置包括:表示字向量的一维数组中的第一个元素之前,或者,表示字向量的一维数组中的最后一个元素之后。
  7. 根据权利要求3或4所述的方法,其中,所述基于所述目标表征矩阵确定所述目标文本中每个字的命名实体识别NER标签,得到所述目标文本的命名实体识别结果的步骤,包括:
    基于所述目标表征矩阵确定所述目标文本的字特征矩阵,其中,所述字特征矩阵中包括:所述目标文本中每个字在所述目标文本的正向顺序中的字向量,以及所述目标文本中每个字在所述目标文本的反向顺序中的字向量;
    基于所述字特征矩阵,确定所述目标文本的标签矩阵,其中,所述标签矩阵用于表征:所述目标文本中每个字具有各个NER标签的概率;
    基于所述标签矩阵,确定所述目标文本中每个字的NER标签索引;
    将所述目标文本中每个字的NER标签索引转换为所述目标文本中每个字的NER标签,得到所述目标文本的命名实体识别结果。
  8. 根据权利要求3所述的方法,其中,所述利用所述目标文本和所述第一分类标签构建目标表征矩阵的步骤,以及所述基于所述目标表征矩阵确定所述目标文本中每个字的NER标签的步骤是通过预先训练的命名实体识别模型实现的;
    所述命名实体识别模型包括:串联相接的输入层、融合层、字嵌入层、双向LSTM层、全连接层、CRF层和输出层,其中,所述输入层、所述融合层和所述字嵌入层用于实现所述利用所述目标文本和所述第一分类标签构建目标表征矩阵的步骤,所述双向LSTM层、所述全连接层、所述CRF层和所述输出层用于实现所述基于所述目标表征矩阵确定所述目标文本中每个字的NER标签的步骤;
    其中,所述输入层,用于生成关于所述目标文本的第一数组;
    所述融合层,用于构建关于所述第一分类标签的第二数组,确定所述第二数组所表征的虚拟字的索引值作为目标索引值,将所述目标索引值添加到所述第一数组中的第一指定位置处,得到融合数组;
    所述字嵌入层,用于生成所述融合数组对应的矩阵作为目标表征矩阵;
    所述双向LSTM层,用于基于所述目标表征矩阵确定所述目标文本的字特征矩阵;
    所述全连接层,用于基于所述字特征矩阵,确定所述目标文本的标签矩阵;
    所述CRF层,用于基于所述标签矩阵,确定所述目标文本中每个字的NER标签索引;
    所述输出层,用于将所述目标文本中每个字的NER标签索引转换为所述目标文本中每个字的NER标签,得到所述目标文本的命名实体识别结果。
  9. 根据权利要求4所述的方法,其中,所述利用所述目标文本和所述第一分类标签,构建目标表征矩阵的步骤,以及所述基于所述目标表征矩阵,确定所述目标文本中每个字的NER标签的步骤是通过预先训练的命名实体识别模型实现的;
    所述命名实体识别模型包括:串联相接的输入层、字嵌入层、融合层、双向LSTM层、全连接层、CRF层和输出层,其中,所述输入层、所字嵌入层和所述融合层用于实现所述利用所述目标文本和所述第一分类标签构建目标表征矩阵的步骤,所述双向LSTM层、所述全连接层、所述CRF层和所述输出层用于实现所述基于所述目标表征矩阵确定所述目标文本中每个字的NER标签的步骤;
    其中,所述输入层,用于生成关于所述目标文本的第一数组;
    所述字嵌入层,用于生成所述第一数组对应的矩阵作为所述目标文本的初始矩阵;
    所述融合层,用于构建关于所述第一分类标签的第二数组,利用所述第二数组对所述初始矩阵进行扩展,生成目标表征矩阵;
    所述双向LSTM层,用于基于所述目标表征矩阵确定所述目标文本的字特征矩阵;
    所述全连接层,用于基于所述字特征矩阵,确定所述目标文本的标签矩阵;
    所述CRF层,用于基于所述标签矩阵,确定所述目标文本中每个字的NER标签索引;
    所述输出层,用于将所述目标文本中每个字的NER标签索引转换为所述目标文本中每个字的NER标签,得到所述目标文本的命名实体识别结果。
  10. 根据权利要求8或9所述的方法,其中,所述命名实体识别模型的训练方式,包括:
    获取待利用的样本文本、所述样本文本的第二分类标签和所述样本文本中每个字的NER标签真值,其中,所述第二分类标签用于表征所述样本文本所对应的用户意图;
    将所述样本文本和所述样本文本的第二分类标签输入所述命名实体识别模型,以使所述命名实体识别模型利用所述样本文本和所述第二分类标签,构建样本表征矩阵,并利用所述样本表征矩阵预测所述样本文本中每个字的NER标签预测值;
    基于所述样本标签的每个字的NER标签真值和NER标签预测值,判断所述命名实体识别模型是否收敛,如果是,结束训练,得到训练完成的所述命名实体识别模型;否则,调整所述命名实体识别模型中的模型参数,返回所述获取待利用的样本文本、所述样本文本的第二分类标签和所述样本文本中每个字的NER标签真值的步骤。
  11. 根据权利要求3或4所述的方法,其中,
    所述生成关于所述目标文本的第一数组的步骤包括:
    当所获取的目标文本中所包括的字的数量超过预设的最大长度时,丢弃所获取的目标文本中超过所述预设的最大长度的字,利用所保留的各个字生成所 述第一数组。
  12. 根据权利要求3或4所述的方法,其中,
    所述第一设定值为1,并且所述第二设定值为0。
  13. 根据权利要求5所述的方法,其中,
    在所述第二指定位置处多次添加所确定的所述虚拟字的索引值对应的字向量。
  14. 根据权利要求8或9所述的方法,其中,
    所生成的第一数组中的每个索引值为整数数值。
  15. 根据权利要求9所述的方法,其中,
    所述融合层利用所述第二数组对所述初始矩阵进行扩展,生成目标表征矩阵的步骤包括:
    确定所述第二数组所表征的虚拟字的索引值对应的字向量;
    将所确定的字向量添加至所述初始矩阵中的第二指定位置处,得到目标表征矩阵,其中,所述第二指定位置处包括:所述初始矩阵中的第一个元素之前,或者,所述初始矩阵中的最后一个元素之后。
  16. 根据权利要求9所述的方法,其中,
    所述融合层利用所述第二数组对所述初始矩阵进行扩展,生成目标表征矩阵的步骤包括:
    将所述第二数组添加至所述初始矩阵中的第三指定位置处,得到目标表征矩阵,其中,所述第三指定位置包括:表示字向量的一维数组中的第一个元素之前,或者,表示字向量的一维数组中的最后一个元素之后。
  17. 根据权利要求1所述的方法,其中,
    所述确定所述目标文本的第一分类标签的步骤是通过预先训练的意图分类模型实现的。
  18. 根据权利要求8或9所述的方法,其中,
    所述标签矩阵用于表征所述目标文本中每个字具有各个NER标签的概率。
PCT/CN2021/106650 2020-08-26 2021-07-16 一种命名实体识别方法 WO2022042125A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010870524.1A CN111967264B (zh) 2020-08-26 2020-08-26 一种命名实体识别方法
CN202010870524.1 2020-08-26

Publications (1)

Publication Number Publication Date
WO2022042125A1 true WO2022042125A1 (zh) 2022-03-03

Family

ID=73390759

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/106650 WO2022042125A1 (zh) 2020-08-26 2021-07-16 一种命名实体识别方法

Country Status (2)

Country Link
CN (1) CN111967264B (zh)
WO (1) WO2022042125A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI817921B (zh) * 2023-05-31 2023-10-01 明合智聯股份有限公司 模型建模指令生成方法及其系統

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967264B (zh) * 2020-08-26 2021-09-24 湖北亿咖通科技有限公司 一种命名实体识别方法
CN112765984A (zh) * 2020-12-31 2021-05-07 平安资产管理有限责任公司 命名实体识别方法、装置、计算机设备和存储介质
CN113515946B (zh) * 2021-06-22 2024-01-05 亿咖通(湖北)技术有限公司 信息处理方法及装置
CN113571052A (zh) * 2021-07-22 2021-10-29 湖北亿咖通科技有限公司 一种噪声提取及指令识别方法和电子设备
CN114282538A (zh) * 2021-11-24 2022-04-05 重庆邮电大学 基于bie位置词列表的中文文本数据字向量表征方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287283A (zh) * 2019-05-22 2019-09-27 中国平安财产保险股份有限公司 意图模型训练方法、意图识别方法、装置、设备及介质
CN110502755A (zh) * 2019-08-27 2019-11-26 湖北亿咖通科技有限公司 基于融合模型的字符串识别方法及计算机存储介质
CN110516247A (zh) * 2019-08-27 2019-11-29 湖北亿咖通科技有限公司 基于神经网络的命名实体识别方法及计算机存储介质
CN110569332A (zh) * 2019-09-09 2019-12-13 腾讯科技(深圳)有限公司 一种语句特征的提取处理方法及装置
CN111177394A (zh) * 2020-01-03 2020-05-19 浙江大学 基于句法注意力神经网络的知识图谱关系数据分类方法
CN111967264A (zh) * 2020-08-26 2020-11-20 湖北亿咖通科技有限公司 一种命名实体识别方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080052262A1 (en) * 2006-08-22 2008-02-28 Serhiy Kosinov Method for personalized named entity recognition
CN106874256A (zh) * 2015-12-11 2017-06-20 北京国双科技有限公司 识别领域命名实体的方法及装置
CN109165384A (zh) * 2018-08-23 2019-01-08 成都四方伟业软件股份有限公司 一种命名实体识别方法及装置
CN109902307B (zh) * 2019-03-15 2023-06-02 北京金山数字娱乐科技有限公司 命名实体识别方法、命名实体识别模型的训练方法及装置
CN110807324A (zh) * 2019-10-09 2020-02-18 四川长虹电器股份有限公司 一种基于IDCNN-crf与知识图谱的影视实体识别方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287283A (zh) * 2019-05-22 2019-09-27 中国平安财产保险股份有限公司 意图模型训练方法、意图识别方法、装置、设备及介质
CN110502755A (zh) * 2019-08-27 2019-11-26 湖北亿咖通科技有限公司 基于融合模型的字符串识别方法及计算机存储介质
CN110516247A (zh) * 2019-08-27 2019-11-29 湖北亿咖通科技有限公司 基于神经网络的命名实体识别方法及计算机存储介质
CN110569332A (zh) * 2019-09-09 2019-12-13 腾讯科技(深圳)有限公司 一种语句特征的提取处理方法及装置
CN111177394A (zh) * 2020-01-03 2020-05-19 浙江大学 基于句法注意力神经网络的知识图谱关系数据分类方法
CN111967264A (zh) * 2020-08-26 2020-11-20 湖北亿咖通科技有限公司 一种命名实体识别方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI817921B (zh) * 2023-05-31 2023-10-01 明合智聯股份有限公司 模型建模指令生成方法及其系統

Also Published As

Publication number Publication date
CN111967264B (zh) 2021-09-24
CN111967264A (zh) 2020-11-20

Similar Documents

Publication Publication Date Title
WO2022042125A1 (zh) 一种命名实体识别方法
US11403680B2 (en) Method, apparatus for evaluating review, device and storage medium
US11948058B2 (en) Utilizing recurrent neural networks to recognize and extract open intent from text inputs
US11403345B2 (en) Method and system for processing unclear intent query in conversation system
JP6909832B2 (ja) オーディオにおける重要語句を認識するための方法、装置、機器及び媒体
JP5901001B1 (ja) 音響言語モデルトレーニングのための方法およびデバイス
WO2021051866A1 (zh) 判案结果确定方法、装置、设备及计算机可读存储介质
CN108416032B (zh) 一种文本分类方法、装置及存储介质
WO2021204017A1 (zh) 文本意图识别方法、装置以及相关设备
US20220261545A1 (en) Systems and methods for producing a semantic representation of a document
WO2021190662A1 (zh) 医学文献排序方法、装置、电子设备及存储介质
WO2021159812A1 (zh) 癌症分期信息处理方法、装置及存储介质
WO2023130951A1 (zh) 语音断句方法、装置、电子设备及存储介质
CN101689198A (zh) 使用规格化串的语音搜索
CN112417878A (zh) 实体关系抽取方法、系统、电子设备及存储介质
WO2022022049A1 (zh) 文本长难句的压缩方法、装置、计算机设备及存储介质
US20230096070A1 (en) Natural-language processing across multiple languages
CN114742062B (zh) 文本关键词提取处理方法及系统
CN112183114B (zh) 模型训练、语义完整性识别方法和装置
CN111831823B (zh) 一种语料生成、模型训练方法
CN111949765B (zh) 基于语义的相似文本搜索方法、系统、设备和存储介质
CN110276001B (zh) 盘点页识别方法、装置、计算设备和介质
CN114239601A (zh) 语句的处理方法、装置及电子设备
CN112528657A (zh) 基于双向lstm的文本意图识别方法及装置、服务器和介质
US20230252225A1 (en) Automatic Text Summarisation Post-processing for Removal of Erroneous Sentences

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21859965

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21859965

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29.06.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21859965

Country of ref document: EP

Kind code of ref document: A1