CN111444719B

CN111444719B - Entity identification method and device and computing equipment

Info

Publication number: CN111444719B
Application number: CN202010187932.7A
Authority: CN
Inventors: 石智中; 张志申; 吕政伟
Original assignee: Chezhi Interconnection Beijing Technology Co ltd
Current assignee: Chezhi Interconnection Beijing Technology Co ltd
Priority date: 2020-03-17
Filing date: 2020-03-17
Publication date: 2023-10-20
Anticipated expiration: 2040-03-17
Also published as: CN111444719A

Abstract

The invention discloses an entity identification method, which is executed in a computing device and comprises the following steps: generating a semantic feature vector of a user input sentence as a current semantic feature vector; performing entity recognition on the input sentence based on the current semantic feature vector to obtain one or more entity names recognized at the time, and the character position and the entity type of each entity name; for each entity name: generating entity position characteristics according to whether each character belongs to the entity name, and generating entity type characteristics according to the entity name and the upper entity type; splicing the entity position features, the entity type features and the semantic feature vectors of the input sentences into spliced vectors; updating the current semantic feature vector into a spliced vector, and restarting to execute the entity identification step until no entity exists in the entity identification result; and summarizing the entity names identified each time as a final entity identification result. The invention also discloses a corresponding entity identification device and a corresponding computing device.

Description

Entity identification method and device and computing equipment

Technical Field

The present invention relates to the field of natural language processing, and in particular, to a method, an apparatus, and a computing device for entity identification.

Background

The nested named entity recognition is taken as a main component of a named entity recognition task, is one of the most basic and core technologies in many scientific researches (such as a question-answering system, a knowledge graph, artificial intelligence and the like), and the related recognition method has wide application in actual life. The complexity of chinese results in many nested named entities often present within text. The existing named entity recognition method can better recognize basic named entities with relatively simple structures, but is difficult to completely and accurately recognize nested named entities with complex structures, and the existing method is concentrated on named entity recognition research in conventional texts.

The current nested named entity recognition is mainly divided into two kinds of nested entity recognition from fine granularity to coarse granularity and entity recognition from coarse granularity to fine granularity. The nested entity identification from fine granularity to coarse granularity often ignores the constraint of coarse granularity type on fine granularity entity type, and in the process of identifying fine granularity to coarse granularity entity, semantic feature learning is required to be continuously carried out on newly identified fine granularity entity, so that a large amount of training cost is required. In the process of performing nested entity identification from coarse granularity entity to fine granularity, most of the process is performed by adopting a strategy combining deep learning and a rule dictionary, a certain labor cost is required, and the process of performing entity identification by adopting a dynamic neural network mode is easy to greatly increase training and prediction time, so that a model is heavier.

Disclosure of Invention

In view of the foregoing, the present invention provides an entity identification method, apparatus, and computing device in an effort to solve, or at least solve, the above-identified problems.

According to one aspect of the present invention there is provided a method of entity identification adapted to be performed in a computing device, the method comprising the steps of: carrying out semantic coding on an input sentence of a user, and generating a semantic feature vector aiming at the input sentence as a current semantic feature vector; performing entity recognition on the input sentence based on the current semantic feature vector to obtain one or more entity names recognized at the time, and the character position and the entity type of each entity name; for each entity name: generating an entity position feature according to whether each character of the input sentence belongs to the entity name, and generating an entity type feature according to the self entity type and the upper entity type of the entity name; splicing the semantic feature vector of the input sentence with the entity position feature and the entity type feature to obtain a spliced vector; updating the current semantic feature vector into a spliced vector, and restarting to execute the entity identification step until no entity exists in the entity identification result; and summarizing the entity names obtained after entity identification each time, and taking the entity names as the final entity identification result of the input sentence.

Optionally, in the entity recognition method according to the present invention, the semantic feature vector of the input sentence is [ T ] ₁ ，T ₂ ，……，T _m ]M is the maximum character length of the input sentence, T _m A word vector that is the mth character; the physical location is characterized by L ₁ ，L ₂ ，……，L _m ]，L _m A character mark for judging whether the mth character belongs to the corresponding entity name; the entity type is characterized by [ C ₁ ，C ₂ ，……，C _n ]N is the total number of entity types, C _n Is a label of the nth entity type.

Optionally, in the entity identifying method according to the present invention, in the entity location feature, a character belonging to the entity name is marked with 1, whereas the character is marked with 0; in the entity type characteristics, the self entity type and the upper entity type of the entity name are marked as 1, and the other entity types are marked as 0;

optionally, in the entity recognition method according to the present invention, the step of stitching the semantic feature vector of the input sentence with the entity location feature and the entity type feature includes: and for each character of the input sentence, splicing the character vector, the character mark and the entity type characteristic of the input sentence to obtain a character vector after each character is spliced, and further obtaining a spliced vector of the input sentence.

Optionally, in the entity recognition method according to the present invention, the step of semantically encoding the input sentence of the user includes: an input sentence is input into the Bert model, and a semantic feature vector for the input sentence is generated.

Optionally, in the entity recognition method according to the present invention, the step of performing entity recognition on the input sentence based on the current semantic feature includes: and inputting the current semantic feature vector into the conditional random field model to obtain an entity identification result of the input sentence.

Optionally, in the entity recognition method according to the present invention, a trained entity recognition model is stored in the computing device, the model comprising: the Bert model layer is suitable for outputting semantic feature vectors of input sentences; and a p-layer conditional random field model which is suitable for outputting an entity identification result of the input sentence, wherein the p value is one added to the nested entity layer number of the input sentence.

Optionally, in the entity recognition method according to the present invention, m=128, and the character position of the entity name includes a start character position and an end character position of the entity name. .

According to another aspect of the present invention there is provided a nested entity identification apparatus adapted to reside in a computing device, the apparatus comprising: the semantic coding module is suitable for carrying out semantic coding on an input sentence of a user, and generating a semantic feature vector aiming at the input sentence as a current semantic feature vector; the entity recognition module is suitable for carrying out entity recognition on the input sentence based on the current semantic feature vector to obtain one or more entity names recognized at the time, and the character position and the entity type of each entity name; the entity re-identification module is adapted to, for each entity name: generating an entity position feature according to whether each character of the input sentence belongs to the entity name, and generating an entity type feature according to the self entity type and the upper entity type of the entity name; splicing the semantic feature vector of the input sentence with the entity position feature and the entity type feature to obtain a spliced vector; updating the current semantic feature vector into a spliced vector, and restarting to execute the entity identification step until no entity exists in the entity identification result; and the entity summarizing module is suitable for summarizing the entity names obtained after entity recognition each time and taking the entity names as the final entity recognition result of the input sentence.

According to yet another aspect of the present invention, there is provided a computing device comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, which when executed by the processors implement the steps of the entity identification method as described above.

According to yet another aspect of the present invention, there is provided a readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, implement the steps of the entity identification method as described above.

According to the technical scheme of the invention, a coarse-granularity to fine-granularity entity identification process is adopted, and in the process of carrying out fine-granularity dynamic entity identification, semantic features generated by input sentences are added into coarse-granularity entity types and position features, so that the accuracy of nested entity identification is improved, and the fact that the semantic features are not lost in the entity identification process is ensured. Furthermore, in the dynamic entity identification process, the BERT model is used as a semantic feature generation unit, and the CRF algorithm is used as a nested entity identification unit, so that the nested entity identification efficiency is improved.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

Drawings

To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which set forth the various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to fall within the scope of the claimed subject matter. The above, as well as additional objects, features, and advantages of the present disclosure will become more apparent from the following detailed description when read in conjunction with the accompanying drawings. Like reference numerals generally refer to like parts or elements throughout the present disclosure.

FIG. 1 illustrates a block diagram of a computing device 100, according to one embodiment of the invention;

FIG. 2 illustrates a flow chart of an entity identification method 200 according to one embodiment of the invention;

FIG. 3 shows a schematic diagram of an entity identification method according to another embodiment of the present invention;

FIG. 4A illustrates an example of input and output of a Bert model according to one embodiment of the invention;

FIG. 4B illustrates an example of entity recognition of semantic features of an input sentence based on a CRF model according to one embodiment of the present invention;

FIG. 4C illustrates an example of feature stitching according to one embodiment of the present invention;

FIG. 4D illustrates an example of entity identification of splice features based on a CRF model in accordance with another embodiment of the present invention;

FIGS. 4E and 4F illustrate examples of multi-level nested entity recognition of an input sentence according to one embodiment of the present invention; and

fig. 5 shows a block diagram of an entity recognition apparatus 500 according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

FIG. 1 is a block diagram of a computing device 100 according to one embodiment of the invention. In a basic configuration 102, computing device 100 typically includes a system memory 106 and one or more processors 104. The memory bus 108 may be used for communication between the processor 104 and the system memory 106.

Depending on the desired configuration, the processor 104 may be any type of processing including, but not limited to: a microprocessor (μp), a microcontroller (μc), a digital information processor (DSP), or any combination thereof. The processor 104 may include one or more levels of caches, such as a first level cache 110 and a second level cache 112, a processor core 114, and registers 116. The example processor core 114 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 118 may be used with the processor 104, or in some implementations, the memory controller 118 may be an internal part of the processor 104.

Depending on the desired configuration, system memory 106 may be any type of memory including, but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. The system memory 106 may include an operating system 120, one or more applications 122, and program data 124. In some implementations, the application 122 may be arranged to operate on an operating system with program data 124. The program data 124 includes instructions and in the computing device 100 according to the present invention, the program data 124 includes instructions for performing the entity identification method 200.

Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to basic configuration 102 via bus/interface controller 130. The example output device 142 includes a graphics processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices such as a display or speakers via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communication with one or more other computing devices 162 via one or more communication ports 164 over a network communication link.

The network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media in a modulated data signal, such as a carrier wave or other transport mechanism. A "modulated data signal" may be a signal that has one or more of its data set or changed in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or special purpose network, and wireless media such as acoustic, radio Frequency (RF), microwave, infrared (IR) or other wireless media. The term computer readable media as used herein may include both storage media and communication media.

Computing device 100 may be implemented as a server, such as a file server, a database server, an application server, a WEB server, etc., as part of a small-sized portable (or mobile) electronic device, such as a cellular telephone, a Personal Digital Assistant (PDA), a personal media player device, a wireless WEB-watch device, a personal headset device, an application-specific device, or a hybrid device that may include any of the above functions. Computing device 100 may also be implemented as a personal computer including desktop and notebook computer configurations. In some embodiments, the computing device 100 is configured to perform the entity identification method 200.

Fig. 2 shows a flow diagram of an entity identification method 200 according to one embodiment of the invention. The method 200 is performed in a computing device, such as the computing device 100, to identify multiple levels of nested entities of an input sentence. The entity identification method 200 can be understood in conjunction with the simplified flow chart of fig. 3.

As shown in fig. 2, the method starts at step S210. In step S210, the input sentence of the user is semantically encoded, and a semantic feature vector for the input sentence is generated as the current semantic feature vector.

In general, an input sentence may be input into the Bert model, and a semantic feature vector for the input sentence is generated. The semantic feature vector of the input sentence comprises a word vector for each character, which may be expressed in particular as [ T ] ₁ ，T ₂ ，……，T _m ]M is the maximum character length of the input sentence, T _m Word direction for the mth characterAmount of the components. Preferably, m=128, that is, the default input/output length of the model is 128 bits, and if the sentence character length is insufficient, the insufficient number of bits can be represented by a zero vector.

Fig. 4A shows an input-output example of the Bert model according to an embodiment of the present invention, the input sentence of which is "bma 320Li luxury is a popular car. And generating corresponding sentence characters of Tok 1-Tokm, and outputting semantic features T1-Tm after passing through a semantic conversion layer.

Subsequently, in step S220, entity recognition is performed on the input sentence based on the current semantic feature vector, so as to obtain one or more entity names recognized this time, and a character position and a self entity type of each entity name.

Wherein the character position of each entity name includes a start character position, an intermediate character position, and an end character position of the entity name. The invention can set a plurality of entity types, and can set different entity type numbers according to different industries and specific service lines, and the invention is not limited to this. For example, the automotive field may set 56 entity types, such as "train", "model", "layout", "brand", "city", "release time", "dealer", "manufacturer", "4S store", and so on.

According to one embodiment, the current semantic feature vector may be input into a conditional random field model (CRF) to obtain an entity recognition result of the input sentence. FIG. 4B illustrates an example of entity recognition of semantic features of an input sentence based on a CRF model, which would be the sentence semantic feature vector [ T ] in FIG. 4A, according to one embodiment of the present invention ₁ T ₂ ... T _m ]Performing coarse-grained entity identification of the first layer as input of the CRF algorithm, wherein the actual sequence result is as follows: [ B_s_spec B_m_spec … O]Wherein "b_s_spec" represents a start character position of the entity "vehicle type", b_m_spec "represents an intermediate character position of the entity" vehicle type ", and the corresponding" b_e_spec "represents an end character position of the entity" vehicle type ", and" O "represents that the character of the current position is not within the entity range. The parsing results from the entity sequence: real worldThe body "BMW 320Li luxury", the start character position is 0, the end character position is 9, and the entity type is "vehicle type". Wherein, the character position can be represented by a character subscript, and the character position can also be marked from 1, which is not limited by the invention.

Subsequently, in step S230, for each entity name, an entity location feature is generated according to whether each character of the input sentence belongs to the entity name, and an entity type feature is generated according to the own entity type and the upper layer entity type of the entity name.

Wherein the entity location feature includes a character tag of whether each character belongs to a corresponding entity name, which may be represented as L ₁ ，L ₂ ，……，L _m ]，L _m Is a character flag indicating whether the mth character belongs to the corresponding entity name. Typically, the character belonging to the entity name is marked 1, whereas the character is marked 0. If the character length of the sentence is less than m, the character flags of the insufficient number of bits are all 0.

For example, the "BMW 320Li luxury version" is a popular vehicle for the input sentence. The first character position of the entity 'BMW 320Li luxury' identified for the first time is 0, the ending index is 9, so the positions from 0 to 9 are all '1', the other positions are all '0', so the character mark at the ellipses is 0, and the actual position is characterized by [1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 … … ].

It should be appreciated that each entity name has a corresponding entity location feature, e.g., the entity name "BMW 320Li deluxe" where the 0-9 characters are labeled 1 and the other characters are labeled 0. For the entity name "BMW 320", the initial character position is 0 and the final character position is 5, so that in the corresponding entity position feature, the characters of 0-5 bits are marked as 1, and the other characters are marked as 0.

The entity type feature includes labels of entity name self entity type and upper layer entity type, which can be expressed as [ C ] ₁ ，C ₂ ，……，C _n ]N is the total number of entity types (e.g., n=56), C _n Is a label of the nth entity type. Wherein the method comprises the steps ofThe own entity type and upper layer entity type of the entity name are marked as 1, and the other entity types are marked as 0. That is, the number of bits of the own entity type and the upper layer entity type in the total entity type is marked with 1, and the other positions are marked with 0.

The upper layer entity type refers to the entity type of the upper layer entity name to which the entity name belongs. For example, the entity type of "BMW 320Li luxury" is "vehicle model", the entity type of the lower entity name "BMW 3" identified by the entity name is "train", and the entity type of the lower entity name "BMW" is "brand". Therefore, in practical application, in the entity type feature generated by the entity name of "BMW 3", the label of the two entity types of "vehicle type" and "train" is 1, and the label of the other entity type is 0. Wherein, the "train" is the own entity type of the "BMW 3", and the "vehicle type" is the upper entity type thereof. In the entity type feature generated by the entity name of 'BMW', the labels of three entity types of 'brand', 'vehicle model' and 'train' are 1, and the labels of other entity types are 0. Where "brand" is its own entity type, and "vehicle model" and "train" are its upper entity type.

In one implementation, the upper-layer entity name to which the entity name belongs may refer solely to the upper-layer entity name to which the entity name directly belongs, but does not include the entity names of the parallel relationship of the upper-layer entity names. For example, the upper layer entity name of "BMW 3" may include "BMW 320Li" and "BMW 320Li luxury" and will not include "luxury". In another implementation, the upper-layer entity name to which the entity name belongs may include an upper-layer entity name belonging to the lineage with the entity name, or may include a parallel entity name of the upper-layer entity name. The present invention does not limit the actual scope of the upper layer entity name.

After step S230, the entity location feature and the entity type feature corresponding to each entity name are obtained.

Subsequently, in step S240, the semantic feature vector of the input sentence is spliced with the entity location feature and the entity type feature to obtain a spliced vector.

Specifically, for each character of the input sentence, the character vector, the character mark and the entity type feature of the input sentence are spliced to obtain a character vector after each character is spliced, and then a spliced vector of the input sentence is obtained. FIG. 4C illustrates an example of feature stitching according to one embodiment of the present invention. For the first character, its word vector in the input sentence is T ₁ The character in the physical location feature is marked L ₁ The entity type is characterized by a multidimensional array TY, and the word vector of the first character is obtained after the three are spliced. The word vectors after each character splice together form a splice vector of the input sentence.

It should be appreciated that the present invention performs vector concatenation on the entity names of each level to further determine whether the entity name of the new level can be identified from the concatenated vector. And at each splice the word vector T for each character i _i Word vectors in the original semantic feature vectors of the input sentences are adopted, and the character marks L _i Different labels will be displayed depending on the entity name used. For example, when the entity name is "BMW 320Li luxury", the character labels at bits 0-9 are all 1; and when the entity name is "BMW 3", only the 0-3 bits of the character are marked 1.

Similarly, entity type features may display different labels depending on the name of the entity used. For example, when the entity name is "BMW 320Li luxury", only the entity type of "vehicle model" is labeled 1; when the entity name is "BMW 3", three entity types, namely "vehicle type", "train" and "brand", are all marked as 1.

Subsequently, in step S250, the current semantic feature vector is updated to a concatenation vector, and step S220 is restarted until no entity exists in the entity recognition result.

Here, after updating the current semantic feature vector, the current semantic feature vector is input into a conditional random field model (CRF model), and the next-layer entity recognition is performed again, so as to obtain the entity recognition result of the input sentence.

For example, after the entity "BMW 320Li deluxe" is identified in the first layer, the semantic feature vector, the entity location feature, and the entity type feature of the entity are concatenated. The splice vector is input into the CRF model, and the actual sequence results are shown in fig. 4D: [ B_s_specB_m_specB_m_spec … O ]. Two entities can be obtained by parsing according to the entity sequence: entity "BMW 320Li", start character position 0, end character position 6, entity type "vehicle type"; entity "luxury version", start character position 7, end character position 9, entity type "version".

And then judging the entity identification result, if the entity exists in the layer, repeating the processes of the steps S220-S240 until the entity identification result of the last layer can not identify the entity. Two entities exist in the second layer of entity identification, namely an entity 'BMW 320 Li' and an entity 'luxury', so that the process of S220-S240 is repeated for the two entities respectively, and dynamic nested entity identification is performed.

As shown in fig. 4E, for the entity "bma 320Li" existing in the second layer, the start character position 0, the end character position 6, and the entity type "vehicle type", the splice feature of "bma 320Li" is used as input to perform the third layer nested entity recognition. In the entity identification result, an entity 'BMW 3' exists, a start character position 0, an end character position 2 and an entity type 'train', so that the splicing characteristic of the 'BMW 3' is continuously used as input to carry out fourth-layer nested entity identification. In the entity identification result, an entity 'BMW' exists, a start character position 0, an end character position 1 and an entity type 'brand', so that the splicing characteristic of the 'BMW' is used as input to perform fifth-layer nested entity identification, and at the moment, no entity exists in the entity identification result, so that the nested entity identification of the entity is stopped.

For the second layer of existence entity 'luxury', the initial character position 7, the end character position 9 and the entity type 'edition', the third layer of nested entity identification is carried out by taking the splicing characteristic of the 'luxury' as input, and no entity exists in the entity identification result, so that the nested entity identification of the entity is stopped.

Finally, in step S260, the entity names obtained after each entity recognition are summarized as the final entity recognition result of the input sentence.

FIG. 4F illustrates the final result of multi-level nested entity recognition of the input sentence of FIG. 4A. The first layer entity identification result is: "BMW 320Li luxury" is the model entity (SPE), the start character position 0, the end character position index 9. Second layer entity recognition results: "BMW 320Li" is a model entity (SPE), a start character position 0, an end character position 6; "luxury version" is the layout entity (STE), the start character position 7, the end character position 9. Third layer entity recognition result: "BMW 3" is the vehicle entity (SER) start character position 0 and end character position 2. Fourth layer entity recognition results: "BMW" is brand entity (BRA), start character position 0, end character position 1.

According to one embodiment of the invention, a trained entity recognition model may also be stored in the computing device, the model comprising: the Bert model layer is suitable for outputting semantic feature vectors of input sentences; and a p-layer conditional random field model which is suitable for outputting an entity identification result of the input sentence, wherein the p value is one added to the nested entity layer number of the input sentence. Regarding the structure and parameters of the solid model, those skilled in the art can set the structure and parameters according to the needs, and the present invention is not limited thereto. The following are some example parameters of the model:

Sentence maximum length max_seq_length=128

Training batch size train_batch_size=32

Evaluating batch size eval_batch_size=8

Predicted batch size prediction_batch_size=8

Learning rate learning_rate=5e-5

Training data cycle number num_train_epochs=3.0

Slow heat learning ratio wakeup_report=0.1

The frequency save_checkpoints_steps=1000 of the save model

Evaluation frequency candidates_per_loop=1000

Packet loss rate droupout_rate=0.5

In summary, the invention uses the Bert model to encode the input sentence, and generates the semantic feature vector of the sentence. The semantic feature vector is input into a CRF algorithm to obtain an entity identification result of the first layer (namely, an entity identification result with coarse granularity). And then dynamically carrying out the entity recognition of the second layer according to the entity result and the semantic feature vector of the first layer, namely carrying out the entity recognition of the second layer if the entity recognition result exists in the first layer, carrying out the entity recognition of the third layer if the entity exists in the second layer, and dynamically repeating the entity recognition process until the entity recognition result does not exist in the last layer.

In this process, the present invention employs a manner of entity identification from coarse-grained entities to fine-grained entities. The first layer is the coarsest granularity entity recognition of the current sentence, and then relatively fine granularity entity recognition is performed under the coarse granularity entity recognition result until the occurrence of the finer granularity entity recognition result (no more finer granularity entity recognition result is recognized).

In addition, when the entity identification with relatively fine granularity is carried out under the entity with coarse granularity, the invention adds the result characteristic of the entity identification with coarse granularity (namely the entity identification of the upper layer) to restrict the entity identification result with fine granularity so as to improve the accuracy of the entity identification result with fine granularity. The start-stop character position of the coarse granularity entity recognition can effectively limit the start-stop character position of the fine granularity entity recognition result (the start-stop character position of the fine granularity entity is necessarily within the range of the start-stop character position of the coarse granularity). The current coarse granularity entity identification type can restrict the fine granularity entity type, and the accuracy of the fine granularity entity identification result can be improved. For example, in a "vehicle type" entity, there is no "city" entity among the fine granularity entities, and there may be entity types such as "brand", "manufacturer", "vehicle system", "model", "year". Therefore, when the nested fine granularity entity is identified under the entity with the coarse granularity of the 'vehicle type', the type characteristic of the entity with the coarse granularity can be added, and the type of the entity with the fine granularity can be effectively restrained, so that the accuracy of the entity identification result is increased.

Meanwhile, in order to ensure that the semantic features of the current statement can be combined in the process of dynamic nested entity recognition, the semantic feature vector of the Bert generated by the statement is added when nested entity recognition of each layer is carried out. In the process of nested entity recognition, the semantic feature vector, the character position feature and the entity type feature of the entity recognition result of the upper layer are adopted as the input of the nested entity recognition of the fine granularity of the next layer in each layer except that only the semantic feature vector is used when the coarse granularity entity recognition of the first layer is carried out. In addition, in the dynamic recognition process of nested entity recognition, the CRF algorithm is adopted to dynamically recognize the nested entity, so that model training and prediction cost can be greatly saved compared with the deep learning algorithm. According to the invention, the semantic feature vector generated by Bert pre-training is combined each time the CRF algorithm is adopted for dynamic nested entity recognition, so that the problem of poor entity recognition result effect caused by semantic deletion does not occur.

Fig. 5 illustrates a block diagram of an entity identification apparatus 500 according to one embodiment of the invention, which apparatus 500 may reside in a computing device, such as computing device 100. As shown in fig. 5, the apparatus 500 includes: a semantic encoding module 510, an entity identification module 520, an entity re-identification module 530, and an entity summarization module 540.

The semantic coding module 510 performs semantic coding on an input sentence of a user, and generates a semantic feature vector for the input sentence as a current semantic feature vector. Specifically, the semantic encoding module 510 inputs an input sentence into the Bert model, generating a semantic feature vector for the input sentence. The semantic coding module 510 may perform a process corresponding to the process described above in step S210, and a detailed description will not be repeated here.

The entity recognition module 520 performs entity recognition on the input sentence based on the current semantic feature vector, so as to obtain one or more entity names recognized at this time, and a character position and a self entity type of each entity name. Specifically, the entity recognition module 520 inputs the current semantic feature vector into the conditional random field model to obtain the entity recognition result of the input sentence. The entity recognition module 520 may perform the process corresponding to the process described above in step S220, and a detailed description thereof will not be repeated here.

For each entity name, the entity re-recognition module 530 generates an entity location feature according to whether each character of the input sentence belongs to the entity name, and generates an entity type feature according to the self entity type and the upper entity type of the entity name. Then, the entity re-recognition module 530 splices the semantic feature vector of the input sentence with the entity location feature and the entity type feature to obtain a spliced vector. Finally, the entity re-recognition module 530 updates the current semantic feature vector to a stitching vector and re-triggers the entity recognition module 520 to begin performing the entity recognition step until no entity exists in the entity recognition result. The entity re-identification module 530 may perform the processes corresponding to the processes described above in steps S230, S240, S250, and will not be described in detail herein.

The entity summarization module 540 summarizes the entity names obtained after each entity recognition as the final entity recognition result of the input sentence. The entity summarizing module 540 may perform a process corresponding to the process described above in step S260, and a detailed description thereof will not be repeated here.

According to the technical scheme, the problem of low accuracy rate in the process of identifying nested entities in the vertical field when fine granularity entity identification is carried out only through semantic features of sentences is solved, and the problem that training cost is continuously superposed due to repeated use of a deep learning model unit for semantic feature extraction in the process of dynamically identifying the nested entities is solved.

A8, the method of any of A1-A7, wherein the character position of the entity name comprises a start character position and an end character position of the entity name, m = 128.

The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions of the methods and apparatus of the present invention, may take the form of program code (i.e., instructions) embodied in tangible media, such as removable hard drives, U-drives, floppy diskettes, CD-ROMs, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.

In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to perform the method of the invention in accordance with instructions in said program code stored in the memory.

By way of example, and not limitation, readable media comprise readable storage media and communication media. The readable storage medium stores information such as computer readable instructions, data structures, program modules, or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of readable media.

In the description provided herein, algorithms and displays are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with examples of the invention. The required structure for a construction of such a system is apparent from the description above. In addition, the present invention is not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment, or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into a plurality of sub-modules.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

Furthermore, some of the embodiments are described herein as methods or combinations of method elements that may be implemented by a processor of a computer system or by other means of performing the functions. Thus, a processor with the necessary instructions for implementing the described method or method element forms a means for implementing the method or method element. Furthermore, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is for carrying out the functions performed by the elements for carrying out the objects of the invention.

As used herein, unless otherwise specified the use of the ordinal terms "first," "second," "third," etc., to describe a general object merely denote different instances of like objects, and are not intended to imply that the objects so described must have a given order, either temporally, spatially, in ranking, or in any other manner.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of the above description, will appreciate that other embodiments are contemplated within the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is defined by the appended claims.

Claims

1. A method of entity identification adapted to be performed in a computing device, the method comprising the steps of:

carrying out semantic coding on an input sentence of a user, and generating a semantic feature vector aiming at the input sentence as a current semantic feature vector;

performing entity recognition on the input sentence based on the current semantic feature vector to obtain one or more entity names recognized at the time, and the character position and the entity type of each entity name;

for each entity name:

generating an entity position feature according to whether each character of the input sentence belongs to the entity name, and generating an entity type feature according to the self entity type and the upper entity type of the entity name;

for each character of the input sentence, splicing the character vector, the character mark and the entity type characteristic of the character to obtain a character vector after each character is spliced, and further obtaining a spliced vector of the input sentence;

updating the current semantic feature vector to the spliced vector, and restarting the entity identification step until no entity exists in the entity identification result; summarizing the entity names obtained after entity identification each time, and taking the entity names as the final entity identification result of the input sentence;

Wherein the semantic feature vector of the input sentence is [ T ] ₁ ，T ₂ ，……，T _m ]M is the maximum character length of the input sentence, T _m A word vector being the mth character, the entity location feature being [ L ] ₁ ，L ₂ ，……，L _m ]，L _m For the mth character belonging to the character mark of the corresponding entity name, the entity type is characterized by [ C ₁ ，C ₂ ，……，C _n ]N is the total number of entity types, C _n A tag that is an nth entity type;

in the entity position feature, the character belonging to the entity name is marked as 1, otherwise, the character belonging to the entity name is marked as 0, in the entity type feature, the self entity type and the upper entity type of the entity name are marked as 1, the other entity types are marked as 0, and the upper entity type refers to the entity type of the upper entity name to which the entity name belongs.

2. The method of claim 1, wherein the step of semantically encoding the user's input sentence comprises:

inputting the input sentence into the Bert model, and generating a semantic feature vector for the input sentence.

3. The method of claim 1 or 2, wherein the step of entity recognition of the input sentence based on the current semantic feature comprises:

and inputting the current semantic feature vector into a conditional random field model to obtain an entity identification result of the input sentence.

4. The method of claim 1 or 2, wherein the computing device has stored therein a trained entity recognition model comprising:

the Bert model layer is suitable for outputting semantic feature vectors of input sentences; and

and the p-layer conditional random field model is suitable for outputting an entity identification result of the input sentence, wherein the p value is one plus the nested entity layer number of the input sentence.

5. The method of claim 1 or 2, wherein m = 128, the character positions of the entity name include a start character position and an end character position of the entity name.

6. An entity identification apparatus adapted to reside in a computing device, the apparatus comprising:

the semantic coding module is suitable for carrying out semantic coding on an input sentence of a user, and generating a semantic feature vector aiming at the input sentence as a current semantic feature vector;

the entity recognition module is suitable for carrying out entity recognition on the input sentence based on the current semantic feature vector to obtain one or more entity names recognized at the time, and the character position and the entity type of each entity name;

the entity re-identification module is adapted to, for each entity name:

updating the current semantic feature vector to the spliced vector, and restarting the entity identification step until no entity exists in the entity identification result; the entity summarizing module is suitable for summarizing entity names obtained after entity recognition each time and used as a final entity recognition result of the input sentence;

7. A computing device, comprising:

a memory;

one or more processors;

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-5.

8. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-5.