CN114281937A

CN114281937A - Training method of nested entity recognition model, and nested entity recognition method and device

Info

Publication number: CN114281937A
Application number: CN202111173085.XA
Authority: CN
Inventors: 谢润泉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-10-08
Filing date: 2021-10-08
Publication date: 2022-04-05

Abstract

The application discloses a training method of a nested entity recognition model, a nested entity recognition method and a nested entity recognition device, and belongs to the technical field of artificial intelligence. The method comprises the following steps: acquiring label information of a plurality of first nested entities; for any first nested entity, determining first prediction information of any first nested entity according to a first network model; training the first network model based on the label information and the first prediction information of the plurality of first nested entities to obtain a second network model; in response to the first condition being satisfied, the second network model is treated as a nested entity recognition model, the nested entity recognition model for recognizing an entity type of the nested entity. The nested entity recognition model can accurately recognize characters which can be used as the starting characters of the sub-entities and characters which can be used as the ending characters of the sub-entities from all characters of the nested entities so as to combine all the sub-entities of the nested entities, thereby accurately recognizing the entity types of all the sub-entities of the nested entities.

Description

Training method of nested entity recognition model, and nested entity recognition method and device

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a training method of a nested entity recognition model, a nested entity recognition method and a nested entity recognition device.

Background

Natural language analysis is an important technology, and covers the technical fields of information retrieval, information extraction, natural language question answering and the like, wherein an entity identification technology is an important component in natural language analysis, and an entity identification technology is a technology for identifying entity types of entities.

In the related art, the entities may be nested entities. At least one entity is nested inside the nested entity, so that the nested entity corresponds to at least one internal entity and one external entity, that is, the nested entity corresponds to at least two sub-entities. For example, for the nested entity "chronic tonsillitis", its internal entities are "tonsils" and "tonsillitis", and the external entities are "chronic tonsillitis", so that the nested entity "chronic tonsillitis" corresponds to the three fruiting bodies "tonsils", "tonsillitis" and "chronic tonsillitis". Since the nested entity corresponds to at least two sub-entities, it is difficult to identify the entity type of the nested entity, and therefore, a nested entity identification model is urgently needed to accurately identify the entity type of the nested entity.

Disclosure of Invention

The embodiment of the application provides a training method of a nested entity recognition model, a nested entity recognition method and a device, which can be used for accurately recognizing the entity type of a nested entity.

In one aspect, an embodiment of the present application provides a training method for a nested entity recognition model, where the method includes:

acquiring label information of a plurality of first nested entities, wherein the label information of the first nested entities comprises a first label of each character in the first nested entity, a second label of each character in the first nested entity and a third label of each sub-entity of the first nested entity, the first label of the character represents whether the character is a start character of the sub-entity, the second label of the character represents whether the character is an end character of the sub-entity, and the third label of the sub-entity represents an entity type of the sub-entity;

for any first nested entity, determining first prediction information of the any first nested entity according to a first network model, wherein the first prediction information of the any first nested entity comprises a first probability that each character in the any first nested entity is a beginning character of a sub-entity, a first probability that each character in the any first nested entity is an ending character of the sub-entity, and a first entity type probability of each sub-entity of the any first nested entity;

training the first network model based on the label information and the first prediction information of the plurality of first nested entities to obtain a second network model;

in response to a first condition being met, treating the second network model as a nested entity identification model for identifying an entity type of a nested entity.

In another aspect, an embodiment of the present application provides a method for identifying a nested entity, where the method includes:

acquiring a target nested entity;

determining the probability that each character in the target nested entity is the starting character of a sub-entity, the probability that each character in the target nested entity is the ending character of the sub-entity and the entity type probability of each sub-entity of the target nested entity according to a nested entity recognition model, wherein the nested entity recognition model is obtained by training according to any one of the above training methods of the nested entity recognition models;

determining an entity type of each sub-entity of the target nested entity based on the entity type probability of each sub-entity of the target nested entity.

In another aspect, an embodiment of the present application provides a training apparatus for a nested entity recognition model, where the apparatus includes:

an obtaining module, configured to obtain tag information of a plurality of first nested entities, where the tag information of a first nested entity includes a first tag of each character in the first nested entity, a second tag of each character in the first nested entity, and a third tag of each sub-entity of the first nested entity, where the first tag of the character represents whether the character is a start character of the sub-entity, the second tag of the character represents whether the character is an end character of the sub-entity, and the third tag of the sub-entity represents an entity type of the sub-entity;

a determining module, configured to determine, for any first nested entity, first prediction information of the any first nested entity according to a first network model, where the first prediction information of the any first nested entity includes a first probability that each character in the any first nested entity is a start character of a child entity, a first probability that each character in the any first nested entity is an end character of a child entity, and a first entity type probability of each child entity of the any first nested entity;

the training module is used for training the first network model based on the label information and the first prediction information of the plurality of first nested entities to obtain a second network model;

the determining module is further configured to treat the second network model as a nested entity identification model in response to a first condition being met, the nested entity identification model being configured to identify an entity type of a nested entity.

In a possible implementation manner, the determining module is configured to determine, according to a first network model, a first probability that each character in any one of the first nested entities is a start character of a child entity and a first probability that each character in any one of the first nested entities is an end character of the child entity; determining, according to the first network model, respective sub-entities of the any first nested entity based on a first probability that respective characters in the any first nested entity are a beginning character of a sub-entity and a first probability that respective characters in the any first nested entity are an ending character of a sub-entity; determining a first entity type probability for each sub-entity of said any first nested entity based on said first network model.

In a possible implementation manner, the determining module is configured to determine, for any sub-entity of the any first nested entity, a sub-entity characteristic of the any sub-entity based on a character characteristic of a start character of the any sub-entity and a character characteristic of an end character of the any sub-entity; determining a first entity type probability for said any sub-entity based on sub-entity characteristics of said any sub-entity.

In a possible implementation manner, the determining module is configured to determine a character relation feature based on a character feature of a start character of the any sub-entity and a character feature of an end character of the any sub-entity, where the character relation feature is used to characterize a character relation between the start character of the any sub-entity and the end character of the any sub-entity; and determining the child entity characteristics of any child entity based on the character characteristics of the starting character of any child entity, the character characteristics of the ending character of any child entity and the character relation characteristics.

In one possible implementation, the character relation feature includes a character difference feature for characterizing a character difference between a start character of the any sub-entity and an end character of the any sub-entity;

the determining module is used for determining the difference value between the character features of the starting character of any sub-entity and the character features of the ending character of any sub-entity to obtain the character difference features.

In a possible implementation manner, the character relation feature includes a character similarity feature, and the character similarity feature is used for characterizing the character similarity between a starting character of any sub-entity and an ending character of any sub-entity;

the determining module is used for determining dot products between the character features of the starting character of any sub-entity and the character features of the ending character of any sub-entity to obtain character similar features.

In a possible implementation manner, the training module is configured to determine, for any first nested entity, a loss value of the any first nested entity based on the label information and the first prediction information of the any first nested entity; and training the first network model based on the loss values of the plurality of first nested entities to obtain a second network model.

In a possible implementation manner, the training module is configured to determine a first loss value of any one of the first nested entities based on the first label of each character in the any one of the first nested entities and the first probability that each character in the any one of the first nested entities is a start character of a sub-entity; determining a second loss value of any first nested entity based on the second label of each character in any first nested entity and the first probability that each character in any first nested entity is a terminal character of a sub-entity; determining a third loss value for any of the first nested entities based on the third label of each sub-entity of the any first nested entity and the first entity type probability of each sub-entity of the any first nested entity; determining a penalty value for the any first nested entity based on the first penalty value, the second penalty value, and the third penalty value for the any first nested entity.

In a possible implementation manner, the determining module is further configured to determine, for any first nested entity, in response to the first condition not being satisfied, second prediction information of the any first nested entity according to the second network model, where the second prediction information of the any first nested entity includes a second probability that each character in the any first nested entity is a start character of a sub-entity, a second probability that each character in the any first nested entity is an end character of the sub-entity, and a second entity type probability of each sub-entity of the any first nested entity;

the training module is further configured to train the second network model based on the label information and the second prediction information of the plurality of first nested entities to obtain a third network model;

the determining module is further configured to treat the third network model as the nested entity identification model in response to the first condition being satisfied.

In a possible implementation manner, the obtaining module is further configured to obtain tag information of a plurality of second nested entities, where the tag information of the second nested entities includes a first tag of each character in the second nested entity, a second tag of each character in the second nested entity, and a third tag of each sub-entity of the second nested entity;

the determining module is further configured to determine, for any second nested entity, first prediction information of the any second nested entity according to the first network model, where the first prediction information of the any second nested entity includes a first probability that each character in the any second nested entity is a beginning character of a child entity, a first probability that each character in the any second nested entity is an ending character of a child entity, and a first entity type probability of each child entity of the any second nested entity;

the training module is further configured to train the first network model based on the label information and the first prediction information of the plurality of first nested entities and the label information and the first prediction information of the plurality of second nested entities to obtain a second network model.

In a possible implementation manner, the determining module is further configured to determine, for any first nested entity, third prediction information of the any first nested entity according to a fourth network model, where the third prediction information of the any first nested entity includes a third probability that each character in the any first nested entity is a start character of a child entity, a third probability that each character in the any first nested entity is an end character of the child entity, and a third entity type probability of each child entity of the any first nested entity;

the training module is further configured to train the fourth network model based on the label information and the third prediction information of the plurality of first nested entities to obtain a fifth network model;

the determining module is further configured to take the fifth network model as a teacher model in response to a second condition being met;

the obtaining module is used for obtaining label information of a plurality of second nested entities based on the teacher model.

In one possible implementation, the teacher model includes a transformer-based bi-directional encoder representation network model, and the nested entity recognition model includes a long-short term memory network model.

In another aspect, an embodiment of the present application provides a nested entity identifying apparatus, where the apparatus includes:

the acquisition module is used for acquiring a target nested entity;

a determining module, configured to determine, according to a nested entity recognition model, a probability that each character in the target nested entity is a start character of a sub-entity, a probability that each character in the target nested entity is an end character of the sub-entity, and an entity type probability of each sub-entity of the target nested entity, where the nested entity recognition model is obtained by training according to any one of the above training methods for the nested entity recognition model;

the determining module is configured to determine an entity type of each sub-entity of the target nested entity based on the entity type probability of each sub-entity of the target nested entity.

In a possible implementation manner, the determining module is configured to determine, according to the nested entity recognition model, a probability that each character in the target nested entity is a start character of a sub-entity and a probability that each character in the target nested entity is an end character of the sub-entity;

determining each sub-entity of the target nested entity based on the probability that each character in the target nested entity is a start character of the sub-entity and the probability that each character in the target nested entity is an end character of the sub-entity according to the nested entity recognition model;

and determining entity type probability of each sub-entity of the target nested entity according to the nested entity recognition model.

In a possible implementation manner, the target nested entity is a nested entity in the media information;

the device further comprises:

the recommending module is used for recommending the media information to a target object in response to the existence of a target entity type in the entity types of the sub-entities of the target nested entity;

and the filtering module is used for filtering the media information in response to the fact that the target entity type does not exist in the entity types of the sub-entities corresponding to the target nested entity.

In another aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a processor and a memory, where the memory stores at least one program code, and the at least one program code is loaded and executed by the processor, so that the electronic device implements any one of the above-mentioned training methods for a nested entity recognition model or any one of the above-mentioned nested entity recognition methods.

In another aspect, a computer-readable storage medium is provided, in which at least one program code is stored, and the at least one program code is loaded and executed by a processor, so as to enable a computer to implement any one of the above-mentioned training method for a nested entity recognition model or any one of the above-mentioned nested entity recognition method.

In another aspect, a computer program or a computer program product is provided, in which at least one computer instruction is stored, and the at least one computer instruction is loaded and executed by a processor, so as to enable a computer to implement any one of the above methods for training a nested entity recognition model or any one of the above methods for recognizing a nested entity.

The technical scheme provided by the embodiment of the application at least has the following beneficial effects:

the technical scheme provided by the embodiment of the application is that a nested entity recognition model is obtained based on the probability that each character in a first nested entity is the starting character of the sub-entity, the first label of each character in the first nested entity, the probability that each character in the first nested entity is the ending character of the sub-entity and the second label of each character in the first nested entity, so that the nested entity recognition model can accurately recognize the character which can be used as the starting character of the sub-entity and the character which can be used as the ending character of the sub-entity from each character of the nested entity so as to combine each sub-entity of the nested entity, and each sub-entity of the nested entity can be accurately recognized. The nested entity recognition model is obtained based on the entity type probability and the third label of each sub-entity of the first nested entity, so that the nested entity recognition model can accurately recognize the entity type of each sub-entity of the nested entity, that is, the nested entity recognition model can accurately recognize the nested entity.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment of a training method of a nested entity recognition model or a nested entity recognition method provided in an embodiment of the present application;

FIG. 2 is a flowchart of a training method of a nested entity recognition model according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of a first nested entity provided in an embodiment of the present application;

fig. 4 is a schematic diagram of another first nested entity provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of labels of characters in a first nested entity according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a teacher model distillation training student model provided by an embodiment of the present application;

fig. 7 is a schematic diagram of identification of a medical nesting entity provided by an embodiment of the present application;

FIG. 8 is a schematic diagram of another medical nesting entity identification provided by an embodiment of the present application;

fig. 9 is a flowchart of a method for identifying a nested entity according to an embodiment of the present application;

fig. 10 is a schematic diagram of an entity type of each fine-grained entity in a target nested entity according to an embodiment of the present application;

FIG. 11 is a schematic diagram illustrating identification of a single granularity entity identification model according to an embodiment of the present disclosure;

FIG. 12 is a schematic structural diagram of a training apparatus for nested entity recognition models according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a nested entity identifying apparatus according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of a terminal device according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of an implementation environment of a training method of a nested entity recognition model or a nested entity recognition method provided in an embodiment of the present application, where the implementation environment includes an electronic device 11 as shown in fig. 1, and the training method of the nested entity recognition model or the nested entity recognition method in the embodiment of the present application may be executed by the electronic device 11. Illustratively, the electronic device 11 may include at least one of a terminal device or a server.

The terminal device may be at least one of a smart phone, a game console, a desktop computer, a tablet computer, an e-book reader, an MP3(Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4) player, and a laptop computer.

The server may be one server, or a server cluster formed by multiple servers, or any one of a cloud computing platform and a virtualization center, which is not limited in this embodiment of the present application. The server can be in communication connection with the terminal device through a wired network or a wireless network. The server may have functions of data processing, data storage, data transceiving, and the like, and is not limited in the embodiment of the present application.

The training method of the nested entity recognition model and the nested entity recognition method provided by the embodiment of the application are realized based on an Artificial Intelligence (AI) technology, wherein the AI is a theory, a method, a technology and an application system which simulate, extend and expand human Intelligence by using a digital computer or a machine controlled by the digital computer, sense the environment, acquire knowledge and obtain the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and the like.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and researched in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical services, smart customer service, internet of vehicles, automatic driving, smart traffic and the like. The scheme provided by the embodiment of the application relates to the technologies of a training method of an artificial intelligence nested entity recognition model, a nested entity recognition method and the like, and will be described in detail in the following embodiments.

Based on the foregoing implementation environment, an embodiment of the present application provides a training method for a nested entity recognition model, which is implemented by the electronic device 11 in fig. 1, taking a flowchart of the training method for the nested entity recognition model provided in the embodiment of the present application as an example, as shown in fig. 2. As shown in fig. 2, the method includes steps 201 to 204.

Step 201, obtaining label information of a plurality of first nested entities.

The label information of the first nested entity comprises a first label of each character in the first nested entity, a second label of each character in the first nested entity and a third label of each sub-entity of the first nested entity, wherein the first label of the character represents whether the character is a start character of the sub-entity, the second label of the character represents whether the character is an end character of the sub-entity, and the third label of the sub-entity represents an entity type of the sub-entity.

In an embodiment of the application, at least one entity is nested inside a first nested entity, the entity nested inside the first nested entity is an internal entity, the first nested entity is an external entity, and both the internal entity and the external entity are sub-entities of the first nested entity.

Referring to fig. 3, fig. 3 is a schematic diagram of a first nesting entity according to an embodiment of the present disclosure. The first nested entity was "lower left abdominal pain", wherein the inner part of the "lower left abdominal pain" was nested with entities: "abdomen", "pain" and "lower left abdomen", and thus the internal entities of "lower left abdominal pain" include "abdomen", "pain" and "lower left abdomen", which is one external entity. That is, there are four fruit bodies for "lower left abdominal pain", which are: "abdominal," pain, "" lower left abdominal, "and" lower left abdominal pain.

Referring next to fig. 4, fig. 4 is a schematic diagram of another first nesting entity provided in the present application. The first nested entity is "chronic tonsillitis", wherein the entity is nested inside the "chronic tonsillitis": "tonsils" and "tonsillitis", and thus the internal entities of "chronic tonsillitis" include "tonsils" and "tonsillitis", and "chronic tonsillitis" is an external entity. That is, there are three fruiting bodies of "chronic tonsillitis", which are: "tonsil", "tonsillitis" and "chronic tonsillitis".

It should be noted that any two sub-entities in the first nested entity may be entities of the same granularity or entities of different granularities. For example, the fruiting body "abdomen" and "pain" in "lower left abdominal pain" are entities of the same granularity, the fruiting body "pain", "lower left abdominal" and "lower left abdominal pain" in "lower left abdominal pain" are entities of different granularity, and the fruiting body "tonsil", "tonsillitis" and "chronic tonsillitis" in "chronic tonsillitis" are entities of different granularity.

For any sub-entity of the first nested entity, the sub-entity comprises a start character and an end character, and the start character and the end character can be the same character or different characters. In this case, a character of the first nested entity may be a start character of a child entity, a final character of the child entity, or an intermediate character of the child entity, and a relationship between the character and the child entity is referred to as boundary information of the character.

For example, for a first nested entity "abdominal pain," the first nested entity includes four characters, respectively "abdominal," facial, "" pain. The boundary information of these four characters is shown in table 1 below.

TABLE 1

Character(s)	Boundary information of character
		Abdomen cover	Starting characters of fruiting body "abdomen", "abdominal pain
Part (A)	Ending character of fruiting body' abdomen
		Pain (ache)	Beginning character of fruiting body "pain
Pain (due to cold or dampness)	The ending characters of "pain" and "abdominal pain" of the fruiting body

In the embodiment of the application, a first label of each character in a first nesting entity and a second label of each character in the first nesting entity are obtained. Illustratively, the first label of any character is 0 or 1, 0 indicates that any character is not the beginning character of the sub-entity, and 1 indicates that any character is the beginning character of the sub-entity. The second label for either character is also 0 or 1. In the figure, 0 indicates that any character is not a terminal character of a child entity, and 1 indicates that any character is a terminal character of a child entity.

Because each character in the first nesting entity contains the sequence, the character strings corresponding to all (starting character and ending character) intervals are enumerated through the two labels, and each sub-entity in the first nesting entity is obtained. Where the beginning character precedes the ending character, i.e., "beginning character < ending character," or the beginning character precedes the ending character, and the beginning and ending characters may be the same character, i.e., "beginning character ≦ ending character.

Referring to fig. 5, fig. 5 is a schematic diagram of a label of each character in a first nesting entity provided in an embodiment of the present application, where the first nesting entity is "very sore neck", and the "very sore neck" includes four characters, which are: "neck", "child", "very" and "pain", the first label of these four characters is 1, 0, 1 in order, and the second label of these four characters is 0, 1, 0, 1 in order. Since the four characters themselves include the sequence, according to the first label of the four characters and the second label of the four characters, by enumerating the character strings corresponding to all the (1, 1) intervals, it can be obtained that: "neck pain" includes three fruit bodies, which are: "neck", "very painful" and "very painful neck".

In this embodiment of the application, any sub-entity of the first nested entity corresponds to a third tag, which may also be referred to as an entity type tag, where the entity type tag of the sub-entity is used to characterize an entity type of the sub-entity, and the entity type of the sub-entity is at least one entity type.

For example, for a first nested entity "abdominal pain," the first nested entity includes three sub-entities, each having an entity type tag as shown in table 2 below.

TABLE 2

Fruit body	Entity type tags
		Abdomen part	Location of a body part
Pain (due to cold or dampness)	Symptoms and signs
		Abdominal pain	Symptoms and signs

The first nested entity includes at least two sub-entities, and the entity type labels corresponding to any two sub-entities of the first nested entity may be the same or different.

As shown in fig. 3, the entity type label corresponding to "abdomen" is "part", "the entity type label corresponding to" pain "is" symptom "," the entity type label corresponding to "lower left abdomen" is "part", and the entity type label corresponding to "lower left abdomen pain" is "symptom". It can be seen that "abdomen" and "lower left abdomen" correspond to the same entity type label, "pain" and "lower left abdomen pain" correspond to the same entity type label, and the entity type labels of "abdomen", "lower left abdomen" and "pain", "lower left abdomen pain" are different from the entity type labels of "pain", "lower left abdomen pain".

As shown in fig. 4, the entity type label corresponding to "tonsil" is "part", "the entity type label corresponding to" tonsillitis "is" symptom ", and the entity type label corresponding to" chronic tonsillitis "is" symptom ". "tonsillitis" and "chronic tonsillitis" correspond to the same entity type label, and "tonsil" and "tonsillitis" correspond to different entity type labels.

For any first nested entity, first prediction information for any first nested entity is determined from the first network model, step 202.

The first prediction information of any first nesting entity comprises a first probability that each character in any first nesting entity is a starting character of a sub-entity, a first probability that each character in any first nesting entity is an ending character of the sub-entity and a first entity type probability of each sub-entity of any first nesting entity.

In the embodiment of the present application, for any first nested entity, the first nested entity is input to a first network model, and first prediction information of the first nested entity is output by the first network model, where the first prediction information of the first nested entity includes three parts, which are a first part, a second part, and a third part, respectively, and the model structure and size of the first network model are not limited in the embodiment of the present application.

The first part is a first probability that each character in the first nested entity is a starting character of a child entity, the first probability that any character is a starting character of a child entity is 0 or more and 1 or less, and a higher probability value indicates that the character has a higher probability of being a starting character of a child entity.

The second part is a first probability that each character in the first nested entity is an ending character of a child entity, the first probability that any character is an ending character of a child entity is greater than or equal to 0 and less than or equal to 1, and the higher the probability value, the higher the probability that the character is an ending character of a child entity.

The third component is a first entity type probability for each sub-entity of the first nested entity, the first entity type probability for any sub-entity being a probability that any sub-entity belongs to each entity type. The probability that any sub-entity belongs to a certain entity type is more than or equal to 0 and less than or equal to 1, and the higher the probability value is, the higher the probability that the sub-entity belongs to the entity type is.

In one possible implementation, determining first prediction information of any first nested entity according to the first network model includes: determining a first probability that each character in any first nesting entity is a beginning character of a sub-entity and a first probability that each character in any first nesting entity is an ending character of the sub-entity according to a first network model; determining, according to a first network model, each child entity of any first nested entity based on a first probability that each character in any first nested entity is a beginning character of the child entity and a first probability that each character in any first nested entity is an ending character of the child entity; first entity type probabilities for respective sub-entities of any of the first nested entities are determined according to a first network model.

In the embodiment of the application, the first nested entity is input into the first network model, and the character features of each character in the first nested entity are extracted by the first network model. For any character in the first nested entity, the first network model determines and outputs a first probability that the character is a beginning character of a child entity and a first probability that the character is an ending character of the child entity based on character features of the character.

The first probability that any character in the first nested entity is the beginning character of the sub-entity is shown in formula (1), and the first probability that any character in the first nested entity is the ending character of the sub-entity is shown in formula (2).

p_start(a)＝softmax(emb_a·T_start) Formula (1)

Wherein p is_start(a)Is a first probability that the character a in the first nested entity is the starting character of the sub-entity, softmax is the function name, emb _ a is the character characteristic of the character a in the first nested entity, T_startIs a model parameter of the first network model.

p_end(a)＝softmax(emb_a·T_end) Formula (2)

Wherein p is_end(a)Is the first probability that the character a in the first nested entity is the ending character of the sub-entity, softmax is the function name, emb _ a is the character characteristic of the character a in the first nested entity, T_endIs another model parameter of the first network model.

Thereafter, the first network model determines respective sub-entities of the first nested entity based on a first probability that respective characters in the first nested entity are a beginning character of the sub-entity, a first probability that respective characters in the first nested entity are an ending character of the sub-entity, a first probability threshold, and a second probability threshold. The first probability threshold and the second probability threshold may be the same or different.

Optionally, for any character, if the first probability that the character is a beginning character of a child entity is greater than the first probability threshold, the character may be regarded as a beginning character of the child entity, and if the first probability that the character is an ending character of the child entity is greater than the second probability threshold, the character may be regarded as an ending character of the child entity. Since each character in the first nesting entity contains the sequence, each sub-entity in the first nesting entity is obtained by enumerating character strings corresponding to all (starting character and ending character) intervals.

The first network model then determines sub-entity characteristics of each sub-entity of the first nested entity, and for any sub-entity of the first nested entity, the first network model determines and outputs a first entity type probability for that sub-entity based on the sub-entity characteristics of that sub-entity. Wherein the first network model determines a first entity type probability for any sub-entity of the first nested entity as shown in equation (3).

Wherein the content of the first and second substances,

being a sub-entity of a first nested entity e_ijIs the function name, emb _ e_ijAs a sub-entity e in the first nested entity_ijCharacteristic of fruit body of (A), T_typeIs a further model parameter of the first network model.

In one possible implementation, determining the first entity type probabilities for respective sub-entities of any of the first nested entities comprises: for any sub-entity of any first nested entity, determining a sub-entity characteristic of any sub-entity based on a character characteristic of a beginning character of any sub-entity and a character characteristic of an ending character of any sub-entity; a first entity type probability is determined for any sub-entity based on the sub-entity characteristics of any sub-entity.

In an embodiment of the application, for any sub-entity of the first nested entity, the first network model determines a sub-entity characteristic of the sub-entity based on a character characteristic of a beginning character of the sub-entity and a character characteristic of an ending character of the sub-entity.

For example, for the sub-entity "tonsil", the first network model determines the sub-entity feature of "tonsil" based on the character feature of the beginning character "flat" of "tonsil" and the character feature of the ending character "body" of "tonsil".

Optionally, determining the child entity characteristics of any child entity based on the character characteristics of the starting character of any child entity and the character characteristics of the ending character of any child entity, includes: determining a character relation characteristic based on the character characteristic of the starting character of any sub-entity and the character characteristic of the ending character of any sub-entity, wherein the character relation characteristic is used for representing the character relation between the starting character of any sub-entity and the ending character of any sub-entity; and determining the child entity characteristics of any child entity based on the character characteristics of the starting character of any child entity, the character characteristics of the ending character of any child entity and the character relation characteristics.

In the embodiment of the present application, for any sub-entity of the first nested entity, a character relationship characteristic, which is a characteristic of a character relationship between a start character of the sub-entity and an end character of the sub-entity, is determined based on a character characteristic of the start character of the sub-entity and a character characteristic of the end character of the sub-entity. The character relationship features include, but are not limited to, character difference features and character similarity features, which are described below (see implementation a1) and (see implementation a2), respectively.

Implementation a1, the character relationship features include character difference features for characterizing character differences between a start character of any sub-entity and an end character of any sub-entity; determining character relationship features based on character features of a beginning character of any sub-entity and character features of an ending character of any sub-entity, comprising: and determining the difference value between the character characteristic of the starting character of any sub-entity and the character characteristic of the ending character of any sub-entity to obtain the character difference characteristic.

The character difference feature is used for representing the character difference between the starting character of the sporocarp and the ending character of the sporocarp, and the feature representing the character difference is the character difference feature. In the embodiment of the application, for any sub-entity of the first nested entity, the first network model calculates the difference value between the character feature of the starting character of the sub-entity and the character feature of the ending character of the sub-entity to obtain the character difference feature of the sub-entity. In this way, the first network model is able to determine character difference characteristics of individual sub-entities of the first nested entity.

For example, for the sub-entity "tonsil", the first network model calculates the difference between the character feature of the beginning character "flat" of "tonsil" and the character feature of the ending character "body" of "tonsil" to obtain the character difference feature of "tonsil".

In the implementation mode a2, the character relation features include character similarity features, and the character similarity features are used for characterizing the character similarity between the start character of any sub-entity and the end character of any sub-entity; determining character relationship features based on character features of a beginning character of any sub-entity and character features of an ending character of any sub-entity, comprising: and determining dot products between the character features of the starting character of any sub-entity and the character features of the ending character of any sub-entity to obtain character similarity features.

The character similarity characteristic is used for representing the character similarity between the starting character of the sub-entity and the ending character of the sub-entity, and the characteristic representing the character similarity is the character similarity characteristic. In the embodiment of the present application, for any sub-entity of the first nested entity, the first network model calculates a dot product between a character feature of a start character of the sub-entity and a character feature of an end character of the sub-entity to obtain a character similarity feature of the sub-entity. In this way, the first network model is able to determine character similarity characteristics of the individual sub-entities of the first nested entity.

For example, for the sub-entity "tonsil", the first network model calculates the dot product between the character feature of the beginning character "flat" of "tonsil" and the character feature of the ending character "body" of "tonsil" to obtain the character-like feature of "tonsil".

It should be noted that the character relation feature may be other features besides the character difference feature and the character similarity feature mentioned above, for example, the character relation feature is a character containing feature, and the character containing feature is used to characterize an inclusion relation or an included relation between a start character of any sub-entity and an end character of any sub-entity, and the embodiment of the present application does not limit a calculation manner of the character containing feature.

After determining the character relationship features of any of the sub-entities in the first nested entity, the first network model determines the sub-entity features of the sub-entity based on the character relationship features of the sub-entity, the character features of the beginning character of the sub-entity, and the character features of the ending character of the sub-entity.

When the character relation feature is a character difference feature, the first network model determines a child entity feature of the child entity according to the character difference feature of the child entity, the character feature of the beginning character of the child entity, and the character feature of the ending character of the child entity, wherein the child entity feature of the child entity is shown in formula (4).

Wherein e is_ijThe first character of the sub-entity is i, and the last character of the sub-entity is j. emb _ e_ijIs a characteristic of a fruit body, h_iCharacter features of the beginning character of the sub-entity, h_jCharacter features of the ending character of the sub-entity, h_i-h_jIs the character difference characteristic of the fruiting body.

When the character relation feature is a character similarity feature, the first network model determines a sub-entity feature of the sub-entity according to the character similarity feature of the sub-entity, the character feature of the start character of the sub-entity, and the character feature of the end character of the sub-entity, wherein the sub-entity feature of the sub-entity is shown in formula (5).

Wherein e is_ijThe first character of the sub-entity is i, and the last character of the sub-entity is j. emb _ e_ijIs a characteristic of a fruit body, h_iCharacter features of the beginning character of the sub-entity, h_jCharacter features of the ending character of the sub-entity, h_i⊙h_jIs a character-like characteristic of a sub-entity.

When the character relation feature includes a character difference feature and a character similarity feature, the first network model determines a child entity feature of the child entity according to the character difference feature of the child entity, the character similarity feature of the child entity, a character feature of a start character of the child entity, and a character feature of a finish character of the child entity, the child entity feature of the child entity being shown in formula (6).

Wherein e is_ijThe first character of the sub-entity is i, and the last character of the sub-entity is j. emb _ e_ijIs a characteristic of a fruit body, h_iCharacter features of the beginning character of the sub-entity, h_jCharacter features of the ending character of the sub-entity, h_i-h_jIs a character difference characteristic of a fruit body, h_i⊙h_jIs a character-like characteristic of a sub-entity.

It is to be understood that, when applied, the first network model may determine the sub-entity characteristics of the sub-entity based on the character relationship characteristics of the sub-entity, the character characteristics of the start character of the sub-entity, the character characteristics of the end character of the sub-entity, and the character characteristics of the respective middle characters of the sub-entity, i.e., the first network model may determine the sub-entity characteristics of the sub-entity based on the character relationship characteristics of the sub-entity and the character characteristics of the respective characters of the sub-entity.

For example, for the sporocarp "tonsil", the first network model determines the sporocarp feature of "tonsil" based on the character feature of the starting character "flat" of "tonsil", the character feature of the middle character "peach" of "tonsil", the character feature of the ending character "body" of "tonsil", and the character relationship feature between "flat" and "body".

A sub-entity characteristic of any of the sub-entities in the first nested entity is determined. Thereafter, a first entity type probability of the sub-entity is determined based on the sub-entity characteristics of the sub-entity.

In summary, the first network model can determine a first probability that each character in any one of the first nested entities is a beginning character of a child entity, a first probability that each character in the first nested entity is an ending character of a child entity, and a first entity type probability of each child entity of the first nested entity, i.e., determine first prediction information of any one of the first nested entities. In this way, the first network model is able to determine first prediction information for each first nested entity.

Step 203, training the first network model based on the label information and the first prediction information of the plurality of first nested entities to obtain a second network model.

In the embodiment of the application, a loss value of the first network model is determined based on label information of a plurality of first nested entities and first prediction information of the plurality of first nested entities, and model parameters of the first network model are adjusted according to the loss value of the first network model, so that one-time training of the first network model is realized, and the second network model is obtained.

In a possible implementation manner, training a first network model based on label information and first prediction information of a plurality of first nested entities to obtain a second network model includes: for any first nested entity, determining a loss value of any first nested entity based on the tag information and the first prediction information of any first nested entity; and training the first network model based on the loss values of the plurality of first nested entities to obtain a second network model.

The Loss function in the training of the first network model is not limited in the embodiment of the present application, and exemplarily, the Loss function is a Cross-entropy Error (CE) Loss function, a Focal Error (FL) Loss function, and the like. For any first nested entity, the loss value of the first nested entity is determined according to the loss function formula based on the label information of the first nested entity and the first prediction information of the first nested entity.

Optionally, determining a loss value of any first nested entity based on the tag information and the first prediction information of any first nested entity includes: determining a first loss value for any first nested entity based on the first label of each character in any first nested entity and the first probability that each character in any first nested entity is the starting character of a child entity; determining a second loss value of any first nested entity based on the second label of each character in any first nested entity and the first probability that each character in any first nested entity is the ending character of the sub-entity; determining a third loss value for any of the first nested entities based on the third label of each sub-entity of any of the first nested entities and the first entity type probability of each sub-entity of any of the first nested entities; determining a penalty value for any first nested entity based on the first penalty value, the second penalty value, and the third penalty value for any first nested entity.

In this embodiment, for any first nested entity, a first loss value of the first nested entity is determined based on the first label of each character in the first nested entity and the first probability that each character in the first nested entity is the starting character of the sub-entity, and the first loss value of the first nested entity is determined as shown in formula (7).

L_start＝CE(P_start，T_start) Formula (7)

Wherein L is_startA first loss value for a first nested entityCE is the sign of the loss function, P_startIs a first probability, T, that each character in the first nested entity is a beginning character of a child entity_startA first label for each character in the first nested entity.

In this embodiment, for any first nested entity, a second loss value of the first nested entity is determined based on the second label of each character in the first nested entity and the first probability that each character in the first nested entity is the last character of the child entity, and the determination manner of the second loss value of the first nested entity is shown in formula (8).

L_end＝CE(P_end，T_end) Formula (8)

Wherein L is_endIs the second loss value of the first nested entity, CE is the loss function sign, P_endIs a first probability, Y, that each character in the first nested entity is a terminal character of a child entity_endA second label for each character in the first nested entity.

In the embodiment of the present application, for any first nested entity, a third loss value of the first nested entity is determined based on the third label of each sub-entity of the first nested entity and the first entity type probability of each sub-entity of the first nested entity, and the third loss value of the first nested entity is determined as shown in formula (9).

L_type＝CE(P_start,end，Y_start,end) Formula (9)

Wherein L is_typeIs the third loss value of the first nested entity, CE is the loss function sign, P_start,endFirst entity type probabilities, Y, for respective sub-entities of a first nested entity_start,endA third tag that is a respective sub-entity of the first nested entity.

After determining the first loss value, the second loss value, and the third loss value of the first nested entity, the loss value of the first nested entity is determined based on the first loss value and its weight of the first nested entity, the second loss value and its weight of the first nested entity, and the third loss value and its weight of the first nested entity, the loss value of the first nested entity being determined in the manner shown in equation (10).

L＝α*L_start+β*L_end+γ*L_typeFormula (10)

Wherein L is a loss value of the first nested entity, α is a weight of the first loss value of the first nested entity, L_startIs a first loss value of the first nested entity, beta is a weight of a second loss value of the first nested entity, L_endIs the second loss value of the first nested entity, gamma is the weight of the third loss value of the first nested entity, L_typeIs the third penalty value for the first nested entity.

Note that, the weight of the first loss value of the first nested entity, the weight of the second loss value of the first nested entity, and the weight of the third loss value of the first nested entity are all equal to or greater than 0 and equal to or less than 1, for example, α, β, γ ∈ [0, 1] in formula (10). Optionally, the sum of the weight of the first loss value of the first nested entity, the weight of the second loss value of the first nested entity, and the weight of the third loss value of the first nested entity is 1.

In the above manner, the loss value of each first nested entity can be determined. And then, determining the loss value of the first network model based on the loss values of the plurality of first nested entities, and adjusting the model parameters of the first network model based on the loss value of the first network model to realize one-time training of the first network model to obtain a second network model. The method for determining the loss value of the first network model based on the loss values of the plurality of first nested entities is not limited in the embodiment of the present application.

Step 204, in response to the first condition being met, treating the second network model as a nested entity recognition model.

The nested entity recognition model is used for recognizing the entity type of the nested entity.

The second network model is a nested entity recognition model in response to the first condition being satisfied. The first condition is satisfied without limitation, and the first condition is, for example, satisfied with the target number of training times. The number of the target training times is not limited, and is flexibly set according to manual experience or scenes, and illustratively, the number of the target training times is 500.

In one possible implementation, the method further includes: in response to the first condition not being satisfied, for any of the first nested entities, determining second prediction information for any of the first nested entities according to the second network model, the second prediction information for any of the first nested entities including a second probability that each character in any of the first nested entities is a beginning character of a child entity, a second probability that each character in any of the first nested entities is an ending character of the child entity, and a second entity type probability for each child entity of any of the first nested entities; training the second network model based on the label information and the second prediction information of the plurality of first nested entities to obtain a third network model; responsive to the first condition being satisfied, the third network model is treated as a nested entity recognition model.

And in response to that the first condition is not met, for any one first nested entity, inputting the first nested entity into a second network model, outputting second prediction information of the first nested entity by the second network model, and obtaining the second prediction information of each first nested entity according to the way, wherein the second prediction information of the first nested entity is similar to the first prediction information of the first nested entity and is not described herein again. And then, determining a loss value of the second network model based on the label information of the plurality of first nested entities and the second prediction information of the plurality of first nested entities, and adjusting model parameters of the second network model based on the loss value of the second network model to realize one-time training of the second network model to obtain a third network model. And when the first condition is met, the third network model is a nested entity recognition model, and when the first condition is not met, the third network model is trained according to the method of the embodiment of the application until the first condition is met, so that the nested entity recognition model is obtained. The related description is given in the description of step 201 to step 204, and the implementation principles are similar, which are not described herein again.

It can be understood that the process of training the nested entity recognition model is a process of iteratively optimizing model parameters for a plurality of times. Optionally, an Adam Optimizer (Optimizer) is used for performing optimization solution, where the Adam Optimizer is a self-adaptive learning rate method, the learning rate of each parameter is dynamically adjusted by using first moment estimation and second moment estimation of a gradient, and the learning rate of each iteration has a definite range, so that the change of the model parameters is stable.

Optionally, training the first network model based on the label information and the first prediction information of the plurality of first nested entities to obtain a second network model, further includes: acquiring label information of a plurality of second nested entities, wherein the label information of the second nested entities comprises first labels of all characters in the second nested entities, second labels of all characters in the second nested entities and third labels of all sub-entities of the second nested entities; for any second nested entity, determining first prediction information of any second nested entity according to the first network model, wherein the first prediction information of any second nested entity comprises a first probability that each character in any second nested entity is a starting character of a sub-entity, a first probability that each character in any second nested entity is an ending character of the sub-entity and a first entity type probability of each sub-entity of any second nested entity; training the first network model based on the label information and the first prediction information of the plurality of first nested entities to obtain a second network model, comprising: and training the first network model based on the label information and the first prediction information of the plurality of first nested entities and the label information and the first prediction information of the plurality of second nested entities to obtain a second network model.

In the embodiment of the application, a plurality of second nested entities are obtained. On one hand, for any one second nesting entity, the tag information of the second nesting entity is obtained, and the tag information of the second nesting entity is similar to the tag information of the first nesting entity, for detailed description, see the related description of step 201, and is not described herein again. On the other hand, for any one of the second nested entities, the second nested entity is input to the first network model, and the first prediction information of the second nested entity is output by the first network model, so that the first prediction information of each second nested entity is obtained, and the first prediction information of the second nested entity is similar to the first prediction information of the first nested entity, which is described in detail in step 202, and is not described herein again. The acquisition mode of the tag information of any one second nested entity is not limited. Illustratively, the label information of the second nested entity can be manually marked or output by the teacher model.

Then, for any one of the first nested entities, a loss value for the first nested entity is determined based on the tag information of the first nested entity and the first prediction information of the first nested entity. For any second nested entity, determining a loss value of the second nested entity based on the label information of the second nested entity and the first prediction information of the second nested entity, where a manner of determining the loss value of the second nested entity is similar to that of determining the loss value of the first nested entity, and is not described herein again.

In this way, the loss value of each first nested entity and the loss value of each second nested entity can be determined. And then, determining the loss value of the first network model based on the loss values of the first nested entities and the loss values of the second nested entities, and adjusting the model parameters of the first network model according to the loss value of the first network model to realize one-time training of the first network model to obtain the second network model. The method for training the first network model to obtain the second network model based on the label information and the first prediction information of the plurality of first nested entities and the label information and the first prediction information of the plurality of second nested entities is similar to the method for training the first network model to obtain the second network model based on the label information and the first prediction information of the plurality of first nested entities, and the implementation principles of the two methods are the same and are not repeated herein.

The embodiment of the application trains the first network model through a plurality of first nested entities and a plurality of second nested entities to obtain a second network model. And when the second network model meets the first condition, the second network model is a nested entity recognition model, and when the second network model does not meet the first condition, the second network model is trained based on the plurality of first nested entities and the plurality of second nested entities until the first condition is met, so that the nested entity recognition model is obtained.

In the embodiment of the application, in order to improve the operation speed of the nested entity recognition model and the recognition effect of the nested entity, the first network model is trained by adopting a model distillation technology to obtain the nested entity recognition model. The principle of the model distillation technology is that a simple model is used for approximating the output of a complex model, the calculated amount in the prediction process is reduced, and meanwhile the prediction effect is guaranteed. The complex Model is generally called a Teacher Model (Teacher Model), the simple Model is generally called a Student Model (Student Model), and the Student Model is obtained after Model distillation. In the embodiment of the application, the student model is a nested entity recognition model, that is, the embodiment of the application adopts a model distillation technology to obtain the nested entity recognition model, so that the output of the nested entity recognition model approaches the output of the teacher model, the recognition effect of the nested entity is ensured, and the calculation amount is reduced.

In a possible implementation manner, before obtaining the tag information of the plurality of second nested entities, the method further includes: for any first nested entity, determining third prediction information of any first nested entity according to a fourth network model, wherein the third prediction information of any first nested entity comprises a third probability that each character in any first nested entity is a start character of a sub-entity, a third probability that each character in any first nested entity is an end character of the sub-entity, and a third entity type probability of each sub-entity of any first nested entity; training the fourth network model based on the label information and the third prediction information of the plurality of first nested entities to obtain a fifth network model; in response to a second condition being satisfied, taking the fifth network model as a teacher model; obtaining label information of a plurality of second nested entities, comprising: label information of a plurality of second nested entities is obtained based on the teacher model.

In the embodiment of the application, the teacher model is obtained by training through the label information of a plurality of first nested entities. Optionally, for any first nested entity, the first nested entity is input to the fourth network model, and the third prediction information of the first nested entity is output by the fourth network model, so that the third prediction information of each first nested entity can be obtained, where the third prediction information of the first nested entity is similar to the first prediction information of the first nested entity, and is not described herein again.

And then, for any one first nested entity, determining the loss value of the first nested entity according to a loss function formula based on the label information of the first nested entity and the third prediction information of the first nested entity, and obtaining the loss value of each first nested entity in this way. And then, obtaining the loss value of the fourth network model based on the loss value of each first nested entity. And adjusting the model parameters of the fourth network model according to the loss value of the fourth network model to obtain a fifth network model. And when the second condition is met, the fifth network model is a teacher model, and when the second condition is not met, the fifth network model is trained based on the label information of the plurality of first nested entities until the second condition is met, so that the teacher model is obtained.

The method for determining the loss value of the first nested entity is not limited based on the label information of the first nested entity and the third prediction information of the first nested entity. For example, based on the tag information of the first nested entity and the third prediction information of the first nested entity, the loss value of the first nested entity may be determined according to equations (7) to (10), or may be determined according to another loss function. The second condition is satisfied, and is not limited, and the second condition is satisfied by reaching a certain number of training times. The number of the training times is not limited, and is flexibly set according to manual experience or scenes, and exemplarily, the training times is 200 times.

After the teacher model is trained, for any one of the second nested entities, the second nested entity is input to the teacher model, and the teacher model outputs the prediction information of the second nested entity. The prediction information of the second nested entity includes a probability that each character in the second nested entity is a beginning character of a child entity, a probability that each character in the second nested entity is an ending character of the child entity, and an entity type probability of each child entity of the second nested entity.

Then, based on the probability that each character in the second nested entity is the starting character of the sub-entity, the first label of each character in the second nested entity is determined. Determining a second label for each character in the second nested entity based on the probability that each character in the second nested entity is a terminal character of a child entity. Determining a third label for each sub-entity of the second nested entity based on the entity type probabilities for each sub-entity of the second nested entity.

Optionally, for any character in the second nested entity, when the probability that the character is the beginning character of the child entity in the second nested entity is greater than the probability threshold of the beginning character, it is determined that the first label of the character in the second nested entity is 1, and when the probability that the character is the beginning character of the child entity in the second nested entity is not greater than the probability threshold of the beginning character, it is determined that the first label of the character in the second nested entity is 0. Based on the same principle, when the probability that the character in the second nested entity is the ending character of the sub-entity is greater than the probability threshold of the ending character, the second label of the character in the second nested entity is determined to be 1, and when the probability that the character in the second nested entity is the ending character of the sub-entity is not greater than the probability threshold of the ending character, the second label of the character in the second nested entity is determined to be 0. In this way, the first label and the second label of each character in the second nested entity can be obtained.

For any sub-entity of the second nested entity, the entity type probability of the sub-entity is the probability that the sub-entity belongs to the respective entity type. And determining a probability value which is greater than the reference probability from the probabilities that the sub-entity belongs to the entity types, and taking the entity type corresponding to the probability value which is greater than the reference probability as a third label of the sub-entity. The probability value greater than the reference probability may be a maximum probability value, a maximum probability value and a next maximum probability value, or even a probability value greater than a certain fixed probability (e.g., 0.75). In this way, the third tags of the respective sub-entities of the second nested entity can be determined.

Optionally, the teacher model comprises a transformer-based bi-directional encoder representation network model, and the nested entity recognition model comprises a long-term and short-term memory network model.

The model complexity of a Bidirectional Encoder Representation From transforms (BERT) network model based on a transformer is high, so that the calculation amount is large, the calculation speed is low, the deployment is not facilitated, but the effect of the BERT network model is good, and the method is suitable for teacher models. Although The Long-Short Term Memory (LSTM) network model has poor effect, The LSTM network model is a State Of The Art (SOTA) model, The model complexity is low, The operation amount is small, The operation speed is high, and The LSTM network model is suitable for student models. Thus, the teacher model of an embodiment of the present application comprises a BERT network model and the student model comprises an LSTM network model.

Optionally, the BERT network model included in the teacher model and the LSTM network model included in the student model are both used to determine character features of each character in the nested entities and sub-entity features of each sub-entity of the nested entities.

In an embodiment of the present application, a student model is distilled and trained using a teacher model based on a plurality of first nested entities and a plurality of second nested entities. Referring to fig. 6, fig. 6 is a schematic diagram of a teacher model distillation training student model according to an embodiment of the present application. And the teacher model is used for sample expansion, namely, each second nested entity is input into the teacher model, and the teacher model outputs the label information of each second nested entity. And then, training the student model by distillation by using the label information of each second nested entity and the label information of each first nested entity.

In the embodiment of the application, on one hand, the first network model includes an LSTM network model, and the first network model is trained according to a plurality of first nested entities to obtain a nested entity recognition model (denoted as a nested entity recognition model obtained by ordinary training). On the other hand, the fourth network model comprises a BERT network model, and the fourth network model is trained according to the plurality of first nested entities to obtain a teacher model. On the other hand, according to the plurality of first nested entities and the plurality of second nested entities, the teacher model is used for distilling and training the student model to obtain a nested entity recognition model (the nested entity recognition model obtained by distilling and training is recorded). The identification effect of the three models on the nested entities is evaluated by using the accuracy, the recall rate and the F1 Score (F1 Score) in the embodiment of the application, and is shown in the following table 3.

TABLE 3

	Rate of accuracy	Recall rate	F1 score
				Nested entity recognition model obtained by common training	65.54％	76.87％	70.75％
Teacher model	79.32％	76.45％	77.86％
				Nested entity recognition model obtained by distillation training	75.21％	75.86％	75.53％

As is apparent from table 3, the recall rate of the nested entity recognition model obtained by distillation training is smaller than that of the nested entity recognition model obtained by ordinary training, but the accuracy and the F1 score of the nested entity recognition model obtained by distillation training are significantly improved, so that the nested entity recognition model obtained by distillation training has a better recognition effect for the nested entity. Compared with a teacher model, the nested entity recognition model obtained through distillation training has smaller differences in accuracy, recall rate and F1 score, so that the nested entity recognition model obtained through distillation training has a better recognition effect on the nested entities and a higher running speed.

It is understood that the teacher model may contain other network models besides the BERT network model, and the student model may contain other network models besides the LSTM network model. The teacher model and the student model may include the same model, that is, the teacher model and the student model may each include a BERT network model or an LSTM network model or other network models except the BERT network model and the LSTM network model.

Because the embodiment of the application can conveniently acquire a large number of second nested entities, the embodiment of the application acquires the label information of each second nested entity based on the teacher model, and trains the first network model according to the label information of each second nested entity. In application, when the number of the second nested entities is small, any one of the second nested entities can be input into the teacher model, the teacher model outputs logs, and the first network model is trained according to the logs of the second nested entities. When the number of the second nested entities is equal, any one of the second nested entities is input into the teacher model, the teacher model outputs the prediction probability distribution, and the first network model is trained according to the prediction probability distribution of each second nested entity.

When it needs to be explained, for any second nested entity, the output information of the full link layer of the teacher model is logs of the second nested entity. After the logs of the second nested entity are processed by a normalization function (such as a Softmax function), a predicted probability distribution of the second nested entity is obtained, that is, a probability that the second nested entity belongs to each entity type is obtained. Label information for the second nested entity can be derived based on the predicted probability distribution for the second nested entity.

The method is based on the probability that each character in the first nested entity is the starting character of the sub-entity, the first label of each character in the first nested entity, the probability that each character in the first nested entity is the ending character of the sub-entity and the second label of each character in the first nested entity, so that the nested entity recognition model can accurately recognize the character which can be used as the starting character of the sub-entity and the character which can be used as the ending character of the sub-entity from each character of the nested entity so as to combine each sub-entity of the nested entity, and each sub-entity of the nested entity is accurately recognized. The nested entity recognition model is obtained based on the entity type probability and the third label of each sub-entity of the first nested entity, so that the nested entity recognition model can accurately recognize the entity type of each sub-entity of the nested entity, that is, the nested entity recognition model can accurately recognize the nested entity.

The above method for training the nested entity recognition model according to the embodiment of the present application is explained in detail from the perspective of method steps, and the method for training the nested entity recognition model will be specifically described below with reference to a scenario. In this scenario, the first nested entity is a medical nested entity, and correspondingly, the nested entity recognition model is a medical nested entity recognition model.

Firstly, label information of a plurality of medical nested entities is obtained, wherein the label information of the medical nested entities comprises first labels of all characters in the medical nested entities, second labels of all characters in the medical nested entities and third labels of all sub-entities of the medical nested entities. Please refer to the description of step 201 for details, which are not repeated herein.

Then, the medical nested entity is input into a first network model, and the first network model outputs a medical nested entity recognition result (namely first prediction information of the medical nested entity), wherein the first network model is a two-stage model, the recognition task of the medical nested entity is divided into a front subtask and a rear subtask, and the two subtasks are a subtask recognized by a boundary of a sporocarp and a subtask recognized by a type of the sporocarp respectively. The entity boundary identification is a first probability of identifying each character in the medical nested entity as a starting character of the entity and a first probability of identifying each character in the medical nested entity as a ending character of the entity, and the entity type identification is a first entity type probability of identifying each entity in the medical nested entity.

As shown in fig. 7, fig. 7 is a schematic diagram of identification of medical nested entities according to an embodiment of the present application, where for any medical nested entity, when the first network model identifies the medical nested entity, a boundary of a sub-entity is identified first, a first probability that each character in the medical nested entity is a start character of the sub-entity and a first probability that each character in the medical nested entity is an end character of the sub-entity are identified, and then sub-entity type identification is performed to identify a first entity type probability of each sub-entity in the medical nested entity, so as to obtain a medical nested entity identification result.

Referring to fig. 8, fig. 8 is a schematic view illustrating identification of another medical nesting entity provided in the embodiment of the present application. The first network model includes a first coding network, a second coding network, a sub-entity boundary prediction network, a sub-entity feature determination network, and a sub-entity type prediction network. The first coding network and the sub-entity boundary prediction network are used for executing sub-tasks of sub-entity boundary identification, and the second coding network, the sub-entity characteristic determination network and the sub-entity type prediction network are used for executing sub-tasks of sub-entity type identification.

The medical nested entity is input into a first coding network, character features of all characters in the medical nested entity are determined by the first coding network, and then a sub-entity boundary prediction network determines the probability (namely, a first probability) that all characters are beginning characters of a sub-entity and the probability (namely, the first probability) that all characters are ending characters of the sub-entity based on the character features of all characters, so that sub-entity boundary recognition is achieved.

The medical nested entity is input into a second coding network, character features of all characters in the medical nested entity are determined by the second coding network, then the sub-entity feature determination network determines all sub-entities of the medical nested entity based on the probability that all characters are starting characters of the sub-entities and the probability that all characters are ending characters of the sub-entities, and determines sub-entity features of all sub-entities of the medical nested entity based on the character features of all characters. Thereafter, the sub-entity type prediction network determines an entity type probability (i.e., a first entity type probability) of each sub-entity based on the sub-entity characteristics of each sub-entity.

And then training the first network model based on the label information of the medical nested entities and the identification result of the medical nested entities to obtain a second network model, and taking the second network model as the nested entity identification model in response to the first condition being met, wherein the detailed description is given in steps 201 to 204, and is not repeated herein.

Based on the foregoing implementation environment, an embodiment of the present application further provides a nested entity identification method, which may be executed by the electronic device 11 in fig. 1, taking a flowchart of the nested entity identification method provided in the embodiment of the present application shown in fig. 9 as an example. As shown in fig. 9, the method includes steps 901 to 903.

Step 901, acquiring a target nested entity.

The embodiment of the application does not limit the acquisition mode and the number of the target nested entities, for example, the target nested entities are media type tags, titles, words in texts and the like of media information, and the number of the target nested entities is one or five.

And step 902, determining the probability that each character in the target nested entity is the starting character of the sub-entity, the probability that each character in the target nested entity is the ending character of the sub-entity and the entity type probability of each sub-entity of the target nested entity according to the nested entity recognition model.

The nested entity recognition model is obtained by training according to the training method of the nested entity recognition model in the optional embodiments.

In the embodiment of the application, the target nested entity is input into the nested entity recognition model, and the nested entity recognition model outputs the probability that each character in the target nested entity is the starting character of the sub-entity, the probability that each character in the target nested entity is the ending character of the sub-entity and the entity type probability of each sub-entity of the target nested entity.

Optionally, determining, according to the nested entity recognition model, a probability that each character in the target nested entity is a start character of a child entity, a probability that each character in the target nested entity is an end character of the child entity, and an entity type probability of each child entity of the target nested entity includes: determining the probability that each character in the target nested entity is the starting character of the sub-entity and the probability that each character in the target nested entity is the ending character of the sub-entity according to the nested entity recognition model; determining each sub-entity of the target nested entity based on the probability that each character in the target nested entity is a start character of the sub-entity and the probability that each character in the target nested entity is an end character of the sub-entity according to the nested entity recognition model; and determining entity type probability of each sub-entity of the target nested entity according to the nested entity recognition model.

In the embodiment of the application, the target nested entity is input into the nested entity recognition model, and the nested entity recognition model extracts the character features of each character in the target nested entity. For any character in the target nested entity, the nested entity recognition model determines and outputs the probability that the character is the beginning character of the sub-entity and the probability that the character is the ending character of the sub-entity based on the character features of the character. Then, the nested entity recognition model determines each sub-entity of the target nested entity based on the probability that each character in the target nested entity is the beginning character of the sub-entity, the probability that each character in the target nested entity is the ending character of the sub-entity, the first probability threshold and the second probability threshold. Then, for any sub-entity of the target nested entity, the nested entity recognition model determines and outputs the entity type probability of the sub-entity based on the sub-entity characteristics of the sub-entity, thereby obtaining the entity type probability of each sub-entity of the target nested entity. The related description is given in step 202, and is not repeated here.

Step 903, determining the entity type of each sub-entity of the target nested entity based on the entity type probability of each sub-entity of the target nested entity.

In the embodiment of the application, for any sub-entity of the target nested entity, the entity type probability of the sub-entity is the probability that the sub-entity belongs to each entity type. And determining a probability value which is greater than the reference probability from the probabilities of the sub-entities belonging to the entity types, and taking the entity type corresponding to the probability value which is greater than the reference probability as the entity type of the sub-entity. The probability value greater than the reference probability may be a maximum probability value, a maximum probability value and a next maximum probability value, or even a probability value greater than a certain fixed probability (e.g., 0.75). In this way, the entity type of the respective sub-entity can be determined.

In one possible implementation, the target nested entity is a nested entity in the media information; after determining the entity type of each sub-entity of the target nested entity based on the entity type probability of each sub-entity of the target nested entity, the method further comprises the following steps: recommending the media information to the target object in response to the target entity type existing in the entity types of the sub-entities of the target nested entity; and filtering the media information in response to the fact that the target entity type does not exist in the entity types of the sub-entities corresponding to the target nested entity.

In the embodiment of the application, at least one nested entity is extracted from a media type label, a title, a text and the like of media information to obtain a target nested entity. The target nested entity comprises at least two sub-entities, and the entity types of the sub-entities of the target nested entity are obtained according to the steps 901 to 903. And responding to the existence of the target entity type in the entity types of the sub-entities of the target nested entity, indicating that the media information corresponding to the target nested entity is the expected media information, and recommending the media information to the target object. And responding to the fact that the target entity type does not exist in the entity types of the sub-entities corresponding to the target nested entity, and filtering out the media information, wherein the media information corresponding to the target nested entity is not expected media information.

By the method, the media information is recommended to the target object or filtered, so that the media information can be recalled, and the method can be applied to various scenes such as media information search, intelligent question answering and the like, and can also be applied to various fields such as medical treatment, maps and the like.

When applied, the entity type of each sub-entity of the target nested entity can also be determined based on other entity recognition models. Other entity recognition models are not limited herein, and may be, for example, a tag-hierarchy-based entity recognition model, an entity-hierarchy-based entity recognition model, or the like.

Inputting the target nested entity into an entity identification model based on label layering, identifying the entity type of each fine-grained entity of the target nested entity by the model, and obtaining the entity type of the target nested entity by combining the entity types of each fine-grained entity. Wherein the fine-grained entity is the finest-grained entity of the target nested entity.

As shown in fig. 10, fig. 10 is a schematic diagram of an entity type of each fine-grained entity in a target nested entity according to an embodiment of the present application. Inputting a target nested entity 'lower left abdominal pain' into an entity recognition model based on label layering, and outputting entity types of three fine-grained entities 'lower left', 'abdomen' and 'pain' by the entity recognition model based on label layering. Wherein the type of the "lower left" entity is "B-site | B-symptom", "B-site" means "lower left" and can be used as the starting entity of a fruit body (e.g., "lower left abdomen"), the type of the fruit body is "site", and "B-symptom" means "lower left" and can be used as the starting entity of another fruit body (e.g., "lower left abdomen pain"), the type of the fruit body is "symptom". The entity type of "abdomen" is "E-site | B-symptom", "E-site" means "abdomen" can be a final entity of a fruit body (e.g., "left lower abdomen") whose entity type is "site", and "B-symptom" means "abdomen" can be a starting entity of another fruit body (e.g., "abdominal pain") whose entity type is "symptom". The entity type of "pain" is "S-symptom | E-symptom", "S-symptom" means "pain" can be regarded as a single entity, the entity type thereof is "symptom", and "E-symptom" means "pain" can be regarded as a ending entity of a entity (e.g., "abdominal pain", "lower left abdominal pain") the entity type of which is "symptom". Then, the entity types of "lower left", "abdomen", and "pain" are combined to obtain the entity type of "lower left abdominal pain".

Inputting a target nested entity into an entity identification model based on entity layering, determining entity type probability of each fine-grained entity of the target nested entity by the model, combining at least two fine-grained entities to obtain a first combined entity, and determining entity type probability of the first combined entity based on the entity type probability of each of the at least two fine-grained entities. The method can also be used for combining at least two first combined entities to obtain a second combined entity, and determining the entity type probability of the second combined entity based on the entity type probabilities of the at least two first combined entities. By the method, entity type probabilities of the fine-grained entities, the first combined entity, the second combined entity, the target nested entity and the like in the target nested entity are determined, namely, the entity type probabilities of the sub-entities with different granularities from low granularity to high granularity of the target nested entity are determined, and finally the entity type probability of the target nested entity is obtained. And then, obtaining the entity type of the target nested entity based on the entity type probability of the target nested entity. The entity identification model based on entity layering is a single-granularity entity identification model, and the entity type of the target nested entity can be identified.

As shown in fig. 11, fig. 11 is a schematic diagram of recognition of a single granularity entity recognition model according to an embodiment of the present application. The single granularity entity recognition model is in recognition of entity type probability of "lower left abdominal pain" of the target nested entity. Firstly, dividing the left lower abdominal pain into various fine-grained entities of left lower part, abdomen and pain, and determining entity type probabilities of the fine-grained entities of abdomen and pain. Then, combining the fine-grained entity 'lower left' and the 'abdomen' to obtain a first combined entity 'lower left abdomen', and determining the entity type probability of the first combined entity 'lower left abdomen' based on the entity type probability of the fine-grained entity 'abdomen'; combining the fine-grained entities "abdominal" and "pain" to obtain a first combined entity "abdominal pain", determining an entity type probability of the first combined entity "abdominal pain" based on the entity type probabilities of the fine-grained entities "abdominal" and "pain". Then, the first combination entity "lower left abdomen" and "abdominal pain" are combined to obtain the second combination entity (i.e. the target nesting entity) "lower left abdominal pain", and the entity type probability of the second combination entity "lower left abdominal pain" is determined based on the entity type probabilities of the first combination entity "lower left abdomen" and "abdominal pain", thereby obtaining the entity type probability of the target nesting entity. And then, obtaining the entity type of the target nested entity based on the entity type probability of the target nested entity.

In the embodiment of the application, the entity types of all the sub-entities of the target nested entity are determined together based on the nested entity identification model and other entity identification models, and the media information is recommended based on the entity types of all the sub-entities in the target nested entity. Because each sporocarp is an entity with different granularities and two models are adopted for entity identification of the target nested entity, the media information can be recalled based on the entity type of the multi-granularity entity, different requirements on recall granularity under different media information quantities or different types of media information can be met, and the recall quantity of the media information is increased. And based on the nested entity identification model of the embodiment of the present application and other entity identification models, the entity types of the respective sub-entities of the target nested entity are determined together, and the increment of the operation time length is also small, for example, in the case of a query rate Per 100 seconds (query Per Second, QPS), the operation time length is increased from 1.83 milliseconds to 2.6 milliseconds, the increment is only 0.77 milliseconds, and the increment is small.

The nested entity recognition model in the embodiment of the application is obtained based on the probability that each character in the first nested entity is the starting character of the sub-entity, the first label of each character in the first nested entity, the probability that each character in the first nested entity is the ending character of the sub-entity and the second label of each character in the first nested entity, so that the nested entity recognition model can accurately recognize the character which can be used as the starting character of the sub-entity and the character which can be used as the ending character of the sub-entity from each character of the nested entity so as to combine each sub-entity of the nested entity, and each sub-entity of the nested entity can be accurately recognized. The nested entity recognition model is obtained based on the entity type probability and the third label of each sub-entity of the first nested entity, so that the nested entity recognition model can accurately recognize the entity type of each sub-entity of the nested entity, that is, the nested entity recognition model can accurately recognize the nested entity.

Fig. 12 is a schematic structural diagram of a training apparatus for a nested entity recognition model according to an embodiment of the present application, and as shown in fig. 12, the apparatus includes:

an obtaining module 1201, configured to obtain tag information of a plurality of first nested entities, where the tag information of the first nested entities includes a first tag of each character in the first nested entity, a second tag of each character in the first nested entity, and a third tag of each sub-entity of the first nested entity, where the first tag of a character represents whether the character is a start character of the sub-entity, the second tag of the character represents whether the character is an end character of the sub-entity, and the third tag of the sub-entity represents an entity type of the sub-entity;

a determining module 1202, configured to determine, for any first nested entity, first prediction information of any first nested entity according to the first network model, where the first prediction information of any first nested entity includes a first probability that each character in any first nested entity is a beginning character of a child entity, a first probability that each character in any first nested entity is an ending character of the child entity, and a first entity type probability of each child entity of any first nested entity;

a training module 1203, configured to train the first network model based on the label information and the first prediction information of the multiple first nested entities to obtain a second network model;

the determining module 1202 is further configured to treat the second network model as a nested entity recognition model in response to the first condition being satisfied, the nested entity recognition model being configured to recognize an entity type of the nested entity.

In a possible implementation manner, the determining module 1202 is configured to determine, according to the first network model, a first probability that each character in any first nested entity is a start character of a child entity and a first probability that each character in any first nested entity is an end character of the child entity; determining, according to a first network model, each child entity of any first nested entity based on a first probability that each character in any first nested entity is a beginning character of the child entity and a first probability that each character in any first nested entity is an ending character of the child entity; first entity type probabilities for respective sub-entities of any of the first nested entities are determined according to a first network model.

In one possible implementation, the determining module 1202 is configured to determine, for any sub-entity of any first nested entity, a sub-entity characteristic of any sub-entity based on a character characteristic of a start character of any sub-entity and a character characteristic of an end character of any sub-entity; a first entity type probability is determined for any sub-entity based on the sub-entity characteristics of any sub-entity.

In a possible implementation manner, the determining module 1202 is configured to determine a character relation feature based on a character feature of a start character of any sub-entity and a character feature of an end character of any sub-entity, where the character relation feature is used to characterize a character relation between the start character of any sub-entity and the end character of any sub-entity; and determining the child entity characteristics of any child entity based on the character characteristics of the starting character of any child entity, the character characteristics of the ending character of any child entity and the character relation characteristics.

In one possible implementation, the character relationship features include character difference features for characterizing character differences between a beginning character of any of the sub-entities and an ending character of any of the sub-entities;

the determining module 1202 is configured to determine a difference between a character feature of a start character of any sub-entity and a character feature of an end character of any sub-entity, so as to obtain a character difference feature.

In one possible implementation, the character relation features include character similarity features, and the character similarity features are used for representing the character similarity between the starting character of any sub-entity and the ending character of any sub-entity;

the determining module 1202 is configured to determine a dot product between a character feature of a start character of any one of the sub-entities and a character feature of an end character of any one of the sub-entities, to obtain a character similarity feature.

In a possible implementation manner, the training module 1203 is configured to determine, for any first nested entity, a loss value of any first nested entity based on the label information and the first prediction information of any first nested entity; and training the first network model based on the loss values of the plurality of first nested entities to obtain a second network model.

In a possible implementation manner, the training module 1203 is configured to determine a first loss value of any first nested entity based on the first label of each character in any first nested entity and the first probability that each character in any first nested entity is a starting character of a sub-entity; determining a second loss value of any first nested entity based on the second label of each character in any first nested entity and the first probability that each character in any first nested entity is the ending character of the sub-entity; determining a third loss value for any of the first nested entities based on the third label of each sub-entity of any of the first nested entities and the first entity type probability of each sub-entity of any of the first nested entities; determining a penalty value for any first nested entity based on the first penalty value, the second penalty value, and the third penalty value for any first nested entity.

In a possible implementation manner, the determining module 1202 is further configured to determine, for any first nested entity, in response to that the first condition is not satisfied, second prediction information of any first nested entity according to the second network model, where the second prediction information of any first nested entity includes a second probability that each character in any first nested entity is a start character of a child entity, a second probability that each character in any first nested entity is an end character of the child entity, and a second entity type probability of each child entity of any first nested entity;

the training module 1203 is further configured to train the second network model based on the label information and the second prediction information of the plurality of first nested entities to obtain a third network model;

the determining module 1202 is further configured to treat the third network model as a nested entity recognition model in response to the first condition being satisfied.

In a possible implementation manner, the obtaining module 1201 is further configured to obtain tag information of a plurality of second nested entities, where the tag information of the second nested entities includes a first tag of each character in the second nested entity, a second tag of each character in the second nested entity, and a third tag of each sub-entity of the second nested entity;

the determining module 1202 is further configured to determine, for any second nested entity, first prediction information of any second nested entity according to the first network model, where the first prediction information of any second nested entity includes a first probability that each character in any second nested entity is a start character of a child entity, a first probability that each character in any second nested entity is an end character of the child entity, and a first entity type probability of each child entity of any second nested entity;

the training module 1203 is further configured to train the first network model based on the label information and the first prediction information of the plurality of first nested entities, and the label information and the first prediction information of the plurality of second nested entities, so as to obtain a second network model.

In a possible implementation manner, the determining module 1202 is further configured to determine, for any first nested entity, third prediction information of any first nested entity according to the fourth network model, where the third prediction information of any first nested entity includes a third probability that each character in any first nested entity is a start character of a child entity, a third probability that each character in any first nested entity is an end character of the child entity, and a third entity type probability of each child entity of any first nested entity;

the training module 1203 is further configured to train the fourth network model based on the label information and the third prediction information of the plurality of first nested entities to obtain a fifth network model;

a determining module 1202, further configured to, in response to a second condition being met, treat the fifth network model as a teacher model;

an obtaining module 1201, configured to obtain tag information of a plurality of second nested entities based on the teacher model.

The device obtains the nested entity recognition model based on the probability that each character in the first nested entity is the starting character of the sub-entity, the first label of each character in the first nested entity, the probability that each character in the first nested entity is the ending character of the sub-entity and the second label of each character in the first nested entity, so that the nested entity recognition model can accurately recognize the character which can be used as the starting character of the sub-entity and the character which can be used as the ending character of the sub-entity from each character of the nested entity so as to combine each sub-entity of the nested entity, and each sub-entity of the nested entity is accurately recognized. The nested entity recognition model is obtained based on the entity type probability and the third label of each sub-entity of the first nested entity, so that the nested entity recognition model can accurately recognize the entity type of each sub-entity of the nested entity, that is, the nested entity recognition model can accurately recognize the nested entity.

It should be understood that, when the apparatus provided in fig. 12 implements its functions, it is only illustrated by the division of the functional modules, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Fig. 13 is a schematic structural diagram of a nested entity identifying device according to an embodiment of the present application, and as shown in fig. 13, the device includes:

an obtaining module 1301, configured to obtain a target nested entity;

a determining module 1302, configured to determine, according to a nested entity recognition model, a probability that each character in a target nested entity is a start character of a sub-entity, a probability that each character in the target nested entity is an end character of the sub-entity, and an entity type probability of each sub-entity of the target nested entity, where the nested entity recognition model is obtained by training according to any one of the above training methods of the nested entity recognition model;

a determining module 1302, configured to determine an entity type of each sub-entity of the target nested entity based on the entity type probability of each sub-entity of the target nested entity.

In a possible implementation manner, the determining module 1302 is configured to determine, according to the nested entity recognition model, a probability that each character in the target nested entity is a start character of the sub-entity and a probability that each character in the target nested entity is an end character of the sub-entity;

In one possible implementation, the target nested entity is a nested entity in the media information;

the device still includes:

the recommendation module is used for responding to the existence of a target entity type in the entity types of all the sub-entities of the target nested entity and recommending the media information to the target object;

The nested entity recognition model in the device is obtained based on the probability that each character in the first nested entity is the starting character of the sub-entity, the first label of each character in the first nested entity, the probability that each character in the first nested entity is the ending character of the sub-entity and the second label of each character in the first nested entity, so that the nested entity recognition model can accurately recognize the character which can be used as the starting character of the sub-entity and the character which can be used as the ending character of the sub-entity from each character of the nested entity so as to combine each sub-entity of the nested entity, and each sub-entity of the nested entity can be accurately recognized. The nested entity recognition model is obtained based on the entity type probability and the third label of each sub-entity of the first nested entity, so that the nested entity recognition model can accurately recognize the entity type of each sub-entity of the nested entity, that is, the nested entity recognition model can accurately recognize the nested entity.

It should be understood that, when the apparatus provided in fig. 13 implements its functions, it is only illustrated by the division of the functional modules, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Fig. 14 shows a block diagram of a terminal device 1400 according to an exemplary embodiment of the present application. The terminal device 1400 may be a portable mobile terminal, such as: a smart phone, a tablet computer, an MP3(Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4) player, a notebook computer or a desktop computer. Terminal device 1400 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so on.

In general, terminal device 1400 includes: a processor 1401, and a memory 1402.

Processor 1401 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 1401 may be implemented in at least one hardware form of DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array). Processor 1401 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also referred to as a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1401 may be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and drawing content that the display screen needs to display. In some embodiments, processor 1401 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 1402 may include one or more computer-readable storage media, which may be non-transitory. Memory 1402 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1402 is used to store at least one instruction for execution by processor 1401 to implement a method of training a nested entity recognition model or a method of nested entity recognition provided by method embodiments herein.

In some embodiments, terminal device 1400 may further optionally include: a peripheral device interface 1403 and at least one peripheral device. The processor 1401, the memory 1402, and the peripheral device interface 1403 may be connected by buses or signal lines. Each peripheral device may be connected to the peripheral device interface 1403 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1404, a display 1405, a camera assembly 1406, audio circuitry 1407, a positioning assembly 1408, and a power supply 1409.

The peripheral device interface 1403 can be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 1401 and the memory 1402. In some embodiments, the processor 1401, memory 1402, and peripheral interface 1403 are integrated on the same chip or circuit board; in some other embodiments, any one or both of the processor 1401, the memory 1402, and the peripheral device interface 1403 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 1404 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 1404 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 1404 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1404 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1404 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 1404 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1405 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1405 is a touch display screen, the display screen 1405 also has the ability to capture touch signals at or above the surface of the display screen 1405. The touch signal may be input to the processor 1401 for processing as a control signal. At this point, the display 1405 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 1405 may be one, disposed on the front panel of the terminal device 1400; in other embodiments, the display 1405 may be at least two, and is disposed on different surfaces of the terminal 1400 or in a foldable design; in other embodiments, the display 1405 may be a flexible display disposed on a curved surface or a folded surface of the terminal device 1400. Even further, the display 1405 may be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The Display 1405 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.

The camera assembly 1406 is used to capture images or video. Optionally, camera assembly 1406 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1406 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuit 1407 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1401 for processing or inputting the electric signals to the radio frequency circuit 1404 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different positions of the terminal device 1400. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is then used to convert electrical signals from the processor 1401 or the radio frequency circuit 1404 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuit 1407 may also include a headphone jack.

The positioning component 1408 serves to locate the current geographic Location of the terminal device 1400 for navigation or LBS (Location Based Service). The Positioning component 1408 may be based on the Positioning component of the GPS (Global Positioning System) in the united states, the beidou System in china, or the galileo System in russia.

Power supply 1409 is used to provide power to various components within terminal device 1400. The power source 1409 may be alternating current, direct current, disposable or rechargeable. When the power source 1409 comprises a rechargeable battery, the rechargeable battery can be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal device 1400 also includes one or more sensors 1410. The one or more sensors 1410 include, but are not limited to: acceleration sensor 1411, gyroscope sensor 1412, pressure sensor 1413, fingerprint sensor 1414, optical sensor 1415, and proximity sensor 1416.

The acceleration sensor 1411 can detect the magnitude of acceleration on three coordinate axes of the coordinate system established with the terminal device 1400. For example, the acceleration sensor 1411 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 1401 can control the display 1405 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1411. The acceleration sensor 1411 may also be used for the acquisition of motion data of a game or a user.

The gyro sensor 1412 may detect a body direction and a rotation angle of the terminal device 1400, and the gyro sensor 1412 and the acceleration sensor 1411 may cooperate to collect a 3D motion of the user on the terminal device 1400. The processor 1401 can realize the following functions according to the data collected by the gyro sensor 1412: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

The pressure sensor 1413 may be disposed on a side frame of the terminal device 1400 and/or underneath the display 1405. When the pressure sensor 1413 is disposed on the side frame of the terminal device 1400, the user can detect the holding signal of the terminal device 1400, and the processor 1401 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1413. When the pressure sensor 1413 is disposed at the lower layer of the display screen 1405, the processor 1401 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 1405. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1414 is used for collecting a fingerprint of a user, and the processor 1401 identifies the user according to the fingerprint collected by the fingerprint sensor 1414, or the fingerprint sensor 1414 identifies the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, processor 1401 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for, and changing settings, etc. The fingerprint sensor 1414 may be disposed on the front, back, or side of the terminal device 1400. When a physical key or vendor Logo is provided on the terminal device 1400, the fingerprint sensor 1414 may be integrated with the physical key or vendor Logo.

The optical sensor 1415 is used to collect ambient light intensity. In one embodiment, processor 1401 may control the display brightness of display 1405 based on the ambient light intensity collected by optical sensor 1415. Specifically, when the ambient light intensity is high, the display luminance of the display screen 1405 is increased; when the ambient light intensity is low, the display brightness of the display screen 1405 is reduced. In another embodiment, the processor 1401 can also dynamically adjust the shooting parameters of the camera assembly 1406 according to the intensity of the ambient light collected by the optical sensor 1415.

A proximity sensor 1416, also called a distance sensor, is usually arranged on the front panel of the terminal device 1400. The proximity sensor 1416 is used to collect the distance between the user and the front face of the terminal device 1400. In one embodiment, when proximity sensor 1416 detects that the distance between the user and the front face of terminal device 1400 is gradually decreased, processor 1401 controls display 1405 to switch from a bright screen state to a dark screen state; when the proximity sensor 1416 detects that the distance between the user and the front of the terminal device 1400 is gradually increased, the processor 1401 controls the display 1405 to switch from the breath-screen state to the bright-screen state.

Those skilled in the art will appreciate that the architecture shown in fig. 14 is not limiting of terminal device 1400 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be employed.

Fig. 15 is a schematic structural diagram of a server 1500 according to an embodiment of the present application, where the server 1500 may generate relatively large differences due to different configurations or performances, and may include one or more processors 1501 and one or more memories 1502, where the one or more memories 1502 store at least one program code, and the at least one program code is loaded and executed by the one or more processors 1501 to implement the method for training the nested entity recognition model or the method for recognizing the nested entity provided in the foregoing method embodiments, and the processor 1501 is, for example, a CPU. Of course, the server 1500 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the server 1500 may also include other components for implementing the functions of the device, which is not described herein again.

In an exemplary embodiment, a computer readable storage medium is further provided, in which at least one program code is stored, and the at least one program code is loaded and executed by a processor, so as to enable an electronic device to implement any one of the above-mentioned training method for a nested entity recognition model or the nested entity recognition method.

Alternatively, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program or a computer program product is further provided, in which at least one computer instruction is stored, and the at least one computer instruction is loaded and executed by a processor, so as to enable a computer to implement any one of the above-mentioned training method for a nested entity recognition model or the nested entity recognition method.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of training a nested entity recognition model, the method comprising:

2. The method of claim 1, wherein determining the first prediction information for any of the first nested entities based on the first network model comprises:

determining a first probability that each character in any one first nesting entity is a beginning character of a sub-entity and a first probability that each character in any one first nesting entity is an ending character of the sub-entity according to a first network model;

determining, according to the first network model, respective sub-entities of the any first nested entity based on a first probability that respective characters in the any first nested entity are a beginning character of a sub-entity and a first probability that respective characters in the any first nested entity are an ending character of a sub-entity;

determining a first entity type probability for each sub-entity of said any first nested entity based on said first network model.

3. The method of claim 2, wherein determining a first entity type probability for each sub-entity of any of the first nested entities comprises:

for any sub-entity of said any first nested entity, determining a sub-entity characteristic of said any sub-entity based on a character characteristic of a beginning character of said any sub-entity and a character characteristic of an ending character of said any sub-entity;

determining a first entity type probability for said any sub-entity based on sub-entity characteristics of said any sub-entity.

4. The method of claim 3, wherein said determining the sub-entity characteristics of said any sub-entity based on the character characteristics of the beginning character of said any sub-entity and the character characteristics of the ending character of said any sub-entity comprises:

determining a character relation characteristic based on the character characteristic of the starting character of any sub-entity and the character characteristic of the ending character of any sub-entity, wherein the character relation characteristic is used for representing the character relation between the starting character of any sub-entity and the ending character of any sub-entity;

and determining the child entity characteristics of any child entity based on the character characteristics of the starting character of any child entity, the character characteristics of the ending character of any child entity and the character relation characteristics.

5. The method according to claim 4, wherein the character relation feature comprises a character difference feature for characterizing a character difference between a start character of the any sub-entity and an end character of the any sub-entity;

the determining character relation characteristics based on the character characteristics of the start character of any sub-entity and the character characteristics of the end character of any sub-entity comprises:

and determining the difference value between the character characteristic of the starting character of any sub-entity and the character characteristic of the ending character of any sub-entity to obtain the character difference characteristic.

6. The method according to claim 4, wherein the character relation feature comprises a character similarity feature for characterizing character similarity between a start character of the any sub-entity and an end character of the any sub-entity;

and determining dot products between the character features of the starting character of any sub-entity and the character features of the ending character of any sub-entity to obtain character similarity features.

7. The method of claim 1, wherein training the first network model based on the label information and the first prediction information of the first plurality of nested entities to obtain a second network model comprises:

for any first nested entity, determining a loss value of the any first nested entity based on the tag information and the first prediction information of the any first nested entity;

and training the first network model based on the loss values of the plurality of first nested entities to obtain a second network model.

8. The method of claim 7, wherein determining the loss value of any first nested entity based on the tag information and the first prediction information of the any first nested entity comprises:

determining a first loss value for any first nested entity based on a first tag of each character in the any first nested entity and a first probability that each character in the any first nested entity is a starting character of a child entity;

determining a second loss value of any first nested entity based on the second label of each character in any first nested entity and the first probability that each character in any first nested entity is a terminal character of a sub-entity;

determining a third loss value for any of the first nested entities based on the third label of each sub-entity of the any first nested entity and the first entity type probability of each sub-entity of the any first nested entity;

determining a penalty value for the any first nested entity based on the first penalty value, the second penalty value, and the third penalty value for the any first nested entity.

9. The method of claim 1, further comprising:

in response to the first condition not being satisfied, for any first nested entity, determining second prediction information for the any first nested entity from the second network model, the second prediction information for the any first nested entity including a second probability that each character in the any first nested entity is a beginning character of a child entity, a second probability that each character in the any first nested entity is an ending character of a child entity, and a second entity type probability for each child entity of the any first nested entity;

training the second network model based on the label information and the second prediction information of the plurality of first nested entities to obtain a third network model;

in response to the first condition being satisfied, treating the third network model as the nested entity recognition model.

10. The method according to any one of claims 1 to 9, wherein before the training the first network model based on the label information and the first prediction information of the plurality of first nested entities to obtain the second network model, the method further comprises:

acquiring label information of a plurality of second nested entities, wherein the label information of the second nested entities comprises first labels of all characters in the second nested entities, second labels of all characters in the second nested entities and third labels of all sub-entities of the second nested entities;

for any second nested entity, determining first prediction information of the any second nested entity according to the first network model, wherein the first prediction information of the any second nested entity comprises a first probability that each character in the any second nested entity is a beginning character of a sub-entity, a first probability that each character in the any second nested entity is an ending character of the sub-entity, and a first entity type probability of each sub-entity of the any second nested entity;

the training the first network model based on the label information and the first prediction information of the plurality of first nested entities to obtain a second network model, including:

and training the first network model based on the label information and the first prediction information of the plurality of first nested entities and the label information and the first prediction information of the plurality of second nested entities to obtain a second network model.

11. The method of claim 10, wherein before obtaining the tag information of the second plurality of nested entities, further comprising:

for any first nested entity, determining third prediction information of the any first nested entity according to a fourth network model, wherein the third prediction information of the any first nested entity comprises a third probability that each character in the any first nested entity is a beginning character of a sub-entity, a third probability that each character in the any first nested entity is an ending character of the sub-entity, and a third entity type probability of each sub-entity of the any first nested entity;

training the fourth network model based on the label information and the third prediction information of the plurality of first nested entities to obtain a fifth network model;

in response to a second condition being met, treating the fifth network model as a teacher model;

the acquiring tag information of a plurality of second nested entities includes:

tag information for a plurality of second nested entities is obtained based on the teacher model.

12. The method of claim 11, wherein the teacher model comprises a transformer-based bi-directional encoder representation network model and the nested entity recognition model comprises a long-short term memory network model.

13. A method of nested entity identification, the method comprising:

acquiring a target nested entity;

determining the probability that each character in the target nested entity is the beginning character of a sub-entity, the probability that each character in the target nested entity is the ending character of the sub-entity and the entity type probability of each sub-entity of the target nested entity according to a nested entity recognition model, wherein the nested entity recognition model is obtained by training according to the training method of the nested entity recognition model of any one of claims 1 to 11;

14. The method of claim 13, wherein determining the probability that each character in the target nested entity is a beginning character of a child entity, the probability that each character in the target nested entity is an ending character of a child entity, and the entity type probability of each child entity of the target nested entity according to a nested entity recognition model comprises:

determining the probability that each character in the target nested entity is the starting character of the sub-entity and the probability that each character in the target nested entity is the ending character of the sub-entity according to the nested entity recognition model;

15. The method of claim 13, wherein the target nested entity is a nested entity in media information;

after determining the entity type of each sub-entity of the target nested entity based on the entity type probability of each sub-entity of the target nested entity, the method further includes:

recommending the media information to a target object in response to the existence of a target entity type in the entity types of the sub-entities of the target nested entity;

and filtering the media information in response to the fact that the target entity type does not exist in the entity types of the sub-entities corresponding to the target nested entity.

16. An apparatus for training a nested entity recognition model, the apparatus comprising:

17. An apparatus for nested entity identification, the apparatus comprising:

the acquisition module is used for acquiring a target nested entity;

a determining module, configured to determine, according to a nested entity recognition model, a probability that each character in the target nested entity is a start character of a sub-entity, a probability that each character in the target nested entity is an end character of a sub-entity, and an entity type probability of each sub-entity of the target nested entity, where the nested entity recognition model is obtained by training according to the training method of the nested entity recognition model according to any one of claims 1 to 11;

18. An electronic device, comprising a processor and a memory, wherein at least one program code is stored in the memory, and the at least one program code is loaded and executed by the processor to cause the electronic device to implement the method for training a nested entity recognition model according to any one of claims 1 to 12 or to implement the method for nested entity recognition according to any one of claims 13 to 15.

19. A computer-readable storage medium having stored therein at least one program code, the at least one program code being loaded and executed by a processor, to cause a computer to implement a method of training a nested entity recognition model according to any one of claims 1 to 12 or to implement a method of nested entity recognition according to any one of claims 13 to 15.

20. A computer program product having stored therein at least one computer instruction which is loaded and executed by a processor to cause a computer to implement a method of training a nested entity recognition model according to any one of claims 1 to 12 or to implement a method of nested entity recognition according to any one of claims 13 to 15.