CN114580413A

CN114580413A - Model training and named entity recognition method and device, electronic equipment and storage medium

Info

Publication number: CN114580413A
Application number: CN202210137920.2A
Authority: CN
Inventors: 王新宇; 蒋勇; 王涛; 黄忠强; 谢朋峻; 屠可伟
Original assignee: Alibaba China Co Ltd; ShanghaiTech University
Current assignee: Alibaba China Co Ltd; ShanghaiTech University
Priority date: 2022-02-15
Filing date: 2022-02-15
Publication date: 2022-06-03

Abstract

The embodiment of the invention provides a model training and named entity recognition method, a model training and named entity recognition device, electronic equipment and a storage medium. The model training method comprises the following steps: acquiring a target text and a picture description text of an associated picture, wherein the associated picture is matched with the target text; fusing the target text and the picture description text to obtain a fused text; and training a named entity recognition model based on the named entity labels of the fusion text and the target text. The embodiment of the invention improves the training effect and the recognition effect of the named entity recognition model.

Description

Model training and named entity recognition method and device, electronic equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a model training and named entity recognition method, a model training and named entity recognition device, electronic equipment and a storage medium.

Background

Named Entity Recognition (NER) refers to recognizing entities with specific meanings in text, and mainly includes names of people, places, organizations, proper nouns and the like. The NER is an important basic tool for natural language processing tasks such as information extraction, question-answering systems, syntactic analysis, machine translation, and the like.

The accuracy of named entity recognition determines the effect of downstream natural language processing tasks, and the current named entity recognition does not fully consider the context semantic factors in the text to be recognized, so that the recognition accuracy is limited.

Disclosure of Invention

Embodiments of the present invention provide a method, an apparatus, an electronic device, and a storage medium for model training and named entity recognition, so as to at least partially solve the above problems.

According to a first aspect of embodiments of the present invention, there is provided a model training method, including: acquiring a target text and a picture description text of an associated picture, wherein the associated picture is matched with the target text; fusing the target text and the picture description text to obtain a fused text; and training a named entity recognition model based on the named entity labels of the fusion text and the target text.

In another implementation manner of the present invention, the obtaining of the picture description text of the associated picture includes: and inputting the associated pictures into a pre-trained picture description model to obtain a picture description text.

In another implementation manner of the present invention, the fusing the target text and the picture description text to obtain a fused text includes: and splicing the dimensional representation of the target text and the dimensional representation of the picture description text to obtain a fusion text.

In another implementation of the invention, the named entity recognition model includes a context fusion layer and a conditional random field processing layer, an input of the context fusion layer being connected to an input of the conditional random field processing layer. The training of the named entity recognition model based on the named entity labels of the fusion text and the target text comprises: training a named entity recognition model based on the fused text as an input to the context fusion layer and based on the named entity tag of the target text as an output of the conditional random field processing layer.

In another implementation of the invention, the named entity recognition model comprises a dimension alignment layer, via which the input of the context fusion layer is connected to the input of the conditional random field processing layer, the dimension alignment layer being configured to extract context-fused features from the dimensions of the fused text as the dimensions of the target text.

In another implementation of the present invention, the context fusion layer is a transform encoder.

According to a second aspect of the embodiments of the present invention, there is provided a named entity identifying method, including: acquiring a text to be recognized and an associated picture matched with the text to be recognized; extracting a picture description text of the associated picture; fusing the text to be recognized and the picture description text to obtain a fused text; and inputting the fused text into a named entity recognition model to obtain named entity information of the text to be recognized, wherein the named entity recognition model is obtained by training according to the method of the first aspect.

According to a third aspect of the embodiments of the present invention, there is provided a named entity identification method, including: acquiring a commodity introduction text and a commodity picture of the commodity introduction text; extracting a picture description text of the commodity picture; fusing the commodity introduction text and the picture description text to obtain a fused text; and inputting the fused text into a named entity recognition model to obtain named entity information of the commodity introduction text, wherein the named entity recognition model is obtained by training according to the method of the first aspect.

According to a fourth aspect of the embodiments of the present invention, there is provided a model training apparatus including: the acquisition module acquires a target text and a picture description text of an associated picture, wherein the associated picture is matched with the target text; the fusion module fuses the target text and the picture description text to obtain a fused text; and the training module is used for training a named entity recognition model based on the named entity labels of the fusion text and the target text.

According to a fifth aspect of the embodiments of the present invention, there is provided a named entity recognition apparatus, including: the acquisition module acquires a text to be recognized and an associated picture matched with the text to be recognized; the extraction module is used for extracting the picture description text of the associated picture; the fusion module fuses the text to be recognized and the picture description text to obtain a fused text; and the recognition module is used for inputting the fused text into a named entity recognition model to obtain the named entity information of the text to be recognized, and the named entity recognition model is obtained by training according to the method of the first aspect.

According to a sixth aspect of an embodiment of the present invention, there is provided an electronic apparatus including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the corresponding operation of the method according to the first aspect.

According to a seventh aspect of embodiments of the present invention, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the method according to the first aspect.

In the scheme of the embodiment of the invention, the target text and the information in the associated pictures thereof are fused in the fusion text, and training is carried out based on the fusion text, compared with the target text, context semantic factors are added in the training, and the recognition effect of the entity recognition model is named.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present invention, and it is also possible for a person skilled in the art to obtain other drawings based on the drawings.

FIG. 1 is a schematic block diagram of an example named entity identification method.

FIG. 2A is a flow chart of steps of a model training method according to one embodiment of the present invention.

Fig. 2B is a flowchart illustrating steps of the named entity recognition method shown in fig. 2A.

Fig. 3A is a schematic diagram of an image description generation method according to another embodiment of the present invention.

FIG. 3B is a schematic block diagram of an example text processing flow of the embodiment of FIGS. 2A and 2B.

Fig. 4 is a block diagram of an apparatus according to another embodiment of the present invention.

Fig. 5 is a block diagram of an apparatus according to another embodiment of the present invention.

Fig. 6 is a schematic structural diagram of an electronic device according to another embodiment of the invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be described in detail below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention shall fall within the scope of the protection of the embodiments of the present invention.

The following further describes specific implementation of the embodiments of the present invention with reference to the drawings.

FIG. 1 is a schematic block diagram of an example named entity identification method. The named entity recognition process of fig. 1 employs a pre-trained named entity recognition model 120, and specifically, the target text 110 is input into the named entity recognition model 120 to obtain named entity information 130.

The NER is a sequence labeling problem, and the data labeling mode also follows the mode of the sequence labeling problem, mainly including BIO and BIOES. As an example, the interpretation of each of the notations in BIOES is as follows:

b, Begin, denotes Start; i, intermedate, denotes Intermediate; e, End, denotes End; s, Single, represents a Single character; o, Other, indicates otherwise, for marking extraneous characters.

Specifically, in the case that the target text is a sentence text "[ Bob and Alice posing for a picture ]", the named entity information obtained through the above named entity tagging process is "[ B-PER, E-PER, O, B-PER, E-PER O, O ]", and a subsequent natural language processing process related to the target text may be performed based on the named entity information.

FIG. 2A is a flow chart of steps of a model training method according to one embodiment of the present invention. The solution of the present embodiment may be applied to any suitable electronic device with data processing capability, including but not limited to: server, mobile terminal (such as mobile phone, PAD, etc.), PC, etc. For example, in a model training (training) phase, a codec model may be trained based on training samples with a computing device (e.g., a data center) configured with a CPU (example of a processing unit) + GPU (example of an acceleration unit) architecture. Computing devices such as data centers may be deployed in cloud servers such as a private cloud, or a hybrid cloud. Accordingly, in the inference (inference) phase, the inference operation may also be performed by using a computing device configured with a CPU (example of processing unit) + GPU (example of acceleration unit) architecture.

The model training method of the embodiment comprises the following steps:

s210: and acquiring a target text and a picture description text of the associated picture, wherein the associated picture is matched with the target text.

It should be understood that the text in the embodiments of the present invention includes only the text in the form of characters (including words, chinese characters, etc.), sentences, paragraphs, chapters, etc. The text in the training sample can not be processed by word embedding (character is used as a unit), and word embedding (word embedding) is carried out before the text participates in training; the text in the training sample may be text via a word embedding process, and the text after the word embedding process may directly participate in model training.

It should also be understood that the target text and associated pictures may match the relationship. In one example, the description object of the target text matches or coincides with the description object of the associated picture, and the description object may be an abstract event or an object. For example, the target text may indicate introduction information of the target objects or the commodities included in the associated picture, a positional relationship between the target objects, an event relationship, and the like.

It should also be understood that the picture description text may be obtained by labeling based on the associated picture, may also be obtained by performing target detection on the associated picture to obtain a text description of the target object, and may also be obtained by recognizing the associated picture by using a pre-trained picture description model to obtain the picture description text. The picture description text may be at least one sentence text, at least one paragraph text, at least one character text, or a text obtained by combining a plurality of character texts or a plurality of sentence texts in a predetermined manner. The predetermined manner may indicate a random ordering manner or an ordering manner in the context semantics. The picture description model will be described in detail below with reference to fig. 3A.

S220: and fusing the target text and the picture description text to obtain a fused text.

It should be understood that the fusion process may adopt a manner of adding the dimensional representations of the respective texts, that is, aligning the target text with the picture description text, and then adding the respective elements in the dimensional representation of the target text with the respective elements in the dimensional representation of the picture description text to obtain the dimensional representation of the fused text. In addition, a splicing processing mode may also be adopted, for example, each element in the dimensional representation of the target text is spliced with each element in the dimensional representation of the picture description text, and the obtained dimension number of the fused text is the sum of the dimension number of the target text and the dimension number of the picture description text.

It should also be understood that when each text is represented by at least one word vector (word embedding), the dimension of the text means that the dimension corresponding to the word vector is not the dimension of the word vector itself, in other words, the number of word vectors in the text corresponds to the number of dimensions of the text.

S230: and training a named entity recognition model based on the named entity labels of the fusion text and the target text.

It should be understood that the training process of the present embodiment may be supervised training, and the model for training may be any neural network model, such as a feedforward neural network for classification, a transform-based neural network, an RNN, CNN, or LSTM-based neural network, and the like.

Fig. 2B is a flowchart illustrating steps of the named entity recognition method shown in fig. 2A. The named entity identification method of the embodiment comprises the following steps:

s260: and acquiring the text to be recognized and the associated picture matched with the text to be recognized.

S270: and extracting picture description texts of the associated pictures.

S280: and fusing the text to be recognized and the picture description text to obtain a fused text.

S290: and inputting the fused text into a named entity recognition model to obtain named entity information of the text to be recognized, wherein the named entity recognition model is obtained by training through a model training method.

In other words, the text to be recognized in the named entity recognition stage corresponds to the target text in the model training stage, and the associated picture is related to the text to be recognized.

In one scenario, the text to be recognized may be a commodity introduction text, the associated picture may be a commodity picture, the user may provide the commodity picture of the target commodity and the commodity introduction text thereof to the server, the server may be deployed with a named entity recognition model, and the named entity recognition processing is performed on the commodity introduction text based on the commodity introduction text and the commodity picture. Then, knowledge information such as multimedia information can be constructed based on the commodity introduction text labeled with the named entity, and accordingly, when a search request of the commodity is received, an accurate user search intention can be obtained based on the knowledge information, so that recommendation is made for the user or an accurate search result is provided for the user.

In other examples, obtaining a picture description text of an associated picture includes: and inputting the associated pictures into a pre-trained picture description model to obtain a picture description text. Thereby, accurate information of the associated picture is obtained, and information fusion processing of the target text and the associated picture is enabled in the text space.

In other examples, wherein fusing the target text and the picture description text to obtain a fused text includes: and splicing the dimensional representation of the target text and the dimensional representation of the picture description text to obtain a fusion text.

Therefore, the splicing operation improves the data processing efficiency of the fusion target text and the picture description text, and improves the flexibility of data processing compared with a mode of performing addition processing based on dimension representations of the two after alignment processing.

In other examples, the named entity recognition model includes a context fusion layer and a conditional random field processing layer, an input of the context fusion layer being connected to an input of the conditional random field processing layer. Further, the named entity recognition model may be trained based on the fused text as an input to the context fusion layer and based on the named entity tag of the target text as an output from the conditional random field processing layer.

In other examples, the named entity recognition model includes a dimension alignment layer via which inputs of the context fusion layer are connected to inputs of a Conditional Random Field (CRF) processing layer, the dimension alignment layer for extracting context-fused features from dimensions of the fused text as dimensions of the target text. Therefore, the dimension of the text input into the conditional random field processing layer is matched with the dimension of the output named entity label, so that the processing of performing context fusion on the target text and the image description text can be relative to the processing independent of the conditional random field processing layer, in other words, the dimension alignment layer realizes the decoupling between the context fusion layer and the conditional random field processing layer.

More specifically, the context fusion layer is a transform encoder, which has a strong ability to process characters in a text sequence, and the attention mechanism in the transform encoder can effectively fuse the dimensions in the features for context fusion. Thus, the training process can be significantly facilitated compared to the process of aligning the target text with the image description text and then adding them. In addition, the image description text is in the text space instead of the image space, so that the context fusion effect can be improved, and the generalization capability of the trained named entity recognition model is facilitated.

The image description generation method shown in fig. 3A may be adopted to process the associated picture to obtain a picture description text. Referring to fig. 3A, the target text 301 is "[ Bob and Alice position for a picture ]", and the picture matching the target text is an associated picture 302. The associated picture 302 is input to the pre-trained picture description model 3000, and a picture description text 303 "[ Alice hair Bob tie ]" or "[ Bob bearinga tie next to Alice with hair ]" corresponding thereto is obtained.

It is to be understood that the picture description model 3000 may be, for example, a VinVL model. The number of dimensions of the target text 301 and the number of dimensions of the picture description text 303 may be the same or different. In the present example, the number of dimensions of the target text 301 is 9 dimensions, that is, the target text 301 includes 9 words (an example of characters), and as an example, the target text 301 may be a text vector obtained after word embedding (word embedding) processing is performed on the 9 words. In addition, the number of dimensions of the picture description text 303 is 4 dimensions, i.e., the picture description text 303 includes 4 words. In fact, the words included in the picture description text 303 have a context semantic relationship, the words do not necessarily form a reasonable sentence text, the ordering of the words may be various, and preferably, an ordering that better reflects the context semantic relationship may be selected, in this example, "[ Alice hair Bob tie ]" is an ordering manner, and other ordering manners may be adopted, for example, "[ tie Bob Alice hair ]", because the degree of association between "tie" and "Bob" is greater than that between "tie" and "Alice", therefore, "[ tie Bob Alice ]" more reflects an accurate mountain semantic relationship than "[ tie Alice Bob hair ]". Further, "Alice" and "Bob" are more important features than "tie" and "hair", and thus, "[ Alice hair Bob tie ]" reflects more accurate downhill semantic relationships than "[ tie Bob Alice hair ]".

Further, fig. 3B illustrates an exemplary text processing framework, and the text processing flow of the present example is illustrated and described below in connection with the framework of fig. 3B for the training phase and the inference phase, respectively.

The text processing framework of the present example comprises a transform encoder 310, a dimension alignment layer 320, and a CRF layer 330, which are connected in sequence.

In the training phase, after the target text "[ Bob and Alice posing for a picture ]" and "[ Alice hair Bob tie ]" as picture description texts are spliced, a fused text "[ Bob and Alice posing for a picture < X > Alice hair Bob tie ]" is obtained.

Accordingly, in the model inference phase, the text to be recognized may be taken as the target text in the text processing framework of the present example, and the associated picture may be matched with the text to be recognized.

It should be understood that in the merged fused text, the position of the target text is distinguished from the position of the picture description text by a specific symbol, for example, each character in the picture description text may be marked with a symbol < X > before or after it; segmentation based on specific fit may also be performed between the character string of the target text and the character string of the picture description text.

It should also be understood that the individual characters in the picture description text may be continuous character strings or may be discrete character strings, for example, the picture description text includes a first portion and a second portion, and the target text is located between the first portion and the second portion in the fused text.

In this example, the number of dimensions of the target text is 9 dimensions, the number of dimensions of the picture description text is 4 dimensions, and accordingly, the number of dimensions of the fusion text is 13 dimensions. As a specific example, the fused text is "[ Bob and Alice posing for a picture < X > Alice hair Bob tie ]".

Then, the fused text is input into a named entity recognition model, and named entity information of the target text, "[ B-PER, E-PER, O, B-PER, E-PER O, O, O, O ]" is obtained through a transform encoder 310, a dimension alignment layer 320 and a CRF layer 330 in sequence. Specifically, the 13-dimensional fused text is input into the transform encoder 310, context semantic fusion between characters of each dimension is performed, and accordingly, 13-dimensional text features are output.

Then, the 13-dimensional text features are input to the dimension alignment layer 320 for dimension alignment processing to align to the number of dimensions of the target text, in other words, the input dimension number of the CRF layer 330. In this example, 9-dimensional text features are determined from the 13-dimensional text features as input to the CRF layer 330. At this time, the 9-dimensional text features are fused with the features of the picture description text, that is, the information of the associated picture.

Then, CRF layer 330 performs processing based on the input 9-dimensional text feature, and outputs 9-dimensional named entity information "[ B-PER, E-PER, O, B-PER, E-PER O, O ]".

Specifically, the word features are fed into the CRF layer to obtain the conditional probabilities:

where ψ is a potential function and θ represents a model parameter. Y represents the set of all possible tag sequences for a given sentence. y _0 is defined as a special start symbol.

The dimension alignment layer is used for extracting the context fusion features from the dimensions of the fusion text into the dimensions of the target text, namely processing the 13-dimensional text features to obtain 9-dimensional text features. For example, based on the position indicated by the specific symbol described above, the dimension of the target text may be truncated from the dimension of the fused text, in this example, the first 9 characters are truncated from the 13-dimensional text feature as the 9-dimensional text feature.

Fig. 4 is a block diagram of an apparatus according to another embodiment of the present invention. The solution of the present embodiment may be applied to any suitable electronic device with data processing capability, including but not limited to: server, mobile terminal (such as mobile phone, PAD, etc.), PC, etc. For example, in a model training (training) phase, a codec model may be trained based on training samples with a computing device (e.g., a data center) configured with a CPU (example of a processing unit) + GPU (example of an acceleration unit) architecture. Computing devices such as data centers may be deployed in cloud servers such as a private cloud, or a hybrid cloud. Accordingly, in the inference (inference) phase, the inference operation may also be performed by using a computing device configured with a CPU (example of processing unit) + GPU (example of acceleration unit) architecture.

The model training device of the embodiment comprises:

the obtaining module 410 obtains a target text and a picture description text of an associated picture, wherein the associated picture is matched with the target text;

the fusion module 420 is used for fusing the target text and the picture description text to obtain a fused text;

and the training module 430 trains a named entity recognition model based on the named entity labels of the fusion text and the target text.

In other examples, the obtaining module is specifically configured to: and inputting the associated pictures into a pre-trained picture description model to obtain a picture description text.

In other examples, the fusion module is specifically configured to: and splicing the dimensional representation of the target text and the dimensional representation of the picture description text to obtain a fusion text.

In other examples, the named entity recognition model includes a context fusion layer and a conditional random field processing layer, an input of the context fusion layer being connected to an input of the conditional random field processing layer. The training module is specifically configured to: training a named entity recognition model based on the fused text as an input to the context fusion layer and based on the named entity tag of the target text as an output of the conditional random field processing layer.

In other examples, the named entity recognition model includes a dimension alignment layer via which an input of the context fusion layer is connected to an input of the conditional random field processing layer, the dimension alignment layer to extract context-fused features from dimensions of the fused text as dimensions of the target text.

The apparatus of this embodiment is used to implement the corresponding method in the foregoing method embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again. In addition, the functional implementation of each module in the apparatus of this embodiment can refer to the description of the corresponding part in the foregoing method embodiment, and is not described herein again.

Fig. 5 is a block diagram of an apparatus according to another embodiment of the present invention. The named entity recognition apparatus of this embodiment includes:

the acquiring module 510 acquires a text to be recognized and an associated picture matched with the text to be recognized;

an extracting module 520, which extracts the picture description text of the associated picture;

the fusion module 530 is used for fusing the text to be recognized and the picture description text to obtain a fused text;

the recognition module 540, inputting the fused text into a named entity recognition model, obtaining the named entity information of the text to be recognized, wherein the named entity recognition model is obtained by training according to the method of any one of claims 1-6.

Referring to fig. 6, a schematic structural diagram of an electronic device according to another embodiment of the present invention is shown, and the specific embodiment of the present invention does not limit the specific implementation of the electronic device.

As shown in fig. 6, the electronic device may include: a processor (processor)602, a communication Interface 604, a memory 606, and a communication bus 608.

Wherein:

the processor 602, communication interface 604, and memory 606 communicate with one another via a communication bus 608.

A communication interface 604 for communicating with other electronic devices or servers.

The processor 602 is configured to execute the program 610, and may specifically perform relevant steps in the foregoing method embodiments.

In particular, program 610 may include program code comprising computer operating instructions.

The processor 602 may be a processor CPU or an application Specific Integrated circuit (asic) or one or more Integrated circuits configured to implement embodiments of the present invention. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And a memory 606 for storing a program 610. Memory 606 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 610 may specifically be configured to cause the processor 602 to perform the following operations: acquiring a target text and a picture description text of an associated picture, wherein the associated picture is matched with the target text; and fusing the target text and the picture description text to obtain a fused text, and training a named entity recognition model based on the named entity labels of the fused text and the target text.

Alternatively, the program 610 may specifically be configured to cause the processor 602 to perform the following operations: acquiring a text to be recognized and an associated picture matched with the text to be recognized; extracting a picture description text of the associated picture; fusing the text to be recognized and the picture description text to obtain a fused text; and inputting the fused text into a named entity recognition model to obtain named entity information of the text to be recognized, wherein the named entity recognition model is obtained by training through a model training method.

In addition, for specific implementation of each step in the program 610, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing method embodiments, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present invention may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present invention.

The above-described method according to an embodiment of the present invention may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the method described herein may be stored in such software processing on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that a computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by a computer, processor, or hardware, implements the methods described herein. Further, when a general-purpose computer accesses code for implementing the methods illustrated herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the methods illustrated herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.

The above embodiments are only for illustrating the embodiments of the present invention and not for limiting the embodiments of the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present invention, so that all equivalent technical solutions also belong to the scope of the embodiments of the present invention, and the scope of patent protection of the embodiments of the present invention should be defined by the claims.

Claims

1. A model training method, comprising:

acquiring a target text and a picture description text of an associated picture, wherein the associated picture is matched with the target text;

fusing the target text and the picture description text to obtain a fused text;

and training a named entity recognition model based on the named entity labels of the fusion text and the target text.

2. The method of claim 1, wherein the obtaining of the picture description text of the associated picture comprises:

and inputting the associated pictures into a pre-trained picture description model to obtain a picture description text.

3. The method of claim 1, wherein the fusing the target text and the picture description text to obtain a fused text comprises:

and splicing the dimensional representation of the target text and the dimensional representation of the picture description text to obtain a fusion text.

4. The method of claim 1 wherein the named entity recognition model comprises a context fusion layer and a conditional random field processing layer, an input of the context fusion layer being connected to an input of the conditional random field processing layer,

the training of the named entity recognition model based on the named entity labels of the fusion text and the target text comprises:

training a named entity recognition model based on the fused text as an input to the context fusion layer and based on the named entity tag of the target text as an output of the conditional random field processing layer.

5. The method of claim 4, wherein the named entity recognition model comprises a dimension alignment layer via which an input of the context fusion layer is connected to an input of the conditional random field processing layer, the dimension alignment layer for extracting context-fused features from dimensions of the fused text as dimensions of the target text.

6. The method of claim 4, wherein the context fusion layer is a transform encoder.

7. A named entity recognition method, comprising:

acquiring a text to be recognized and an associated picture matched with the text to be recognized;

extracting a picture description text of the associated picture;

fusing the text to be recognized and the picture description text to obtain a fused text;

inputting the fused text into a named entity recognition model to obtain named entity information of the text to be recognized, wherein the named entity recognition model is obtained by training according to the method of any one of claims 1-6.

8. A named entity recognition method, comprising:

acquiring a commodity introduction text and a commodity picture of the commodity introduction text;

extracting a picture description text of the commodity picture;

fusing the commodity introduction text and the picture description text to obtain a fused text;

inputting the fused text into a named entity recognition model to obtain named entity information of the commodity introduction text, wherein the named entity recognition model is obtained by training according to the method of any one of claims 1-6.

9. A model training apparatus comprising:

the acquisition module acquires a target text and a picture description text of an associated picture, wherein the associated picture is matched with the target text;

the fusion module fuses the target text and the picture description text to obtain a fused text;

and the training module is used for training a named entity recognition model based on the named entity labels of the fusion text and the target text.

10. A named entity recognition apparatus comprising:

the acquisition module acquires a text to be recognized and an associated picture matched with the text to be recognized;

the extraction module is used for extracting the picture description text of the associated picture;

the fusion module fuses the text to be recognized and the picture description text to obtain a fused text;

the recognition module is used for inputting the fused text into a named entity recognition model to obtain named entity information of the text to be recognized, and the named entity recognition model is obtained through training according to the method of any one of claims 1-6.

11. An electronic device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction which causes the processor to execute the corresponding operation of the method according to any one of claims 1-8.

12. A computer storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-8.