CN114580413A - Model training and named entity recognition method and device, electronic equipment and storage medium - Google Patents

Model training and named entity recognition method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114580413A
CN114580413A CN202210137920.2A CN202210137920A CN114580413A CN 114580413 A CN114580413 A CN 114580413A CN 202210137920 A CN202210137920 A CN 202210137920A CN 114580413 A CN114580413 A CN 114580413A
Authority
CN
China
Prior art keywords
text
named entity
picture
entity recognition
fused
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210137920.2A
Other languages
Chinese (zh)
Inventor
王新宇
蒋勇
王涛
黄忠强
谢朋峻
屠可伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
ShanghaiTech University
Original Assignee
Alibaba China Co Ltd
ShanghaiTech University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd, ShanghaiTech University filed Critical Alibaba China Co Ltd
Priority to CN202210137920.2A priority Critical patent/CN114580413A/en
Publication of CN114580413A publication Critical patent/CN114580413A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a model training and named entity recognition method, a model training and named entity recognition device, electronic equipment and a storage medium. The model training method comprises the following steps: acquiring a target text and a picture description text of an associated picture, wherein the associated picture is matched with the target text; fusing the target text and the picture description text to obtain a fused text; and training a named entity recognition model based on the named entity labels of the fusion text and the target text. The embodiment of the invention improves the training effect and the recognition effect of the named entity recognition model.

Description

Model training and named entity recognition method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a model training and named entity recognition method, a model training and named entity recognition device, electronic equipment and a storage medium.
Background
Named Entity Recognition (NER) refers to recognizing entities with specific meanings in text, and mainly includes names of people, places, organizations, proper nouns and the like. The NER is an important basic tool for natural language processing tasks such as information extraction, question-answering systems, syntactic analysis, machine translation, and the like.
The accuracy of named entity recognition determines the effect of downstream natural language processing tasks, and the current named entity recognition does not fully consider the context semantic factors in the text to be recognized, so that the recognition accuracy is limited.
Disclosure of Invention
Embodiments of the present invention provide a method, an apparatus, an electronic device, and a storage medium for model training and named entity recognition, so as to at least partially solve the above problems.
According to a first aspect of embodiments of the present invention, there is provided a model training method, including: acquiring a target text and a picture description text of an associated picture, wherein the associated picture is matched with the target text; fusing the target text and the picture description text to obtain a fused text; and training a named entity recognition model based on the named entity labels of the fusion text and the target text.
In another implementation manner of the present invention, the obtaining of the picture description text of the associated picture includes: and inputting the associated pictures into a pre-trained picture description model to obtain a picture description text.
In another implementation manner of the present invention, the fusing the target text and the picture description text to obtain a fused text includes: and splicing the dimensional representation of the target text and the dimensional representation of the picture description text to obtain a fusion text.
In another implementation of the invention, the named entity recognition model includes a context fusion layer and a conditional random field processing layer, an input of the context fusion layer being connected to an input of the conditional random field processing layer. The training of the named entity recognition model based on the named entity labels of the fusion text and the target text comprises: training a named entity recognition model based on the fused text as an input to the context fusion layer and based on the named entity tag of the target text as an output of the conditional random field processing layer.
In another implementation of the invention, the named entity recognition model comprises a dimension alignment layer, via which the input of the context fusion layer is connected to the input of the conditional random field processing layer, the dimension alignment layer being configured to extract context-fused features from the dimensions of the fused text as the dimensions of the target text.
In another implementation of the present invention, the context fusion layer is a transform encoder.
According to a second aspect of the embodiments of the present invention, there is provided a named entity identifying method, including: acquiring a text to be recognized and an associated picture matched with the text to be recognized; extracting a picture description text of the associated picture; fusing the text to be recognized and the picture description text to obtain a fused text; and inputting the fused text into a named entity recognition model to obtain named entity information of the text to be recognized, wherein the named entity recognition model is obtained by training according to the method of the first aspect.
According to a third aspect of the embodiments of the present invention, there is provided a named entity identification method, including: acquiring a commodity introduction text and a commodity picture of the commodity introduction text; extracting a picture description text of the commodity picture; fusing the commodity introduction text and the picture description text to obtain a fused text; and inputting the fused text into a named entity recognition model to obtain named entity information of the commodity introduction text, wherein the named entity recognition model is obtained by training according to the method of the first aspect.
According to a fourth aspect of the embodiments of the present invention, there is provided a model training apparatus including: the acquisition module acquires a target text and a picture description text of an associated picture, wherein the associated picture is matched with the target text; the fusion module fuses the target text and the picture description text to obtain a fused text; and the training module is used for training a named entity recognition model based on the named entity labels of the fusion text and the target text.
According to a fifth aspect of the embodiments of the present invention, there is provided a named entity recognition apparatus, including: the acquisition module acquires a text to be recognized and an associated picture matched with the text to be recognized; the extraction module is used for extracting the picture description text of the associated picture; the fusion module fuses the text to be recognized and the picture description text to obtain a fused text; and the recognition module is used for inputting the fused text into a named entity recognition model to obtain the named entity information of the text to be recognized, and the named entity recognition model is obtained by training according to the method of the first aspect.
According to a sixth aspect of an embodiment of the present invention, there is provided an electronic apparatus including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the corresponding operation of the method according to the first aspect.
According to a seventh aspect of embodiments of the present invention, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the method according to the first aspect.
In the scheme of the embodiment of the invention, the target text and the information in the associated pictures thereof are fused in the fusion text, and training is carried out based on the fusion text, compared with the target text, context semantic factors are added in the training, and the recognition effect of the entity recognition model is named.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present invention, and it is also possible for a person skilled in the art to obtain other drawings based on the drawings.
FIG. 1 is a schematic block diagram of an example named entity identification method.
FIG. 2A is a flow chart of steps of a model training method according to one embodiment of the present invention.
Fig. 2B is a flowchart illustrating steps of the named entity recognition method shown in fig. 2A.
Fig. 3A is a schematic diagram of an image description generation method according to another embodiment of the present invention.
FIG. 3B is a schematic block diagram of an example text processing flow of the embodiment of FIGS. 2A and 2B.
Fig. 4 is a block diagram of an apparatus according to another embodiment of the present invention.
Fig. 5 is a block diagram of an apparatus according to another embodiment of the present invention.
Fig. 6 is a schematic structural diagram of an electronic device according to another embodiment of the invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be described in detail below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention shall fall within the scope of the protection of the embodiments of the present invention.
The following further describes specific implementation of the embodiments of the present invention with reference to the drawings.
FIG. 1 is a schematic block diagram of an example named entity identification method. The named entity recognition process of fig. 1 employs a pre-trained named entity recognition model 120, and specifically, the target text 110 is input into the named entity recognition model 120 to obtain named entity information 130.
The NER is a sequence labeling problem, and the data labeling mode also follows the mode of the sequence labeling problem, mainly including BIO and BIOES. As an example, the interpretation of each of the notations in BIOES is as follows:
b, Begin, denotes Start; i, intermedate, denotes Intermediate; e, End, denotes End; s, Single, represents a Single character; o, Other, indicates otherwise, for marking extraneous characters.
Specifically, in the case that the target text is a sentence text "[ Bob and Alice posing for a picture ]", the named entity information obtained through the above named entity tagging process is "[ B-PER, E-PER, O, B-PER, E-PER O, O ]", and a subsequent natural language processing process related to the target text may be performed based on the named entity information.
FIG. 2A is a flow chart of steps of a model training method according to one embodiment of the present invention. The solution of the present embodiment may be applied to any suitable electronic device with data processing capability, including but not limited to: server, mobile terminal (such as mobile phone, PAD, etc.), PC, etc. For example, in a model training (training) phase, a codec model may be trained based on training samples with a computing device (e.g., a data center) configured with a CPU (example of a processing unit) + GPU (example of an acceleration unit) architecture. Computing devices such as data centers may be deployed in cloud servers such as a private cloud, or a hybrid cloud. Accordingly, in the inference (inference) phase, the inference operation may also be performed by using a computing device configured with a CPU (example of processing unit) + GPU (example of acceleration unit) architecture.
The model training method of the embodiment comprises the following steps:
s210: and acquiring a target text and a picture description text of the associated picture, wherein the associated picture is matched with the target text.
It should be understood that the text in the embodiments of the present invention includes only the text in the form of characters (including words, chinese characters, etc.), sentences, paragraphs, chapters, etc. The text in the training sample can not be processed by word embedding (character is used as a unit), and word embedding (word embedding) is carried out before the text participates in training; the text in the training sample may be text via a word embedding process, and the text after the word embedding process may directly participate in model training.
It should also be understood that the target text and associated pictures may match the relationship. In one example, the description object of the target text matches or coincides with the description object of the associated picture, and the description object may be an abstract event or an object. For example, the target text may indicate introduction information of the target objects or the commodities included in the associated picture, a positional relationship between the target objects, an event relationship, and the like.
It should also be understood that the picture description text may be obtained by labeling based on the associated picture, may also be obtained by performing target detection on the associated picture to obtain a text description of the target object, and may also be obtained by recognizing the associated picture by using a pre-trained picture description model to obtain the picture description text. The picture description text may be at least one sentence text, at least one paragraph text, at least one character text, or a text obtained by combining a plurality of character texts or a plurality of sentence texts in a predetermined manner. The predetermined manner may indicate a random ordering manner or an ordering manner in the context semantics. The picture description model will be described in detail below with reference to fig. 3A.
S220: and fusing the target text and the picture description text to obtain a fused text.
It should be understood that the fusion process may adopt a manner of adding the dimensional representations of the respective texts, that is, aligning the target text with the picture description text, and then adding the respective elements in the dimensional representation of the target text with the respective elements in the dimensional representation of the picture description text to obtain the dimensional representation of the fused text. In addition, a splicing processing mode may also be adopted, for example, each element in the dimensional representation of the target text is spliced with each element in the dimensional representation of the picture description text, and the obtained dimension number of the fused text is the sum of the dimension number of the target text and the dimension number of the picture description text.
It should also be understood that when each text is represented by at least one word vector (word embedding), the dimension of the text means that the dimension corresponding to the word vector is not the dimension of the word vector itself, in other words, the number of word vectors in the text corresponds to the number of dimensions of the text.
S230: and training a named entity recognition model based on the named entity labels of the fusion text and the target text.
It should be understood that the training process of the present embodiment may be supervised training, and the model for training may be any neural network model, such as a feedforward neural network for classification, a transform-based neural network, an RNN, CNN, or LSTM-based neural network, and the like.
In the scheme of the embodiment of the invention, the target text and the information in the associated pictures thereof are fused in the fusion text, and training is carried out based on the fusion text, compared with the target text, context semantic factors are added in the training, and the recognition effect of the entity recognition model is named.
Fig. 2B is a flowchart illustrating steps of the named entity recognition method shown in fig. 2A. The named entity identification method of the embodiment comprises the following steps:
s260: and acquiring the text to be recognized and the associated picture matched with the text to be recognized.
S270: and extracting picture description texts of the associated pictures.
S280: and fusing the text to be recognized and the picture description text to obtain a fused text.
S290: and inputting the fused text into a named entity recognition model to obtain named entity information of the text to be recognized, wherein the named entity recognition model is obtained by training through a model training method.
In other words, the text to be recognized in the named entity recognition stage corresponds to the target text in the model training stage, and the associated picture is related to the text to be recognized.
In one scenario, the text to be recognized may be a commodity introduction text, the associated picture may be a commodity picture, the user may provide the commodity picture of the target commodity and the commodity introduction text thereof to the server, the server may be deployed with a named entity recognition model, and the named entity recognition processing is performed on the commodity introduction text based on the commodity introduction text and the commodity picture. Then, knowledge information such as multimedia information can be constructed based on the commodity introduction text labeled with the named entity, and accordingly, when a search request of the commodity is received, an accurate user search intention can be obtained based on the knowledge information, so that recommendation is made for the user or an accurate search result is provided for the user.
In other examples, obtaining a picture description text of an associated picture includes: and inputting the associated pictures into a pre-trained picture description model to obtain a picture description text. Thereby, accurate information of the associated picture is obtained, and information fusion processing of the target text and the associated picture is enabled in the text space.
In other examples, wherein fusing the target text and the picture description text to obtain a fused text includes: and splicing the dimensional representation of the target text and the dimensional representation of the picture description text to obtain a fusion text.
Therefore, the splicing operation improves the data processing efficiency of the fusion target text and the picture description text, and improves the flexibility of data processing compared with a mode of performing addition processing based on dimension representations of the two after alignment processing.
In other examples, the named entity recognition model includes a context fusion layer and a conditional random field processing layer, an input of the context fusion layer being connected to an input of the conditional random field processing layer. Further, the named entity recognition model may be trained based on the fused text as an input to the context fusion layer and based on the named entity tag of the target text as an output from the conditional random field processing layer.
In other examples, the named entity recognition model includes a dimension alignment layer via which inputs of the context fusion layer are connected to inputs of a Conditional Random Field (CRF) processing layer, the dimension alignment layer for extracting context-fused features from dimensions of the fused text as dimensions of the target text. Therefore, the dimension of the text input into the conditional random field processing layer is matched with the dimension of the output named entity label, so that the processing of performing context fusion on the target text and the image description text can be relative to the processing independent of the conditional random field processing layer, in other words, the dimension alignment layer realizes the decoupling between the context fusion layer and the conditional random field processing layer.
More specifically, the context fusion layer is a transform encoder, which has a strong ability to process characters in a text sequence, and the attention mechanism in the transform encoder can effectively fuse the dimensions in the features for context fusion. Thus, the training process can be significantly facilitated compared to the process of aligning the target text with the image description text and then adding them. In addition, the image description text is in the text space instead of the image space, so that the context fusion effect can be improved, and the generalization capability of the trained named entity recognition model is facilitated.
The image description generation method shown in fig. 3A may be adopted to process the associated picture to obtain a picture description text. Referring to fig. 3A, the target text 301 is "[ Bob and Alice position for a picture ]", and the picture matching the target text is an associated picture 302. The associated picture 302 is input to the pre-trained picture description model 3000, and a picture description text 303 "[ Alice hair Bob tie ]" or "[ Bob bearinga tie next to Alice with hair ]" corresponding thereto is obtained.
It is to be understood that the picture description model 3000 may be, for example, a VinVL model. The number of dimensions of the target text 301 and the number of dimensions of the picture description text 303 may be the same or different. In the present example, the number of dimensions of the target text 301 is 9 dimensions, that is, the target text 301 includes 9 words (an example of characters), and as an example, the target text 301 may be a text vector obtained after word embedding (word embedding) processing is performed on the 9 words. In addition, the number of dimensions of the picture description text 303 is 4 dimensions, i.e., the picture description text 303 includes 4 words. In fact, the words included in the picture description text 303 have a context semantic relationship, the words do not necessarily form a reasonable sentence text, the ordering of the words may be various, and preferably, an ordering that better reflects the context semantic relationship may be selected, in this example, "[ Alice hair Bob tie ]" is an ordering manner, and other ordering manners may be adopted, for example, "[ tie Bob Alice hair ]", because the degree of association between "tie" and "Bob" is greater than that between "tie" and "Alice", therefore, "[ tie Bob Alice ]" more reflects an accurate mountain semantic relationship than "[ tie Alice Bob hair ]". Further, "Alice" and "Bob" are more important features than "tie" and "hair", and thus, "[ Alice hair Bob tie ]" reflects more accurate downhill semantic relationships than "[ tie Bob Alice hair ]".
Further, fig. 3B illustrates an exemplary text processing framework, and the text processing flow of the present example is illustrated and described below in connection with the framework of fig. 3B for the training phase and the inference phase, respectively.
The text processing framework of the present example comprises a transform encoder 310, a dimension alignment layer 320, and a CRF layer 330, which are connected in sequence.
In the training phase, after the target text "[ Bob and Alice posing for a picture ]" and "[ Alice hair Bob tie ]" as picture description texts are spliced, a fused text "[ Bob and Alice posing for a picture < X > Alice hair Bob tie ]" is obtained.
Accordingly, in the model inference phase, the text to be recognized may be taken as the target text in the text processing framework of the present example, and the associated picture may be matched with the text to be recognized.
It should be understood that in the merged fused text, the position of the target text is distinguished from the position of the picture description text by a specific symbol, for example, each character in the picture description text may be marked with a symbol < X > before or after it; segmentation based on specific fit may also be performed between the character string of the target text and the character string of the picture description text.
It should also be understood that the individual characters in the picture description text may be continuous character strings or may be discrete character strings, for example, the picture description text includes a first portion and a second portion, and the target text is located between the first portion and the second portion in the fused text.
In this example, the number of dimensions of the target text is 9 dimensions, the number of dimensions of the picture description text is 4 dimensions, and accordingly, the number of dimensions of the fusion text is 13 dimensions. As a specific example, the fused text is "[ Bob and Alice posing for a picture < X > Alice hair Bob tie ]".
Then, the fused text is input into a named entity recognition model, and named entity information of the target text, "[ B-PER, E-PER, O, B-PER, E-PER O, O, O, O ]" is obtained through a transform encoder 310, a dimension alignment layer 320 and a CRF layer 330 in sequence. Specifically, the 13-dimensional fused text is input into the transform encoder 310, context semantic fusion between characters of each dimension is performed, and accordingly, 13-dimensional text features are output.
Then, the 13-dimensional text features are input to the dimension alignment layer 320 for dimension alignment processing to align to the number of dimensions of the target text, in other words, the input dimension number of the CRF layer 330. In this example, 9-dimensional text features are determined from the 13-dimensional text features as input to the CRF layer 330. At this time, the 9-dimensional text features are fused with the features of the picture description text, that is, the information of the associated picture.
Then, CRF layer 330 performs processing based on the input 9-dimensional text feature, and outputs 9-dimensional named entity information "[ B-PER, E-PER, O, B-PER, E-PER O, O ]".
Specifically, the word features are fed into the CRF layer to obtain the conditional probabilities:
Figure BDA0003505111700000081
where ψ is a potential function and θ represents a model parameter. Y represents the set of all possible tag sequences for a given sentence. y _0 is defined as a special start symbol.
The dimension alignment layer is used for extracting the context fusion features from the dimensions of the fusion text into the dimensions of the target text, namely processing the 13-dimensional text features to obtain 9-dimensional text features. For example, based on the position indicated by the specific symbol described above, the dimension of the target text may be truncated from the dimension of the fused text, in this example, the first 9 characters are truncated from the 13-dimensional text feature as the 9-dimensional text feature.
Fig. 4 is a block diagram of an apparatus according to another embodiment of the present invention. The solution of the present embodiment may be applied to any suitable electronic device with data processing capability, including but not limited to: server, mobile terminal (such as mobile phone, PAD, etc.), PC, etc. For example, in a model training (training) phase, a codec model may be trained based on training samples with a computing device (e.g., a data center) configured with a CPU (example of a processing unit) + GPU (example of an acceleration unit) architecture. Computing devices such as data centers may be deployed in cloud servers such as a private cloud, or a hybrid cloud. Accordingly, in the inference (inference) phase, the inference operation may also be performed by using a computing device configured with a CPU (example of processing unit) + GPU (example of acceleration unit) architecture.
The model training device of the embodiment comprises:
the obtaining module 410 obtains a target text and a picture description text of an associated picture, wherein the associated picture is matched with the target text;
the fusion module 420 is used for fusing the target text and the picture description text to obtain a fused text;
and the training module 430 trains a named entity recognition model based on the named entity labels of the fusion text and the target text.
In the scheme of the embodiment of the invention, the target text and the information in the associated pictures thereof are fused in the fusion text, and training is carried out based on the fusion text, compared with the target text, context semantic factors are added in the training, and the recognition effect of the entity recognition model is named.
In other examples, the obtaining module is specifically configured to: and inputting the associated pictures into a pre-trained picture description model to obtain a picture description text.
In other examples, the fusion module is specifically configured to: and splicing the dimensional representation of the target text and the dimensional representation of the picture description text to obtain a fusion text.
In other examples, the named entity recognition model includes a context fusion layer and a conditional random field processing layer, an input of the context fusion layer being connected to an input of the conditional random field processing layer. The training module is specifically configured to: training a named entity recognition model based on the fused text as an input to the context fusion layer and based on the named entity tag of the target text as an output of the conditional random field processing layer.
In other examples, the named entity recognition model includes a dimension alignment layer via which an input of the context fusion layer is connected to an input of the conditional random field processing layer, the dimension alignment layer to extract context-fused features from dimensions of the fused text as dimensions of the target text.
The apparatus of this embodiment is used to implement the corresponding method in the foregoing method embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again. In addition, the functional implementation of each module in the apparatus of this embodiment can refer to the description of the corresponding part in the foregoing method embodiment, and is not described herein again.
Fig. 5 is a block diagram of an apparatus according to another embodiment of the present invention. The named entity recognition apparatus of this embodiment includes:
the acquiring module 510 acquires a text to be recognized and an associated picture matched with the text to be recognized;
an extracting module 520, which extracts the picture description text of the associated picture;
the fusion module 530 is used for fusing the text to be recognized and the picture description text to obtain a fused text;
the recognition module 540, inputting the fused text into a named entity recognition model, obtaining the named entity information of the text to be recognized, wherein the named entity recognition model is obtained by training according to the method of any one of claims 1-6.
In the scheme of the embodiment of the invention, the target text and the information in the associated pictures thereof are fused in the fusion text, and training is carried out based on the fusion text, compared with the target text, context semantic factors are added in the training, and the recognition effect of the entity recognition model is named.
Referring to fig. 6, a schematic structural diagram of an electronic device according to another embodiment of the present invention is shown, and the specific embodiment of the present invention does not limit the specific implementation of the electronic device.
As shown in fig. 6, the electronic device may include: a processor (processor)602, a communication Interface 604, a memory 606, and a communication bus 608.
Wherein:
the processor 602, communication interface 604, and memory 606 communicate with one another via a communication bus 608.
A communication interface 604 for communicating with other electronic devices or servers.
The processor 602 is configured to execute the program 610, and may specifically perform relevant steps in the foregoing method embodiments.
In particular, program 610 may include program code comprising computer operating instructions.
The processor 602 may be a processor CPU or an application Specific Integrated circuit (asic) or one or more Integrated circuits configured to implement embodiments of the present invention. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 606 for storing a program 610. Memory 606 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 610 may specifically be configured to cause the processor 602 to perform the following operations: acquiring a target text and a picture description text of an associated picture, wherein the associated picture is matched with the target text; and fusing the target text and the picture description text to obtain a fused text, and training a named entity recognition model based on the named entity labels of the fused text and the target text.
Alternatively, the program 610 may specifically be configured to cause the processor 602 to perform the following operations: acquiring a text to be recognized and an associated picture matched with the text to be recognized; extracting a picture description text of the associated picture; fusing the text to be recognized and the picture description text to obtain a fused text; and inputting the fused text into a named entity recognition model to obtain named entity information of the text to be recognized, wherein the named entity recognition model is obtained by training through a model training method.
In addition, for specific implementation of each step in the program 610, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing method embodiments, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.
It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present invention may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present invention.
The above-described method according to an embodiment of the present invention may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the method described herein may be stored in such software processing on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that a computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by a computer, processor, or hardware, implements the methods described herein. Further, when a general-purpose computer accesses code for implementing the methods illustrated herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the methods illustrated herein.
Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.
The above embodiments are only for illustrating the embodiments of the present invention and not for limiting the embodiments of the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present invention, so that all equivalent technical solutions also belong to the scope of the embodiments of the present invention, and the scope of patent protection of the embodiments of the present invention should be defined by the claims.

Claims (12)

1. A model training method, comprising:
acquiring a target text and a picture description text of an associated picture, wherein the associated picture is matched with the target text;
fusing the target text and the picture description text to obtain a fused text;
and training a named entity recognition model based on the named entity labels of the fusion text and the target text.
2. The method of claim 1, wherein the obtaining of the picture description text of the associated picture comprises:
and inputting the associated pictures into a pre-trained picture description model to obtain a picture description text.
3. The method of claim 1, wherein the fusing the target text and the picture description text to obtain a fused text comprises:
and splicing the dimensional representation of the target text and the dimensional representation of the picture description text to obtain a fusion text.
4. The method of claim 1 wherein the named entity recognition model comprises a context fusion layer and a conditional random field processing layer, an input of the context fusion layer being connected to an input of the conditional random field processing layer,
the training of the named entity recognition model based on the named entity labels of the fusion text and the target text comprises:
training a named entity recognition model based on the fused text as an input to the context fusion layer and based on the named entity tag of the target text as an output of the conditional random field processing layer.
5. The method of claim 4, wherein the named entity recognition model comprises a dimension alignment layer via which an input of the context fusion layer is connected to an input of the conditional random field processing layer, the dimension alignment layer for extracting context-fused features from dimensions of the fused text as dimensions of the target text.
6. The method of claim 4, wherein the context fusion layer is a transform encoder.
7. A named entity recognition method, comprising:
acquiring a text to be recognized and an associated picture matched with the text to be recognized;
extracting a picture description text of the associated picture;
fusing the text to be recognized and the picture description text to obtain a fused text;
inputting the fused text into a named entity recognition model to obtain named entity information of the text to be recognized, wherein the named entity recognition model is obtained by training according to the method of any one of claims 1-6.
8. A named entity recognition method, comprising:
acquiring a commodity introduction text and a commodity picture of the commodity introduction text;
extracting a picture description text of the commodity picture;
fusing the commodity introduction text and the picture description text to obtain a fused text;
inputting the fused text into a named entity recognition model to obtain named entity information of the commodity introduction text, wherein the named entity recognition model is obtained by training according to the method of any one of claims 1-6.
9. A model training apparatus comprising:
the acquisition module acquires a target text and a picture description text of an associated picture, wherein the associated picture is matched with the target text;
the fusion module fuses the target text and the picture description text to obtain a fused text;
and the training module is used for training a named entity recognition model based on the named entity labels of the fusion text and the target text.
10. A named entity recognition apparatus comprising:
the acquisition module acquires a text to be recognized and an associated picture matched with the text to be recognized;
the extraction module is used for extracting the picture description text of the associated picture;
the fusion module fuses the text to be recognized and the picture description text to obtain a fused text;
the recognition module is used for inputting the fused text into a named entity recognition model to obtain named entity information of the text to be recognized, and the named entity recognition model is obtained through training according to the method of any one of claims 1-6.
11. An electronic device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction which causes the processor to execute the corresponding operation of the method according to any one of claims 1-8.
12. A computer storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-8.
CN202210137920.2A 2022-02-15 2022-02-15 Model training and named entity recognition method and device, electronic equipment and storage medium Pending CN114580413A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210137920.2A CN114580413A (en) 2022-02-15 2022-02-15 Model training and named entity recognition method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210137920.2A CN114580413A (en) 2022-02-15 2022-02-15 Model training and named entity recognition method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114580413A true CN114580413A (en) 2022-06-03

Family

ID=81773255

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210137920.2A Pending CN114580413A (en) 2022-02-15 2022-02-15 Model training and named entity recognition method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114580413A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116341555A (en) * 2023-05-26 2023-06-27 华东交通大学 Named entity recognition method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116341555A (en) * 2023-05-26 2023-06-27 华东交通大学 Named entity recognition method and system
CN116341555B (en) * 2023-05-26 2023-08-04 华东交通大学 Named entity recognition method and system

Similar Documents

Publication Publication Date Title
CN109117777B (en) Method and device for generating information
CN107679039B (en) Method and device for determining statement intention
US11914959B2 (en) Entity linking method and apparatus
CN113283551B (en) Training method and training device of multi-mode pre-training model and electronic equipment
CN112396049A (en) Text error correction method and device, computer equipment and storage medium
CN111488468B (en) Geographic information knowledge point extraction method and device, storage medium and computer equipment
CN111985229A (en) Sequence labeling method and device and computer equipment
CN111046656A (en) Text processing method and device, electronic equipment and readable storage medium
CN112699686B (en) Semantic understanding method, device, equipment and medium based on task type dialogue system
CN112287095A (en) Method and device for determining answers to questions, computer equipment and storage medium
CN111737990B (en) Word slot filling method, device, equipment and storage medium
CN113051380B (en) Information generation method, device, electronic equipment and storage medium
CN111859093A (en) Sensitive word processing method and device and readable storage medium
CN112417878A (en) Entity relationship extraction method, system, electronic equipment and storage medium
CN114580413A (en) Model training and named entity recognition method and device, electronic equipment and storage medium
CN112633007A (en) Semantic understanding model construction method and device and semantic understanding method and device
CN113761923A (en) Named entity recognition method and device, electronic equipment and storage medium
CN115115432B (en) Product information recommendation method and device based on artificial intelligence
CN113779202B (en) Named entity recognition method and device, computer equipment and storage medium
CN114528851B (en) Reply sentence determination method, reply sentence determination device, electronic equipment and storage medium
CN115759293A (en) Model training method, image retrieval device and electronic equipment
CN114490993A (en) Small sample intention recognition method, system, equipment and storage medium
WO2022262080A1 (en) Dialogue relationship processing method, computer and readable storage medium
Joshi et al. Optical Text Translator from Images using Machine Learning
CN114722823B (en) Method and device for constructing aviation knowledge graph and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination