CN115129862A - Statement entity processing method and device, computer equipment and storage medium - Google Patents

Statement entity processing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN115129862A
CN115129862A CN202210374003.6A CN202210374003A CN115129862A CN 115129862 A CN115129862 A CN 115129862A CN 202210374003 A CN202210374003 A CN 202210374003A CN 115129862 A CN115129862 A CN 115129862A
Authority
CN
China
Prior art keywords
training sample
entity
training
samples
classification model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210374003.6A
Other languages
Chinese (zh)
Inventor
刘知远
郑孙聪
周博通
孙茂松
韩旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tsinghua University
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Tencent Technology Shenzhen Co Ltd filed Critical Tsinghua University
Priority to CN202210374003.6A priority Critical patent/CN115129862A/en
Publication of CN115129862A publication Critical patent/CN115129862A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application discloses a sentence entity processing method, a sentence entity processing device, computer equipment and a storage medium, which can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like, and the method comprises the following steps: identifying an entity included in the target statement; determining first to-be-processed data corresponding to the target statement, wherein the first to-be-processed data comprises the target statement and the position mark information of the entity; and calling an entity classification model to process the first to-be-processed data to obtain entity type information of the entity, wherein the entity classification model is obtained by comparison and learning based on positive examples, negative examples and entity type labels in a training sample set, and the training sample set comprises original training samples and translation training samples corresponding to the original training samples. By the method and the device, entity classification can be realized under a multi-language low-resource scene, and accuracy of entity classification is improved.

Description

Statement entity processing method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a sentence entity processing method, a sentence entity processing apparatus, a computer device, and a computer-readable storage medium.
Background
Entity classification (Entity Typing) is an important natural language processing task, and aims to indicate corresponding type information of Entity mentions (Entity indication) in large-scale original texts, so as to be used by a question answering system, a dialogue system, a recommendation system, a search engine and the like. At present, the core of entity classification is to discriminate entity types by using rich contexts, and the existing neural network entity classification model needs to learn semantic understanding and entity type characteristics based on rich high-quality data with entity type labels.
However, in practical scenarios, the entity type labeling data of sentences is very deficient, and the entity classification task often cannot achieve ideal effects in low-resource scenarios. In addition, multilingual entity type labeling data is also lacking at present, which means that the problem of data shortage in entity classification of multilingual texts is more prominent. Therefore, for a multi-language low-resource scenario, how to implement entity classification is an urgent problem to be solved.
Disclosure of Invention
The embodiment of the application provides a statement entity processing method and device, a computer device and a storage medium, which can realize entity classification under a multi-language low-resource scene and improve the accuracy of entity classification.
In one aspect, an embodiment of the present application provides a statement entity processing method, where the method includes:
identifying an entity included in the target statement;
determining first to-be-processed data corresponding to the target statement, wherein the first to-be-processed data comprises the target statement and the position mark information of the entity;
and calling an entity classification model to process the first to-be-processed data to obtain entity type information of the entity, wherein the entity classification model is obtained by comparison learning based on positive examples, negative examples and entity type labels in a training sample set, and the training sample set comprises original training samples and translation training samples corresponding to the original training samples.
In one aspect, an embodiment of the present application provides a statement entity processing apparatus, where the apparatus includes:
an acquisition unit configured to identify an entity included in a target sentence;
the determining unit is used for determining first to-be-processed data corresponding to the target statement, wherein the first to-be-processed data comprises the target statement and the position mark information of the entity;
and the processing unit is used for calling an entity classification model to process the first to-be-processed data to obtain entity type information of the entity, wherein the entity classification model is obtained by comparison and learning based on positive examples, negative examples and entity type labels in a training sample set, and the training sample set comprises original training samples and translation training samples corresponding to the original training samples.
In one aspect, an embodiment of the present application provides a computer device, where the computer device includes a memory and a processor, and the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the statement entity processing method described above.
In one aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, and when the computer program is read and executed by a processor of a computer device, the computer device is caused to execute the above statement entity processing method.
In one aspect, embodiments of the present application provide a computer program product, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the statement entity processing method.
In the embodiment of the application, an entity included in a target statement is firstly identified; then determining first to-be-processed data corresponding to the target statement; and finally, calling an entity classification model to process the first to-be-processed data so as to obtain entity type information of the entity, wherein the entity classification model is obtained by comparison and learning based on positive examples, negative examples and entity type labels in a training sample set, and the training sample set comprises original training samples and translation training samples corresponding to the original training samples. The method is oriented to a multilingual low-resource scene, original training samples in a training sample set and translation training samples corresponding to the original training samples are fully utilized, and the entity classification capability of the original training samples is transferred to other languages while the entity classification capability of the model is strengthened through comparison and learning of the multilingual samples, so that the method has the entity classification capability of the multilingual samples, and the accuracy of entity classification is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a statement entity processing system according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a statement entity processing method according to an embodiment of the present application;
FIG. 3 is a schematic flowchart of another sentence entity processing method provided in the embodiment of the present application;
FIG. 4 is a schematic structural diagram of an entity classification model provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of a sentence entity processing apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the descriptions of "first", "second", etc. referred to in the embodiments of the present application are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a technical feature defined as "first" or "second" may explicitly or implicitly include at least one such feature.
In the embodiments of the present application, an Artificial Intelligence (AI) technique is involved; AI is a theory, method, technique and application system that uses a digital computer or a digital computer controlled machine to simulate, extend and extend human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. Specifically, the AI technology is extensive in field, and has both hardware and software technologies; on the hardware level, the AI technology generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics, and the like; in the software level, the AI technology mainly includes computer vision technology, speech processing technology, natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic, and so on. With the research and progress of the AI technology, the AI technology is researched and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical services, smart customer service, and the like.
Among them, Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning/deep learning typically includes techniques such as artificial neural networks, self-supervised learning, contrast learning, and so on. The self-supervised learning belongs to an unsupervised learning paradigm, and is characterized in that manually labeled class label information is not needed, data is directly used as supervision information to learn the characteristic expression of sample data, and the feature expression is used for downstream tasks. The contrast learning technology is a method for completing self-supervision learning, and specifically, data is respectively compared with positive samples and negative samples in a feature space to learn feature representation of the samples, and the core of the contrast learning technology is to shorten the distance between the positive samples and to lengthen the distance between the negative samples in the feature space.
Based on the mentioned contrast learning technology in machine learning/deep learning, the embodiment of the application provides a sentence entity processing method to realize entity classification under a multi-language low-resource scene and improve the accuracy of entity classification. Specifically, the general principle of the statement entity processing method is as follows: firstly, identifying entities included in a target statement; then determining first to-be-processed data corresponding to the target statement, wherein the first to-be-processed data comprises the target statement and the position mark information of the entity; and finally, calling an entity classification model to process the first to-be-processed data so as to obtain entity type information of the entity, wherein the entity classification model is obtained by comparison and learning based on positive examples, negative examples and entity type labels in a training sample set, and the training sample set comprises original training samples and translation training samples corresponding to the original training samples.
In a specific implementation, the above-mentioned sentence entity processing method may be executed by a computer device, and the computer device may be a terminal device or a server. The terminal device may be, for example, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart sound box, a smart watch, a vehicle-mounted terminal, an aircraft, and the like, but is not limited thereto; the server may be, for example, an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, web service, cloud communication, middleware service, domain name service, security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform. The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent traffic, driving assistance and the like.
Alternatively, the above-mentioned sentence entity processing method may be executed by the terminal device and the server together. For example, see FIG. 1 for an illustration: the terminal device 101 may first recognize an entity included in the target sentence, then determine first data to be processed corresponding to the target sentence, and send the first data to be processed to the server 102. Correspondingly, the server 102 calls the entity classification model to process the first to-be-processed data to obtain entity type information of the entity. Of course, the server 102 may also transmit entity type information of the entity to the terminal apparatus 101.
The embodiment of the application is oriented to a multi-language low-resource scene, original training samples in a training sample set and translation training samples corresponding to the original training samples are fully utilized, and the entity classification capability of the original training samples is transferred to other languages while the entity classification capability of the model is strengthened through comparison and learning of the multi-language samples, so that the entity classification capability of the multi-language samples is achieved, and the accuracy of entity classification is improved.
It is to be understood that the system architecture diagram described in the embodiment of the present application is for more clearly illustrating the technical solution of the embodiment of the present application, and does not constitute a limitation to the technical solution provided in the embodiment of the present application, and as a person having ordinary skill in the art knows that along with the evolution of the system architecture and the appearance of a new service scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.
Based on the above explanation, the statement entity processing method proposed in the embodiment of the present application is further explained below with reference to the flowchart shown in fig. 2. In the embodiment of the present application, the above mentioned computer device executing the statement entity processing method is mainly taken as an example for explanation. Referring to fig. 2, the statement entity processing method may specifically include steps S201 to S203:
s201, identifying entities included in the target statement.
In the embodiment of the present application, the target sentence may refer to text data or text data included in an image, and is not limited herein. When the target sentence is text data included in the image, the text data in the image needs to be extracted first, and then an entity included in the text data needs to be identified. An Entity may also be referred to as an Entity Mention (Entity compartment), specifically a substring in a sentence, and points to an exact Entity. For example, if sentence 1 is "city a is the center of country B," city a "is an entity in sentence 1. Of course, the target sentence may include one or more entities, which are not limited herein. The manner in which the computer device identifies the entity included in the target sentence may adopt an entity extraction technique, such as an entity extraction method based on deep learning, an entity extraction method based on statistics, and the like, which is not limited herein.
It should be noted that, when the target statement belongs to the user, the target statement and the data related to the entity included in the target statement in the embodiment of the present application are all acquired after being authorized by the user. Moreover, when the embodiments of the present application are applied to specific products or technologies, the data involved in the usage need to be approved or approved by users, and the collection, usage and processing of the relevant data need to comply with relevant laws and regulations and standards of relevant countries and regions.
S202, determining first to-be-processed data corresponding to the target statement.
In this embodiment, the first data to be processed includes the target statement and the location marking information of the entity. The computer equipment can add a special mark at the position of the entity so as to obtain the first to-be-processed data corresponding to the target statement. Based on the mode, the concrete position of the entity can be determined, the subsequent entity representation vector can be rapidly and accurately determined conveniently, and therefore the accuracy of entity classification is improved.
In a possible implementation manner, the computer device determines first to-be-processed data corresponding to the target sentence, and the specific implementation manner may be: determining the position of an entity included in the target statement; adding an entity position mark to the target statement at the position of the entity; and determining the target sentence added with the entity position mark as the first data to be processed.
For example, assume that the target sentence is sentence 1, sentence 1 is "city a is the center of country B", and the entity location is labeled "< ent >. After the computer device obtains an entity "a city" included in sentence 1, a corresponding location is found in sentence 1, and an entity location tag is added to sentence 1 at the location of the entity, that is, "< ent > a city < ent > is the center of country B", so that the first to-be-processed data corresponding to sentence 1 is "< ent > a city < ent > is the center of country B".
S203, calling an entity classification model to process the first data to be processed to obtain entity type information of the entity.
In the embodiment of the present application, the entity classification model is obtained by performing comparative learning based on positive examples, negative examples, and entity type labels in a training sample set, where the training sample set includes original training samples and translation training samples corresponding to the original training samples. The entity type information is used to indicate a type to which the entity belongs, for example, the entity type information of the entity "city a" is a city. Of course, the type of the entity can be various, for example, the entity type information of the entity "tomato" is fruit and vegetable. The computer equipment processes the first data to be processed by using the entity classification model trained by the scheme, so that entity classification can be realized in a multi-language low-resource scene, and the accuracy of entity classification is improved. It should be noted that, for the entity classification model, the computer device first needs to determine the positive examples and the negative examples in the training sample set, and performs comparative learning on the initial neural network model by using the positive examples and the negative examples to obtain an initial classification model; and then, fine adjustment is carried out on the initial classification model by utilizing the training sample set and the entity type label, so that the entity classification model is obtained. The initial neural network model may be a Multilingual BERT model (M-BERT), or other models may be used, which is not limited herein. The M-BERT model is a multilingual version of the BERT model, and due to the fact that a plurality of languages are used for pre-training, the M-BERT can have certain cross-language performance.
To sum up, in the embodiment of the present application, an entity included in a target sentence is first identified; then determining first to-be-processed data corresponding to the target statement; and finally, calling an entity classification model to process the first to-be-processed data so as to obtain entity type information of the entity, wherein the entity classification model is obtained by comparison and learning based on positive examples, negative examples and entity type labels in a training sample set, and the training sample set comprises original training samples and translation training samples corresponding to the original training samples. The method is oriented to a multilingual low-resource scene, original training samples in a training sample set and translation training samples corresponding to the original training samples are fully utilized, and the entity classification capability of the original training samples is transferred to other languages while the entity classification capability of a model is strengthened through contrast learning of the multilingual samples, so that the method has the entity classification capability of the multilingual samples, and the accuracy of entity classification is improved.
Based on the above explanation, the statement entity processing method proposed in the embodiment of the present application is further explained below with reference to the flowchart shown in fig. 3. In the embodiment of the present application, the above mentioned computer device executing the statement entity processing method is mainly taken as an example for explanation. Referring to fig. 3, the statement entity processing method may specifically include steps S301 to S307:
s301, a training sample set and a corresponding entity type label are obtained.
In the embodiment of the present application, the training sample set includes an original training sample and a translation training sample corresponding to the original training sample. The computer device may obtain the training sample set and the corresponding entity type label from a self-contained database, and may also obtain the training sample set and the corresponding entity type label from other devices, which is not limited herein. It should be noted that, the original training samples and the translation training samples corresponding to the original training samples are introduced into the training sample set, so that the model has the entity classification capability of the multilingual samples while enhancing the entity classification capability.
The translation training sample corresponding to the original training sample can be marked by one translation label, so that the translation training sample corresponding to the original training sample after being translated by a machine can be conveniently found. Illustratively, the translation tag may be translate # i, where i represents the number of the original training sample. For example, the translation training sample corresponding to the 1 st original training sample may be labeled as: translate # 1.
In a possible implementation manner, the computer device obtains a training sample set and a corresponding entity type label, and a specific implementation manner may be: obtaining an original training sample; performing translation processing on the original training sample to obtain a translation training sample corresponding to the original training sample; acquiring an entity type label of an original training sample; an entity type label for the translation training sample is determined based on the entity type label of the original training sample. It should be noted that, for the translation training sample corresponding to the original training sample, the entity type of the entity in the original training sample may be considered to be the same as the entity type of the entity corresponding to the translation training sample.
Illustratively, the original training sample M is an English sentence, which includes entity 1 "City A". The translation training sample M corresponding to the original training sample M is a chinese sentence into which the english sentence is translated by machine translation, and the entity type tag of the entity 1 in the original training sample M is further obtained, that is, the "city". Then, the position after the translation of the entity 1 is located in the translation training sample M, and the entity type label of the entity 1 in the translation training sample M is considered to be the same as the entity type label of the entity 1 in the original training sample M, i.e. the entity type label is also "city".
In a possible implementation manner, the computer device obtains an original training sample, and a specific implementation manner may be: acquiring entities included in sample sentences; and constructing original training samples based on the sample sentences and the entities included in the sample sentences, wherein each original training sample comprises one entity in the sample sentences and the sample sentences, and each original training sample corresponds to one or more entity labels. For example, the original training sample is "city a is the center of country B, city a", and the entity label corresponding to the original training sample is "city a & country B"; for another example, the original training sample is "country a is the center of country B," and "country B & a" is the entity label corresponding to the original training sample; for another example, the original training sample is "C company located in city a, which is the center of country B," a city ", and the original training sample corresponds to two entity labels, which are" a city & B "and" a city & C company ", respectively.
It should be noted that, assume that the sample sentence is a sentence S, and the sentence includes n entities, each being e 1 ,e 2 ,...,e n . Then n original training samples can be constructed from the sentence, wherein the ith original training sample contains the sentence S and the ith entity e i And has (n-1) entity tags:
Figure BDA0003590057870000091
wherein "&"used to split two entity tags.
For example, assuming that the sample sentence is sentence 1, i.e., "a city is the center of country B," the entities included in sentence 1 are extracted as follows: "city A" and "nation B". Thus, two original training samples can be constructed from the entities comprised in sentence 1: original training sample 1 is "city a is the center of country B, city a"; original training sample 2 is "city a is the center of nation B, nation B". The entity label corresponding to the original training sample 1 is "country a & B", and the entity label corresponding to the original training sample 2 is also "country B & a".
S302, determining positive examples and negative examples in the training sample set based on the entity labels of the training samples in the training sample set.
In the embodiment of the application, after the computer device obtains the training sample set and the corresponding entity type label, positive examples and negative examples are determined from the training sample set, so that the positive examples and the negative examples can be conveniently used for comparison learning of the initial neural network model in the follow-up process.
In one possible implementation manner, the computer device determines a positive example and a negative example in the training sample set based on the entity label of each training sample in the training sample set, and the specific implementation manner may be: taking the second training sample as a positive example of a first training sample, wherein the first training sample is any one training sample in a training sample set; at least one identical entity label exists in the entity label of the second training sample and the entity label of the first training sample, or the second training sample is a translation training sample of the first training sample, or the second training sample is an original training sample of the first training sample; the training samples other than the first training sample and the second training sample are taken as negative examples of the first training sample. In the mode of determining the positive examples and the negative examples, the original training samples can be compared and learned, and the entity classification capability of the model can be improved; meanwhile, the original training samples and the translation training samples corresponding to the original training samples can be compared for learning, so that the model has the entity classification capability of multiple languages.
For example, assume that the training sample set includes an original training sample a and a corresponding translation training sample a, an original training sample B and a corresponding translation training sample B, and an original training sample C and a corresponding translation training sample C. For the original training sample a, the entity label of the original training sample a is the same as the entity label of the original training sample B, so the original training sample B can be used as the positive example of the original training sample a, and the translation training sample a, the translation training sample B, the original training sample C and the corresponding translation training sample C can be used as the negative example of the original training sample a.
As another example, assume that the training sample set includes an original training sample a and a corresponding translation training sample a, an original training sample B and a corresponding translation training sample B, and an original training sample C and a corresponding translation training sample C. For the original training sample a, the translation training sample a corresponding to the original training sample a may be used as a positive example of the original training sample a, and the original training sample B and the corresponding translation training sample B, the original training sample C and the corresponding translation training sample C may be used as a negative example of the original training sample a.
As another example, assume that the training sample set includes an original training sample a and a corresponding translation training sample a, an original training sample B and a corresponding translation training sample B, and an original training sample C and a corresponding translation training sample C. For the translation training sample a, the original training sample a may be used as a positive example of the translation training sample a, and the original training sample B and the corresponding translation training sample B, the original training sample C and the corresponding translation training sample C may be used as a negative example of the translation training sample a.
S303, comparing and learning the initial neural network model based on the positive case and the negative case to obtain an initial classification model.
In the embodiment of the application, the computer device utilizes the positive examples and the negative examples to perform comparative learning on the initial neural network model, so that the similarity between the positive examples is improved, and the similarity between the negative examples is inhibited, thereby realizing the pre-training of the initial neural network model and obtaining the initial classification model.
In a possible implementation manner, the computer device performs comparative learning on the initial neural network model based on the positive case and the negative case to obtain an initial classification model, and the specific implementation manner may be: determining a relation value among training samples in a training sample set, wherein the relation value is used for indicating that the training samples are positive examples or negative examples; determining the similarity between training samples in a training sample set; determining a first loss value of the initial neural network model based on the relation value and the similarity between the training samples; and updating the model parameters of the initial neural network model based on the first loss value to obtain an initial classification model.
It should be noted that, the relationship (positive or negative) between two training samples is different, and the relationship value between two training samples is also different. The first loss value may be determined by comparing learned loss functions, and may specifically be calculated by using formula (1), where formula (1) is as follows:
Figure BDA0003590057870000101
wherein m represents that m training samples are selected in total, i represents the ith training sample, j represents the jth training sample, and x i The first entity representing the ith training sample represents a vector, x j Is shown asThe first entity of j training samples represents the vector, f (x) i ,x j ) Representing the similarity between the ith training sample and the jth training sample, g (i, j) representing the relationship between the ith training sample and the jth training sample, L contrastive Representing a first loss value.
In a possible implementation manner, the computer device determines a relationship value between training samples in the training sample set, and a specific implementation manner may be: if the first training sample and the second training sample are positive examples, the relation value between the first training sample and the second training sample is a first numerical value, and the first training sample and the second training sample are any two training samples in a training sample set; if the first training sample and the second training sample are negative examples, the relation value between the first training sample and the second training sample is a second numerical value. It should be noted that, in order to improve the similarity between positive examples and suppress the similarity between negative examples, the first value needs to be larger than the second value. The first value and the second value may be fixed values set by themselves or dynamically changing values, and are not limited herein.
Illustratively, according to equation (1) and the above explanation, if the positive example is between the ith training sample and the jth training sample, g (i, j) is 1; if the negative example is between the ith training sample and the jth training sample, then g (i, j) is 0.
In a possible implementation manner, the computer device determines similarity between training samples in the training sample set, and a specific implementation manner may be: determining second data to be processed corresponding to each training sample in the training sample set, wherein the second data to be processed comprises each training sample and position mark information of an entity in each training sample; calling an encoder of the initial neural network model to encode second data to be processed corresponding to each training sample to obtain a first entity representation vector corresponding to each training sample; and determining the similarity between the training samples in the training sample set based on the first entity representation vector corresponding to each training sample.
It should be noted that determining the similarity between the training samples in the training sample set requires using the first entity representation vector corresponding to each training sample, where the first entity representation vector corresponding to each training sample is obtained by calling an encoder of the initial neural network model to encode the second to-be-processed data corresponding to each training sample. Here, the first entity representation vector may be a word vector corresponding to the first < ent >. The classifier for subsequent entity classification is a linear layer, and the cosine distance represented by two vectors can be selected as the similarity between training samples. If the cosine distance is larger, the first entity representation vectors of the two selected training samples are very similar; conversely, the smaller the cosine distance is, the different the first entity representation vectors of the two training samples are represented. Through comparative learning, positive examples are closer to each other, and negative examples are further distant from each other. In addition, because the cosine has a narrow range of [ -1,1], it needs to be multiplied by a coefficient to be amplified for model differentiation.
Specifically, the similarity between the training samples can be calculated by using a formula (2), where the formula (2) is as follows:
Figure BDA0003590057870000121
where i denotes the ith training sample, j denotes the jth training sample, x i The first entity representing the ith training sample represents a vector, x j The first entity representing the jth training sample represents a vector, f (x) i ,x j ) Represents the similarity between the ith and jth training samples, and τ represents the temperature, the lower the temperature the easier the model can distinguish between different entity types, and τ can typically be 0.5.
In a possible implementation manner, before the computer device invokes an encoder of the initial neural network model to encode the second to-be-processed data corresponding to each training sample to obtain the first entity representation vector corresponding to each training sample, entity masking processing needs to be performed on the second to-be-processed data corresponding to each training sample. It should be noted that, in order to improve the capability of the initial neural network model to utilize the context information, an entity masking method may be applied in the process of training the model. For example, the sentence "city a is the center of country B," which includes the entity "city a". During model training, there is a predetermined probability that entity "A city" is replaced with a symbol representing occlusion, e.g. [ Mask ]. This may allow the initial neural network model to learn how to exploit the context information while training, without just focusing on the entity name. The preset probability may be set randomly or determined according to a large amount of experimental data, and is not limited herein.
S304, fine adjustment is carried out on the initial classification model based on the training sample set and the entity type labels, and an entity classification model is obtained.
In the embodiment of the present application, after obtaining the initial classification model, the computer device further performs Fine-tuning (Fine-tuning) on the initial classification model to improve the entity classification capability of the model.
In a possible implementation manner, the computer device performs fine adjustment on the initial classification model based on the training sample set and the entity type label to obtain the entity classification model, and the specific implementation manner may be: calling an encoder of the initial classification model to encode second to-be-processed data corresponding to each training sample in the training sample set to obtain a second entity expression vector corresponding to each training sample, wherein the second to-be-processed data comprises each training sample and position mark information of an entity in each training sample; calling a linear layer of the initial classification model to classify the second entity representation vector corresponding to each training sample to obtain entity type information corresponding to each training sample; determining a second loss value of the initial classification model based on the entity type information corresponding to each training sample and the entity type label of each training sample; and updating the model parameters of the initial classification model based on the second loss value to obtain the entity classification model. The linear layer may be used as a classifier, such as a softmax classifier, and the second entity representation vector is mapped to a vector having a dimension equal to the number of the entity type tags by the classifier, so as to perform entity classification.
It should be noted that, after the computer device obtains the initial classification model through training, the initial classification model is further called to obtain entity type information corresponding to each training sample, and a second loss value of the initial classification model needs to be determined according to the entity type information corresponding to each training sample and the entity type label of each training sample, so as to achieve fine tuning of the initial classification model, and thus obtain the entity classification model. Specifically, the second loss value of the initial classification model may be calculated by using formula (3), where formula (3) is as follows:
Figure BDA0003590057870000131
wherein i represents the ith training sample, j represents the jth entity type, l represents a total of l entity types, and z i (j) Representing entity type information corresponding to the ith training sample aiming at the jth entity type; y is i (j) For the jth entity type, an entity type label of the ith training sample (which may be a one-hot vector of the jth entity type of the ith training sample, or a sum of one-hot vectors of multiple entity types if there are multiple entity types); σ is a sigmoid function, an activation function; l is a radical of an alcohol fine-tuning Is the second loss value of the initial classification model.
The following explains the statement entity processing method proposed in the present application, taking the initial neural network model as an M-BERT model as an example:
referring to fig. 4, fig. 4 is a schematic structural diagram of an entity classification model according to an embodiment of the present disclosure. As shown in fig. 4, before training the initial neural network model, the computer device needs to obtain a training sample set and corresponding entity type labels, where the training sample set includes an original training sample and a translation training sample corresponding to the original training sample. Each original training sample comprises a sample statement and an entity in the sample statement, and each original training sample corresponds to one or more entity labels.
Assume that the training sample set includes an original training sample a and a corresponding translation training sample a, an original training sample B and a corresponding translation training sample B, an original training sample C and a corresponding translation training sample C. Wherein, the entity corresponding to the original training sample A is x, the entity label is R & T, and the entity type label is H; correspondingly, the entity corresponding to the translation training sample a is x, the entity label is R & T, and the entity type label is H. The entity corresponding to the original training sample B is y, the entity label is R & T, and the entity type label is K; correspondingly, the entity corresponding to the translation training sample b is y, the entity label is R & T, and the entity type label is K. An entity corresponding to the original training sample C is z, an entity label is M, and an entity type label is N; correspondingly, the entity corresponding to the translation training sample c is z, the entity label is M, and the entity type label is N.
And determining positive examples and negative examples in the training sample set according to the entity labels of the training samples for each training. And marking the entity position corresponding to each training sample, wherein the entity position mark can be "< ent >", so as to obtain second data to be processed corresponding to each training sample. For example, in one training, for the original training sample a, the original training sample B may be used as a positive example of the original training sample a, and the translation training sample a, the translation training sample B, the original training sample C, and the corresponding translation training sample C may be used as a negative example of the original training sample a.
Further, a word vector of the second data to be processed corresponding to each training sample can be obtained by querying the word vector table; determining a segment vector of second data to be processed corresponding to each training sample according to the context; and determining the position vector of the second data to be processed corresponding to each training sample according to the position relation of each word in the second data to be processed corresponding to each training sample. And inputting the vectors into an M-BERT model together, and calling an encoder of the M-BERT model to perform encoding processing to obtain a first entity representation vector corresponding to each training sample, wherein the first entity representation vector can be a word vector corresponding to the first < ent >. Entity masking methods can be applied during the training of the model to improve the model's ability to utilize context information.
Further, comparison learning is performed among the training samples, and a first loss value of the comparison learning is calculated by using a formula (1) according to the relation value and the similarity among the training samples. And then updating the model parameters of the M-BERT model according to the direction of reducing the first loss value, thereby obtaining an initial classification model.
Further, an encoder of the initial classification model is used for encoding second data to be processed corresponding to each training sample to obtain a second entity expression vector corresponding to each training sample; and then, carrying out classification processing on the second entity representation vector corresponding to each training sample by using a linear layer of the initial classification model to obtain entity type information corresponding to each training sample. And then, calculating a second loss value by using a formula (3), updating model parameters of the initial classification model according to the direction of reducing the second loss value, and continuously training to obtain the entity classification model. In the process of applying the entity classification model, firstly, an entity included in a target statement is identified, first to-be-processed data corresponding to the target statement is determined, and then the first to-be-processed data is input into the entity classification model for processing, so that entity type information of the entity included in the target statement can be obtained.
Please refer to table 1, where table 1 is a statistical table of application test indexes of entity classification models provided in the embodiments of the present application. Wherein, the Precision (P) index is used for representing the proportion of correctly predicted test samples to all test samples; the Recall (R) index is used for expressing the proportion of the test samples capable of predicting the entity type information to all the test samples; the F1 value (H-mean value) index is used for representing the harmonic mean of the precision rate and the recall rate, and is equivalent to the comprehensive evaluation index of the precision rate and the recall rate. The Entity classification model trained by the scheme is applied to Open evaluation data (Open Entity and Few-NERD) to test the effect of the model: fine-tuning shows the Fine-tuned basic model, Mono (E) shows the model obtained by performing contrast learning only using the monolingual text of the english data, Mono (E + Z) shows the model obtained by performing contrast learning using the monolingual texts of chinese and english, Cross (E + Z) shows the model obtained by performing contrast learning using the texts of chinese, english and their translations, that is, the entity classification model provided by the scheme. As can be seen from table 1, when the number of samples of each type is different, the F1 value corresponding to the entity classification model proposed by the present solution has a better entity classification effect than the F1 values corresponding to other models, so that entity classification can be realized in a multi-language low-resource scenario, and the accuracy of entity classification is improved.
TABLE 1
Figure BDA0003590057870000151
Please refer to table 2, where table 2 is another entity classification model application test index statistical table provided in the embodiment of the present application. In order to test the cross-language migration learning capability of the entity classification model provided by the scheme, the test set of the evaluation data is translated into Chinese by using machine translation and tested. Wherein, the Precision (P) index is used for representing the proportion of correctly predicted test samples to all test samples; a Recall (R) index is used for expressing the proportion of the test samples capable of predicting the entity type information to all the test samples; the F1 value (H-mean value) index is used for representing the harmonic mean of the precision rate and the recall rate, and is equivalent to the comprehensive evaluation index of the precision rate and the recall rate. The Entity classification model trained by the scheme is applied to Open evaluation data (Open Entity and Few-NERD) to test the effect of the model: fine-tuning shows the Fine-tuned basic model, Mono (E) shows the model obtained by performing contrast learning only using the monolingual text of the english data, Mono (E + Z) shows the model obtained by performing contrast learning using the monolingual texts of chinese and english, Cross (E + Z) shows the model obtained by performing contrast learning using the texts of chinese, english and their translations, that is, the entity classification model provided by the scheme. As can be seen from table 2, when the number of samples of each type is different, the F1 value corresponding to the entity classification model proposed in the present scheme has a better entity classification effect than the F1 values corresponding to other models, which indicates that the entity classification model has the capability of cross-language migration learning, and therefore, entity classification can be implemented in a multi-language low-resource scenario, and the accuracy of entity classification is improved.
TABLE 2
Figure BDA0003590057870000161
S305, identifying entities included in the target statement.
S306, determining first to-be-processed data corresponding to the target statement.
S307, calling an entity classification model to process the first data to be processed to obtain entity type information of the entity.
The specific implementation manners of steps S305 to S307 may refer to the specific implementation manners of steps S201 to S203, which are not described herein again.
In summary, in the embodiment of the present application, a training sample set and corresponding entity type labels are obtained first, and a positive example and a negative example in the training sample set are determined based on the entity labels of the training samples in the training sample set; then, comparing and learning the initial neural network model based on the positive case and the negative case to obtain an initial classification model; and fine-tuning the initial classification model based on the training sample set and the entity type label to obtain the entity classification model. When the entity classification model is applied, an entity included in a target statement is firstly identified; then determining first to-be-processed data corresponding to the target statement; and finally, calling the entity classification model to process the first data to be processed so as to obtain entity type information of the entity. The method is oriented to a multilingual low-resource scene, original training samples in a training sample set and translation training samples corresponding to the original training samples are fully utilized, and the entity classification capability of the original training samples is transferred to other languages while the entity classification capability of the model is strengthened through comparison and learning of the multilingual samples, so that the method has the entity classification capability of the multilingual samples, and the accuracy of entity classification is improved.
Based on the statement entity processing method, the embodiment of the application provides a statement entity processing device. Referring to fig. 5, it is a schematic structural diagram of a sentence entity processing apparatus provided in the embodiment of the present application, where the sentence entity processing apparatus 500 may operate the following units:
an obtaining unit 501, configured to identify an entity included in a target statement;
a determining unit 502, configured to determine first to-be-processed data corresponding to a target statement, where the first to-be-processed data includes the target statement and location marking information of the entity;
the processing unit 503 is configured to invoke an entity classification model to process the first to-be-processed data, so as to obtain entity type information of the entity, where the entity classification model is obtained by performing comparison learning based on positive examples, negative examples, and entity type labels in a training sample set, and the training sample set includes original training samples and translation training samples corresponding to the original training samples.
In one embodiment, the processing unit 503 is further configured to: acquiring a training sample set and a corresponding entity type label, wherein the training sample set comprises an original training sample and a translation training sample corresponding to the original training sample; determining positive examples and negative examples in the training sample set based on the entity labels of all the training samples in the training sample set; comparing and learning the initial neural network model based on the positive case and the negative case to obtain an initial classification model; and fine-tuning the initial classification model based on the training sample set and the entity type label to obtain the entity classification model.
In another embodiment, when the initial neural network model is compared and learned based on the positive examples and the negative examples to obtain the initial classification model, the processing unit 503 may be specifically configured to: determining a relation value among training samples in a training sample set, wherein the relation value is used for indicating that the training samples are positive examples or negative examples; determining the similarity between training samples in a training sample set; determining a first loss value of the initial neural network model based on the relation value and the similarity between the training samples; and updating the model parameters of the initial neural network model based on the first loss value to obtain an initial classification model.
In another embodiment, when determining the relationship value between the training samples in the training sample set, the processing unit 503 may specifically be configured to: if the first training sample and the second training sample are positive examples, the relation value between the first training sample and the second training sample is a first numerical value, and the first training sample and the second training sample are any two training samples in a training sample set; if the first training sample and the second training sample are negative examples, the relation value between the first training sample and the second training sample is a second numerical value.
In another embodiment, when determining the similarity between the training samples in the training sample set, the processing unit 503 may specifically be configured to: determining second data to be processed corresponding to each training sample in the training sample set, wherein the second data to be processed comprises each training sample and position mark information of an entity in each training sample; calling an encoder of the initial neural network model to encode second data to be processed corresponding to each training sample to obtain a first entity expression vector corresponding to each training sample; and determining the similarity between the training samples in the training sample set based on the first entity representation vector corresponding to each training sample.
In another embodiment, when the initial classification model is fine-tuned based on the training sample set and the entity type label to obtain the entity classification model, the processing unit 503 may be specifically configured to: calling an encoder of the initial classification model to encode second to-be-processed data corresponding to each training sample in the training sample set to obtain a second entity expression vector corresponding to each training sample, wherein the second to-be-processed data comprises each training sample and position mark information of an entity in each training sample; calling a linear layer of the initial classification model to classify the second entity representation vector corresponding to each training sample to obtain entity type information corresponding to each training sample; determining a second loss value of the initial classification model based on the entity type information corresponding to each training sample and the entity type label of each training sample; and updating the model parameters of the initial classification model based on the second loss value to obtain the entity classification model.
In another embodiment, when obtaining the training sample set and the corresponding entity type label, the processing unit 503 may specifically be configured to: obtaining an original training sample; performing translation processing on the original training sample to obtain a translation training sample corresponding to the original training sample; obtaining an entity type label of the original training sample; an entity type label for the translation training sample is determined based on the entity type label of the original training sample.
In another embodiment, when obtaining the original training sample, the processing unit 503 may be specifically configured to: acquiring an entity included in a sample statement; and constructing original training samples based on the sample statement and the entity included in the sample statement, wherein each original training sample comprises the sample statement and an entity in the sample statement, and each original training sample corresponds to one or more entity labels.
In another embodiment, when determining the positive examples and the negative examples in the training sample set based on the entity labels of the training samples in the training sample set, the processing unit 503 may specifically be configured to: taking a second training sample as a positive example of a first training sample, wherein the first training sample is any one training sample in a training sample set; at least one identical entity label exists in the entity labels of the second training sample and the entity labels of the first training sample, or the second training sample is a translation training sample of the first training sample, or the second training sample is an original training sample of the first training sample; the training samples other than the first training sample and the second training sample are taken as negative examples of the first training sample.
According to another embodiment of the present application, the units in the sentence entity processing apparatus shown in fig. 5 may be respectively or entirely combined into one or several other units to form the sentence entity processing apparatus, or some unit(s) of the sentence entity processing apparatus may be further split into multiple units with smaller functions to form the sentence entity processing apparatus, which may achieve the same operation without affecting the achievement of the technical effect of the embodiment of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the sentence entity processing means may also include other units, and in practical applications, these functions may also be implemented by the assistance of other units, and may be implemented by cooperation of a plurality of units.
According to another embodiment of the present application, the sentence entity processing apparatus as shown in fig. 5 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the corresponding method as shown in fig. 2 or fig. 3 on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and a storage element, and implementing the sentence entity processing method of the embodiment of the present application. The computer program may be recorded on a computer-readable recording medium, for example, and loaded and executed in the above-described computing apparatus via the computer-readable recording medium.
In the embodiment of the application, an entity included in a target sentence is firstly identified; then determining first to-be-processed data corresponding to the target statement; and finally, calling an entity classification model to process the first to-be-processed data so as to obtain entity type information of the entity, wherein the entity classification model is obtained by comparison and learning based on positive examples, negative examples and entity type labels in a training sample set, and the training sample set comprises original training samples and translation training samples corresponding to the original training samples. The method is oriented to a multilingual low-resource scene, original training samples in a training sample set and translation training samples corresponding to the original training samples are fully utilized, and the entity classification capability of the original training samples is transferred to other languages while the entity classification capability of a model is strengthened through contrast learning of the multilingual samples, so that the method has the entity classification capability of the multilingual samples, and the accuracy of entity classification is improved.
Based on the description of the method embodiment and the device embodiment, the embodiment of the application further provides a computer device. Referring to fig. 6, the computer device 600 includes at least a processor 601, a communication interface 602, and a computer storage medium 603. The processor 601, the communication interface 602, and the computer storage medium 603 may be connected by a bus or other means. A computer storage medium 603 may be stored in the memory 604 of the computer device 600, the computer storage medium 603 being for storing a computer program comprising program instructions, the processor 601 being for executing the program instructions stored by the computer storage medium 603. The processor 601 (or CPU) is a computing core and a control core of the computer device, and is adapted to implement one or more instructions, and in particular, is adapted to load and execute one or more instructions so as to implement a corresponding method flow or a corresponding function.
In an embodiment, the processor 601 according to the embodiment of the present application may be configured to perform a series of processes, which specifically includes: identifying an entity included in the target statement; determining first to-be-processed data corresponding to the target statement, wherein the first to-be-processed data comprises the target statement and the position mark information of the entity; and calling an entity classification model to process the first to-be-processed data to obtain entity type information of the entity, wherein the entity classification model is obtained by comparison learning based on positive examples, negative examples and entity type labels in a training sample set, and the training sample set comprises original training samples and translation training samples corresponding to the original training samples, and the like.
An embodiment of the present application further provides a computer storage medium (Memory), which is a Memory device in a computer device and is used to store programs and data. It is understood that the computer storage medium herein may include both built-in storage media in the computer device and extended storage media supported by the computer device. The computer storage medium provides a storage space that stores an operating system of the computer device. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for being loaded and executed by the processor 601. It should be noted that the computer storage medium herein may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer storage medium located remotely from the processor.
In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by a processor to implement the corresponding steps of the method described above with respect to the statement entity processing method embodiment shown in FIG. 2 or FIG. 3; in particular implementations, one or more instructions in the computer storage medium are loaded and executed by processor 601 to perform the steps of:
identifying an entity included in the target statement;
determining first to-be-processed data corresponding to the target statement, wherein the first to-be-processed data comprises the target statement and the position mark information of the entity;
and calling an entity classification model to process the first to-be-processed data to obtain entity type information of the entity, wherein the entity classification model is obtained by comparison and learning based on positive examples, negative examples and entity type labels in a training sample set, and the training sample set comprises original training samples and translation training samples corresponding to the original training samples.
In one embodiment, the one or more instructions are loadable by a processor and further perform: acquiring a training sample set and a corresponding entity type label, wherein the training sample set comprises an original training sample and a translation training sample corresponding to the original training sample; determining positive examples and negative examples in the training sample set based on the entity labels of all the training samples in the training sample set; comparing and learning the initial neural network model based on the positive case and the negative case to obtain an initial classification model; and fine-tuning the initial classification model based on the training sample set and the entity type label to obtain the entity classification model.
In another embodiment, when the initial neural network model is learned based on positive and negative examples to obtain the initial classification model, the one or more instructions may be loaded and executed by the processor: determining a relation value among training samples in a training sample set, wherein the relation value is used for indicating that the training samples are positive examples or negative examples; determining the similarity between training samples in a training sample set; determining a first loss value of the initial neural network model based on the relation value and the similarity between the training samples; and updating the model parameters of the initial neural network model based on the first loss value to obtain an initial classification model.
In another embodiment, the one or more instructions may be loaded and executed by the processor in determining a relationship value between individual training samples in the set of training samples: if the first training sample and the second training sample are positive examples, the relation value between the first training sample and the second training sample is a first numerical value, and the first training sample and the second training sample are any two training samples in a training sample set; if the first training sample and the second training sample are negative examples, the relationship value between the first training sample and the second training sample is a second numerical value.
In another embodiment, in determining the similarity between the training samples in the set of training samples, the one or more instructions may be loaded and executed by the processor to: determining second data to be processed corresponding to each training sample in the training sample set, wherein the second data to be processed comprises each training sample and position mark information of an entity in each training sample; calling an encoder of the initial neural network model to encode second data to be processed corresponding to each training sample to obtain a first entity expression vector corresponding to each training sample; and determining the similarity among the training samples in the training sample set based on the first entity representation vector corresponding to each training sample.
In another embodiment, when the initial classification model is fine-tuned based on the training sample set and the entity type labels to obtain the entity classification model, the one or more instructions may be loaded and executed by the processor: calling an encoder of the initial classification model to encode second to-be-processed data corresponding to each training sample in the training sample set to obtain a second entity expression vector corresponding to each training sample, wherein the second to-be-processed data comprises each training sample and position mark information of an entity in each training sample; calling a linear layer of the initial classification model to classify the second entity representation vector corresponding to each training sample to obtain entity type information corresponding to each training sample; determining a second loss value of the initial classification model based on the entity type information corresponding to each training sample and the entity type label of each training sample; and updating the model parameters of the initial classification model based on the second loss value to obtain the entity classification model.
In another embodiment, the one or more instructions may be loaded and executed by the processor when obtaining the training sample set and the corresponding entity type tag: obtaining an original training sample; performing translation processing on the original training sample to obtain a translation training sample corresponding to the original training sample; obtaining an entity type label of the original training sample; an entity type label for the translation training sample is determined based on the entity type label of the original training sample.
In another embodiment, the one or more instructions may be loaded and executed by the processor when obtaining the original training samples: acquiring an entity included in a sample statement; and constructing original training samples based on the sample statement and the entity included in the sample statement, wherein each original training sample comprises the sample statement and an entity in the sample statement, and each original training sample corresponds to one or more entity labels.
In another embodiment, in determining positive and negative examples in a set of training samples based on entity labels for individual training samples in the set of training samples, the one or more instructions are loadable and executable by a processor to: taking the second training sample as a positive example of a first training sample, wherein the first training sample is any one training sample in a training sample set; at least one identical entity label exists in the entity label of the second training sample and the entity label of the first training sample, or the second training sample is a translation training sample of the first training sample, or the second training sample is an original training sample of the first training sample; the training samples other than the first training sample and the second training sample are taken as negative examples of the first training sample.
In the embodiment of the application, an entity included in a target sentence is firstly identified; then determining first to-be-processed data corresponding to the target statement; and finally, calling an entity classification model to process the first to-be-processed data so as to obtain entity type information of the entity, wherein the entity classification model is obtained by comparison and learning based on positive examples, negative examples and entity type labels in a training sample set, and the training sample set comprises original training samples and translation training samples corresponding to the original training samples. The method is oriented to a multilingual low-resource scene, original training samples in a training sample set and translation training samples corresponding to the original training samples are fully utilized, and the entity classification capability of the original training samples is transferred to other languages while the entity classification capability of a model is strengthened through contrast learning of the multilingual samples, so that the method has the entity classification capability of the multilingual samples, and the accuracy of entity classification is improved.
It should be noted that according to an aspect of the present application, a computer program product or a computer program is also provided, and the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternatives in the aspect of the embodiment of the sentence entity processing method shown in fig. 2 or fig. 3. It should be understood that the above-described embodiments are merely illustrative of the preferred embodiments of the present invention, which should not be taken as limiting the scope of the invention, but rather the scope of the invention is defined by the appended claims.

Claims (13)

1. A sentence entity processing method is characterized by comprising the following steps:
identifying an entity included in the target statement;
determining first to-be-processed data corresponding to the target statement, wherein the first to-be-processed data comprises the target statement and the position mark information of the entity;
and calling an entity classification model to process the first to-be-processed data to obtain entity type information of the entity, wherein the entity classification model is obtained by comparison and learning based on positive examples, negative examples and entity type labels in a training sample set, and the training sample set comprises original training samples and translation training samples corresponding to the original training samples.
2. The method of claim 1, further comprising:
acquiring a training sample set and a corresponding entity type label, wherein the training sample set comprises an original training sample and a translation training sample corresponding to the original training sample;
determining positive examples and negative examples in the training sample set based on the entity labels of the training samples in the training sample set;
comparing and learning the initial neural network model based on the positive examples and the negative examples to obtain an initial classification model;
and fine-tuning the initial classification model based on the training sample set and the entity type label to obtain the entity classification model.
3. The method of claim 2, wherein the learning of the initial neural network model based on the positive examples and the negative examples to obtain an initial classification model comprises:
determining a relation value between training samples in the training sample set, wherein the relation value is used for indicating that the training samples are positive examples or negative examples;
determining the similarity between the training samples in the training sample set;
determining a first loss value of the initial neural network model based on the relation value and the similarity between the training samples;
and updating the model parameters of the initial neural network model based on the first loss value to obtain an initial classification model.
4. The method of claim 3, wherein determining the relationship value between the training samples in the set of training samples comprises:
if a positive example exists between a first training sample and a second training sample, the relation value between the first training sample and the second training sample is a first numerical value, and the first training sample and the second training sample are any two training samples in the training sample set;
and if the first training sample and the second training sample are negative examples, the relation value between the first training sample and the second training sample is a second numerical value.
5. The method of claim 3 or 4, wherein determining the similarity between the training samples in the set of training samples comprises:
determining second data to be processed corresponding to each training sample in the training sample set, wherein the second data to be processed comprises each training sample and position mark information of an entity in each training sample;
calling an encoder of an initial neural network model to encode second data to be processed corresponding to each training sample to obtain a first entity expression vector corresponding to each training sample;
and determining the similarity between the training samples in the training sample set based on the first entity representation vector corresponding to each training sample.
6. The method according to any one of claims 2 to 4, wherein the fine-tuning the initial classification model based on the training sample set and the entity type labels to obtain the entity classification model comprises:
calling an encoder of the initial classification model to encode second to-be-processed data corresponding to each training sample in the training sample set to obtain a second entity representation vector corresponding to each training sample, wherein the second to-be-processed data comprises each training sample and position mark information of an entity in each training sample;
calling a linear layer of the initial classification model to classify the second entity representation vector corresponding to each training sample to obtain entity type information corresponding to each training sample;
determining a second loss value of the initial classification model based on the entity type information corresponding to each training sample and the entity type label of each training sample;
and updating the model parameters of the initial classification model based on the second loss value to obtain the entity classification model.
7. The method according to any one of claims 2 to 4, wherein obtaining a training sample set and a corresponding entity type label comprises:
obtaining an original training sample;
performing translation processing on the original training sample to obtain a translation training sample corresponding to the original training sample;
obtaining an entity type label of the original training sample;
determining an entity type label for the translation training sample based on the entity type label for the original training sample.
8. The method of claim 7, wherein obtaining the raw training samples comprises:
acquiring an entity included in a sample statement;
constructing original training samples based on the sample sentence and entities included in the sample sentence, wherein each original training sample includes the sample sentence and one entity in the sample sentence, and each original training sample corresponds to one or more entity labels.
9. The method according to any one of claims 2 to 4, wherein the determining positive and negative examples in the set of training samples based on the entity labels of the respective training samples in the set of training samples comprises:
taking a second training sample as a positive example of a first training sample, wherein the first training sample is any one training sample in the training sample set; at least one identical entity label exists in the entity labels of the second training sample and the entity labels of the first training sample, or the second training sample is a translation training sample of the first training sample, or the second training sample is an original training sample of the first training sample;
taking other training samples except the first training sample and the second training sample as negative examples of the first training sample.
10. A sentence entity processing apparatus, the apparatus comprising:
an acquisition unit configured to identify an entity included in a target sentence;
a determining unit, configured to determine first to-be-processed data corresponding to the target statement, where the first to-be-processed data includes the target statement and location marking information of the entity;
and the processing unit is used for calling an entity classification model to process the first to-be-processed data to obtain entity type information of the entity, wherein the entity classification model is obtained by comparison and learning based on positive examples, negative examples and entity type labels in a training sample set, and the training sample set comprises original training samples and translation training samples corresponding to the original training samples.
11. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to execute the sentence entity processing method according to any of claims 1-9.
12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores one or more computer programs adapted to be loaded by a processor and to execute the sentence entity processing method of any one of claims 1 to 9.
13. A computer program product, characterized in that the computer program product comprises a computer program adapted to be loaded by a processor and to perform the method of sentence entity processing of any of claims 1-9.
CN202210374003.6A 2022-04-11 2022-04-11 Statement entity processing method and device, computer equipment and storage medium Pending CN115129862A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210374003.6A CN115129862A (en) 2022-04-11 2022-04-11 Statement entity processing method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210374003.6A CN115129862A (en) 2022-04-11 2022-04-11 Statement entity processing method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115129862A true CN115129862A (en) 2022-09-30

Family

ID=83376444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210374003.6A Pending CN115129862A (en) 2022-04-11 2022-04-11 Statement entity processing method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115129862A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115618891A (en) * 2022-12-19 2023-01-17 湖南大学 Multimodal machine translation method and system based on contrast learning
CN117273003A (en) * 2023-11-14 2023-12-22 腾讯科技(深圳)有限公司 Text data processing method, model training method and named entity recognition method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115618891A (en) * 2022-12-19 2023-01-17 湖南大学 Multimodal machine translation method and system based on contrast learning
CN117273003A (en) * 2023-11-14 2023-12-22 腾讯科技(深圳)有限公司 Text data processing method, model training method and named entity recognition method
CN117273003B (en) * 2023-11-14 2024-03-12 腾讯科技(深圳)有限公司 Text data processing method, model training method and named entity recognition method

Similar Documents

Publication Publication Date Title
US20230016365A1 (en) Method and apparatus for training text classification model
CN112988979B (en) Entity identification method, entity identification device, computer readable medium and electronic equipment
JP2023535709A (en) Language expression model system, pre-training method, device, device and medium
CN113627447B (en) Label identification method, label identification device, computer equipment, storage medium and program product
CN109086265B (en) Semantic training method and multi-semantic word disambiguation method in short text
CN115129862A (en) Statement entity processing method and device, computer equipment and storage medium
CN113761868B (en) Text processing method, text processing device, electronic equipment and readable storage medium
CN111858898A (en) Text processing method and device based on artificial intelligence and electronic equipment
CN114398899A (en) Training method and device for pre-training language model, computer equipment and medium
CN114444476B (en) Information processing method, apparatus, and computer-readable storage medium
CN112667803A (en) Text emotion classification method and device
CN112860871B (en) Natural language understanding model training method, natural language understanding method and device
CN116662522B (en) Question answer recommendation method, storage medium and electronic equipment
CN110852066B (en) Multi-language entity relation extraction method and system based on confrontation training mechanism
CN114398903B (en) Intention recognition method, device, electronic equipment and storage medium
CN116976341A (en) Entity identification method, entity identification device, electronic equipment, storage medium and program product
CN113704466B (en) Text multi-label classification method and device based on iterative network and electronic equipment
CN116127060A (en) Text classification method and system based on prompt words
CN114398482A (en) Dictionary construction method and device, electronic equipment and storage medium
CN113657092A (en) Method, apparatus, device and medium for identifying label
CN114282528A (en) Keyword extraction method, device, equipment and storage medium
CN113807920A (en) Artificial intelligence based product recommendation method, device, equipment and storage medium
Yoo et al. Improving visually grounded sentence representations with self-attention
CN113609873A (en) Translation model training method, device and medium
CN112818688A (en) Text processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination