CN115130542A - Model training method, text processing device and electronic equipment - Google Patents

Model training method, text processing device and electronic equipment Download PDF

Info

Publication number
CN115130542A
CN115130542A CN202210456716.7A CN202210456716A CN115130542A CN 115130542 A CN115130542 A CN 115130542A CN 202210456716 A CN202210456716 A CN 202210456716A CN 115130542 A CN115130542 A CN 115130542A
Authority
CN
China
Prior art keywords
sample
text
sample text
texts
recognition model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210456716.7A
Other languages
Chinese (zh)
Inventor
陈玉博
刘康
赵军
曹鹏飞
闭玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210456716.7A priority Critical patent/CN115130542A/en
Publication of CN115130542A publication Critical patent/CN115130542A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application provides a model training method, a text processing device and electronic equipment, and relates to the field of machine learning and natural speech processing. The method comprises the following steps: acquiring a first sample set and a second sample set, training an initial case relation recognition model according to the first sample set to obtain a first case relation recognition model, and determining the uncertainty degree of the first case relation recognition model for text prediction of each second sample; screening a plurality of third sample texts according to the uncertainty degrees corresponding to the second sample texts; and performing iterative training on the first affair relation recognition model based on the target sample texts until a training stopping condition is met to obtain a trained second affair relation recognition model. The embodiment of the application alleviates the problem of insufficient labeling data, and the trained second case relation recognition model has higher accuracy and robustness.

Description

Model training method, text processing device and electronic equipment
Technical Field
The application relates to the technical field of machine learning and natural speech processing, in particular to a model training method, a text processing device and electronic equipment.
Background
In the current society, various kinds of information are pushed in real time on the internet. In the face of increasing information, it becomes crucial to quickly comb out the logical relationship between events in the information.
The extraction of the event physical relationship takes the event as a basic semantic unit to realize the deep detection and extraction of the event logical relationship, has an important function for text understanding, and the research popularity of the technology is increased year by year at home and abroad due to the influence of network and social factors.
Because the expression mode of daily texts is complex and semantic understanding is difficult, in the prior art, when a case relation recognition model is trained through a machine learning technology, the case relation of the texts needs to be labeled manually, the efficiency is low, and the problem of insufficient training data exists.
Disclosure of Invention
Embodiments of the present application provide a model training method, a text processing method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product, which can solve the above problems in the prior art. The technical scheme is as follows:
according to an aspect of an embodiment of the present application, there is provided a model training method, including:
acquiring a first sample set and a second sample set, wherein the first sample set comprises a plurality of first sample texts marked with labels, the second sample set comprises a plurality of second sample texts which are not marked with labels, and the labels of the first sample texts represent the affair relation among the event information contained in the first sample set;
training an initial case relation recognition model according to the first sample set to obtain a first case relation recognition model, and determining the uncertainty degree of the first case relation recognition model for each second sample text prediction;
screening a plurality of third sample texts from each second sample text according to the corresponding uncertainty degree of each second sample text;
taking each first sample text and each third sample text with labels as target sample texts, and performing iterative training on the first event relation recognition model based on a plurality of target sample texts until a training stopping condition is met to obtain a trained second event relation recognition model
According to another aspect of an embodiment of the present application, there is provided a text processing method, including:
acquiring a text to be identified;
inputting the text to be recognized into a trained case relation recognition model to obtain case relations among event information contained in the text to be recognized;
the trained case relation recognition model is obtained by training by adopting the method.
According to another aspect of an embodiment of the present application, there is provided a model training apparatus including:
the system comprises a sample set acquisition module, a first sample set and a second sample set, wherein the first sample set comprises a plurality of first sample texts marked with labels, the second sample set comprises a plurality of second sample texts which are not marked with labels, and the labels of the first sample texts represent the matter relation among event information contained in the first sample set;
the uncertainty calculation module is used for training the initial case relation recognition model according to the first sample set to obtain a first case relation recognition model and determining the uncertainty degree of the first case relation recognition model for text prediction of each second sample;
the sample screening module is used for screening a plurality of third sample texts from each second sample text according to the corresponding uncertainty degree of each second sample text;
and the retraining module is used for taking each first sample text and each third sample text with labels as target sample texts, and performing iterative training on the first affair relationship recognition model based on the plurality of target sample texts until a training stopping condition is met to obtain a trained second affair relationship recognition model.
As an alternative embodiment, the uncertainty calculation module, when determining the degree of uncertainty of the first transaction relationship identification model for each second sample text prediction, is configured to:
performing case relation recognition on each second sample text through the first case relation recognition model to obtain a corresponding first recognition result, and taking the first recognition result as a label of the corresponding second sample text;
under the condition that the first affair relationship recognition model activates dropout, performing multiple times of affair relationship recognition on each second sample text through the first affair relationship recognition model to obtain multiple second recognition results of each second sample text;
and determining the uncertainty degree of the first event relation recognition model for each second sample text prediction according to a plurality of second recognition results of each second sample text.
As an alternative embodiment, the model training device is further configured to: for a third sample text in the plurality of target samples, determining the discrete degrees of a plurality of second recognition results of the third sample text;
the retraining module is used for performing iterative training on the first event relation recognition model based on a plurality of target sample texts:
determining the weight corresponding to each third sample text according to the discrete degree corresponding to each third sample text in the target samples; wherein the weight is inversely proportional to the magnitude of the degree of dispersion;
inputting a plurality of target samples into a first event relation recognition model, and respectively obtaining the prediction recognition results of the plurality of target samples;
determining a first target value according to the predicted identification result of each first sample text in the plurality of target samples and the corresponding label;
for each third sample text in the multiple target samples, determining an initial second target value according to the predicted recognition result of the third sample text and the corresponding label, and weighting the initial second target value according to the weight corresponding to the third sample text to obtain a second target value;
and obtaining a training target value of the first event relation recognition model according to the first target value and the second target value, adjusting the parameters if the training target value does not meet the conditions, and continuing training based on the target sample text and the adjusted parameters.
As an optional embodiment, when determining the weight corresponding to each third sample text in the multiple target samples according to the discrete degree corresponding to each third sample text, the retraining module is configured to: and for each third sample text in the plurality of target samples, determining the weight of the third sample text according to the discrete degree corresponding to the third sample text.
As an optional embodiment, when the sample screening module screens a plurality of third sample texts from each second sample text according to the uncertainty level corresponding to each second sample text, the sample screening module is configured to:
and sorting according to the size relation of the uncertainty degrees corresponding to all the second sample texts, and screening a preset number of second sample texts as third sample texts according to a sorting result.
As an optional embodiment, when the sample screening module screens a plurality of third sample texts from each second sample text according to the uncertainty level corresponding to each second sample text, the sample screening module is configured to:
determining the probability of taking each second sample text as a third sample text according to the size relation of the uncertainty degrees corresponding to all the second sample texts;
and screening a preset number of second sample texts as third sample texts according to the corresponding probability of each second sample text.
As an alternative embodiment, the sample screening module when determining the probability of each second sample text as a third sample text is configured to:
and taking the ratio of the uncertainty degree corresponding to the second sample text to the sum of the uncertainty degrees corresponding to all the second sample texts as the probability of taking the second sample text as the third sample text.
According to another aspect of the embodiments of the present application, there is provided a text processing apparatus including:
the text acquisition module is used for acquiring a text to be identified;
the system comprises a matter relation analysis module, a matter relation analysis module and a recognition module, wherein the matter relation analysis module is used for inputting a text to be recognized into a trained matter relation recognition model to obtain matter relations among event information contained in the text to be recognized;
the trained case relationship recognition model is obtained by training by adopting the model training device.
According to another aspect of embodiments of the present application, there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory, the processor executing the computer program to implement the steps of the model training method and/or the text processing method described above.
According to a further aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described model training method and/or text processing method.
According to an aspect of embodiments of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, performs the steps of the above-described model training method and/or text processing method.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
by obtaining a first sample set and a second sample set, the first sample set comprises a plurality of first sample texts marked with labels, the second sample set comprises a plurality of second sample texts not marked with labels, a first event relation recognition model is obtained by training based on the first sample set, the uncertainty degree predicted by each second sample text is determined by using the trained first event relation recognition model, and a third sample text with low uncertainty degree is screened out from each second sample text by using the uncertainty degree, so that the number of samples can be rapidly expanded, the problem of insufficient labeled data is solved, the time of manual labeling is greatly saved, certainly, some second sample texts with high uncertainty degree can be screened out, the uncertainty is trained by using the second sample text with high uncertainty, and compared with a second event relation recognition model obtained by retraining the first event relation recognition model based on the first sample text and the third sample text with low uncertainty degree The recognition model has higher accuracy and robustness, so that the recognition model is more suitable for text recognition with complex grammar.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic diagram of an application environment provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of a model training method according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram illustrating that T feedforward operations are executed when a first transaction relation recognition model provided by the embodiment of the present application activates dropout;
fig. 4 is a schematic flowchart of an iterative training process performed on a first event relationship recognition model based on a plurality of target sample texts according to an embodiment of the present application;
fig. 5 is a schematic diagram illustrating probability-based screening of a third sample text according to an embodiment of the present application;
FIG. 6 is a schematic flow chart illustrating a model training method according to another embodiment of the present disclosure;
fig. 7 is a flowchart illustrating a text processing method according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a text processing system according to an embodiment of the present application;
FIG. 9 is a schematic structural diagram of a model training apparatus according to an embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below in conjunction with the drawings in the present application. It should be understood that the embodiments set forth below in connection with the drawings are exemplary descriptions for explaining technical solutions of the embodiments of the present application, and do not limit the technical solutions of the embodiments of the present application.
As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the terms "comprises" and/or "comprising," when used in this specification in connection with embodiments of the present application, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, as embodied in the art. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates at least one of the items defined by the term, e.g., "a and/or B" can be implemented as "a", or as "B", or as "a and B".
Optionally, the model training method and the text processing method provided in the embodiment of the present application may be implemented based on an Artificial Intelligence (AI) technology. For example, feature extraction of the text to be processed, determination of prediction uncertainty degree of the training sample, screening of the training sample, and recognition of the case relation of the text to be processed can be realized by a trained neural network model. AI is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. As artificial intelligence technology has been researched and developed in a wide variety of fields, it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and will play an increasingly important role.
Optionally, the model training and the text processing according to the embodiment of the present application may be implemented based on Cloud technology (Cloud technology), for example, the uncertainty calculation involved in the training of the neural network model and the data calculation involved in processing the data to be processed may be implemented by Cloud calculation. The cloud technology is a hosting technology for unifying series resources such as hardware, software, network and the like in a wide area network or a local area network to realize the calculation, storage, processing and sharing of data. Cloud computing refers to the mode of delivery and use of IT infrastructure, which refers to the acquisition of needed resources in an on-demand, easily scalable manner over a network. With the development of diversification of internet, real-time data stream and connecting equipment and the promotion of demands of search service, social network, mobile commerce, open collaboration and the like, cloud computing is rapidly developed. Different from the prior parallel distributed computing, the generation of cloud computing can promote the revolutionary change of the whole internet mode and the enterprise management mode in concept.
To make the objects, technical solutions and advantages of the present application more clear, the following detailed description of the embodiments of the present application will be made with reference to the accompanying drawings.
The terms referred to in this application will first be introduced and explained:
1) dropout, a method to prevent overfitting. At each training time, the model "discards" some model parameters with a probability p, and the model parameters at each "discard" are not identical, so that the model is trained to be a unique model in each training process.
2) The uncertainty degree refers to the degree of unsuspecting of the neural network model on the prediction result, and the relatively unreliable prediction result is filtered out according to the uncertainty degree in the related art.
3) The existing affair relation recognition model:
the Structured Point Model jointly optimizes event extraction and event time sequence relation extraction, uses a pre-training language Model BERT and a long-short term memory network (BilSTM) to extract features, and then uses a Structured SVM (Structured SVM) to constrain a prediction result. But the model only utilizes text information and ignores the statistical information of the data set, such as the distribution of each type of label.
An End-to-End reference model that uses a pre-trained language model BERT and a long-short term memory network LSTM to obtain a context representation of event pairs. To alleviate the data set class imbalance problem, the model incorporates probabilistic domain knowledge for the model using the Lagrangian relaxation method. The model is only limited to solving the problem of unbalanced data set categories, and distribution conditions among the labels are ignored.
The HGRU model uses a pre-trained language model (BERT) and a Long Short-Term Memory network (LSTM) to obtain a context Representation of event pairs, and also uses hyperbolic space to model asymmetric relationships between tags and also uses some timing knowledge to assist model prediction. The model cannot deal with the problem of insufficient training data, and overfitting is easily caused.
The above methods easily ignore the problem of insufficient training data, cannot utilize a large amount of label-free data, and easily cause model overfitting.
The present application provides a model training method, a text processing method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product, which are intended to solve the above technical problems in the prior art.
The technical solutions of the embodiments of the present application and the technical effects produced by the technical solutions of the present application will be described below through descriptions of several exemplary embodiments. It should be noted that the following embodiments may be referred to, referred to or combined with each other, and the description of the same terms, similar features, similar implementation steps and the like in different embodiments is not repeated.
Referring to fig. 1, fig. 1 is a schematic diagram of an application environment according to an embodiment of the present application, and as shown in the drawing, the application environment at least includes a server 01 and a terminal 02.
In the embodiment of the application, the server 01 may be configured to provide a background service for the terminal 01, specifically, may perform training processing on the incident relation recognition model, and send the trained incident relation recognition model to the terminal 02. The server 01 may include an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform.
In the embodiment of the present application, the terminal 02 may be configured to provide a user-oriented text processing service, and specifically, may implement recognition of a case relation based on a case relation recognition model trained by the server 01. Terminal 02 may include a smartphone, desktop computer, tablet computer, laptop computer, smart speaker, digital assistant, Augmented Reality (AR)/Virtual Reality (VR) device, smart wearable device, or other type of physical device. The physical device may also include software running in the physical device, such as an application program. The operating system running on the terminal 01 in the embodiment of the present application may include, but is not limited to, an android system, an IOS system, linux, windows, and the like.
In the embodiment of the present specification, the terminal 01 and the server 02 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.
In addition, in practical applications, the training process of the case relation recognition model may be implemented in the terminal 01, and in the embodiment of the present specification, it is preferable to implement the training process of the case relation recognition model in the server 01, so as to reduce the data processing pressure of the terminal and improve the device performance of the terminal for the user.
In a specific embodiment, when the server 02 is a distributed system, the distributed system may be a blockchain system, when the distributed system 100 is a blockchain system, the distributed system may be formed by a plurality of nodes (any form of computing device in an access network, such as a server and a user terminal), a Peer-to-Peer (P2P, Peerto Peer) network is formed between the nodes, and the P2P Protocol is an application layer Protocol running on a Transmission Control Protocol (TCP). In a distributed system, any machine, such as a server or a terminal, can join to become a node, and the node comprises a hardware layer, a middle layer, an operating system layer and an application layer. Specifically, the functions of each node in the blockchain system may include:
1) routing, a basic function that a node has, is used to support communication between nodes.
Besides the routing function, the node can also have the following functions:
2) the application is used for being deployed in a block chain, realizing specific services according to actual service requirements, recording data related to the realization functions to form recording data, carrying a digital signature in the recording data to represent a source of task data, and sending the recording data to other nodes in the block chain system, so that the other nodes add the recording data to a temporary block when the source and integrity of the recording data are verified successfully.
An embodiment of the present application provides a model training method, as shown in fig. 2, the method includes:
s101, obtaining a first sample set and a second sample set.
The first sample set of the embodiment of the application comprises a plurality of first sample texts marked with labels, and the second sample set comprises a plurality of second sample texts which are not marked with labels. The fields of the respective sample texts in the first sample set and the second sample set are not limited to a single field, and may cover as much as possible the respective fields, such as news, novels, academic papers, conversation records, comment information, and the like, and the text proportions between different fields are not specifically limited.
The labels of the first sample text characterize the matter relationships between the event information contained in the first sample text. The number of the event information included in each first sample text may be two, the case relationship is used to represent a logical relationship between two event information, and the specific type of the case relationship is not specifically limited in this application, and may include:
causality, a certain event information causes another event information to occur, the causality can be applied to scenes of causality tracing, cause finding and the like by obtaining the causality, for example, a certain first sample text "someday is busy with bird flu and causes the bird to rise in price", the text includes two event information "someplace bird flu" and "the bird rise in price", and "someplace bird flu" is a cause causing the bird to rise in price ", so the label of the first sample text is" causality ".
The condition relationship is obtained, and can be applied to a timing determination scene, for example, a first sample text "if a foreign currency rises in price, the gold should fall", the text includes two event information "a foreign currency rises in price" and "gold falls", and under the condition of triggering "a foreign currency rises in price", the "gold falls" occurs, so that the label of the first sample text is in the "condition relationship".
The method can be applied to scenes such as combing reverse teaching materials, preventing mismeasurement and the like by obtaining the reversal relationship, for example, a first sample text 'the stock price is slightly increased but the volume of the transaction is not much', the text comprises two event information 'the stock price is slightly increased' and 'the volume of the transaction is not much', and the label of the first sample text is 'the reversal relationship'.
In the case of the cis-bearing relationship, a certain event occurs next to another event information, and the cis-bearing relationship is obtained, so that the method can be applied to scenes such as event evolution, future intention identification and the like, for example, a certain first sample text "zhengzheng is beneficial to jun", the text comprises two pieces of time information "zhengzheng" and "jun is beneficial", when the event "zhengzheng" occurs, the event "jun (qin) is beneficial" occurs next, and therefore the label of the first sample text is the cis-bearing relationship.
The specific form of the corresponding label in the embodiments of the present application is not particularly limited, and may be, for example, numbers, letters, codes, and the like.
S102, training the initial case relation recognition model according to the first sample set to obtain a first case relation recognition model, and determining the uncertainty degree of the first case relation recognition model for each second sample text prediction.
Specifically, in the embodiment of the application, the initial case relation recognition model can be iteratively trained through a plurality of first sample texts marked with labels in the first sample set until a training end condition is met, and the neural network model meeting the training end condition is used as the first case relation recognition model. In the process of training the model, the input of the model is each first sample text, the prediction result of the event relationship between the event information contained in each first sample text, namely the prediction result of the model, is output, the training loss is calculated by the label (real result) corresponding to each first sample text and the prediction result, and the model parameters are adjusted based on the training loss.
The model of the initial case relation recognition model is not limited in the embodiment of the application and can be selected according to actual requirements. For example, the combination of the pre-trained language model RoBERTa and the bidirectional long-short term memory network BiLSTM can be used, and the pre-trained language model can be replaced by other models, such as ALBERT and the like.
After the first case relation recognition model is obtained, the uncertainty degree of each second sample text can be predicted through the first case relation recognition model, and the uncertainty degree is caused by the fact that training is incomplete due to too few training samples. The higher the uncertainty of the model prediction for a sample, the lower the accuracy of the prediction result, the less uncertainty of the model prediction for the sample, and conversely, the higher the accuracy of the prediction result, the more confidence of the model prediction for the sample, the lower the uncertainty of the model prediction for the sample.
S103, screening a plurality of third sample texts from the second sample texts according to the corresponding uncertainty degrees of the second sample texts.
Because the uncertainty degree can reflect the accuracy of the model for the sample prediction, high-quality sample texts (namely third sample texts) meeting retraining requirements can be screened out based on the uncertainty degree, specifically, some second sample texts with lower uncertainty degree can be screened out, the accuracy of the prediction result of the second sample texts with lower uncertainty degree is higher, the samples are taken as samples during retraining, the number of the samples can be rapidly expanded, the time for manual marking is greatly saved, certainly, some second sample texts with higher uncertainty degree can be screened out, and the model is trained by utilizing the second sample texts with higher uncertainty degree, so that the model has more outstanding robustness compared with a first physical relationship recognition model and a second physical relationship recognition model trained based on the second sample texts with lower uncertainty degree, to be more suitable for grammatically complex text recognition.
And S104, taking each first sample text and each third sample text with labels as target sample texts, and performing iterative training on the first case relation recognition model based on the plurality of target sample texts until a training stopping condition is met to obtain a trained second case relation recognition model.
In the process of training the first event relation recognition model, the input of the model is a target sample text, namely the first sample text and the third sample text, the output is an event relation prediction result between event information contained in the target sample text, namely a result predicted by the model, training loss is calculated through labels (real results) corresponding to all target sample texts and the prediction result, and model parameters are adjusted based on the training loss.
It should be understood that, before executing step S104, the method may further include inputting the third sample text into the first transaction relationship recognition model, obtaining the transaction relationship information between the event information included in the third sample text output by the first transaction relationship recognition model, and using the obtained transaction relationship information as a label of the third sample text.
The model training method of the embodiment of the application obtains a first sample set and a second sample set, the first sample set comprises a plurality of first sample texts marked with labels, the second sample set comprises a plurality of second sample texts not marked with labels, a first event relation recognition model is obtained through training based on the first sample set, the trained first event relation recognition model is used for determining the uncertainty degree predicted by aiming at each second sample text, a third sample text with low uncertainty degree is screened out from each second sample text by using the uncertainty degree, the number of samples can be rapidly expanded, the problem of insufficient marked data is relieved, the time of manual marking is greatly saved, certainly, some second sample texts with high uncertainty degree can be screened out, the model is trained by using the second sample text with high uncertainty degree, and then the first event relation is trained compared with the third sample text based on the first sample text and low uncertainty degree The second case relation recognition model obtained after the model is recognized has higher accuracy and robustness, and is more suitable for text recognition with complex grammar.
On the basis of the foregoing embodiments, as an alternative embodiment, the determining the uncertainty degree of the first genetic relationship identification model for each second sample text prediction includes:
s201, performing event relation recognition on each second sample text through the first event relation recognition model to obtain a corresponding first recognition result, and taking the first recognition result as a label of the corresponding second sample text.
It should be understood that the first recognition result, i.e., the first transaction relationship, recognizes the event relationship between the event information contained in each of the second sample texts output by the model.
S202, under the condition that the first affair relationship recognition model activates dropout, multiple times of affair relationship recognition are respectively carried out on each second sample text through the first affair relationship recognition model, and multiple second recognition results of each second sample text are obtained.
Referring to fig. 3, a schematic diagram of performing feed-forward operation T times in a case where dropout is activated by the first transaction relation recognition model according to the embodiment of the present application is exemplarily shown, and as shown in the figure, for the second sample text x u The parameters activated by the first event relation identification model during each feedforward operation are different, and the parameters of the first event relation identification model during the t-th feedforward operation are
Figure BDA0003619016500000131
The operation result of the t-th feedforward operation is recorded as
Figure BDA0003619016500000132
The method fuses operation results of T times to obtain a final denotation result:
Figure BDA0003619016500000133
wherein c' represents the recognition result output each time, and c represents the first event relation recognition model aiming at x under the condition of not activating dropout u The first recognition result output, argmax, is a function that functions the (set of) parameters of the function: when there is another function y ═ f (x), if there is a result x0 ═ argmax (f (x)), this means that when the function f (x) takes x ═ x0, the maximum value of the value range of f (x) is obtained.
S203, determining the uncertainty degree of the first event relation recognition model for the prediction of each second sample text according to the plurality of second recognition results of each second sample text.
When the model returns the unlabeled data, the more "self-confident" the model, the higher the matching degree of the prediction result for the unlabeled data and the label, so for each second sample text, the embodiment of the present application may determine the uncertainty degree of the first fact relation recognition model for the prediction of the second sample text by counting the matching degree of each second recognition result and the label corresponding to the second sample text and then according to all the matching degrees corresponding to the second sample text.
Specifically, the matching degree in the embodiment of the present application may be represented by a probability that a second recognition result output when the first factual relation recognition model activates dropout is a tag. In some embodiments, the degree of uncertainty of the first pattern for the prediction of the second sample text may be expressed as a degree of dispersion of the probability distribution, considering that the more dispersion of the distribution of the second recognition result, the less confident, i.e., the more uncertain, the second sample text is predicted by the first pattern. The embodiment of the present application does not specifically limit how to characterize the degree of dispersion, and for example, may include characterizing at least one of expectation, variance, standard deviation, and mean square deviation.
When the discrete degree of each second recognition result of the second sample text is characterized by the expectation and the variance (and also the uncertainty degree of the first event relation recognition model for the second sample text prediction), the following formula can be specifically used for representing:
Figure BDA0003619016500000141
Figure BDA0003619016500000142
where var (y) represents a variance of a matching degree (which may also be understood as a probability p that each second recognition result is a label) between each second recognition result of the second sample text and the label; e (y) an expectation of the degree of matching of each second recognition result of the second sample text with the tag.
According to the method and the device for determining the uncertainty degree of the first fact relation recognition model for the prediction of each second sample text, under the condition that the first fact relation recognition model activates dropout, the first fact relation recognition model carries out multiple times of matter relation recognition on each second sample text to obtain multiple second recognition results of each second sample text, and because parameters in the first fact relation recognition model are different when each second recognition result is obtained and the obtained second recognition results are different, the uncertainty degree of the first fact relation recognition model for the prediction of each second sample text can be determined based on the distribution situation of the multiple second recognition results.
In order to better deal with the situation that the label of the third sample text is obtained by prediction rather than the error caused by labeling according to the actual situation, the embodiment of the application can consider the influence of the sample label accuracy on the training target when the first event relation recognition model is retrained, and set higher weight for the training target of the sample with more concentrated recognition result distribution, so that the model can pay more attention to the more accurate recognized sample, and the prediction accuracy of the model is improved.
In some embodiments, the present application further comprises: for a third sample text in the plurality of target samples, determining a discrete degree of a plurality of second recognition results of the third sample text.
Specifically, the embodiment of the application can represent the discrete degree by information such as variance, covariance, standard deviation and the like.
Referring to fig. 4, which schematically illustrates a flow chart of iteratively training a first event relationship recognition model based on a plurality of target sample texts according to an embodiment of the present application, as shown in the drawing, each first sample text and each third sample text in the plurality of target samples are respectively input to the first event relationship recognition model, and a predicted recognition result of each first sample text and a predicted recognition result of each third sample text are obtained;
determining a first target value according to the predicted identification result of each first sample text in the plurality of target samples and the corresponding label;
determining the weight corresponding to each third sample text in the plurality of target samples according to the discrete degree corresponding to each third sample text in the plurality of target samples, determining an initial second target value according to the predicted recognition result of the third sample text and the corresponding label for each third sample text in the plurality of target samples, and weighting the initial second target value according to the weight corresponding to the third sample text to obtain a second target value;
and obtaining a training target value of the first event relation recognition model according to the first target value and the second target value, adjusting the parameters if the training target value does not meet the conditions, and continuing training based on the target sample text and the adjusted parameters.
It should be understood that the embodiment of the present application is equivalent to setting a weight of 1 (weight 1) for each first sample corresponding to the first target value, while a weight determined based on the degree of dispersion (weight 2) is set for the initial second target value of each third sample text, and the weight is inversely proportional to the degree of dispersion, i.e. if the degree of dispersion of the third sample text is larger, the smaller the weight obtained, if the weight 2 is less than 1, then in calculating the magnitude of the training target value, the first target value plays a larger role, namely the model in training pays more attention to the first sample text, on the contrary, if the weight 2 is greater than 1, the model in training pays more attention to the second sample text, and the noise problem in the echo data can be effectively solved by setting the weight to the second target value corresponding to the third sample text.
The weight set in the embodiment of the present application may determine a uniform weight for all third sample texts based on the discrete degrees corresponding to all third sample texts, or set a corresponding weight for each third sample text based on the discrete degree corresponding to the third sample text. Further, if the uniform weight is determined for all the third sample texts, the corresponding discrete degrees of a plurality of (or all) third sample texts may be obtained first, then an average value is taken for the plurality of discrete degrees, and then the weight is obtained based on the average value of the discrete degrees, where the obtained weight is the uniform weight applicable to all the third sample texts.
In some embodiments, the first target value is used to describe a degree of matching between the predicted recognition result of each first sample text and the corresponding tag, and may specifically be characterized based on a probability that the predicted recognition result of each first sample text is the corresponding tag, and the initial second target value is used to describe a degree of matching between the predicted recognition result of each third sample text and the corresponding tag, and may specifically be characterized based on a probability that the predicted recognition result of each third sample text is the corresponding tag.
According to the embodiment of the application, the first target value is determined according to the predicted recognition result and the corresponding label of each first sample text, the initial second target value is determined according to the predicted recognition result and the corresponding label of each third sample text, and each initial second target value is weighted based on the dispersion degree of the second recognition result of each third sample text, so that the model can pay more attention to the samples with lower prediction uncertainty degree during training, and the accuracy of the model is further improved.
On the basis of the foregoing embodiments, as an optional embodiment, determining a weight corresponding to each third sample text in the plurality of target samples according to a discrete degree corresponding to each third sample text includes:
and for each third sample text in the plurality of target samples, determining the weight of the third sample text according to the discrete degree corresponding to the third sample text.
It should be noted that, in order to enable the model to pay attention to each third sample text more finely and specifically, in the embodiment of the present application, for each third sample text, the weight of the third sample text is determined based on the discrete degree of each second recognition result of the third sample text.
In some embodiments, the dispersion degree may be characterized by a variance, that is, the weight of the third sample text is determined by calculating a variance of the matching degree between each second recognition result of the third sample text and a preset tag.
In some embodiments, the calculation formula of the training target value may be expressed as:
Figure BDA0003619016500000171
Figure BDA0003619016500000172
wherein x is l Representing a first set of samples D l The first sample text of (a); y is l Denotes x l The label of (1); p (y) l |x l W) denotes the input x in the case of a model parameter W l The recognition result output by the model is y l Probability of (x) u Representing a third sample text, S u A set of third sample texts selected from the second sample set, and var (y) an input of x in the case where dropout is activated by the first event relationship recognition model u And each second identification result output in the T times of feedforward operation is the variance of the probability of the label y (wherein the model parameter in the T time of feedforward operation is
Figure BDA0003619016500000173
Distribution q θ (W)) of model parameters satisfying T feedforward calculations, and p (y | x) u W) denotes the input x in the case of a model parameter W u And the recognition result output by the model is the probability of y. When the training target value reaches a minimum value, the iterative training is stopped.
In some embodiments, the present application sets weights of the initial second objective values in different manners, trains the second case relationship recognition model based on the second objective values after the weights are set, and counts performance of the second case relationship model trained in each manner, respectively, see table 1:
Figure BDA0003619016500000181
TABLE 1 comparison of the effects of different weighting methods on the F1 index
Mean in table 1 means that all third sample texts are weighted by 1; probasic refers to the degree of match (i.e. Probability) of the prediction result with the label as the training weight of the sample; unrnterainty refers to training weight with the inverse of the dispersion degree of the matching degree as a sample. The performance of the sample weight assignment method proposed in the present application is greatly superior to that of the method of setting the weight to 1, which also illustrates the effectiveness of the method proposed in the present application.
According to the uncertainty degree corresponding to each second sample text, screening a plurality of third sample texts from each second sample text, including:
and sequencing according to the size relationship of the uncertainty degrees corresponding to all the second sample texts, wherein two sequencing modes exist during sequencing, one sequencing mode is that the uncertainty degrees are sequenced from small to large, and the other sequencing mode is that the uncertainty degrees are sequenced from large to small, and two sequencing modes lead to two sequencing results, so if the quantity of training samples is expected to be enlarged as soon as possible, a part of the second sample texts with lower uncertainty degrees can be selected as third sample texts, and if the second case relation recognition model is expected to have more outstanding robustness, a part of the second sample texts with higher uncertainty degrees can be selected as third sample texts.
If the model always focuses on a sample with low uncertainty, the prediction performance of the model is improved but the improved space is limited, and if the model always focuses on a sample with high uncertainty, which is difficult, the prediction performance of the model is affected, and in order to balance the profit and the prediction performance of the model, on the basis of the above embodiments, as an optional embodiment, a plurality of third sample texts are screened from each second sample text according to the corresponding uncertainty degree of each second sample text, including:
determining the probability of taking each second sample text as a third sample text according to the size relation of the uncertainty degrees corresponding to all the second sample texts;
and screening a preset number of second sample texts as third sample texts according to the corresponding probability of each second sample text.
According to the embodiment of the application, by analyzing the size relation of the uncertainty degrees corresponding to all the second sample texts, the second sample texts with higher uncertainty degrees can be set with higher probability according to needs, the second sample texts with lower uncertainty degrees can also be set with higher probability, and the second sample texts with higher probability have more possibility as third sample texts compared with the second sample texts with lower probability, so that the uncertainty is increased compared with a sorting mode. After the corresponding probabilities of the second sample texts are set, sampling of the samples is extracted based on probability theory and a random principle, and the corresponding probabilities of the second sample texts are extracted to serve as third sample texts.
Each third sample text obtained by the application is not repeated, so that after one second sample text is determined as a new third sample text each time, in order to avoid the problem that the repeated third sample text is discarded before being re-sampled when being extracted, the application screens the probability samples that the third sample text can be sampled and not put back, and after the new third sample text is screened from the second sample text set each time, the probability that each remaining third sample text pair is used as the third sample text can be re-calculated based on the corresponding uncertainty of the remaining second sample text in the second sample text set until the number of the screened third sample texts meets the preset condition.
Referring to fig. 5, a schematic diagram of screening a third sample text based on probability is exemplarily shown in this embodiment of the present application, as shown in the figure, a second sample set includes 5 second sample texts, and two second sample texts need to be determined from the 5 second sample texts.
After the uncertainty degrees corresponding to the 5 sample texts are determined, the probabilities corresponding to the 5 second sample texts are determined based on the magnitude relation of the uncertainty degrees corresponding to the 5 second sample texts, in the embodiment of the application, the higher the uncertainty degree is, the lower the corresponding probability is, when a third sample text is screened for the first time, sampling is performed based on the probabilities corresponding to the 5 second sample texts to obtain a first third sample text, then the probabilities corresponding to the 4 second sample texts are updated based on the magnitude relation of the uncertainty degrees corresponding to the remaining 4 second sample texts, and then sampling is performed with the updated probabilities corresponding to the 4 second sample texts to obtain a second third sample text.
The third sample texts screened out based on the probabilities in the embodiment of the application are not like a sorting mode, and either the third sample texts with lower uncertainty degrees or the third sample texts with higher uncertainty degrees are obtained, so that the benefits and the prediction performance of the model can be balanced.
In some embodiments, the present application selects third sample texts from the second sample sets in different manners, respectively, trains the second case relation recognition model based on the selected third sample texts, and respectively counts the performance of the second case relation models trained in the various manners, please refer to table 2:
Figure BDA0003619016500000201
TABLE 2 comparison of the impact of different sample selection methods on the F1 index
Random in table 2 refers to randomly selecting a third sample text; probasic refers to selecting a third sample text according to the predicted Probability; hard refers to the selection difficulty sample proposed in this application; easy refers to the simple sample of choice proposed by the present invention, MATRES and TB-Dense are two remote supervised relationship extraction datasets, 30% and 40% in the table, meaning that there is 30% or 40% of the first sample text in the dataset. It can be seen that the performance of all three sample selection methods proposed in the embodiments of the present application is superior to that of the random selection method, which also illustrates the effectiveness of the methods proposed in the embodiments of the present application.
On the basis of the foregoing embodiments, as an optional embodiment, the determining the probability of using each second sample text as the third sample text includes:
and taking the ratio of the uncertainty degree corresponding to the second sample text to the sum of the uncertainty degrees corresponding to all the second sample texts as the probability of taking the second sample text as the third sample text.
By the method, the sum of the probabilities of all the second sample texts as the third sample texts is 100%, that is, one second sample text is necessarily extracted as the third sample text in one sampling process. In some embodiments, the degree of uncertainty may be determined from a Bayesian Active Learning by deviation (BALD) indicator.
The calculation formula of the BALD index can be expressed as:
Figure BDA0003619016500000202
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003619016500000211
i.e. in t feedforward operations (in this case the model parameters are
Figure BDA0003619016500000212
) Inputting a third sample text x u The second recognition result y is output u Probability of label c, D u Representing a second sample set, it should be understood that label c is for x in case dropout is not activated for the first transaction relationship identification model u And outputting the identification result.
It should be understood that the present application can characterize the degree of uncertainty either directly in terms of the BALD index or in terms of the value of (1-BALD index). When the uncertainty level is characterized by a BALD index, the greater the value of BALD, the greater the corresponding uncertainty level, and when the uncertainty level is characterized by a (1-BALD index), the greater the value of BALD, the smaller the corresponding uncertainty level.
Specifically, when the uncertainty degree is characterized by the BALD index, the calculation formula of the probability that the second sample text u is used as the third sample text may be represented as:
Figure BDA0003619016500000213
when the uncertainty degree is characterized by (1-BALD index), the calculation formula of the probability that the second sample text u is used as the third sample text can be expressed as:
Figure BDA0003619016500000214
the embodiment of the application can use one of the sample texts to select the third sample text for retraining the first event relation recognition model.
Referring to fig. 6, a flow chart of a model training method according to another embodiment of the present application is exemplarily shown, and as shown, the method includes the following steps:
acquiring a first sample set and a second sample set, wherein the first sample set comprises a plurality of first sample texts (labeled data) labeled with labels, and the second sample set comprises a plurality of second sample texts (unlabeled data) not labeled with labels;
training an initial incident relation recognition model through a first sample set to obtain a first incident relation recognition model;
performing case relation recognition on each second sample text through the first case relation recognition model to obtain a corresponding first recognition result, and taking the first recognition result as a label of the corresponding second sample text;
determining the uncertainty degree of the first incident relation recognition model for each second sample text prediction, screening a plurality of third sample texts from each second sample text according to the uncertainty degree corresponding to each second sample text to enlarge the scale of a training sample, taking each first sample text with a label and each third sample text as target sample texts, and carrying out iterative training on the first incident relation recognition model based on the plurality of target sample texts until the training stopping condition is met to obtain the trained second incident relation recognition model.
The process of screening the third sample text and performing iterative training on the first event relationship recognition model in this embodiment mainly includes 3 key processes:
(1) uncertainty Estimation (uncertainties): estimating the degree of uncertainty of a model using a method of activating dropout
For the second sample text x u The first event relation identification model executes feedforward operation T times under the condition of activating dropout, and the parameter of the model at the T-th feedforward operation is
Figure BDA0003619016500000221
The operation result is:
Figure BDA0003619016500000222
and fusing the operation results of the T times to obtain a final prediction result y which is as follows:
Figure BDA0003619016500000223
for each second sample text, determining the matching degree of the plurality of second recognition results of the second sample text and the label, specifically, representing by using the probability p that the second recognition result is the label, and after obtaining the matching degree corresponding to each second recognition result, further determining the uncertainty degree of the first transaction relationship recognition model for the prediction of the second sample text by using a BALD index. As can be seen from the calculation formula of the BALD index of the foregoing embodiment, the BALD index is calculated according to the expectation of each second recognition result.
(2) Sample Selection with expansion (Sample Selection): two sample selection modes are designed, and a simple sample and a difficult sample are selected for utilization respectively according to a sequencing mode and a probability sampling mode;
because the lower the uncertainty degree is, the stronger the confidence of representing model prediction is, simple samples or complex samples, i.e., samples with lower uncertainty degree, can be screened from the second sample set as the third sample text by using the uncertainty degree as an index.
The third sample texts can be screened by adopting a sorting mode and a probability sampling mode, specifically, when the sorting mode is adopted, the uncertainty degrees of the second sample texts can be sorted, and a preset number of second sample texts are sequentially selected from a sorting result as the third sample texts;
when a probability sampling mode is adopted, the ratio of the uncertainty degree corresponding to the second sample text to the sum of the uncertainty degrees corresponding to all the second sample texts can be used as the probability that the second sample text is used as the third sample text, and after the probability corresponding to each second sample text is determined, sampling can be performed based on the probability.
(3) uncertainty-Aware Learning (Uncertain-Aware Learning): the uncertainty of model prediction is blended into the trained objective function, so that the model focuses more on samples with higher backstandings, and the fitting of noise data is reduced.
Determining the weight corresponding to each third sample text according to the discrete degree corresponding to each third sample text; inputting a plurality of target samples (including a first sample text and a third sample text) into a first event relation recognition model, and respectively obtaining prediction recognition results of the plurality of target samples; determining a first target value according to the prediction identification result of each first sample text and the corresponding label; for each third sample text, determining an initial second target value according to the predicted identification result of the third sample text and the corresponding label, and weighting the initial second target value according to the weight corresponding to the third sample text to obtain a second target value; and obtaining a training target value of the first event relation recognition model according to the first target value and the second target value, adjusting the parameters if the training target value does not meet the conditions, and continuing training based on the target sample text and the adjusted parameters.
Specifically, for a third sample text in the plurality of target samples, determining a discrete degree of a plurality of second recognition results of the third sample text, wherein the discrete degree can be represented by a variance;
the embodiment of the application aims at that the weight of the third sample text in the target sample is in inverse proportion to the size of the discrete degree, and the weight has two calculation modes:
(1) a uniform weight is set for all third sample texts: specifically, the discrete degrees corresponding to all the third sample texts may be averaged, and then a weight may be obtained based on the average discrete degree.
(2) And setting personalized weight for each third sample text, namely not calculating the average value of the dispersion degrees corresponding to all the third sample texts, but directly obtaining the corresponding weight according to the dispersion degree corresponding to the third sample text. May be embodied in
Figure BDA0003619016500000241
Weights are calculated, where Var (y) represents the variance.
Referring to fig. 7, a flowchart of a text processing method according to an embodiment of the present application is exemplarily shown, and as shown in the drawing, the text processing method includes:
and acquiring a text to be recognized.
And inputting the text to be recognized into the trained case relationship recognition model to obtain the case relationship among the event information contained in the text to be recognized, wherein the case relationship recognition model is trained according to the embodiments of the model training methods.
It should be noted that the text to be recognized is not limited in particular to the embodiment of the present application, and may be news, novels, academic papers, conversation records, comment information, and so on. According to the embodiment of the application, the triples can be further obtained by sorting according to the output event relation, and the triples comprise two pieces of event information and the type of the event relation. It should be understood that when more than two event information are included in the text to be recognized, the case relationship between two event information may be recognized by the case relationship recognition model.
The method provided by the embodiment of the application can be applied to any scene needing to identify the affair relation in the text, for example, the scene can be an automatic question and answer scene, a hot news identification scene and the like. In order to better understand and explain the method and the effect thereof provided by the embodiments of the present application, an alternative implementation of the solution provided by the present application is described below with reference to a specific application scenario. The application scenario is a game scenario.
Fig. 8 is a schematic structural diagram of a text processing system applicable in this scenario embodiment of the present application, and as shown in fig. 8, the text processing system includes a user terminal, a sample text database, a model training server, a verification platform, a case relation recognition server, and an intelligent chat server.
The user terminal can be connected with the intelligent chat server in a communication mode through a network, and the user terminal can be the user terminal of any consumer. The embodiment of the application does not limit the type of the chat application running on the user terminal, the chat application can be a chat application which needs to be downloaded and installed by a user, the chat application can also be a cloud chat application, the chat application can also be a chat application in a small program, when the chat application runs, the user terminal sends chat information to a question and answer server through a network, an intelligent chat server receives the chat information, generates a text to be identified according to desensitized chat information and sends the text to be identified to a case relation identification server, the case relation identification server obtains the case relation among event information contained in the text to be identified based on a trained case relation identification model, the case relation identification server returns the identified case relation to the intelligent chat server, and the intelligent chat server returns response information according to the case relation, for example, the chat information is that "i think about that i know the first name of i me", the contained event information comprises 'casualty' and 'first name is considered', the fact relation of the two event information is a causal relation, the intelligent chatting server determines that the user wants to emphasize why the user is happy according to the two event information and the fact relation, and therefore, the intelligent chatting server generates relevant response information according to 'first name is considered', and does not generate response information relevant to 'happy', such as 'what your is happy', so that the user can feel distracted, and the enthusiasm of chatting of the user is eliminated. The intelligent chat server can also instruct the user to send feedback information when the response information is returned, and the feedback information is used for indicating the satisfaction degree of the accuracy of the response information. If the satisfaction degree is higher than the preset threshold value, the intelligent chat server sends desensitized chat information and the fact relation information contained in the chat information to a verification platform as sample texts to be verified, a sample auditor manually judges whether the fact relation is accurate or not by logging in the verification platform, if so, a first sample text is constructed according to the chat information and the corresponding fact relation, and the first sample text is sent to a sample text database.
The sample text database stores a plurality of labeled samples (first sample texts) labeled with labels and a plurality of unlabeled samples (second sample texts) not labeled with labels. And when the number of the updated samples in the text database meets the preset condition, sending the updated samples (comprising a plurality of first sample texts and a plurality of second sample texts) to the model training server.
The model training server trains an initial case relation recognition model according to the plurality of first sample texts to obtain a first case relation recognition model, performs case relation recognition on each second sample text through the first case relation recognition model to obtain a corresponding first recognition result, and takes the first recognition result as a label of the corresponding second sample text; under the condition that the first affair relation recognition model activates dropout, performing multiple times of affair relation recognition on each second sample text through the first affair relation recognition model to obtain multiple second recognition results of each second sample text; determining the uncertainty degree of the first event relation recognition model for the prediction of each second sample text according to a plurality of second recognition results of each second sample text, and determining the probability of taking each second sample text as a third sample text according to the magnitude relation of the uncertainty degrees corresponding to all the second sample texts; screening a preset number of second sample texts as third sample texts according to the corresponding probability of each second sample text; for a third sample text in the target samples, determining the dispersion degree of a plurality of second recognition results of the third sample text; determining the weight corresponding to each third sample text according to the discrete degree corresponding to each third sample text in the target samples; wherein the weight is inversely proportional to the magnitude of the degree of dispersion; inputting a plurality of target samples into a first event relation recognition model, and respectively obtaining the prediction recognition results of the plurality of target samples; determining a first target value according to the predicted identification result of each first sample text in the plurality of target samples and the corresponding label; for each third sample text in the multiple target samples, determining an initial second target value according to the predicted recognition result of the third sample text and the corresponding label, and weighting the initial second target value according to the weight corresponding to the third sample text to obtain a second target value; and obtaining a training target value of the first affair relation recognition model according to the first target value and the second target value, adjusting the parameters if the training target value does not meet the conditions, and continuing training based on the target sample text and the adjusted parameters until a trained second affair relation recognition model is obtained, so that the affair relation recognition server can determine the affair relation in the text to be recognized by calling the second affair relation recognition model.
In order to prove the effectiveness of the embodiment of the present application, the model Training method (uantain aware algorithm, UAST) provided by the present application is further compared with several existing model Training methods, a case relationship recognition model is trained based on a public remote supervision relationship extraction data set (including), and the effects of each case relationship recognition model are compared, and the comparison result is shown in table 3:
Figure BDA0003619016500000271
TABLE 3 Overall Effect comparison Table
PLM + BilSTM in Table 3 refers to extracting context representation of event pairs by using a pre-trained language model and a bidirectional long-short term memory network, and then classifying by using a softmax classifier; han et al, extracting context representation of event pairs by using a pre-training language model and a bidirectional long-short term memory network, and then ensuring the consistency of model prediction by using a structured prediction network; Mean-Teacher: the method is a common semi-supervised learning method, and the prediction results of different varieties of the model for the same input data are as consistent as possible; self-tracing: the method is also a common semi-supervised learning method, firstly a small amount of labeled data is used for training a model, then the trained model is used for labeling the unlabeled data, and secondly the labeled data and the original labeled data are mixed together for training the model. And repeating the steps until the model converges. Table 3 shows that the method provided in the embodiment of the present application is superior to the previous method in all the indexes (Precision, Recall, and comprehensive evaluation index F1) of the international published data set.
An embodiment of the present application provides a model training apparatus, as shown in fig. 9, the model training apparatus may include: a sample set acquisition module 101, an uncertainty calculation module 102, a sample screening module 103, and a retraining module 104, wherein,
a sample set obtaining module 101, configured to obtain a first sample set and a second sample set, where the first sample set includes a plurality of first sample texts marked with labels, the second sample set includes a plurality of second sample texts not marked with labels, and the labels of the first sample texts represent the physical relationships among the event information included in the first sample set;
the uncertainty calculation module 102 is configured to train an initial case relation recognition model according to the first sample set, obtain a first case relation recognition model, and determine an uncertainty degree of the first case relation recognition model for each second sample text prediction;
the sample screening module 103 is configured to screen a plurality of third sample texts from each second sample text according to the uncertainty degree corresponding to each second sample text;
and the retraining module 104 is configured to take each first sample text and each third sample text with labels as target sample texts, perform iterative training on the first case relation recognition model based on the multiple target sample texts until a training stop condition is met, and obtain a trained second case relation recognition model.
The apparatus of the embodiment of the present application can execute the model training method provided by the embodiment of the present application, and the implementation principle is similar, the actions executed by the modules in the apparatus of the embodiments of the present application correspond to the steps in the model training method of the embodiments of the present application, and for the detailed functional description of the modules of the apparatus, reference may be specifically made to the description in the corresponding method shown in the foregoing, and details are not repeated here.
Compared with the prior art, the method comprises the steps of obtaining a first sample set and a second sample set, wherein the first sample set comprises a plurality of first sample texts marked with labels, the second sample set comprises a plurality of second sample texts which are not marked with labels, training on the basis of the first sample set to obtain a first event relation recognition model, determining the predicted uncertainty degree of each second sample text by using the trained first event relation recognition model, and screening out a third sample text with low uncertainty degree from each second sample text by using the uncertainty degree, so that the number of samples can be rapidly expanded, the problem of insufficient labeled data can be solved, the time of manual labeling can be greatly saved, certainly, some second sample texts with high uncertainty degrees can be screened out, the second sample text training model with high uncertainty degree is used, and compared with the third sample text based on the first sample text and the third sample text with low uncertainty degree and then the first event relation recognition model is trained The second case relation recognition model has higher accuracy and robustness and is more suitable for text recognition with complex grammar.
As an alternative embodiment, the uncertainty calculation module, when determining the degree of uncertainty of the first genetic relationship identification model for each of the second sample text predictions, is configured to:
performing case relation recognition on each second sample text through the first case relation recognition model to obtain a corresponding first recognition result, and taking the first recognition result as a label of the corresponding second sample text;
under the condition that the first affair relation recognition model activates dropout, performing multiple times of affair relation recognition on each second sample text through the first affair relation recognition model to obtain multiple second recognition results of each second sample text;
and determining the uncertainty degree of the first event relation recognition model for each second sample text prediction according to a plurality of second recognition results of each second sample text.
As an alternative embodiment, the model training apparatus is further configured to: for a third sample text in the plurality of target samples, determining the discrete degrees of a plurality of second recognition results of the third sample text;
the retraining module is used for performing iterative training on the first event relation recognition model based on a plurality of target sample texts:
determining the weight corresponding to each third sample text according to the discrete degree corresponding to each third sample text in the target samples; wherein the weight is inversely proportional to the magnitude of the degree of dispersion;
inputting a plurality of target samples into a first event relation recognition model, and respectively obtaining the prediction recognition results of the plurality of target samples;
determining a first target value according to the predicted identification result of each first sample text in the plurality of target samples and the corresponding label;
for each third sample text in the multiple target samples, determining an initial second target value according to the predicted identification result of the third sample text and the corresponding label, and weighting the initial second target value according to the weight corresponding to the third sample text to obtain a second target value;
and obtaining a training target value of the first event relation recognition model according to the first target value and the second target value, adjusting the parameters if the training target value does not meet the conditions, and continuing training based on the target sample text and the adjusted parameters.
As an optional embodiment, when determining the weight corresponding to each third sample text in the multiple target samples according to the discrete degree corresponding to each third sample text, the retraining module is configured to: and for each third sample text in the plurality of target samples, determining the weight of the third sample text according to the discrete degree corresponding to the third sample text.
As an optional embodiment, when the sample screening module screens a plurality of third sample texts from each second sample text according to the uncertainty level corresponding to each second sample text, the sample screening module is configured to:
and sorting according to the size relation of the uncertainty degrees corresponding to all the second sample texts, and screening a preset number of second sample texts as third sample texts according to a sorting result.
As an optional embodiment, when the sample screening module screens a plurality of third sample texts from each second sample text according to the uncertainty level corresponding to each second sample text, the sample screening module is configured to:
determining the probability of taking each second sample text as a third sample text according to the size relation of the uncertainty degrees corresponding to all the second sample texts;
and screening a preset number of second sample texts as third sample texts according to the corresponding probability of each second sample text.
As an alternative embodiment, the sample screening module when determining the probability of each second sample text as a third sample text is configured to:
and taking the ratio of the uncertainty degree corresponding to the second sample text to the sum of the uncertainty degrees corresponding to all the second sample texts as the probability of taking the second sample text as the third sample text.
An embodiment of the present application provides a text processing apparatus, and as shown in fig. 10, the model training apparatus may include: a text acquisition module 201 and a case relation analysis module 202, wherein,
a text acquisition module 201, configured to acquire a text to be identified;
the case relation analysis module 202 is configured to input the text to be recognized into the trained case relation recognition model, so as to obtain a case relation between event information included in the text to be recognized;
the trained case relation recognition model is obtained by training by adopting the model training device.
An embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory, where the processor executes the computer program to implement the steps of the model training method and/or the text processing method, and as compared with the related art, the electronic device may implement: by obtaining a first sample set and a second sample set, wherein the first sample set comprises a plurality of first sample texts marked with labels, the second sample set comprises a plurality of second sample texts which are not marked with labels, a first event relation recognition model is obtained by training based on the first sample set, the uncertainty degree of each second sample text prediction is determined by using the trained first event relation recognition model, and a third sample text with low uncertainty degree is screened out from each second sample text by using the uncertainty degree, so that the number of samples can be rapidly expanded, the problem of insufficient marked data can be solved, the time of manual marking can be greatly saved, certainly, some second sample texts with high uncertainty degree can be screened out, the second sample text with high uncertainty is used for training the model, and compared with a second event relation recognition model obtained by retraining the first event relation recognition model based on the first sample text and the third sample text with low uncertainty degree The recognition model has higher accuracy and robustness, so that the recognition model is more suitable for text recognition with complex grammar.
In an alternative embodiment, an electronic device is provided, as shown in fig. 11, an electronic device 4000 shown in fig. 11 includes: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further include a transceiver 4004, and the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data. It should be noted that the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 11, but this is not intended to represent only one bus or type of bus.
The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer, without limitation.
The memory 4003 is used for storing computer programs for executing the embodiments of the present application, and is controlled by the processor 4001 to execute. The processor 4001 is used to execute computer programs stored in the memory 4003 to implement the steps shown in the foregoing method embodiments.
Compared with the prior art, the method comprises the steps of obtaining a first sample set and a second sample set, wherein the first sample set comprises a plurality of first sample texts marked with labels, the second sample set comprises a plurality of second sample texts not marked with labels, training on the basis of the first sample set to obtain a first event relation recognition model, determining the uncertainty degree of prediction of each second sample text by using the trained first event relation recognition model, and screening out a third sample text with low uncertainty degree from each second sample text by using the uncertainty degree, so that the number of samples can be rapidly expanded, and the problem of insufficient marked data can be solved, the time of manual labeling is greatly saved, certainly, some second sample texts with higher uncertainty degree can be screened out, the second sample texts with higher uncertainty degree are used for training the model, and compared with a second affair relationship recognition model obtained after a first affair relationship recognition model is retrained based on the first sample texts and a third sample text with lower uncertainty degree, the second affair relationship recognition model has higher accuracy and robustness and is more suitable for text recognition with complex grammar.
Embodiments of the present application further provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the steps and corresponding contents of the foregoing method embodiments can be implemented. Compared with the prior art, the method has the advantages that the first sample set and the second sample set are obtained, the first sample set comprises a plurality of first sample texts marked with labels, the second sample set comprises a plurality of second sample texts not marked with labels, the first event relation recognition model is obtained through training based on the first sample set, the uncertainty degree predicted by each second sample text is determined through the trained first event relation recognition model, the third sample text with low uncertainty degree is screened out from each second sample text through the uncertainty degree, the number of samples can be expanded rapidly, the problem of insufficient labeled data is solved, the time of manual labeling is saved greatly, certainly, some second sample texts with high uncertainty degree can be screened out, the second sample texts with high uncertainty degree are used for training the model, and compared with the third sample texts based on the first sample texts and low degree, the first event relation recognition model is obtained through retraining the third sample texts The second case relation recognition model has higher accuracy and robustness and is more suitable for text recognition with complex grammar.
The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than described or illustrated herein.
It should be understood that, although each operation step is indicated by an arrow in the flowchart of the embodiment of the present application, the implementation order of the steps is not limited to the order indicated by the arrow. In some implementation scenarios of the embodiments of the present application, the implementation steps in the flowcharts may be performed in other sequences as desired, unless explicitly stated otherwise herein. In addition, some or all of the steps in each flowchart may include multiple sub-steps or multiple stages based on an actual implementation scenario. Some or all of these sub-steps or stages may be performed at the same time, or each of these sub-steps or stages may be performed at different times, respectively. Under the scenario that the execution time is different, the execution sequence of the sub-steps or phases may be flexibly configured according to the requirement, which is not limited in the embodiment of the present application.
The above are only optional embodiments of partial implementation scenarios in the present application, and it should be noted that, for those skilled in the art, other similar implementation means based on the technical idea of the present application are also within the scope of protection of the embodiments of the present application without departing from the technical idea of the present application.

Claims (13)

1. A method of model training, comprising:
acquiring a first sample set and a second sample set, wherein the first sample set comprises a plurality of first sample texts marked with labels, the second sample set comprises a plurality of second sample texts which are not marked with labels, and the labels of the first sample texts represent the matter relation among the event information contained in the first sample set;
training an initial case relation recognition model according to the first sample set to obtain a first case relation recognition model, and determining the uncertainty degree of the first case relation recognition model for each second sample text prediction;
screening a plurality of third sample texts from each second sample text according to the corresponding uncertainty degree of each second sample text;
and taking each first sample text and each third sample text with labels as target sample texts, and performing iterative training on the first case relation recognition model based on a plurality of target sample texts until a training stopping condition is met to obtain a trained second case relation recognition model.
2. The method of claim 1, wherein determining a degree of uncertainty of the first transaction relationship identification model for each of the second sample text predictions comprises:
performing event relation recognition on each second sample text through the first event relation recognition model to obtain a corresponding first recognition result, and taking the first recognition result as a label of the corresponding second sample text;
under the condition that the first transaction relationship recognition model activates dropout, performing multiple times of transaction relationship recognition on each second sample text through the first transaction relationship recognition model to obtain multiple second recognition results of each second sample text;
and determining the uncertainty degree of the first event relation recognition model for each second sample text prediction according to a plurality of second recognition results of each second sample text.
3. The method of claim 2, further comprising: for a third sample text in a plurality of target samples, determining the discrete degrees of a plurality of second recognition results of the third sample text;
the iterative training of the first transaction relationship recognition model based on the plurality of target sample texts comprises:
determining the weight corresponding to each third sample text in the target samples according to the discrete degree corresponding to each third sample text; wherein the weight is inversely proportional to the magnitude of the degree of dispersion;
inputting a plurality of target samples into a first event relation recognition model, and respectively obtaining prediction recognition results of the target samples;
determining a first target value according to a predicted identification result and a corresponding label of each first sample text in a plurality of target samples;
for each third sample text in the multiple target samples, determining an initial second target value according to the predicted recognition result and the corresponding label of the third sample text, and weighting the initial second target value according to the weight corresponding to the third sample text to obtain a second target value;
and obtaining a training target value of the first event relation recognition model according to the first target value and the second target value, adjusting parameters if the training target value does not meet conditions, and continuing training based on the target sample text and the adjusted parameters.
4. The method according to claim 3, wherein the determining a weight corresponding to each third sample text in the plurality of target samples according to the discrete degree corresponding to each third sample text comprises:
for each third sample text in the plurality of target samples, determining a weight of the third sample text with the degree of dispersion corresponding to the third sample text.
5. The method according to claim 1, wherein the screening a plurality of third sample texts from each second sample text according to the degree of uncertainty corresponding to each second sample text comprises:
and sorting according to the size relation of the uncertainty degrees corresponding to all the second sample texts, and screening a preset number of second sample texts as the third sample texts according to a sorting result.
6. The method according to claim 1, wherein the screening a plurality of third sample texts from each second sample text according to the degree of uncertainty corresponding to each second sample text comprises:
determining the probability of taking each second sample text as a third sample text according to the size relation of the uncertainty degrees corresponding to all the second sample texts;
and screening a preset number of second sample texts as the third sample texts according to the corresponding probability of each second sample text.
7. The method of claim 6, wherein determining the probability of each of the second sample texts as a third sample text comprises:
and taking the ratio of the uncertainty degree corresponding to the second sample text to the sum of the uncertainty degrees corresponding to all the second sample texts as the probability of taking the second sample text as the third sample text.
8. A method of text processing, comprising:
acquiring a text to be identified;
inputting the text to be recognized into a trained case relation recognition model to obtain case relations among event information contained in the text to be recognized;
wherein the trained case relationship recognition model is obtained by training by the method of any one of claims 1 to 7.
9. A model training apparatus, comprising:
a sample set obtaining module, configured to obtain a first sample set and a second sample set, where the first sample set includes multiple first sample texts marked with labels, the second sample set includes multiple second sample texts not marked with labels, and the labels of the first sample texts represent a case relationship between event information included in the first sample set;
the uncertainty calculation module is used for training an initial case relation recognition model according to the first sample set to obtain a first case relation recognition model, and determining the uncertainty degree of the first case relation recognition model for each second sample text prediction;
the sample screening module is used for screening a plurality of third sample texts from the second sample texts according to the uncertainty degrees corresponding to the second sample texts;
and the retraining module is used for taking each first sample text and each third sample text with labels as target sample texts, and performing iterative training on the first case relation recognition model based on a plurality of target sample texts until a training stopping condition is met to obtain a trained second case relation recognition model.
10. A text processing apparatus, comprising:
the text acquisition module is used for acquiring a text to be identified;
the matter relation analysis module is used for inputting the text to be recognized into a trained matter relation recognition model to obtain matter relation among event information contained in the text to be recognized;
wherein the trained case relationship recognition model is obtained by training using the model training apparatus of claim 9.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to implement the steps of the method of any of claims 1-8.
12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
13. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1-8 when executed by a processor.
CN202210456716.7A 2022-04-27 2022-04-27 Model training method, text processing device and electronic equipment Pending CN115130542A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210456716.7A CN115130542A (en) 2022-04-27 2022-04-27 Model training method, text processing device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210456716.7A CN115130542A (en) 2022-04-27 2022-04-27 Model training method, text processing device and electronic equipment

Publications (1)

Publication Number Publication Date
CN115130542A true CN115130542A (en) 2022-09-30

Family

ID=83376757

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210456716.7A Pending CN115130542A (en) 2022-04-27 2022-04-27 Model training method, text processing device and electronic equipment

Country Status (1)

Country Link
CN (1) CN115130542A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116502093A (en) * 2023-06-28 2023-07-28 江苏瑞中数据股份有限公司 Target detection data selection method and device based on active learning
CN116911288A (en) * 2023-09-11 2023-10-20 戎行技术有限公司 Discrete text recognition method based on natural language processing technology

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116502093A (en) * 2023-06-28 2023-07-28 江苏瑞中数据股份有限公司 Target detection data selection method and device based on active learning
CN116502093B (en) * 2023-06-28 2023-10-13 江苏瑞中数据股份有限公司 Target detection data selection method and device based on active learning
CN116911288A (en) * 2023-09-11 2023-10-20 戎行技术有限公司 Discrete text recognition method based on natural language processing technology
CN116911288B (en) * 2023-09-11 2023-12-12 戎行技术有限公司 Discrete text recognition method based on natural language processing technology

Similar Documents

Publication Publication Date Title
CN111932144B (en) Customer service agent distribution method and device, server and storage medium
CN110598037B (en) Image searching method, device and storage medium
WO2019037202A1 (en) Method and apparatus for recognising target customer, electronic device and medium
CN115130542A (en) Model training method, text processing device and electronic equipment
CN109933782B (en) User emotion prediction method and device
CN110727761B (en) Object information acquisition method and device and electronic equipment
US20210383205A1 (en) Taxonomy Construction via Graph-Based Cross-domain Knowledge Transfer
CN112989761B (en) Text classification method and device
US10678821B2 (en) Evaluating theses using tree structures
CN111371767A (en) Malicious account identification method, malicious account identification device, medium and electronic device
WO2023287910A1 (en) Intelligent task completion detection at a computing device
CN110598070A (en) Application type identification method and device, server and storage medium
WO2023024408A1 (en) Method for determining feature vector of user, and related device and medium
CN114138968A (en) Network hotspot mining method, device, equipment and storage medium
CN107644042B (en) Software program click rate pre-estimation sorting method and server
CN115131052A (en) Data processing method, computer equipment and storage medium
Kavikondala et al. Automated retraining of machine learning models
CN113362852A (en) User attribute identification method and device
CN112507185B (en) User portrait determination method and device
CN112463964B (en) Text classification and model training method, device, equipment and storage medium
CN115293872A (en) Method for establishing risk identification model and corresponding device
CN113822684A (en) Heikou user recognition model training method and device, electronic equipment and storage medium
CN116091133A (en) Target object attribute identification method, device and storage medium
CN115204436A (en) Method, device, equipment and medium for detecting abnormal reasons of business indexes
Windiatmoko et al. Mi-Botway: A deep learning-based intelligent university enquiries chatbot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination