CN117591622A - Model training and service executing method, device, storage medium and equipment - Google Patents

Model training and service executing method, device, storage medium and equipment Download PDF

Info

Publication number
CN117591622A
CN117591622A CN202311561050.2A CN202311561050A CN117591622A CN 117591622 A CN117591622 A CN 117591622A CN 202311561050 A CN202311561050 A CN 202311561050A CN 117591622 A CN117591622 A CN 117591622A
Authority
CN
China
Prior art keywords
entity
sample data
target
model
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311561050.2A
Other languages
Chinese (zh)
Inventor
张蝶
周书恒
祝慧佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202311561050.2A priority Critical patent/CN117591622A/en
Publication of CN117591622A publication Critical patent/CN117591622A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The specification discloses a model training and service execution method, a device, a storage medium and equipment. The model training method comprises the following steps: acquiring service data in a target service field as first sample data, and acquiring first entity description information corresponding to each entity type in the target service field; inputting the first sample data and the first entity description information into a target reading understanding model, determining each entity contained in the first sample data according to the first entity description information, and taking each determined entity as a pseudo tag corresponding to the first sample data; inputting the first sample data into an entity extraction model to be trained, and determining each entity contained in the first sample data as a prediction entity corresponding to the first sample data; and training the entity extraction model by taking the deviation between the predicted entity corresponding to the minimized first sample data and the pseudo tag as an optimization target to obtain a target entity extraction model.

Description

Model training and service executing method, device, storage medium and equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, a storage medium, and a device for model training and service execution.
Background
With the rapid development of artificial intelligence, entity extraction models are widely applied to various fields such as information recommendation, risk control, privacy protection, intelligent customer service and the like, and entity extraction is a common natural language processing (Natural Language Processing, NLP) task, and can provide required information for services in different scenes by extracting entities in target texts.
In order to make the entity extraction model have higher performance, a large amount of sample data is usually required to train the entity extraction model, and accurate labeling is required to be carried out on the sample data, so that the model can learn a certain discrimination capability under the guidance of the supervision signals (labeling).
However, at present, a manual labeling method is generally adopted to label training samples of the entity extraction model, for some professional fields, an expert with a certain experience knowledge in the field is generally required to accurately label the data, the labeling threshold is large, the time consumption is long, the cost of training the entity extraction model is high, and the service requirement of rapid development change cannot be met in time.
Therefore, how to reduce the training cost of training the entity extraction model is a urgent problem to be solved.
Disclosure of Invention
The specification provides a model training method, device, storage medium and equipment. And labeling the sample data through a pre-trained target reading understanding model, and further training an entity extraction model.
The technical scheme adopted in the specification is as follows:
the specification provides a model training method, comprising:
acquiring service data in a target service field as first sample data, and acquiring first entity description information corresponding to each entity type in the target service field;
inputting the first sample data and the first entity description information into a target reading understanding model obtained by training in advance, determining each entity contained in the first sample data according to the first entity description information through the target reading understanding model, and taking each determined entity as a pseudo tag corresponding to the first sample data;
inputting the first sample data into an entity extraction model to be trained, so as to determine each entity contained in the first sample data through the entity extraction model, and using the entity as a prediction entity corresponding to the first sample data;
and training the entity extraction model by taking the deviation between the predicted entity corresponding to the minimized first sample data and the pseudo tag as an optimization target to obtain a target entity extraction model.
Optionally, training a reading understanding model to obtain the target reading understanding model, specifically including:
acquiring service data in a plurality of service fields as second sample data, and acquiring second entity description information corresponding to each entity type in the plurality of service fields;
matching each entity contained in the second sample data in a preset entity dictionary, and taking the matched entity as a weak tag corresponding to the second sample data;
and training the reading understanding model according to the second sample data carrying the weak tag and the second entity description information to obtain the target reading understanding model.
Optionally, training the reading understanding model according to the second sample data carrying the weak tag and the second entity description information to obtain a target reading understanding model, which specifically includes:
obtaining third sample data carrying strong labels in the target service field, wherein the number of the third sample data is smaller than that of the first sample data, and the strong labels of the third sample data are marked in advance;
training the reading understanding model according to second sample data carrying the weak tag and the second entity description information to obtain a trained reading understanding model;
And adjusting the trained reading understanding model based on the third sample data carrying the strong tag to obtain a target reading understanding model.
Optionally, training the reading understanding model according to the second sample data carrying the weak tag and the second entity description information specifically includes:
inputting the second sample data and the second entity description information into the reading understanding model so as to determine each entity contained in the second sample data according to the second entity description information through the reading understanding model as a prediction entity corresponding to the second sample data;
and training the reading and understanding model by taking the deviation between the predicted entity corresponding to the second sample data and the weak tag as an optimization target.
Optionally, training the entity extraction model with a deviation between a predicted entity corresponding to the minimized first sample data and the pseudo tag as an optimization target to obtain a target entity extraction model, which specifically includes:
obtaining third sample data carrying strong labels in the target service field, wherein the number of the third sample data is smaller than that of the first sample data, and the strong labels of the third sample data are marked in advance;
Training the entity extraction model by taking the deviation between the predicted entity corresponding to the minimized first sample data and the pseudo tag as an optimization target to obtain a trained entity extraction model;
and adjusting the entity extraction model after training based on the third sample data carrying the strong label to obtain the target entity extraction model.
The present specification provides a service execution method, including:
receiving a service request carrying target service data;
inputting the target business data into a pre-trained target entity extraction model to determine each target entity contained in the target business data through the target entity extraction model, wherein the target entity extraction model is obtained by training through the model training method;
and executing the service corresponding to the service request according to each target entity.
The present specification provides a model training apparatus comprising:
the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module acquires service data in the target service field as first sample data and acquires first entity description information corresponding to each entity type in the target service field;
the determining module inputs the first sample data and the first entity description information into a target reading understanding model obtained by training in advance so as to determine each entity contained in the first sample data according to the first entity description information through the target reading understanding model, and takes each determined entity as a pseudo tag corresponding to the first sample data;
The input module inputs the first sample data into an entity extraction model to be trained, so that each entity contained in the first sample data is determined through the entity extraction model and used as a prediction entity corresponding to the first sample data;
and the training module is used for training the entity extraction model by taking the deviation between the predicted entity corresponding to the minimized first sample data and the pseudo tag as an optimization target to obtain a target entity extraction model.
Optionally, the training module is specifically configured to obtain service data in a plurality of service domains, as second sample data, and obtain second entity description information corresponding to each entity type in the plurality of service domains; matching each entity contained in the second sample data in a preset entity dictionary, and taking the matched entity as a weak tag corresponding to the second sample data; and training the reading understanding model according to the second sample data carrying the weak tag and the second entity description information to obtain the target reading understanding model.
Optionally, the training module is specifically configured to obtain third sample data carrying strong labels in the target service field, where the number of the third sample data is smaller than the number of the first sample data, and the strong labels of the third sample data are labeled in advance; training the reading understanding model according to second sample data carrying the weak tag and the second entity description information to obtain a trained reading understanding model; and adjusting the trained reading understanding model based on the third sample data carrying the strong tag to obtain a target reading understanding model.
Optionally, the training module is specifically configured to input the second sample data and the second entity description information into the reading understanding model, so that each entity included in the second sample data is determined according to the second entity description information through the reading understanding model, and is used as a prediction entity corresponding to the second sample data; and training the reading and understanding model by taking the deviation between the predicted entity corresponding to the second sample data and the weak tag as an optimization target.
Optionally, the training module is specifically configured to obtain third sample data carrying strong labels in the target service field, where the number of the third sample data is smaller than the number of the first sample data, and the strong labels of the third sample data are labeled in advance; training the entity extraction model by taking the deviation between the predicted entity corresponding to the minimized first sample data and the pseudo tag as an optimization target to obtain a trained entity extraction model; and adjusting the entity extraction model after training based on the third sample data carrying the strong label to obtain the target entity extraction model.
The present specification provides a service execution apparatus, including:
the receiving module receives a service request carrying target service data;
the extraction module inputs the target business data into a pre-trained target entity extraction model so as to determine each target entity contained in the target business data through the target entity extraction model, wherein the target entity extraction model is obtained by training through the model training method;
and the execution module is used for executing the service corresponding to the service request according to each target entity.
The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the model training and business execution method described above.
The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above described method of model training and business execution when executing the program.
The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:
in the model training method provided by the specification, firstly, service data in the target service field is obtained and used as first sample data, and first entity description information corresponding to each entity type in the target service field is obtained; inputting the first sample data and the first entity description information into a target reading understanding model, determining each entity contained in the first sample data according to the first entity description information, and taking each determined entity as a pseudo tag corresponding to the first sample data; inputting the first sample data into an entity extraction model to be trained, and determining each entity contained in the first sample data as a prediction entity corresponding to the first sample data; training the entity extraction model by taking the deviation between the predicted entity corresponding to the minimized first sample data and the pseudo tag as an optimization target to obtain a target entity extraction model
According to the method, a large number of pseudo labels without label data in the target field can be generated based on the reading understanding model trained in advance, and then the entity extraction model is trained through the data with the pseudo labels.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. Attached at
In the figure:
FIG. 1 is a schematic flow chart of a model training method provided in the present specification;
FIG. 2 is a schematic diagram of a training process of an entity extraction model provided in the present specification;
Fig. 3 is a schematic flow chart of a service execution method provided in the present specification;
FIG. 4 is a schematic diagram of a model training apparatus provided in the present specification;
fig. 5 is a schematic diagram of a service execution device provided in the present specification;
fig. 6 is a schematic view of an electronic device corresponding to fig. 1 or fig. 4 provided in the present specification.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.
The existing entity extraction scheme based on self-training generally performs preliminary training on an entity extraction model by using a small amount of manually marked strong tag data, and then predicts a large amount of unlabeled data by using the model to mark the model with a pseudo tag. And then selecting some samples with higher prediction confidence as samples with reliable marking, and mixing the samples with the manually marked strong label samples so as to amplify the label data set. And finally, training the entity extraction model again through the amplified data set, and iterating continuously until a stopping condition is reached (for example, all the label-free data are marked).
However, in the process of model self-training, only if the model correctly predicts most samples during initial and subsequent iterations and the obtained pseudo tag is reliable, a small amount of strong tag data marked manually can be effectively enhanced, otherwise if the model is initially learned towards the wrong direction, a large amount of wrong tags are predicted, even if the model is mistakenly added in the iterative process, the screened pseudo tag samples have a large amount of noise, the model is misled, and the performance of the model is reduced.
Fig. 1 is a schematic flow chart of a model training method provided in the present specification, including the following steps:
s100: acquiring service data in a target service field as first sample data, and acquiring first entity description information corresponding to each entity type in the target service field.
In an actual entity extraction scene, the data marking cost in some professional fields (such as finance, law, biology and the like) is extremely high, on one hand, an expert with a certain experience knowledge in the field is required to accurately mark the data, the marking threshold is large, and the labor cost is high; on the other hand, the labeling of the entity extraction task is word-level labeling, each word in the sentence needs to be labeled, and the labeling is more complex and takes longer time than the labeling of the sentence level. Thus, on relatively complex entity extraction tasks, situations often arise where only small amounts of tagged data and large amounts of untagged data are available.
Based on the above, the specification provides a model training method, which trains and fine-tunes a reading understanding model through a large amount of weak tag data in an open domain, so as to label sample data in a closed domain with a pseudo tag through a target reading understanding model, and trains an entity extraction model through data carrying the pseudo tag, thereby saving training cost.
In the present specification, an execution body for implementing a model training and service execution method may be a designated device such as a server, and for convenience of description, only the server is taken as an execution body, and a model training method provided in the present specification will be described below.
The server needs to acquire service data (i.e., open domain data) corresponding to a plurality of service domains, which may include: finance, law, news, medical, etc. In addition, the server may further obtain entity description information corresponding to each entity type in the service fields, as second entity description information, where the entity description information may be understood as text corresponding to the entity type, such as time, place, name, animal, plant, etc.
In practical applications, only a small portion of the open domain data has an entity tag, and a significant portion of the open domain data has no entity tag, so that the server may obtain a weak tag corresponding to the unlabeled second sample data.
That is, in this specification, a part of the second sample data carries a strong tag, and the rest of the second sample data without a tag generates a corresponding weak tag.
It should be noted that, the entity extraction task mainly extracts entity objects in text data, so the sample data (including the first sample data, the second sample data, and the third sample data) and the service data mentioned in this specification may be text data.
The weak label can be used for indicating labels with lower quality, such as the situations of wrong labeling, incomplete labeling, conflict and the like. The server may input the second sample data without the tag into a preset entity dictionary, where different entities under each entity type are pre-stored, and then the server may match each entity included in the second sample data in the entity dictionary, and use the entities as weak tags corresponding to the second sample data.
For example, the entity under the entity type "animal" in the entity dictionary may include tiger, lion, panda, etc., and for any one of the second sample data, when any one of the above animals (such as panda) appears in the text, the second sample data may be matched to include the entity "panda" under the entity type "animal".
Machine reading understanding (Machine Reading Comprehension, MRC) can be understood to be given a content to be understood and a question, allowing the machine to answer the question after reading the article. In the present specification, the server may input the second training sample as content to be understood, and the second description information as problem description into a reading understanding model to be trained (e.g. BERT-MRC).
And then, the reading and understanding model predicts the entity contained in the second sample data according to the second description information, so that the entity under each entity type contained in the second sample data is determined and used as a predicted entity corresponding to the second sample data. It should be noted that, for any one piece of description information, if the second sample data does not include an entity of the entity type corresponding to the description information, the predicted entity corresponding to the description information is empty.
The server may train the reading and understanding model with the deviation between the predicted entity corresponding to the minimized second sample data and the weak tag corresponding to the second sample data as an optimization target until the training target is met (e.g., the reading and understanding model converges to a preset range or reaches a preset training frequency), so as to obtain the reading and understanding model after training.
For example, the server may input the "place" as the second description information, together with the second sample data, into the reading understanding model to be trained, so that the model learns how to extract the entities (e.g., place a, place B, etc.) under the entity type "place" specified by the second description information from the input text.
In addition, the server can train the reading understanding model based on the confidence coefficient of the weak label according to the prediction entity and the weak label, specifically, when the confidence coefficient of the weak label is higher, the accuracy rate is relatively higher, a larger deviation range can exist between the prediction entity and the weak label, and when the confidence coefficient of the weak label is lower, the accuracy rate is relatively lower, and at the moment, the prediction entity and the weak label need to be kept in a smaller deviation range.
In other words, when the confidence of the weak tag is high, the deviation between the predicted entity and the weak tag is as close as possible to the first larger deviation when the reading and understanding model is trained, and when the confidence of the weak tag is low, the deviation between the predicted entity and the weak tag is as close as possible to the second smaller deviation when the reading and understanding model is trained.
The confidence level may be determined according to the frequency (word frequency) of occurrence of the entity corresponding to the weak tag in the target field, or the semantic correlation between the weak tag and the second sample data, which is not specifically limited in this specification.
Further, in order to ensure accuracy of the recognition result of the reading understanding model in the target service field, the server may acquire service data (i.e., closed domain data) in the target service field as third sample data, where the third sample data is labeled with a strong tag in advance, and then fine-tuning the trained reading understanding model is performed through the first sample data, so that the reading understanding model is migrated from the open domain to the closed domain (target service field) to obtain the target reading understanding model.
The fine tuning process of the target reading and understanding model is similar to the training process, and the server can input the third sample data into the trained reading and understanding model so as to output a prediction result, and further adjust the reading and understanding model by taking the deviation between the minimized prediction result and the strong label of the third sample data as an optimization target, so that the target reading and understanding model is obtained.
After the target reading understanding model is obtained, the server can further obtain first sample data containing service data corresponding to the target service field and first entity description information corresponding to each entity type in the target service field.
It should be noted that the first sample data is unlabeled data, and the number of the first sample data is far greater than the number of the third sample data, so that preliminary training of the entity extraction model by a large amount of data with pseudo labels is realized, and fine tuning of the entity extraction model by a small amount of data with labels is realized. Wherein, the strong label of the first sample data can be marked manually.
In addition, the first entity description information and the second entity description information may be the same, and of course, the second entity description information and the first entity description information may be different.
It should be noted that, in the present specification, the labeling quality (accuracy) of the weak label and the pseudo label is lower than that of the strong label, and the labeling quality of the weak label and the pseudo label can be expressed as follows from high to low: strong label > pseudo label > weak label.
S102: and inputting the first sample data and the first entity description information into a target reading understanding model obtained by training in advance, determining each entity contained in the first sample data according to the first entity description information through the target reading understanding model, and taking each determined entity as a pseudo tag corresponding to the first sample data.
S104: and inputting the first sample data into an entity extraction model to be trained, so as to determine each entity contained in the first sample data through the entity extraction model as a prediction entity corresponding to the first sample data.
S106: and training the entity extraction model by taking the deviation between the predicted entity corresponding to the minimized first sample data and the pseudo tag as an optimization target to obtain a target entity extraction model.
The server may input the first sample data and the first entity description information, which do not carry the tag, into the target reading understanding model, determine, according to the first entity description information, each entity included in the first sample data through the target reading understanding model, and use the entities as pseudo tags corresponding to the first sample data.
The server can input the first sample data carrying the pseudo tag into an entity extraction model to be trained, each entity contained in the first sample data is determined through the entity extraction model to serve as a prediction entity corresponding to the first sample data, and the server can train the entity extraction model by taking the minimized deviation between the pseudo tag corresponding to the first sample data and the prediction entity as an optimization target, so that the trained entity extraction model is obtained.
In order to ensure the accuracy of the entity extraction model after training, the server can fine tune the entity extraction model after training through a small amount of third sample data carrying strong labels, the fine tuning process is similar to the training process, the server can input the third sample data carrying strong labels into the entity extraction model, determine a prediction entity corresponding to the third sample data, and then adjust the entity extraction model by taking the deviation between the strong labels corresponding to the minimized third sample data and the prediction entity as an optimization target to obtain a target entity extraction model. For ease of understanding, the present disclosure provides a training process schematic of the entity extraction model, as shown in fig. 2.
Fig. 2 is a schematic diagram of a training process of a physical extraction model provided in the present specification.
The server can determine weak labels of a large amount of open domain data (second sample data), wherein a small amount of the second sample data carries strong labels, then trains a reading understanding model through the second sample data carrying the weak labels and the strong labels, and fine-tunes the reading understanding model through a small amount of closed domain data (third sample data) carrying the strong labels to obtain the target reading understanding model.
And then the server determines a large number of pseudo tags of the closed domain data (first sample data) through the target reading understanding model, trains the entity extraction model through the first sample data carrying the pseudo tags, and further carries out fine tuning on the entity extraction model through third sample data after training is completed to obtain the target entity extraction model.
The server may deploy the entity extraction model to execute subsequent services through the entity extraction model. In the following, from the practical point of view, a service execution method provided in the present specification will be described, as shown in fig. 3.
Fig. 3 is a flow chart of a service execution method provided in the present specification, which includes the following steps:
s300: and receiving a service request carrying target service data.
S302: and inputting the target business data into a pre-trained target entity extraction model to determine each target entity contained in the target business data through the target entity extraction model, wherein the target entity extraction model is obtained through training by the model training method.
S304: and executing the service corresponding to the service request according to each target entity.
The server may receive a service request carrying target service data, where the target service data may be text data.
The server can input the service data into a deployed entity extraction model, perform entity extraction on the target service data through the entity extraction model, determine the target entity contained in the target service data, and execute the service corresponding to the service request based on each extracted target entity.
For example, in a wind control scenario, the server may input a business document, such as a dialogue text or a search text, into an entity extraction model, determine each entity included in the text, then perform anomaly detection on the entities, determine whether an anomaly content violating a relevant rule is involved, and execute a wind control policy according to the anomaly detection result. The entity detected in the wind control scene may include names, when abnormal personnel exist in the names, the risk of the current service is indicated, the server may execute the wind control policy, of course, the specific service may also be included, and when abnormal service exists, the server may execute the wind control policy.
For another example, in the information recommendation scenario, the server may perform entity extraction on the content input by the user through the entity extraction model, and further perform information recommendation according to the extracted entity (such as trade name, store name, place, etc.).
As can be seen from the above method, since the entity types to be extracted by reading and understanding the model are input to the model as the problem description text in a displaying manner, the model automatically learns the internal semantic relation between the entity type text (entity description information) and the input content text (training sample), instead of conventionally encoding each entity type into a discrete label, and the model predicts the modeling mode of the discrete label (the mode needs to set the number of neurons of the last layer of the network according to the number of entity types).
Therefore, when the entity type to be extracted is changed or increased, the conventional modeling mode of encoding the entity type into discrete labels needs to change the structure of the entity extraction model, so that the knowledge that the model has learned is damaged to a certain extent, and the model is not beneficial to migration between different tasks. The reading of the cleavage model only needs to change the problem description text at the input level, the model can be naturally adapted to a new entity extraction task without modifying the model structure, damage to the knowledge learned by the model is reduced, and the migration capability and generalization of the model are improved.
In view of the characteristic that reading and understanding models are easy to migrate between different tasks, the scheme enables the models to learn how to extract general entities (such as time, place, name and the like) and proprietary domain entities (such as proprietary entities in the fields of finance, law, news, medical treatment and the like) on a large amount of open domain data, and then enables the models to learn on a closed domain. As the model sees enough data in the open domain, learns enough rich knowledge and has stronger learning ability, the learning difficulty of the model in a new task of a downstream closed domain can be greatly reduced, so that better performance is easy to achieve, and the accuracy and reliability of the pseudo tag marked on the non-standard data of the closed domain of the model are improved.
For example, the model learns how well to extract the entity type "time" on the open domain data, and when the entity type "time" needs to be extracted when a task in a closed domain (such as a legal domain) is encountered, the "time" is taken as a problem description and is input into the model together with corresponding text at the input level. Considering the semantically similarity of "time of occurrence" and "time", the model may learn how to extract the new entity type "time of occurrence", i.e. the difficulty of learning how to extract the model of "time of occurrence" is greatly reduced when learning how to extract the "time of occurrence". Therefore, under the condition that the strong tag data of the closed domain is few, the model can achieve better generalization performance, and reliable prediction results are made.
Experiments prove that the scheme improves the credibility of the model for marking the pseudo tag on the non-standard data in the closed domain, effectively enhances the standard data in the domain, and improves the performance of the model on the extraction task of the entity in the closed domain.
In the present specification, an execution body for implementing a code test method may refer to a designated device such as a server provided on a service platform, and for convenience of description, the present specification uses only the server as an execution body as an example, and describes a code test method provided in the present specification.
The above is a method for implementing model training or service execution for one or more of the present specification, and based on the same thought, the present specification further provides a corresponding model training or service execution device, as shown in fig. 4 or fig. 5.
Fig. 4 is a schematic diagram of a model training device provided in the present specification, including:
the acquiring module 400 is configured to acquire service data in a target service domain, as first sample data, and acquire first entity description information corresponding to each entity type in the target service domain;
a determining module 402, configured to input the first sample data and the first entity description information into a target reading understanding model obtained by training in advance, so as to determine, according to the first entity description information, each entity included in the first sample data through the target reading understanding model, and use each determined entity as a pseudo tag corresponding to the first sample data;
the input module 404 is configured to input the first sample data into an entity extraction model to be trained, so as to determine, according to the entity extraction model, each entity included in the first sample data, as a predicted entity corresponding to the first sample data;
And a training module 406, configured to train the entity extraction model with a deviation between the predicted entity corresponding to the minimized first sample data and the pseudo tag as an optimization target, to obtain a target entity extraction model.
Optionally, the training module 406 is specifically configured to obtain service data in a plurality of service domains, as second sample data, and obtain second entity description information corresponding to each entity type in the plurality of service domains; matching each entity contained in the second sample data in a preset entity dictionary, and taking the matched entity as a weak tag corresponding to the second sample data; and training the reading understanding model according to the second sample data carrying the weak tag and the second entity description information to obtain the target reading understanding model.
Optionally, the training module 406 is specifically configured to obtain third sample data carrying strong labels in the target service domain, where the number of the third sample data is smaller than the number of the first sample data, and the strong labels of the third sample data are labeled in advance; training the reading understanding model according to second sample data carrying the weak tag and the second entity description information to obtain a trained reading understanding model; and adjusting the trained reading understanding model based on the third sample data carrying the strong tag to obtain a target reading understanding model.
Optionally, the training module 406 is specifically configured to input the second sample data and the second entity description information into the reading understanding model, so as to determine, according to the second entity description information, each entity included in the second sample data as a predicted entity corresponding to the second sample data through the reading understanding model; and training the reading and understanding model by taking the deviation between the predicted entity corresponding to the second sample data and the weak tag as an optimization target.
Optionally, the training module 406 is specifically configured to obtain third sample data carrying strong labels in the target service domain, where the number of the third sample data is smaller than the number of the first sample data, and the strong labels of the third sample data are labeled in advance; training the entity extraction model by taking the deviation between the predicted entity corresponding to the minimized first sample data and the pseudo tag as an optimization target to obtain a trained entity extraction model; and adjusting the entity extraction model after training based on the third sample data carrying the strong label to obtain the target entity extraction model.
Fig. 5 is a schematic diagram of a model training device provided in the present specification, including:
a receiving module 500, configured to receive a service request carrying target service data;
the extraction module 502 is configured to input the target business data into a pre-trained target entity extraction model, so as to determine each target entity included in the target business data through the target entity extraction model, where the target entity extraction model is obtained by training the model training method;
and the executing module 504 is configured to execute the service corresponding to the service request according to the target entities. The present specification also provides a computer readable storage medium storing a computer program operable to perform a model training and business execution method as provided in fig. 1 or fig. 4 above.
The present specification also provides a schematic structural diagram of an electronic device corresponding to fig. 1 or 4 shown in fig. 6. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as illustrated in fig. 6, although other hardware required by other services may be included. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to implement the model training and business execution methods described above with respect to fig. 1 or fig. 4. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims (14)

1. A model training method, comprising:
acquiring service data in a target service field as first sample data, and acquiring first entity description information corresponding to each entity type in the target service field;
inputting the first sample data and the first entity description information into a target reading understanding model obtained by training in advance, determining each entity contained in the first sample data according to the first entity description information through the target reading understanding model, and taking each determined entity as a pseudo tag corresponding to the first sample data;
Inputting the first sample data into an entity extraction model to be trained, so as to determine each entity contained in the first sample data through the entity extraction model, and using the entity as a prediction entity corresponding to the first sample data;
and training the entity extraction model by taking the deviation between the predicted entity corresponding to the minimized first sample data and the pseudo tag as an optimization target to obtain a target entity extraction model.
2. The method of claim 1, training a reading understanding model to obtain the target reading understanding model, comprising:
acquiring service data in a plurality of service fields as second sample data, and acquiring second entity description information corresponding to each entity type in the plurality of service fields;
matching each entity contained in the second sample data in a preset entity dictionary, and taking the matched entity as a weak tag corresponding to the second sample data;
and training the reading understanding model according to the second sample data carrying the weak tag and the second entity description information to obtain the target reading understanding model.
3. The method of claim 2, training the reading understanding model according to the second sample data carrying the weak tag and the second entity description information to obtain a target reading understanding model, and specifically comprising:
Obtaining third sample data carrying strong labels in the target service field, wherein the number of the third sample data is smaller than that of the first sample data, and the strong labels of the third sample data are marked in advance;
training the reading understanding model according to second sample data carrying the weak tag and the second entity description information to obtain a trained reading understanding model;
and adjusting the trained reading understanding model based on the third sample data carrying the strong tag to obtain a target reading understanding model.
4. The method of claim 2, training the reading understanding model according to the second sample data carrying the weak tag and the second entity description information, specifically comprising:
inputting the second sample data and the second entity description information into the reading understanding model so as to determine each entity contained in the second sample data according to the second entity description information through the reading understanding model as a prediction entity corresponding to the second sample data;
and training the reading and understanding model by taking the deviation between the predicted entity corresponding to the second sample data and the weak tag as an optimization target.
5. The method of claim 1, training the entity extraction model with a deviation between a predicted entity corresponding to the first sample data and the pseudo tag as an optimization target to obtain a target entity extraction model, specifically comprising:
obtaining third sample data carrying strong labels in the target service field, wherein the number of the third sample data is smaller than that of the first sample data, and the strong labels of the third sample data are marked in advance;
training the entity extraction model by taking the deviation between the predicted entity corresponding to the minimized first sample data and the pseudo tag as an optimization target to obtain a trained entity extraction model;
and adjusting the entity extraction model after training based on the third sample data carrying the strong label to obtain the target entity extraction model.
6. A service execution method, comprising:
receiving a service request carrying target service data;
inputting the target business data into a pre-trained target entity extraction model to determine each target entity contained in the target business data through the target entity extraction model, wherein the target entity extraction model is obtained through training by the method of any one of claims 1-5;
And executing the service corresponding to the service request according to each target entity.
7. A model training apparatus comprising:
the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module acquires service data in the target service field as first sample data and acquires first entity description information corresponding to each entity type in the target service field;
the determining module inputs the first sample data and the first entity description information into a target reading understanding model obtained by training in advance so as to determine each entity contained in the first sample data according to the first entity description information through the target reading understanding model, and takes each determined entity as a pseudo tag corresponding to the first sample data;
the input module inputs the first sample data into an entity extraction model to be trained, so that each entity contained in the first sample data is determined through the entity extraction model and used as a prediction entity corresponding to the first sample data;
and the training module is used for training the entity extraction model by taking the deviation between the predicted entity corresponding to the minimized first sample data and the pseudo tag as an optimization target to obtain a target entity extraction model.
8. The model training apparatus of claim 7, wherein the training module is specifically configured to obtain service data in a plurality of service domains as second sample data, and obtain second entity description information corresponding to each entity type in the plurality of service domains; matching each entity contained in the second sample data in a preset entity dictionary, and taking the matched entity as a weak tag corresponding to the second sample data; and training the reading understanding model according to the second sample data carrying the weak tag and the second entity description information to obtain the target reading understanding model.
9. The model training apparatus of claim 8, wherein the training module is specifically configured to obtain third sample data carrying strong labels in the target service domain, the number of the third sample data is smaller than the number of the first sample data, and the strong labels of the third sample data are labeled in advance; training the reading understanding model according to second sample data carrying the weak tag and the second entity description information to obtain a trained reading understanding model; and adjusting the trained reading understanding model based on the third sample data carrying the strong tag to obtain a target reading understanding model.
10. The model training apparatus of claim 8, wherein the training module is specifically configured to input the second sample data and the second entity description information into the reading understanding model, so as to determine, according to the reading understanding model and according to the second entity description information, each entity included in the second sample data as a prediction entity corresponding to the second sample data; and training the reading and understanding model by taking the deviation between the predicted entity corresponding to the second sample data and the weak tag as an optimization target.
11. The model training apparatus of claim 7, wherein the training module is specifically configured to obtain third sample data carrying strong labels in the target service domain, the number of the third sample data is smaller than the number of the first sample data, and the strong labels of the third sample data are labeled in advance; training the entity extraction model by taking the deviation between the predicted entity corresponding to the minimized first sample data and the pseudo tag as an optimization target to obtain a trained entity extraction model; and adjusting the entity extraction model after training based on the third sample data carrying the strong label to obtain the target entity extraction model.
12. A service execution apparatus comprising:
the receiving module receives a service request carrying target service data;
the extraction module inputs the target business data into a pre-trained target entity extraction model to determine each target entity contained in the target business data through the target entity extraction model, wherein the target entity extraction model is obtained through training by the method of any one of claims 1 to 5;
and the execution module is used for executing the service corresponding to the service request according to each target entity.
13. A computer readable storage medium storing a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-6.
14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the preceding claims 1-6 when the program is executed.
CN202311561050.2A 2023-11-21 2023-11-21 Model training and service executing method, device, storage medium and equipment Pending CN117591622A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311561050.2A CN117591622A (en) 2023-11-21 2023-11-21 Model training and service executing method, device, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311561050.2A CN117591622A (en) 2023-11-21 2023-11-21 Model training and service executing method, device, storage medium and equipment

Publications (1)

Publication Number Publication Date
CN117591622A true CN117591622A (en) 2024-02-23

Family

ID=89909419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311561050.2A Pending CN117591622A (en) 2023-11-21 2023-11-21 Model training and service executing method, device, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN117591622A (en)

Similar Documents

Publication Publication Date Title
CN113221555B (en) Keyword recognition method, device and equipment based on multitasking model
CN115952272B (en) Method, device and equipment for generating dialogue information and readable storage medium
CN112417093B (en) Model training method and device
CN113887227B (en) Model training and entity identification method and device
CN111144126A (en) Training method of semantic analysis model, semantic analysis method and device
CN115618964B (en) Model training method and device, storage medium and electronic equipment
CN116502176A (en) Pre-training method and device of language model, medium and electronic equipment
CN116188971A (en) Robot character recognition method, device and storage medium
CN116127305A (en) Model training method and device, storage medium and electronic equipment
CN111222315B (en) Movie scenario prediction method
CN113887206B (en) Model training and keyword extraction method and device
CN117591661B (en) Question-answer data construction method and device based on large language model
CN116186330B (en) Video deduplication method and device based on multi-mode learning
CN112948449A (en) Information recommendation method and device
CN116863484A (en) Character recognition method, device, storage medium and electronic equipment
CN113641766B (en) Relationship identification method and device, storage medium and electronic equipment
CN117591622A (en) Model training and service executing method, device, storage medium and equipment
CN117079646B (en) Training method, device, equipment and storage medium of voice recognition model
CN117573849B (en) Knowledge graph multi-hop question-answering method, device, equipment and storage medium
CN115658891B (en) Method and device for identifying intention, storage medium and electronic equipment
CN117807961B (en) Training method and device of text generation model, medium and electronic equipment
CN116501852B (en) Controllable dialogue model training method and device, storage medium and electronic equipment
CN115017915B (en) Model training and task execution method and device
CN114611517B (en) Named entity recognition method, device, equipment and medium based on deep learning
CN117195871A (en) Model training method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination