WO2023151284A1

WO2023151284A1 - Classification result correction method and system, device, and medium

Info

Publication number: WO2023151284A1
Application number: PCT/CN2022/122302
Authority: WO
Inventors: 刘红丽; 李峰; 于彤; 周镇镇
Original assignee: 苏州浪潮智能科技有限公司
Priority date: 2022-02-14
Filing date: 2022-09-28
Publication date: 2023-08-17
Also published as: CN114186065A; CN114186065B

Abstract

The present application discloses a classification result correction method and system, a device, and a medium. The method comprises the steps: constructing a data set, and labeling each piece of data in the data set with a classification label of a corresponding category; inputting each piece of data in the data set into a trained model to obtain a probability of the corresponding classification label, and calculating a correction matrix by means of the classification label probability corresponding to each piece of data; expanding the classification label of each category into a plurality of sub-labels; adjusting the output of the trained model to be probabilities of the plurality of sub-labels corresponding to each category; inputting data to be classified into the trained model to obtain probabilities of the plurality of sub-labels corresponding to each category; and determining, by means of the probabilities of the plurality of sub-labels corresponding to each category and the correction matrix, the final category of the data to be classified. According to the solution provided in the present application, labels are expanded, and thus, bias caused by different frequencies of occurrence of the labels is eliminated.

Description

A classification result correction method, system, equipment and medium

This application claims the priority of the Chinese patent application with the application number 202210133548.8 and the application title "A Classification Result Correction Method, System, Equipment and Medium" submitted to the China Patent Office on February 14, 2022, the entire contents of which are incorporated by reference incorporated in this application.

technical field

The present application relates to the field of classification, and in particular to a classification result correction method, system, device and storage medium.

Background technique

The core capabilities of the massive model are zero-sample learning and small-sample learning. That is, when faced with different application tasks, there is no need to retrain the model. However, the huge amount of models will bring bias from the corpus during pre-training, resulting in low accuracy or unstable performance of downstream tasks. The current solution is to compensate the biased label words through text-free input, calibrate them to an unbiased state, and reduce the difference between different prompt choices. However, due to the difference in the frequency of tags appearing in the pre-training corpus, the model will have a preference for the prediction results, that is, the model output accuracy is low. Therefore, the existing correction methods can only correct the bias of the model to the label, but cannot correct the bias brought by the input samples.

Contents of the invention

In view of this, in order to overcome at least one aspect of the above problems, an embodiment of the present application proposes a classification result correction method, including the following steps:

Constructing a data set and marking each data in the data set with a classification label of a corresponding category;

Input each data in the data set into the trained model to obtain the probability of the corresponding classification label and calculate a correction matrix using the classification label probability corresponding to each data;

expanding said classification label for each category into a plurality of sub-labels;

adjusting the output of the trained model to the probability of a plurality of sub-labels corresponding to each category;

Input the data to be classified into the trained model to obtain the probability of multiple sub-labels corresponding to each category;

The final category of the data to be classified is determined by using the probabilities of the plurality of sub-labels corresponding to each category and the correction matrix.

In some embodiments, each data in the data set is input into the trained model to obtain the probability of the corresponding classification label and the correction matrix is calculated using the classification label probability corresponding to each data, further comprising:

The probability of the classification label corresponding to each data is summed and averaged by category to obtain the probability corresponding to each category;

After normalizing the probability corresponding to each category, a diagonal matrix is constructed;

The correction matrix is obtained after inverting the diagonal matrix.

In some embodiments, expanding the classification label of each category into a plurality of sub-labels, further comprising:

Using a preset model to obtain a plurality of synonyms corresponding to the classification labels of each category;

Respectively select a preset number of words from the plurality of synonyms corresponding to each category as the plurality of subtags corresponding to each category.

In some embodiments, before using the preset model to obtain multiple synonyms corresponding to the classification labels of each category, it also includes:

The preset model is obtained through training with a preset number of embedded corpora of Chinese word phrases.

In some embodiments, selecting a preset number of words from the plurality of synonyms corresponding to each category as the plurality of subtags corresponding to each category further includes:

Deleting words that do not exist in the vocabulary of the trained model in the plurality of synonyms;

adjusting the output of the trained model to the probabilities of the remaining synonyms;

Input each data in the data set into the trained model to obtain the probability of the remaining synonyms;

deleting the words whose probability is lower than the first threshold among the remaining synonyms according to the probability of the remaining synonyms output by the trained model;

Among the remaining synonyms, the words whose probability difference is smaller than the second threshold are deleted, and a preset number of words with the highest probability are selected as the plurality of sub-labels corresponding to each category.

In some embodiments, according to the probability of the remaining synonyms output by the trained model, deleting the words whose probability is lower than the first threshold among the remaining synonyms further includes:

According to the probabilities of the remaining synonyms output by the trained model, among the remaining synonyms, synonyms whose probability is lower than the average value are classified as rare words, and the rare words are deleted.

In some embodiments, the deleting of the plurality of synonyms that do not exist in the vocabulary of the trained model further includes:

Checking whether each of the plurality of synonyms is in the vocabulary space of the trained model in a traversal manner, and deleting synonyms that are not in the vocabulary space.

In some embodiments, the words whose probability difference is smaller than the second threshold among the remaining synonyms further include:

Obtain the synonyms among the remaining synonyms, and delete the synonyms except the one with the highest probability.

In some embodiments, the selection of the preset number of words with the highest probability as the plurality of subtags corresponding to each category further includes:

Rank the words whose probability difference is smaller than the second threshold among the remaining synonyms in descending order of probability, and select a preset number of words ranked first as the plurality of sub-labels corresponding to each category.

In some embodiments, the preset model is a word2vec model.

In some embodiments, using the probability of the plurality of sub-labels corresponding to each category and the correction matrix to determine the final category of the data to be classified further includes:

Calculate the average value of the probabilities of multiple sub-labels corresponding to each category by category and multiply the average value corresponding to each category by the correction matrix as the corrected first probability, and add the first probability of each category to The maximum value of is used as the classification category of the data.

The maximum value among the probabilities of multiple sub-labels corresponding to each category is multiplied by the correction matrix as the corrected second probability, and the category corresponding to the sub-label with the highest probability is used as the second classification category of the data.

Multiply the probabilities of multiple sub-labels corresponding to each category by the correction matrix and take the average value by category as the corrected third probability, and use the maximum value of the third probability of each category as the third probability of the data Three classification categories.

In some embodiments, the pre-trained model is a PLM model.

Based on the same inventive concept, according to another aspect of the present application, an embodiment of the present application also provides a classification result correction system, including:

A building module configured to construct a data set and mark each data in the data set with a classification label of a corresponding category;

A calculation module configured to input each data in the data set into the trained model to obtain the probability of the corresponding classification label and calculate a correction matrix using the classification label probability corresponding to each data;

An expansion module configured to expand the classification label of each category into a plurality of sub-labels;

An adjustment module configured to adjust the output of the trained model to the probability of multiple sub-labels corresponding to each category;

The input module is configured to input the data to be classified into the trained model to obtain the probability of a plurality of sub-labels corresponding to each category;

The correction module is configured to determine the final category of the data to be classified by using the probabilities of the plurality of sub-labels corresponding to each category and the correction matrix.

Based on the same inventive concept, according to another aspect of the present application, an embodiment of the present application also provides a computer device, including:

at least one processor; and

A memory, the memory stores a computer program that can run on the processor, wherein when the processor executes the program, it executes the steps of any classification result correction method described above.

Based on the same inventive concept, according to another aspect of the present application, the embodiment of the present application also provides a non-volatile readable storage medium, the non-volatile readable storage medium stores a computer program, and the computer When the program is executed by the processor, the steps of any one of the classification result correction methods described above are executed.

Based on the same inventive concept, according to another aspect of the present application, an embodiment of the present application also provides a computing processing device, including:

a memory having computer readable code stored therein;

One or more processors, when the computer readable code is executed by the one or more processors, the computing processing device executes the steps of any one of the classification result correction methods described above.

Based on the same inventive concept, according to another aspect of the present application, an embodiment of the present application further provides a computer program product, including computer readable codes, when the computer readable codes run on a computing processing device, cause the The computing processing device executes the steps of the method for correcting classification results according to any one of the above.

The present application has one of the following beneficial technical effects: the scheme proposed by the present application eliminates the bias caused by the different frequency of occurrence of labels in the pre-training corpus by extending the label, and replaces the empty text with the training set sample, and corrects the label at the same time Bias brought by words and input samples.

Description of drawings

In order to more clearly illustrate some embodiments of the present application or technical solutions in the prior art, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description The drawings are only some embodiments of the present application, and those skilled in the art can obtain other embodiments according to these drawings without any creative effort.

Fig. 1 is a schematic flow chart of the classification result correction method provided by the embodiment of the present application;

FIG. 2 is a flow chart of tag extension provided by an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a classification result correction system provided by an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a computer device provided by an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a non-volatile readable storage medium provided by an embodiment of the present application;

Figure 6 schematically illustrates a block diagram of a computing processing device for performing a method according to the present application; and

Fig. 7 schematically shows a storage unit for holding or carrying program codes for realizing the method according to the present application.

Detailed ways

In order to make the purpose, technical solution and advantages of the present application clearer, the embodiments of the present application will be further described in detail below in combination with specific embodiments and with reference to the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of this application are to distinguish between two entities with the same name but different parameters or parameters that are not the same, see "first" and "second" It is only for the convenience of expression, and should not be construed as a limitation on the embodiments of the present application, which will not be described one by one in the subsequent embodiments.

According to one aspect of the present application, an embodiment of the present application proposes a classification result correction method, as shown in Figure 1, which may include steps:

S1, constructing a data set and marking each data in the data set with a classification label of a corresponding category;

S2. Input each data in the data set into the trained model to obtain the probability of the corresponding classification label and calculate a correction matrix using the classification label probability corresponding to each data;

S3, expanding the classification label of each category into multiple sub-labels;

S4, adjusting the output of the trained model to the probability of multiple sub-labels corresponding to each category;

S5, input the data to be classified into the trained model to obtain the probability of multiple sub-labels corresponding to each category;

S6. Determine the final category of the data to be classified by using the probabilities of the plurality of sub-labels corresponding to each category and the correction matrix.

The scheme proposed in this application eliminates the bias caused by the different frequencies of tags in the pre-training corpus by expanding the tags, and replaces the empty text with training set samples, while correcting the bias caused by tag words and input samples.

In some embodiments, step S2, input each data in the data set into the trained model to obtain the probability of the corresponding classification label and use the classification label probability corresponding to each data to calculate the correction matrix, further comprising:

The correction matrix is obtained after inverting the diagonal matrix.

Specifically, after each data in the training set is input into the model (PLM (Pre-trained Language Model), pre-trained language model), the label probability of the corresponding category can be obtained, and then all label probabilities of the same category are summed Take the mean value and normalize it to construct a diagonal matrix, and then calculate the inverse matrix of the diagonal matrix to obtain the final correction matrix.

For example, after data A is input into the model, the label probability of data A can be obtained, and the label corresponds to the first category. In this way, the probability corresponding to the first category can be obtained by summing the label probabilities of all the first category and taking the mean value , and then normalize the probabilities corresponding to all categories, and then construct a diagonal matrix and obtain the inverse matrix of the diagonal matrix to obtain the final correction matrix.

In some embodiments, S3, expanding the classification label of each category into multiple sub-labels, further comprising:

Use the preset model to obtain multiple synonyms corresponding to the classification labels of each category;

A preset number of words are respectively screened from multiple synonyms corresponding to each category as multiple sub-tags corresponding to each category.

In some embodiments, selecting a preset number of words from a plurality of synonyms corresponding to each category as a plurality of subtags corresponding to each category further includes:

Delete words that do not exist in the vocabulary of the trained model among multiple synonyms;

Adjust the output of the trained model to the probabilities of the remaining synonyms;

Delete the words whose probability is lower than the first threshold in the remaining synonyms according to the probability of the remaining synonyms output by the trained model;

Among the remaining synonyms, the words whose probability difference is smaller than the second threshold are deleted, and a preset number of words with the highest probability are selected as multiple sub-labels corresponding to each category.

Specifically, the extended label mapping vocabulary can be constructed by outputting synonyms through the word2vec model. For the selection of synonyms, an embedding corpus covering more than 8 million Chinese words and phrases can be used to train the word2vec model to reveal the correlation between words, and then filter multiple synonyms output by the word2vec model to obtain The final multiple subtags.

As shown in Figure 2, you can first use the traversal method to check whether each synonym is in the vocabulary space of the model, and delete the label mapping words that are not in it.

It should be noted that each word in the vocabulary space can be used as a label. When a piece of data is input into the model, the model can output the probability of each word in the vocabulary space, but since there are many words in the vocabulary space, the label output by the model can be adjusted according to actual needs.

Then you can input each data in the training set into the model to get the probability of the remaining synonyms, and divide those synonyms whose probability is lower than the average into rare words. These rare words will lead to inaccurate predicted probability, so it needs to be deleted. go.

Then, because the probabilities of the synonyms predicted by the model are very similar, the label expansion will be meaningless, so the synonyms with similar probability values can be deleted, and the one with the highest predicted probability among the synonyms can be retained, and the rest can be deleted.

Finally, the first N of the remaining vocabularies are selected as the final extended tag mapping vocabulary, for example, N=min(5, the number of tag mapping words after screening).

Through the above process, each tag is expanded into N synonyms, thereby eliminating the bias caused by a single tag.

For example, after augmenting the labels, the label "Praise" for the first category (eg Positive) is expanded to ["Good", "Positive", "Satisfied", "Excellent", "Great"], and for the second category (eg The label "bad review" on the reverse side) expands to ["bad", "negative", "disappointed", "poor", "bad"].

Ideally, all labels should appear with roughly the same frequency in the pre-training corpus. However, in the experiment, it was found that the frequency of tags appearing in the corpus is different, which makes the model have a preference for the prediction results. In practical applications, it is very difficult to manually select qualified label mapping words from a vocabulary space of nearly 60,000, and subjective factors are usually introduced. Therefore, the above screening method is used to expand the label mapping words,

In some embodiments, using the probability and correction matrix of multiple sub-labels corresponding to each category to determine the final category of the data to be classified further includes:

Calculate the average of the probabilities of multiple sub-labels corresponding to each category by category and multiply the average value corresponding to each category by the correction matrix as the corrected first probability, and take the maximum of the first probability of each category Values as categorical categories for the data.

Specifically, after the correction matrix is obtained, the data to be classified can be input into the model. At this time, the output of the model is the probability of multiple labels corresponding to each category, and then the probability of multiple sublabels corresponding to each category is calculated by category. The average value of the probability, and then the average value corresponding to each category is multiplied by the correction matrix as the corrected first probability, and the maximum value of the first probability of each category is used as the classification category of the data.

For example, after the data to be classified B is input into the model, the model can output the probabilities of 10 sub-labels, that is, the probabilities of ["good", "positive", "satisfied", "excellent", "rod"] and ["poor ", "Negative", "Disappointed", "Poor", "Bad"] and average the probabilities of ["Good", "Positive", "Satisfied", "Excellent", "Great"] Value, the probability of ["bad", "negative", "disappointed", "bad", "bad"] is averaged, then the average value is multiplied by the correction matrix, and finally the average value with the largest probability value corresponds to The category of is used as the final classification category of the data B to be classified.

Specifically, after the correction matrix is obtained, the data to be classified can be input into the model. At this time, the output of the model is the probability of multiple labels corresponding to each category, and then the probability of multiple sublabels corresponding to each category is included in The maximum value of is multiplied by the correction matrix as the corrected second probability, and the category corresponding to the sublabel with the highest probability is used as the second classification category of the data.

For example, after the data to be classified B is input into the model, the model can output the probabilities of 10 sub-labels, that is, the probabilities of ["good", "positive", "satisfied", "excellent", "rod"] and ["poor ", "Negative", "Disappointed", "Poor", "Bad"], then the probability of ["Good", "Positive", "Satisfied", "Excellent", "Great"] is the largest The value is multiplied by the correction matrix, the maximum value of the probability of ["poor", "negative", "disappointed", "bad", "bad"] is multiplied by the correction matrix, and finally the corresponding category with a large probability value is used as the target Final categorical category for categorical data B.

Multiply the probabilities of multiple sub-labels corresponding to each category by the correction matrix and take the average value by category as the corrected third probability, and use the maximum value of the third probability of each category as the third classification of the data category.

Specifically, after the correction matrix is obtained, the data to be classified can be input into the model. At this time, the output of the model is the probability of multiple labels corresponding to each category, and then the probabilities of multiple sub-labels corresponding to each category are respectively After multiplying by the correction matrix, take the average value by category as the corrected third probability, and use the maximum value of the third probability of each category as the third classification category of the data.

For example, after the data to be classified B is input into the model, the model can output the probabilities of 10 sub-labels, that is, the probabilities of ["good", "positive", "satisfied", "excellent", "rod"] and ["poor ", "Negative", "Disappointed", "Poor", "Bad"], then multiply the probabilities of ["Good", "Positive", "Satisfied", "Excellent", "Great"] by Calculate the average value after correcting the matrix, multiply the probabilities of ["poor", "negative", "disappointed", "bad", "bad"] by the correction matrix respectively, and calculate the average value, and finally the corresponding The category of is used as the final classification category of the data B to be classified.

In some embodiments, the data in the training set can also be corrected using the above three correction methods, and then the results obtained by each correction method are compared with the labels of the training set, and the one with the highest accuracy corresponds to the best correction optimization scheme, and then Calibrate using the best calibration scheme.

This application uses a correction optimization scheme that combines label expansion and correction, which not only eliminates the bias caused by the different frequency of labels in the pre-training corpus, but also corrects the bias caused by label words and input samples by replacing empty text with training set samples. This method can avoid retraining of huge models, and greatly improve the accuracy and stability of downstream tasks. This application applies the proposed optimization scheme to the CLUE Chinese classification data set, and loads the pre-trained 100 billion parameter model "Source 1.0". After the test, the news classification accuracy rate can be increased by 5 percentage points (52.09% before correction and 57.47% after correction) ), the subject classification accuracy rate of scientific literature can be increased by 7 percentage points (39.02% before correction, 46.57% after correction), and the classification accuracy rate of application description long text can be increased by 4 percentage points (34.89% before correction, 38.82% after correction). The accuracy rate of product emotion classification can be increased by 35 percentage points (51.25% before correction and 86.88% after correction).

The existing correction methods can only correct the bias of the model to the label through no text, but cannot correct the bias brought by the input samples. That is, the existing method corrects all categories to an unbiased state, but the category distribution in the data set is inconsistent. However, in this application, the training set samples are used to replace the empty text optimization correction algorithm, so that the model can be corrected according to the data distribution. At the same time, by expanding the labels, the bias caused by the different frequencies of labels in the pre-training corpus is eliminated.

Based on the same inventive concept, according to another aspect of the present application, an embodiment of the present application also provides a classification result correction system 400, as shown in FIG. 3 , including:

The construction module 401 is configured to construct a data set and mark each data in the data set with a classification label of a corresponding category;

The calculation module 402 is configured to input each data in the data set into the trained model to obtain the probability of the corresponding classification label and calculate a correction matrix using the classification label probability corresponding to each data;

The expansion module 403 is configured to expand the classification label of each category into multiple sub-labels;

An adjustment module 404 configured to adjust the output of the trained model to the probability of multiple sub-labels corresponding to each category;

The input module 405 is configured to input the data to be classified into the trained model to obtain the probability of multiple sub-labels corresponding to each category;

The correction module 406 is configured to use the probability of the plurality of sub-labels corresponding to each category and the correction matrix to determine the final category of the data to be classified.

In some embodiments, computing module 402 is further configured to:

The correction matrix is obtained after inverting the diagonal matrix.

In some embodiments, the extension module 403 is further configured to:

In some embodiments, the correction module 406 is further configured to:

Based on the same inventive concept, according to another aspect of the present application, as shown in FIG. 4 , an embodiment of the present application also provides a computer device 501, including:

at least one processor 520; and

Memory 510, the memory 510 stores a computer program 511 that can run on the processor, and the processor 520 performs the following steps when executing the program:

The correction matrix is obtained after inverting the diagonal matrix.

In some embodiments, screening a preset number of words from the plurality of synonyms corresponding to each category as the plurality of subtags corresponding to each category, further comprising:

Based on the same inventive concept, according to another aspect of the present application, as shown in FIG. 5 , the embodiment of the present application also provides a non-volatile readable storage medium 601, which stores Computer program instructions 610, the following steps are performed when the computer program instructions 610 are executed by the processor:

S2, input each data in the data set into the trained model to obtain the probability of the corresponding classification label and use the classification label probability corresponding to each data to calculate a correction matrix;

The correction matrix is obtained after inverting the diagonal matrix.

Finally, it should be noted that those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing related hardware through computer programs, and the programs can be stored in a computer-readable storage medium. When the program is executed, it may include the processes of the embodiments of the above-mentioned methods.

In addition, it should be understood that the non-volatile readable storage medium (for example, memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile memory and nonvolatile memory .

The various component embodiments of the present application may be realized in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all functions of some or all components in the computing processing device according to the embodiments of the present application. The present application can also be implemented as an apparatus or apparatus program (eg, computer program and computer program product) for performing a part or all of the methods described herein. Such a program implementing the present application may be stored on a computer-readable medium, or may be in the form of one or more signals. Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or in any other form.

For example, FIG. 6 shows a computing processing device that may implement a method according to the present application. The computing processing device includes thereon a processor 710 and a computer program product in the form of a memory 720 or a non-volatile readable storage medium. Memory 720 may be electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM. The memory 720 has a storage space 730 for program code 731 for performing any method step in the method described above. For example, the storage space 730 for program codes may include respective program codes 731 for respectively implementing various steps in the above methods. These program codes can be read from or written into one or more computer program products. These computer program products comprise program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such a computer program product is typically a portable or fixed storage unit as with reference to FIG. 7 . The storage unit may have storage segments, storage spaces, etc. arranged similarly to the memory 720 in the computing processing device of FIG. 6 . The program code can eg be compressed in a suitable form. Typically, the memory unit includes computer readable code 731', i.e. code readable by, for example, a processor such as 710, which code, when executed by a computing processing device, causes the computing processing device to perform the above-described method. each step.

Those of skill would also appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described generally in terms of their functionality. Whether such functionality is implemented as software or as hardware depends upon the particular application and design constraints imposed on the overall system. Those skilled in the art may implement the functions in various ways for each specific application, but such implementation decisions should not be interpreted as causing a departure from the scope disclosed in the embodiments of the present application.

The above are the exemplary embodiments disclosed in the present application, but it should be noted that various changes and modifications can be made without departing from the scope of the embodiments disclosed in the present application defined by the claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. In addition, although the elements disclosed in the embodiments of the present application may be described or required in an individual form, they may also be understood as plural unless explicitly limited to a singular number.

It should be understood that as used herein, the singular form "a" and "an" are intended to include the plural forms as well, unless the context clearly supports an exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.

The serial numbers of the embodiments disclosed in the above-mentioned embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments.

Those of ordinary skill in the art can understand that all or part of the steps for implementing the above embodiments can be completed by hardware, or can be completed by instructing related hardware through a program, and the program can be stored in a non-volatile readable storage medium. The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, and the like.

Those of ordinary skill in the art should understand that: the discussion of any of the above embodiments is exemplary only, and is not intended to imply that the scope (including claims) disclosed by the embodiments of the present application is limited to these examples; under the idea of the embodiments of the present application , the technical features in the above embodiments or different embodiments can also be combined, and there are many other changes in different aspects of the above embodiments of the present application, which are not provided in details for the sake of brevity. Therefore, within the spirit and principle of the embodiments of the present application, any omissions, modifications, equivalent replacements, improvements, etc., shall be included in the protection scope of the embodiments of the present application.

Claims

A classification result correction method, which includes the following steps:

Constructing a data set and marking each data in the data set with a classification label of a corresponding category;

Input each data in the data set into the trained model to obtain the probability of the corresponding classification label and calculate a correction matrix using the classification label probability corresponding to each data;

expanding said classification label for each category into a plurality of sub-labels;

adjusting the output of the trained model to the probability of a plurality of sub-labels corresponding to each category;

Input the data to be classified into the trained model to obtain the probability of multiple sub-labels corresponding to each category;

The final category of the data to be classified is determined by using the probabilities of the plurality of sub-labels corresponding to each category and the correction matrix.
The method according to claim 1, inputting each data in the data set into the trained model to obtain the probability of the corresponding classification label and using the classification label probability corresponding to each data to calculate a correction matrix, Further includes:

The probability of the classification label corresponding to each data is summed and averaged by category to obtain the probability corresponding to each category;

After normalizing the probability corresponding to each category, a diagonal matrix is constructed;

The correction matrix is obtained after inverting the diagonal matrix.
The method according to claim 1, expanding the classification label of each category into a plurality of sub-labels, further comprising:

Using a preset model to obtain a plurality of synonyms corresponding to the classification labels of each category;

Respectively select a preset number of words from the plurality of synonyms corresponding to each category as the plurality of subtags corresponding to each category.
The method according to claim 3, before said using a preset model to obtain a plurality of synonyms corresponding to the classification labels of each category, further comprising:

The preset model is obtained through training with a preset number of embedded corpora of Chinese word phrases.
The method according to claim 3, respectively screening a preset number of words from the plurality of synonyms corresponding to each category as the plurality of subtags corresponding to each category, further comprising:

Deleting words that do not exist in the vocabulary of the trained model in the plurality of synonyms;

adjusting the output of the trained model to the probabilities of the remaining synonyms;

Input each data in the data set into the trained model to obtain the probability of the remaining synonyms;

deleting the words whose probability is lower than the first threshold among the remaining synonyms according to the probability of the remaining synonyms output by the trained model;

Among the remaining synonyms, the words whose probability difference is smaller than the second threshold are deleted, and a preset number of words with the highest probability are selected as the plurality of sub-labels corresponding to each category.
The method according to claim 5, wherein the probability of the remaining synonyms output according to the trained model is to delete the words whose probability is lower than the first threshold in the remaining synonyms, further comprising:

According to the probabilities of the remaining synonyms output by the trained model, among the remaining synonyms, synonyms whose probability is lower than the average value are classified as rare words, and the rare words are deleted.
The method according to claim 5, said deleting words that do not exist in the vocabulary of the trained model in the plurality of synonyms, further comprising:

Checking whether each of the plurality of synonyms is in the vocabulary space of the trained model in a traversal manner, and deleting synonyms that are not in the vocabulary space.
The method according to claim 5, the words whose probability difference value is less than the second threshold among the remaining synonyms, further comprising:

Obtain the synonyms among the remaining synonyms, and delete the synonyms except the one with the highest probability.
The method according to claim 5, said selecting a preset number of words with the highest probability as the plurality of sub-tags corresponding to each category, further comprising:

Rank the words whose probability difference is smaller than the second threshold among the remaining synonyms in descending order of probability, and select a preset number of words ranked first as the plurality of sub-labels corresponding to each category.
According to the method according to any one of claims 3 to 9, the preset model is a word2vec model.
The method according to claim 1, using the probability of the plurality of sub-labels corresponding to each category and the correction matrix to determine the final category of the data to be classified, further comprising:

Calculate the average value of the probabilities of multiple sub-labels corresponding to each category by category and multiply the average value corresponding to each category by the correction matrix as the corrected first probability, and add the first probability of each category to The maximum value of is used as the classification category of the data.
The method according to claim 1, using the probability of the plurality of sub-labels corresponding to each category and the correction matrix to determine the final category of the data to be classified, further comprising:

The maximum value among the probabilities of multiple sub-labels corresponding to each category is multiplied by the correction matrix as the corrected second probability, and the category corresponding to the sub-label with the highest probability is used as the second classification category of the data.
The method according to claim 1, using the probability of the plurality of sub-labels corresponding to each category and the correction matrix to determine the final category of the data to be classified, further comprising:

Multiply the probabilities of multiple sub-labels corresponding to each category by the correction matrix and take the average value by category as the corrected third probability, and use the maximum value of the third probability of each category as the third probability of the data Three classification categories.
The method according to any one of claims 1 to 13, the pre-training model is a PLM model.
A classification result correction system, including:

A building module configured to construct a data set and mark each data in the data set with a classification label of a corresponding category;

A calculation module configured to input each data in the data set into the trained model to obtain the probability of the corresponding classification label and calculate a correction matrix using the classification label probability corresponding to each data;

An expansion module configured to expand the classification label of each category into a plurality of sub-labels;

An adjustment module configured to adjust the output of the trained model to the probability of multiple sub-labels corresponding to each category;

The input module is configured to input the data to be classified into the trained model to obtain the probability of a plurality of sub-labels corresponding to each category;

The correction module is configured to determine the final category of the data to be classified by using the probabilities of the plurality of sub-labels corresponding to each category and the correction matrix.
The apparatus according to claim 15, the computing module is further configured to:

The probability of the classification label corresponding to each data is summed and averaged by category to obtain the probability corresponding to each category;

After normalizing the probability corresponding to each category, a diagonal matrix is constructed;

The correction matrix is obtained after inverting the diagonal matrix.
A computer device comprising:

at least one processor; and

A memory, the memory stores a computer program that can run on the processor, wherein, when the processor executes the program, the steps of the classification result correction method according to any one of claims 1-14 are executed.
A non-volatile readable storage medium, the non-volatile readable storage medium stores a computer program, wherein, when the computer program is executed by a processor, the method described in any one of claims 1-14 is executed. The steps of the classification result correction method.
A computing processing device comprising:

a memory having computer readable code stored therein;

One or more processors, when the computer readable code is executed by the one or more processors, the computing processing device executes the steps of the method for correcting classification results according to any one of claims 1-14.
A computer program product, comprising computer readable code, when the computer readable code is run on a computing processing device, causing the computing processing device to execute the classification result correction method according to any one of claims 1-14 A step of.