CN115062709A - Model optimization method, device, equipment, storage medium and program product - Google Patents

Model optimization method, device, equipment, storage medium and program product Download PDF

Info

Publication number
CN115062709A
CN115062709A CN202210709588.2A CN202210709588A CN115062709A CN 115062709 A CN115062709 A CN 115062709A CN 202210709588 A CN202210709588 A CN 202210709588A CN 115062709 A CN115062709 A CN 115062709A
Authority
CN
China
Prior art keywords
model
prediction
label
sample
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210709588.2A
Other languages
Chinese (zh)
Inventor
牛帅程
吴家祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210709588.2A priority Critical patent/CN115062709A/en
Publication of CN115062709A publication Critical patent/CN115062709A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The embodiment of the application discloses a model optimization method, a model optimization device, model optimization equipment, a storage medium and a program product, which are suitable for scenes such as cloud technology, artificial intelligence and intelligent traffic. The method comprises the following steps: obtaining a test sample; extracting a plurality of predicted sample characteristics of the test sample by adopting a label prediction model, wherein each predicted sample characteristic is obtained by discarding part of characteristic information of the test sample; performing label prediction on the test sample based on each prediction sample characteristic to obtain a prediction result corresponding to each prediction sample characteristic; the prediction result comprises a prediction probability of each candidate label in at least one candidate label; and determining the model confidence of the label prediction model according to the prediction probability of each candidate label in the prediction result corresponding to each prediction sample characteristic, and performing model optimization processing on the label prediction model towards the direction of increasing the model confidence to obtain an optimized label prediction model, wherein the optimized label prediction model has higher robustness.

Description

Model optimization method, device, equipment, storage medium and program product
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a model optimization method, apparatus, device, storage medium, and program product.
Background
In recent years, the artificial intelligence technology has been developed and advanced, and has been widely applied to many practical scenes closely related to human life, which greatly enriches and facilitates people's daily life. For example, deep learning techniques in artificial intelligence techniques are now commonly applied in application scenarios such as image classification, speech recognition, machine translation, automatic driving, and smart medicine. Specifically, the deep learning technology may be used to perform model optimization on the label prediction model in each application scenario, so that the optimized label prediction model may have higher and more stable label prediction capabilities, and thus, the relevant device may process the optimized label prediction model to obtain an accurate prediction result.
Currently, many model optimization methods proposed based on the deep learning technology cannot optimize to obtain a label prediction model with strong robustness, which affects the prediction capability of the optimized label prediction model in practical application, and further causes the problem of low accuracy of the prediction result. Therefore, how to improve the robustness of the tag prediction model becomes a current research hotspot.
Disclosure of Invention
The embodiment of the application provides a model optimization method, a model optimization device, equipment, a storage medium and a program product, which can improve the robustness of a label prediction model for label prediction.
In one aspect, an embodiment of the present application provides a model optimization method, including:
obtaining a test sample;
extracting a plurality of predicted sample characteristics of the test sample by adopting a label prediction model, wherein each predicted sample characteristic is obtained by discarding part of characteristic information of the test sample, and the discarded characteristic information corresponding to different predicted sample characteristics is different;
performing label prediction on the test sample based on each prediction sample characteristic to obtain a prediction result corresponding to each prediction sample characteristic; wherein the prediction result comprises a prediction probability of each of at least one candidate label, the prediction probability being indicative of a probability that the test sample is predicted as the respective candidate label;
and determining the model confidence of the label prediction model according to the prediction probability of each candidate label in the prediction result corresponding to each prediction sample characteristic, and performing model optimization processing on the label prediction model towards the direction of increasing the model confidence to obtain an optimized label prediction model, wherein the optimized label prediction model is used for predicting the target label of the data to be predicted.
In another aspect, an embodiment of the present application provides a model optimization apparatus, including:
an acquisition unit for acquiring a test sample;
the characteristic extraction unit is used for extracting a plurality of predicted sample characteristics of the test sample by adopting a label prediction model, wherein each predicted sample characteristic is obtained by discarding part of characteristic information of the test sample, and the discarded characteristic information corresponding to different predicted sample characteristics is different;
the prediction unit is used for performing label prediction on the test sample based on each prediction sample characteristic to obtain a prediction result corresponding to each prediction sample characteristic; wherein the prediction result comprises a prediction probability of each of at least one candidate label, the prediction probability being indicative of a probability that the test sample is predicted as the respective candidate label;
and the model optimization unit is used for determining the model confidence coefficient of the label prediction model according to the prediction probability of each candidate label in the prediction result corresponding to each prediction sample characteristic, and performing model optimization processing on the label prediction model towards the direction of increasing the model confidence coefficient to obtain an optimized label prediction model, wherein the optimized label prediction model is used for predicting the target label of the data to be predicted.
In another aspect, an embodiment of the present application provides a computer device, including:
a processor for implementing one or more computer programs;
a computer storage medium storing one or more computer programs adapted to be loaded by the processor and to perform the method of model optimization according to the first aspect.
In yet another aspect, embodiments of the present application further provide a computer storage medium storing one or more computer programs adapted to be loaded by a processor and to perform the model optimization method according to the first aspect.
In a further aspect, embodiments of the present application provide a computer program comprising a computer program adapted to be loaded by a processor and to perform the model optimization method according to the first aspect.
In the embodiment of the application, the predicted sample features obtained by the computer device include partial feature information of the test sample, the computer device performs label prediction on the test sample according to each predicted sample feature in the plurality of predicted sample features, and further determines the model confidence coefficient based on the prediction probability of the candidate label in each prediction result, so that the computer device can perform model optimization processing on the label prediction model in the direction of increasing the model confidence coefficient. Increasing the confidence of the model means increasing the similarity of the prediction probabilities of the same candidate labels in each test result, and thus, it is easy to understand that increasing the confidence of the model can enable the computer device to obtain a more similar prediction result based on different prediction sample characteristics of the test sample. Therefore, the optimized label prediction model can predict and obtain a corresponding target label according to partial data characteristics of data to be predicted, the robustness of the label prediction model is enhanced, and the stability of computer equipment in deploying the label prediction model can be further improved to a certain extent.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart of model optimization to model application provided by an embodiment of the present application;
FIG. 2 is a schematic flow chart diagram of a model optimization method provided in an embodiment of the present application;
FIG. 3 is a schematic flow chart diagram of another model optimization method provided in the embodiments of the present application;
fig. 4a is a schematic diagram of an obtaining manner of a sub-model provided in an embodiment of the present application;
FIG. 4b is a schematic diagram of a model loss value determination method provided in an embodiment of the present application;
FIG. 5 is a schematic structural diagram of a model optimization apparatus provided in an embodiment of the present application;
fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make those skilled in the art better understand the method provided by the embodiments of the present application, the technical method in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It should be noted that the specific embodiments described in the embodiments of the present application are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the various embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
With the vigorous development of internet technology, the label prediction task has made great progress. The label prediction task mentioned here can be understood as: the task of predicting the label information contained in the corresponding data. The tag information may be, for example: text information, semantic information, image information, and the like. For example, when the tag information is text information, the tag prediction task may be specified as follows: text translation tasks, text label prediction tasks, speech recognition tasks (i.e., predicting the text content implied by the speech signal), and the like. When the tag information is semantic information, the tag prediction task may be a semantic understanding task, specifically: a task for predicting semantic information of a picture, a task for predicting semantic information of a text, or a task for predicting semantic information of a voice, and the like. When the tag information is image information, the tag prediction task may be an image generation task, an image retrieval task, or the like. The image generation task may be specifically configured to generate an image that matches the feature information (e.g., color information, semantic information, shape information, etc.) indicated by the corresponding data, and the image retrieval task may be configured to retrieve an image that matches the feature information indicated by the corresponding data.
Then, based on the above description, it is apparent that the tag prediction task can be applied in a variety of internet scenarios. For example, tag prediction may be applied to data retrieval scenarios. Taking video retrieval as an example, the tag prediction task can be used for performing tag prediction on massive videos to obtain a video tag (such as a video category or a video title) of each video, so that a corresponding video retrieval device can quickly and efficiently retrieve based on the video tag of each video. As another example, label prediction may also be applied to data classification scenarios (e.g., speech classification, video classification, text classification, image classification, etc.). Taking an image classification scene as an example, when one or more images to be classified are available, the image classification device may perform label prediction on each image to obtain label information of each corresponding image, so that the image classification device may classify images with the same or similar label information into the same category to facilitate browsing or using of subsequent related objects. Because the label prediction task can be applied in various internet scenes, the robustness of the label prediction task can be improved, and a certain promotion effect can be achieved on the development of the internet. Robustness mainly refers to: when some parameters (such as model parameters of the label prediction model) are slightly changed or the control quantity slightly deviates from the optimal value, the label prediction model can still maintain the stability and the effectiveness.
In order to improve the robustness of the label prediction task, the embodiment of the application provides a model optimization scheme by combining an artificial intelligence technology, and the model optimization scheme can be used for optimizing various label prediction models. The scheme indicates that: when the label prediction model is optimized, an optimization target can be constructed based on a plurality of prediction results of the same test sample, each prediction result is obtained by predicting the label prediction model based on the sample characteristics of one defect of the test sample, and the sample characteristics of the defects corresponding to different prediction results are different. Wherein, the optimization objective can be related to the model confidence, and the higher the similarity between the prediction results, the higher the model confidence. Wherein, the model confidence coefficient refers to: and the reliability of the prediction result obtained by the label prediction model. Such as: when the confidence of the model is 80%, the probability that the predicted result of the label prediction model is correct by 80% can be represented. Then, in the embodiment of the present application, model optimization may be performed on the tag prediction model in a direction of increasing the confidence of the model. In practical application, the optimized label prediction model can be used for performing label prediction on data to be predicted to obtain a corresponding prediction result. In order to make the relevant readers clearly understand the implementation principle of the embodiment of the present application and the existence significance of the optimized label prediction model, the following will explain the relevant steps and the principle of the embodiment of the present application in detail with reference to the explanation flow shown in fig. 1. Namely: in the subsequent embodiment, the process of performing model optimization on the label prediction model by using the test sample by the computer device is elaborated, and then the mode of performing label prediction on the data to be tested by using the optimized label test model by the computer device is elaborated.
In the model optimization scheme, each prediction result is obtained by performing label prediction based on incomplete sample characteristics, and the model confidence coefficient can reflect the similarity and the confidence coefficient between the prediction results. Therefore, it can be easily understood that when the model confidence degree meets the optimization target, the similarity between the prediction results is high, and the confidence degree of each prediction result is high. Then, that is, the optimized label prediction model can predict a prediction result with a higher reliability based on one or more incomplete sample features. Therefore, it is understood that, in the process of actually adopting the optimized tag prediction model to perform tag prediction, even if the optimized tag prediction model cannot perform feature extraction on data to be predicted to obtain relatively complete data features, the optimized tag prediction model can also predict and obtain a target tag of the data to be predicted. Therefore, the robustness of the optimized label prediction model is effectively improved by adopting the embodiment of the application.
In a specific application, the model optimization scheme may be implemented by using one or more computer devices (for convenience of description, the embodiment of the present application is described by taking one computer device to execute the related method as an example). The computer device may be a terminal device, a server, or a computing system composed of a terminal device and a server, which is not limited in this embodiment of the present application. And specifically, in the embodiment of the present application, the terminal device may include but is not limited to: the system comprises a smart phone, a tablet computer, a notebook computer, a desktop computer, a vehicle-mounted terminal, intelligent voice interaction equipment, intelligent household appliances, an aircraft and the like. In an embodiment, various Applications (APPs) and/or clients may also be run in the terminal device, such as: a multimedia playing client, a social client, a browser client, an information flow client, an education client, and an image processing client, among others. Further, the above-mentioned server may include, but is not limited to: the system comprises independent physical servers, a server cluster or distributed system formed by a plurality of physical servers, cloud servers and the like, wherein the cloud servers provide basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, Network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), big data platforms, artificial intelligence platforms and the like.
In addition, it should be noted that the Artificial Intelligence (AI) technique employed in the embodiments of the present application is a comprehensive technique of computer science that attempts to understand the essence of Intelligence and produce a new intelligent machine that can react in a manner similar to human Intelligence. The artificial intelligence is to study the design principle and the realization method of various intelligent machines, and the machines can have the functions of perception, reasoning and decision-making based on the artificial intelligence. In particular, artificial intelligence techniques can simulate, extend, and extend human intelligence using a digital computer or using a digital computer-controlled machine, such that the digital computer or related machine can perceive the environment, gain knowledge. That is, theories, methods, techniques and applications that use the learned knowledge of a digital computer or related machine to achieve optimal results can be implemented based on artificial intelligence. In practical applications, the artificial intelligence technology relates to a wide range of fields, including both hardware-level and software-level technologies. Specifically, the artificial intelligence hardware technology generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics, and the like. Artificial intelligence software techniques typically include computer vision techniques, speech processing techniques, natural language processing techniques, and machine learning/deep learning techniques.
When the model optimization method is provided, the machine learning/deep learning technology in the artificial intelligence technology is mainly utilized. To facilitate a clear understanding of implementations of embodiments of the present application, computer vision techniques and machine learning/deep learning techniques are briefly described below.
Machine Learning (ML) technology and deep Learning technology are one multi-domain cross disciplines, which may specifically relate to probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other multi-domain disciplines. The machine learning technology is a technology specially researched and used for simulating or realizing the learning behaviors of human beings by adopting a computer. Based on the machine learning technology, the computer can continuously acquire new knowledge or skills and can reorganize the existing knowledge structure to continuously improve the performance of the computer, so that better intelligent processing effects (such as image recognition effect, text translation effect, voice generation effect and the like) are achieved. Based on the above description, it is obvious that the machine learning technology is the core of the artificial intelligence technology and is the fundamental way to make the computer have intelligence. Thus, the application of machine learning techniques is spread across various areas of artificial intelligence. In practical applications, machine learning techniques and deep learning techniques typically include: artificial neural network, confidence network, reinforcement learning, transfer learning, inductive learning, formula teaching learning and the like.
In order to verify the practicability of the scheme, the related technical personnel perform experimental comparison on the performances of the MobileNet-V3 pre-training model and the MobileNet-V3 pre-training model optimized by the scheme in the ImageNet-C data set (a test data set, wherein test data in the test data set comprise various types of disturbance). Experiments show that compared with an unoptimized MobileNet-V3 pre-training model, the optimized MobileNet-V3 pre-training model based on the model optimization scheme provided by the embodiment of the application has obviously improved multiple performance indexes (see Table 1 for experimental data).
In Table 1, Base represents the MobileNet-V3 pre-training model before optimization in the embodiments of the present application, and ours represents the MobileNet-V3 pre-training model after optimization in the embodiments of the present application.
TABLE 1
Figure BDA0003706128710000071
Based on the principle of the model optimization scheme, the embodiment of the present application provides a specific model optimization method, which can still be executed by using computer equipment. Referring to fig. 2, fig. 2 is a schematic flow chart of the model optimization method according to the embodiment of the present application. As shown in FIG. 2, the model optimization method may include steps S201-S204:
s201, obtaining a test sample.
In the embodiment of the application, the number of the test samples obtained by the computer equipment can be one or more. The test sample refers to: samples for which certain label information needs to be predicted. The type of the tag information may be a text type, an image type, a voice type, a video type, or the like. Similarly, the type of the test sample may also include any one of a text type, an image type, a voice type, and a video type. For example, a test sample may include a text content, may include an image, may include a voice data, and may include a video content. In the embodiment of the present application, a test sample is taken as an example to describe a related implementation manner in detail.
In one embodiment, the test sample may carry a reference label that indicates the correct label information to which the test sample belongs. When the test sample carries the reference label, the process of model optimization of the label prediction model by the computer device can be understood as a supervised training process. In this case, the computer device may construct an optimization target based on a difference between the label information included in the prediction result and the reference label (e.g., the difference between the label information included in the prediction result and the reference label is smaller than a preset difference), so that the computer device may perform model optimization on the label prediction model towards the optimization target to obtain an optimized label prediction model.
In yet another embodiment, the test sample may also not carry a reference label. When the test sample does not carry the reference label, the process of model optimization of the label prediction model by the computer device can be understood as an unsupervised training process. In this case, the computer device may construct an optimization goal based on the plurality of predicted results (e.g., the similarity between the predicted results is greater than the similarity threshold), so that the computer device may perform model optimization on the label prediction model towards the optimization goal to obtain an optimized label prediction model. In the embodiment of the present application, the detailed description of the related implementation steps is mainly given by taking an example that the test sample does not carry a reference label.
S202, extracting a plurality of predicted sample characteristics of the test sample by adopting a label prediction model, wherein each predicted sample characteristic is obtained by discarding part of characteristic information of the test sample, and the discarded characteristic information corresponding to different predicted sample characteristics is different.
In particular embodiments, the label prediction model may be any type of neural network model, such as: ResNet (Residual Neural Network), MobileNet (lightweight Neural Network), NASNET (Neural Architecture Search Network), LSTM (Long Short-Term Memory Network), and the like. In particular, the label prediction model may be an initial neural network model that is not pre-trained. The initial neural network model refers to: when computer equipment needs to execute a certain label prediction task by means of a neural network model, the computer equipment constructs the neural network model with corresponding functions, and one or more model parameters in the neural network model can be obtained by random initialization. Based on the above description, it can be understood that, since the model parameters of the initial neural network model are obtained by random initialization, the predicted result generally cannot achieve the expected effect when the initial neural network model performs label prediction. Then, when the computer device performs model optimization on the initial neural network model, the required optimization time is usually long, and the workload is also large. Therefore, in order to increase the development rate of the label prediction task, the label prediction model in practical application may also be a neural network model after the computer device is pre-trained based on the training samples. By pre-training is meant: in order to obtain a neural network model with a prediction result meeting an expected value, the computer device may train (i.e., adjust model parameters) an initial neural network model with training samples, and during the training process, the model parameters initialized at the beginning may change continuously so that the prediction result obtained by the neural network model may gradually approach the expected effect, which may be referred to as a pre-training process.
In the embodiment of the application, the computer equipment adopts the label prediction model to perform feature extraction on the test sample, and the obtained prediction sample features only comprise partial feature information of the test sample. The feature information present in the predicted sample features may be: and the computer equipment randomly selects all extracted characteristic information in the process of extracting the characteristics of the test sample by adopting the label prediction model. In other words, in the process of acquiring the predicted sample feature by the computer device, the computer device may randomly discard the partial feature information of the extracted test sample, and use a sample feature generated based on the discarded partial feature information as the predicted sample feature. Wherein, the part of the feature information discarded by the computer device may include: some or all color information of the test sample, some or all semantic information of the test sample, some or all shape information of the test sample, some or all position information of the test sample, and the like.
In practical applications, the computer device may obtain a plurality of predicted sample features, and different predicted sample features are obtained by discarding different feature information by the computing device. Then, it can be understood that the corresponding feature distributions of different predicted sample features are not uniform. It should be noted that, in the embodiment of the present application, the multiple predicted sample features may be understood as: at least two predicted sample features. Each prediction sample characteristic is obtained by discarding part of characteristic information of the test sample by the computer device, and the discarded characteristic information corresponding to different prediction sample characteristics can be different. That is, among the plurality of prediction sample features, different prediction sample features include different feature information. For example, in the process of feature extraction of the test sample a by the computer device using the label prediction model, the computer device may discard part of the color features of the test sample to generate one predicted sample feature based on all extracted feature information except the discarded part of the color features. Similarly, the computer device may discard a part of the semantic features of the test sample to generate a further predicted sample feature based on all extracted feature information other than the discarded semantic features, and so on, the computer device may obtain a plurality of predicted sample features.
S203, label prediction is carried out on the test samples based on the characteristics of each prediction sample to obtain a prediction result corresponding to the characteristics of each prediction sample; the prediction result includes a prediction probability for each of the at least one candidate tag.
In a specific embodiment, after the computer device obtains a plurality of predicted sample features, the computer device may predict a predicted result based on each predicted sample feature. That is, if the number of the predicted sample features acquired by the computer device is N (N is a positive integer greater than 1, for example, N is 3), the number of the predicted results obtained by the computer device is also N, and one predicted sample feature corresponds to one predicted result. Wherein each prediction result comprises a prediction probability of each candidate label in at least one candidate label. The candidate tag may be understood as a tag that can be predicted by a tag prediction model, and in practical applications, the candidate tag may be preset by a computer device. The prediction probability may be used to indicate: the computer device predicts the probability of the test sample as a respective candidate label. For example, the predicted probability of candidate tag a can be understood as: the computer device considers the probability that the test sample possesses the trait indicated by candidate tag a. Or to be understood as: the probability that the computer device considers the test sample to be the object indicated by candidate label a. Objects referred to herein may be objects (e.g., people, flowers, grass), text (e.g., a video title, an article, etc.), video (e.g., a movie clip), audio (e.g., a piece of speech, a song, etc.), and so forth.
In an implementation manner, candidate labels corresponding to prediction probabilities existing in each prediction result are the same (in the following embodiments, the model optimization method proposed in the embodiment of the present application is explained by taking this implementation manner as an example). For example, assume that the number of prediction sample features acquired by the computer device is 3, which are prediction sample feature 1, prediction sample feature 2, and prediction sample feature 3, respectively, and assume that the total number of candidate labels is 2, which are label a and label B, respectively. Then, the computer device may perform label prediction on the test sample according to the prediction sample characteristic 1 to obtain a corresponding prediction result 1, perform label prediction on the test sample according to the prediction sample characteristic 2 to obtain a corresponding prediction result 2, and perform label prediction on the test sample according to the prediction sample characteristic 3 to obtain a corresponding prediction result 3. Each of the prediction results 1, 2 and 3 includes the prediction probability of the corresponding label a and the prediction probability of the corresponding label B.
In yet another implementation, the candidate labels corresponding to the prediction probabilities existing in each prediction result may also be different. Specifically, only the predicted probability of the candidate tag with the highest predicted probability (or meeting a probability condition, such as that the probability meets a threshold or the probability values are arranged in descending order of the top N bits, etc.) may exist in each predicted result. For example, taking the prediction probability that only the candidate label with the largest prediction probability may exist in each prediction result as an example, it is assumed that the number of the prediction sample features acquired by the computer device is 3, respectively, the prediction sample feature 1 and the prediction sample feature 2, and it is assumed that the total number of the candidate labels is 2, respectively, the label a and the label B. Then, if the computer device obtains the prediction probability of the candidate tag a as 70% and the probability of the candidate tag B as 30% according to the prediction of the prediction sample feature 1, the prediction probability of the candidate tag a may only exist in the prediction result corresponding to the prediction sample feature 1; similarly, if the computer device obtains the prediction probability of the candidate tag a by predicting according to the predicted sample feature 2 as 20% and the probability of the candidate tag B as 80%, the prediction probability of only the candidate tag B may exist in the prediction result corresponding to the predicted sample feature 2.
S204, determining the model confidence of the label prediction model according to the prediction probability of each candidate label in the prediction result corresponding to each prediction sample characteristic, and performing model optimization processing on the label prediction model towards the direction of increasing the model confidence to obtain the optimized label prediction model.
In an embodiment of the present application, the model confidence is used to indicate: and the reliability of the prediction result obtained by the label prediction model. Specifically, the higher the confidence of the model, the higher the confidence of the prediction result obtained by the label prediction model. Therefore, the computer device can perform model optimization processing on the tag prediction model in a direction of increasing the confidence of the model.
The mode of determining the confidence of the model by the computer device can be as follows: the computer equipment obtains the prediction probability of each candidate label based on the prediction result corresponding to each prediction sample characteristic to obtain a plurality of prediction probabilities corresponding to each candidate label. The computer device may then determine an average prediction probability for each candidate tag based on the plurality of prediction probabilities for the respective candidate tag. Further, the computer device may calculate a model loss value by using a method shown in formula 1, and then the computer device may determine a model confidence degree according to the model loss value by using a target algorithm, where the model confidence degree determined by the target algorithm and the model loss value are in a negative correlation relationship, and the target algorithm is not specifically limited in the present application.
Figure BDA0003706128710000111
Wherein x represents a test sample; y represents a candidate tag; y denotes a candidate tag set including a plurality of candidate tags, and any one of the candidate tags may be denoted by Y.
Figure BDA0003706128710000112
Representing the average probability that the label prediction model predicts the test sample x as the candidate label y (i.e., the average predicted probability of the candidate label y). In an exemplary manner, the first and second electrodes are,
Figure BDA0003706128710000113
can be calculated as shown in equation 2.
Figure BDA0003706128710000114
Where N represents the number of prediction sample features, and may also represent the number of prediction results. Correspondingly, N may represent the nth predicted sample feature of the nth predicted sample feature, M n And (y | x) represents the prediction probability of the candidate label y in the prediction result corresponding to the nth prediction sample characteristic. It should be noted that, in the above formula 1
Figure BDA0003706128710000115
Representing the model loss value, which is essentially the entropy of the information. The larger the information entropy is, the larger the information entropy isThe more uniform the prediction probability distribution of each candidate label, the higher uncertainty (i.e., lower confidence) is given to each prediction probability predicted by the label prediction model. Thus, entropy of information
Figure BDA0003706128710000116
May be used to indicate model confidence. And specifically, the information entropy is inversely related to the model confidence, i.e.: the larger the information entropy, the smaller the model confidence. Then, based on this, it is understood that the computer device can model-optimize the label prediction model toward a direction of decreasing information entropy.
In addition, based on the foregoing description about the feature of the prediction sample, different prediction results are determined based on different feature distributions. Since the model loss value is calculated according to the prediction probability of each candidate tag in each prediction result, the model loss value can also be used to measure the consistency between the prediction results, that is: the computer equipment can measure the generalization ability of the label prediction model in label prediction on test samples with inconsistent feature distribution according to the model loss value. The larger the model loss value is, the lower the consistency between the individual predicted results is, and the lower the corresponding generalization ability is. Therefore, it is easy to see that the computer device carries out model optimization on the label prediction model towards the direction of reducing the model loss value, and can also enhance the generalization capability of the label prediction model. The consistency between the prediction results may specifically refer to: consistency of prediction probabilities of the same candidate tag between different predictors. It will be appreciated that the closer the prediction probabilities of the same candidate tag are, the higher the agreement between the respective two predictors will be.
In the embodiment of the application, the predicted sample features obtained by the computer device include partial feature information of the test sample, the computer device performs label prediction on the test sample according to each predicted sample feature in the plurality of predicted sample features, and further determines the model confidence coefficient based on the prediction probability of the candidate label in each prediction result, so that the computer device can perform model optimization processing on the label prediction model in the direction of increasing the model confidence coefficient. Increasing the confidence of the model means increasing the similarity of the prediction probabilities of the same candidate labels in each test result, and thus, it is easy to understand that increasing the confidence of the model can enable the computer device to obtain a more similar prediction result based on different prediction sample characteristics of the test sample. Therefore, the optimized label prediction model can predict and obtain a corresponding target label according to partial data characteristics of data to be predicted, the robustness of the label prediction model is enhanced, and the stability of computer equipment in deploying the label prediction model can be further improved to a certain extent.
Based on the principle of the model optimization scheme and the implementation manner of the model optimization method, the embodiment of the application also provides another model optimization method, and the model optimization method can still be executed by adopting computer equipment. Referring to fig. 3, fig. 3 is a flow chart of the model optimization method. As shown in fig. 3, the model optimization method may include steps S301 to S307:
and S301, obtaining a test sample.
In an embodiment, a manner of obtaining the test sample by the computer device may refer to the related description of step S201, and details of the embodiment of the present application are not repeated herein.
S302, extracting a plurality of predicted sample characteristics of the test sample by adopting a label prediction model, wherein each predicted sample characteristic is obtained by discarding part of characteristic information of the test sample, and the discarded characteristic information corresponding to different predicted sample characteristics is different.
In an embodiment of the present application, the label prediction model may include at least one feature extraction layer, and each feature extraction layer may include a plurality of feature extraction modules. In practical applications, different feature extraction modules may be used to extract different feature information of the test sample. Since the predicted sample features in the embodiment of the present application include partial feature information of the test sample, it is understood that, in the embodiment of the present application, the computer device may use a partial feature extraction module in the label prediction model to perform feature extraction on the test sample, so as to obtain the predicted sample features of the test sample. Optionally, the computer device may further extract, by using all the feature extraction modules in the label prediction model, a reference sample feature including complete feature information of the test sample. Further, the computer device may discard partial feature information in the reference sample feature to obtain a predicted sample feature including the partial feature information.
The two ways of extracting the features of the prediction sample are described in detail below with reference to specific examples.
In one implementation, the manner in which the computer device employs the partial feature extraction module to obtain the predicted sample features may be as follows: the computer device selects at least one feature extraction model from each feature extraction module to obtain a plurality of target feature extraction modules. Then, the computer device may perform feature extraction on the test sample by using the plurality of target feature extraction modules, so as to obtain predicted sample features of the test sample. The number of the target feature extraction modules for extracting the predicted sample features is smaller than the total number of all the feature extraction modules in the label prediction model. In addition, the target feature extraction module may be randomly selected by the computer device, or may be selected by the computer device according to the feature rule (for example, the computer device selects the feature extraction module in which the model parameter with the higher optimization level is located). In order to facilitate a clear understanding of a specific implementation of the embodiments of the present application, the following description is made with reference to fig. 4 a.
As shown in fig. 4a, for a complete label prediction model (as shown by the structure labeled 40 in fig. 4 a), it may include multiple feature extraction layers (as represented by a row of circles labeled 401 in fig. 4 a). Each feature extraction layer may include at least one feature extraction module (e.g., represented by a circle labeled 402 in fig. 4 a). The computer device may then randomly discard a portion of the feature extraction modules in each feature extraction layer to generate a sub-model of the label prediction model (or understood to be a sub-network of feature extractions of the label prediction model). Further, the computer device may perform feature extraction on the test sample by using the generated sub-model to obtain a predicted sample feature of the test sample. The discarded feature extraction modules may be as indicated by the circles marked 411 in fig. 4a, and the submodels (or: sub-networks of feature extraction) may be as indicated by the structures marked 41 in fig. 4 a.
It is worth mentioning that in practical applications, the computer device may generate a plurality of feature extraction sub-networks, so that the computer device may obtain a plurality of predicted sample features in parallel using the plurality of feature extraction sub-networks. When the computer device obtains a plurality of predicted sample features in parallel, the computer device may first obtain a plurality of identical test samples, and then input each test sample into a different feature extraction sub-network, so as to obtain a plurality of corresponding predicted sample features. Of course, in other implementations, the computer device may obtain one predicted sample feature at a time, in which case, the computer device may obtain a plurality of predicted sample features by repeatedly executing the step of obtaining one predicted sample feature for a plurality of times, but it should be noted that at least two different predicted sample features are obtained during the process of repeatedly executing the step of obtaining the predicted sample features by the computer device.
In yet another implementation, the manner in which the computer device discards part of the feature information in the reference sample feature to obtain the predicted sample feature may be as follows: and the computer equipment adopts each feature extraction module in each feature extraction layer to extract the features of the test sample to obtain the reference sample features of the test sample. Further, the computer device may discard a part of feature information in the reference sample feature, and use the reference sample feature after discarding the part of feature information as the predicted sample feature. Then, in this case, in order to obtain a plurality of different predicted sample features, the computer device may discard different feature information in the reference sample feature a plurality of times, one predicted sample feature at a time.
S303, label prediction is carried out on the test samples based on the characteristics of each prediction sample to obtain a prediction result corresponding to the characteristics of each prediction sample; the prediction result includes a prediction probability for each of the at least one candidate tag.
In an embodiment of the present application, each prediction result may include a prediction probability of at least one candidate tag. As one prediction result may be obtained by performing label prediction on one prediction sample feature, and one prediction sample feature may be obtained by performing feature extraction on the test sample by using a feature extraction sub-network by the computer device, it can be understood that, in the embodiment of the present application, one prediction result may correspond to one feature extraction sub-network. Illustratively, the manner in which the computer device obtains multiple predictors may be as shown in FIG. 4 b. In fig. 4b, Sub-model X represents the xth feature extraction Sub-network (or Sub-model), and X is 1, 2, …, N. The predicted probability of each candidate tag may exist in the prediction result corresponding to each feature extraction sub-network.
S304, determining the model confidence of the label prediction model according to the prediction probability of each candidate label in the prediction result corresponding to each prediction sample characteristic.
In a specific implementation, the model confidence may be determined by the computer device based on the model loss value, and the model confidence is inversely related to the model loss value. Illustratively, the model confidence may be indicated by the difference between a preset parameter (e.g., 1) and the model loss value, or by the reciprocal value of the model loss value. In addition, in this embodiment of the application, the model loss value may be calculated by the computer device based on the prediction probability of each candidate tag by using an objective loss function, where the objective loss function may be as shown in formula 1 above, and the objective loss function is expressed in formula 1
Figure BDA0003706128710000151
The model loss value is represented, and further details of the embodiment of the present application are omitted here for other loss parameters. In addition, other specific implementation manners of determining the model confidence of the tag prediction model by the computer device may also be referred to in the related description of step S204, and the embodiments of the present application are not described herein again.
S305, obtaining a plurality of candidate model parameters in the label prediction model.
In a particular embodiment, the candidate model parameters refer to: model parameters used by the computer device in extracting the predicted sample features. Since the feature extraction module used by the computer device to obtain the predicted sample features is referred to as a target feature extraction module, and the computer device employs a plurality of target feature extraction modules for each of the plurality of predicted sample features. The manner in which the computer device obtains the candidate model parameters may then be as follows: for each predicted sample feature, the computer device may obtain a model parameter of each target feature extraction module of the plurality of target feature extraction modules corresponding to the predicted sample feature to obtain a candidate model parameter, and further, the computer device may use all candidate model parameters obtained corresponding to each predicted sample feature as the plurality of candidate model parameters obtained by the computer device.
In other embodiments, the predicted sample feature may be obtained by discarding feature information of a reference sample feature, and the reference sample feature is obtained by extracting, by the computer device, all feature extraction modules in the label prediction model, so in this embodiment, the computer device may also use each model parameter in the label prediction model as a candidate model parameter. That is, in this case, the plurality of candidate model parameters acquired by the computer device are all model parameters in the label prediction model.
S306, determining at least one model parameter to be optimized from the candidate model parameters, wherein the generalization ability strength value of each model parameter to be optimized meets the strength value condition.
The manner in which the computer device determines the at least one model parameter to be optimized may be as follows: the computer device back-propagates the model loss values and, during back-propagation, calculates a second derivative of the target loss function for each candidate model parameter. Wherein the absolute value of the second derivative may be used to indicate the generalization ability strength value of the corresponding candidate model parameter, and the higher the absolute value of the second derivative, the lower the generalization ability strength value of the corresponding candidate model parameter. In practical application, the higher the generalization ability strength value of the candidate model parameter is, the smaller the influence of the computer device on the prediction ability of the label prediction model after adjusting the generalization ability strength value is, so that the situation that the accuracy of a prediction result predicted by the label prediction model is greatly fluctuated in the process of performing model optimization on the label prediction model by the computer device can be avoided, and the stability of the model optimization is ensured. After the computer device obtains the absolute value of the second derivative of each candidate model parameter in the plurality of candidate model parameters, the computer device may determine at least one model parameter to be optimized based on the absolute value of the second derivative of each candidate model parameter. Specifically, the computer device may use the candidate model parameter whose generalization capability strength value satisfies the strength value condition as the model parameter to be optimized.
The intensity value condition may include, but is not limited to, any one of the following: (1) the generalization ability strength value is the minimum value of the generalization ability strength values corresponding to all candidate model parameters; (2) and after the generalization ability strength values of the candidate model parameters are sorted according to the sequence from small to large of the generalization ability strength values, the corresponding generalization ability strength values are positioned at the front M positions of the sorted sequence, and M is any positive integer. (3) The generalization ability strength value is smaller than the preset strength value.
S307, performing parameter adjustment on each model parameter to be optimized towards the direction of increasing the confidence coefficient of the model so as to perform model optimization processing on the label prediction model to obtain the optimized label prediction model.
After the computer device determines the parameters of the model to be optimized, the computer device may perform corresponding parameter adjustment on each parameter of the model to be optimized by using the model optimizer. For example, the computer device may adjust the corresponding model parameter to be optimized toward a direction of increasing the similarity between the prediction results of the test samples. The similarity between different prediction results can be measured by KL divergence (or called relative entropy), JS divergence (i.e. Jensen-Shannon divergence), cosine similarity, and the like, which is not limited in the embodiment of the present application. It can be understood that, in the embodiment of the present application, the computer device may adjust part of the model parameters in the tag prediction model, which may effectively reduce the computational resource overhead required by the computer device in the model optimization process, so that the corresponding model optimization method may also be deployed on the related device with low computational capability, and the application range of the model optimization method is extended to a certain extent, thereby extending the application range of the related tag prediction method.
In a specific embodiment, after the computer device obtains the optimized label prediction model, the computer device may perform label prediction processing on data to be predicted by using the optimized label prediction model. Specifically, after receiving the tag prediction request, the computer device may obtain the data to be predicted carried in the tag prediction request. Then, the computer device may extract the data features of the data to be predicted by using the optimized label prediction model, so that the computer device may perform label prediction on the data to be predicted based on the data features to obtain a target prediction probability of each candidate label in at least one candidate label, where the target prediction probability is used to indicate a probability that the data to be predicted is predicted as a corresponding candidate label. The computer equipment can adopt all the feature extraction modules in the label prediction model to extract features of the data to be predicted, and data features of the data to be predicted are obtained. That is to say, when the optimized label prediction model is actually applied, the computer device does not need to extract a plurality of data features of the data to be predicted, so that the workload of the computer device is effectively reduced, and the label prediction rate is improved to a certain extent.
Further, the computer device may select a target label of the data to be predicted from the candidate labels based on the target prediction probability of each candidate label. The target label of the data to be predicted can be one or more. Specifically, when the number of the target tags is one, the computer device may use the candidate tag with the maximum target prediction probability as the target tag of the data to be predicted, or the computer device may randomly select one of the candidate tags with the target prediction probability satisfying the probability threshold as the target tag. When the target labels are multiple, the computer device may use all candidate labels whose target prediction probabilities satisfy the probability threshold as the target labels of the data to be predicted, or after the target prediction probabilities of the candidate labels are arranged in descending order, the computer device may use the candidate label corresponding to the target prediction probability whose arrangement order is located at the top Q-bit as the target label, where Q is a positive integer.
In the embodiment of the application, when the computer device performs model optimization on the label prediction model, the computer device extracts different prediction sample characteristics of the same test sample, and then the computer device determines a model loss value based on the prediction probability of the candidate label in the prediction result corresponding to each prediction sample characteristic, so that the model loss value can be used for measuring the consistency between different prediction results and the confidence of each prediction result. The larger the model loss value is, the lower the consistency between the prediction results is, and the lower the confidence of each prediction result is. Therefore, the computer device optimizes the label prediction model towards the direction of increasing the confidence of the model, so that the optimized label prediction model has higher confidence. Moreover, because the optimized label prediction model can predict and obtain a consistent prediction result according to different prediction sample characteristics, the computer equipment can adopt the optimized label prediction model to predict and obtain a target label based on the incomplete data characteristics of the data to be predicted, so that the computer equipment does not need to pay more attention to whether the characteristic extraction of the corresponding data is complete or not in the actual label prediction process, the label prediction efficiency can be improved to a certain extent, and the robustness of the optimized label prediction model is ensured.
Based on the relevant description of the model optimization method, the embodiment of the application also discloses a model optimization device. The model optimization means may be one or more computer programs (including program code) running on the computer apparatus mentioned above. In a particular embodiment, the model optimization apparatus may be used to perform a model optimization method as shown in fig. 2 or fig. 3. Referring to fig. 5, the model optimization apparatus may include: an acquisition unit 501, a feature extraction unit 502, a prediction unit 503, a model optimization unit 504, and a model application unit 505. Wherein:
an obtaining unit 501, configured to obtain a test sample;
a feature extraction unit 502, configured to extract, by using a label prediction model, a plurality of prediction sample features of the test sample, where each prediction sample feature is obtained by discarding part of feature information of the test sample, and discarded feature information corresponding to different prediction sample features is different;
the prediction unit 503 is configured to perform label prediction on the test sample based on each prediction sample characteristic, so as to obtain a prediction result corresponding to each prediction sample characteristic; wherein the prediction result comprises a prediction probability of each of at least one candidate label, the prediction probability being indicative of a probability that the test sample is predicted as the respective candidate label;
the model optimization unit 504 is configured to determine a model confidence of the tag prediction model according to a prediction probability of each candidate tag in a prediction result corresponding to each prediction sample feature, and perform model optimization on the tag prediction model in a direction of increasing the model confidence to obtain an optimized tag prediction model, where the optimized tag prediction model is used to predict a target tag of data to be predicted.
In an embodiment, the model optimization unit 504 may be specifically configured to perform:
obtaining a plurality of candidate model parameters in the label prediction model;
determining at least one model parameter to be optimized from the candidate model parameters, wherein the generalization ability strength value of each model parameter to be optimized in the at least one model parameter to be optimized meets the strength value condition;
and carrying out parameter adjustment on each model parameter to be optimized towards the direction of increasing the confidence coefficient of the model so as to carry out model optimization processing on the label prediction model.
In yet another embodiment, the model confidence is determined based on a model loss value calculated based on the predicted probability of each candidate tag using an objective loss function; the model optimization unit 504 may be further configured to perform:
back-propagating the model loss values, and calculating a second derivative of the target loss function for each candidate model parameter in a back propagation process, wherein an absolute value of the second derivative is used for indicating a generalization capability intensity value of the corresponding candidate model parameter;
determining the at least one model parameter to be optimized based on the absolute value of the second derivative of each of the plurality of candidate model parameters.
In another embodiment, the label prediction model includes at least one feature extraction layer, each feature extraction layer includes a plurality of feature extraction modules, and the feature extraction unit 502 may be specifically configured to perform:
selecting at least one feature extraction module from each feature extraction layer to obtain a plurality of target feature extraction modules; the number of the target feature extraction modules is less than the total number of all feature extraction modules in the label prediction model, and the all feature extraction modules are used for extracting sample features including complete feature information of the test sample;
and performing feature extraction on the test sample by adopting the plurality of target feature extraction modules to obtain the predicted sample feature of the test sample.
In another embodiment, the label prediction model includes at least one feature extraction layer, each feature extraction layer includes a plurality of feature extraction modules, and the feature extraction unit 502 is further specifically configured to perform:
performing feature extraction on the test sample by adopting each feature extraction module in each feature extraction layer to obtain reference sample features of the test sample, wherein the reference sample features comprise complete feature information of the test sample;
and discarding part of feature information in the reference sample features, and taking the reference sample features after discarding part of feature information as the prediction sample features.
In yet another embodiment, the model optimization unit 504 can be further configured to perform:
for each prediction sample feature in the plurality of prediction sample features, obtaining a model parameter included in each target feature extraction module in a plurality of target feature extraction modules for extracting each prediction sample feature;
and taking all the obtained model parameters as candidate model parameters to obtain the candidate model parameters.
In yet another embodiment, the model application unit 505 may be configured to perform:
receiving a label prediction request, wherein the label prediction request carries data to be predicted;
extracting the data characteristics of the data to be predicted by adopting the optimized label prediction model;
performing label prediction on the data to be predicted based on the data characteristics to obtain a target prediction probability of each candidate label in the at least one candidate label;
and selecting the target label of the data to be predicted from the at least one candidate label based on the target prediction probability of each candidate label.
According to an embodiment of the present application, the steps involved in the model optimization methods shown in fig. 2 and 3 may be performed by the units in the model optimization apparatus shown in fig. 5. For example, step S201 shown in fig. 2 may be performed by the acquisition unit 501 in the model optimization apparatus shown in fig. 5; step S202 may be performed by the feature extraction unit 502 in the model optimization apparatus shown in fig. 5; step S203 may be performed by the prediction unit 503 in the model optimization apparatus shown in fig. 5; step S204 may be performed by the model optimization unit 504 in the model optimization apparatus shown in fig. 5. As another example, step S301 in the model optimization method shown in fig. 3 may be performed by the obtaining unit 501 in the model optimization apparatus shown in fig. 5; step S302 may be performed by the feature extraction unit 502 in the model optimization apparatus shown in fig. 5; step S303 may be performed by the prediction unit 503 in the model optimization apparatus shown in fig. 5; steps S304 to S307 may each be performed by the model optimization unit 504 in the model optimization apparatus shown in fig. 5.
According to another embodiment of the present application, each unit in the model optimization apparatus shown in fig. 5 is divided based on a logical function. The above units may be respectively or completely combined into one or several other units to form the structure, or some of the units may be further split into multiple functionally smaller units to form the structure, which may achieve the same operation without affecting the achievement of the technical effects of the embodiments of the present application. In other embodiments of the present application, the model optimization apparatus may also include other units, and in practical applications, these functions may also be implemented by being assisted by other units, and may also be implemented by being assisted by multiple units.
According to another embodiment of the present application, the model optimization apparatus shown in fig. 5 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the methods shown in fig. 2 and 3 on a general-purpose computing device, such as a domain name management device, including a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and the like, as well as a storage element, and implementing the model optimization method of the embodiment of the present application. The computer program may be embodied on, for example, a computer storage medium, and loaded into and executed by the computing device described above via the computer storage medium.
In the embodiment of the application, the predicted sample characteristics obtained by the model optimization device include partial characteristic information of the test sample, the model optimization device performs label prediction on the test sample according to each predicted sample characteristic in the plurality of predicted sample characteristics, and then determines the model confidence coefficient based on the prediction probability of the candidate label in each prediction result, so that the model optimization device can perform model optimization processing on the label prediction model in the direction of increasing the model confidence coefficient. Increasing the confidence of the model means increasing the similarity of the prediction probabilities of the same candidate labels in each test result, and thus, it is easy to understand that increasing the confidence of the model can enable the model optimization device to obtain a more similar prediction result based on different prediction sample characteristics of the test sample. Therefore, the optimized label prediction model can predict and obtain a corresponding target label according to partial data characteristics of the data to be predicted, the robustness of the label prediction model is enhanced, and the stability of the model optimization device in deploying the label prediction model can be further improved to a certain extent.
Based on the above description of the method embodiment and the apparatus embodiment, an embodiment of the present application further provides a computer device, please refer to fig. 6. The computer device comprises at least a processor 601 and a computer storage medium 602, and the processor 601 and the computer storage medium 602 may be connected by a bus or other means.
Among them, the above-mentioned computer storage medium 602 is a memory device in a computer device for storing programs and data. It is understood that the computer storage medium 602 herein may include both built-in storage media in a computer device and, of course, extended storage media supported by a computer device. The computer storage media 602 provides storage space that stores an operating system for the computer device. Also stored in this memory space are one or more computer programs, which may be one or more program codes, adapted to be loaded and executed by the processor 601. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one storage medium located remotely from the processor. The processor 601 (or CPU) is a computing core and a control core of the computer device, and is adapted to implement one or more computer programs, and in particular, is adapted to load and execute the one or more computer programs so as to implement corresponding method procedures or corresponding functions.
In one embodiment, one or more computer programs stored in the computer storage medium 602 may be loaded and executed by the processor 601 to implement the corresponding method steps in the method embodiments described above with respect to fig. 2 and 3. In particular implementations, one or more computer programs in the computer storage medium 602 may be loaded and executed by the processor 601 to perform the steps of:
obtaining a test sample;
extracting a plurality of predicted sample characteristics of the test sample by adopting a label prediction model, wherein each predicted sample characteristic is obtained by discarding part of characteristic information of the test sample, and the discarded characteristic information corresponding to different predicted sample characteristics is different;
performing label prediction on the test sample based on each prediction sample characteristic to obtain a prediction result corresponding to each prediction sample characteristic; wherein the prediction result comprises a prediction probability of each of at least one candidate label, the prediction probability being indicative of a probability that the test sample is predicted as the respective candidate label;
and determining the model confidence of the label prediction model according to the prediction probability of each candidate label in the prediction result corresponding to each prediction sample characteristic, and performing model optimization processing on the label prediction model towards the direction of increasing the model confidence to obtain an optimized label prediction model, wherein the optimized label prediction model is used for predicting the target label of the data to be predicted.
In one embodiment, the processor 601 may be specifically configured to load and execute:
obtaining a plurality of candidate model parameters in the label prediction model;
determining at least one model parameter to be optimized from the candidate model parameters, wherein the generalization ability strength value of each model parameter to be optimized in the at least one model parameter to be optimized meets the strength value condition;
and carrying out parameter adjustment on each model parameter to be optimized towards the direction of increasing the confidence coefficient of the model so as to carry out model optimization processing on the label prediction model.
In yet another embodiment, the model confidence is determined based on a model loss value calculated based on the predicted probability of each candidate tag using an objective loss function; the processor 601 may be specifically configured to load and execute:
back-propagating the model loss values, and calculating a second derivative of the target loss function for each candidate model parameter in a back propagation process, wherein an absolute value of the second derivative is used for indicating a generalization capability intensity value of the corresponding candidate model parameter;
determining the at least one model parameter to be optimized based on the absolute value of the second derivative of each of the plurality of candidate model parameters.
In another embodiment, the label prediction model includes at least one feature extraction layer, each feature extraction layer includes a plurality of feature extraction modules, and the processor 601 is specifically configured to load and execute:
selecting at least one feature extraction module from each feature extraction layer to obtain a plurality of target feature extraction modules; the number of the target feature extraction modules is less than the total number of all feature extraction modules in the label prediction model, and the all feature extraction modules are used for extracting sample features including complete feature information of the test sample;
and performing feature extraction on the test sample by adopting the plurality of target feature extraction modules to obtain the predicted sample feature of the test sample.
In another embodiment, the label prediction model includes at least one feature extraction layer, each feature extraction layer includes a plurality of feature extraction modules, and the processor 601 is specifically configured to load and execute:
performing feature extraction on the test sample by adopting each feature extraction module in each feature extraction layer to obtain reference sample features of the test sample, wherein the reference sample features comprise complete feature information of the test sample;
and discarding part of feature information in the reference sample features, and taking the reference sample features after discarding the part of feature information as the predicted sample features.
In yet another embodiment, the processor 601 may be specifically configured to load and execute:
for each prediction sample feature in the plurality of prediction sample features, obtaining a model parameter included in each target feature extraction module in a plurality of target feature extraction modules for extracting each prediction sample feature;
and taking all the obtained model parameters as candidate model parameters to obtain the candidate model parameters.
In yet another embodiment, the processor 601 may be specifically configured to load and execute:
receiving a label prediction request, wherein the label prediction request carries data to be predicted;
extracting the data characteristics of the data to be predicted by adopting the optimized label prediction model;
performing label prediction on the data to be predicted based on the data characteristics to obtain a target prediction probability of each candidate label in the at least one candidate label;
and selecting the target label of the data to be predicted from the at least one candidate label based on the target prediction probability of each candidate label.
In the embodiment of the application, the predicted sample characteristics obtained by the computer device include partial characteristic information of the test sample, the computer device performs label prediction on the test sample according to each predicted sample characteristic in the multiple predicted sample characteristics, and then determines the model confidence coefficient based on the prediction probability of the candidate label in each prediction result, so that the computer device can perform model optimization processing on the label prediction model towards the direction of increasing the model confidence coefficient. Increasing the confidence of the model means increasing the similarity of the prediction probabilities of the same candidate labels in each test result, and thus, it is easy to understand that increasing the confidence of the model can enable the computer device to obtain a more similar prediction result based on different prediction sample characteristics of the test sample. Therefore, the optimized label prediction model can predict and obtain a corresponding target label according to partial data characteristics of data to be predicted, the robustness of the label prediction model is enhanced, and the stability of computer equipment in deploying the label prediction model can be further improved to a certain extent.
The present application further provides a computer storage medium, where one or more computer programs corresponding to the model optimization method are stored in the computer storage medium, and when one or more processors load and execute the one or more computer programs, the description of the model optimization method in the embodiments may be implemented, which is not described herein again. The description of the beneficial effects of the same method is not repeated herein. It will be appreciated that the computer program may be deployed to be executed on one or more devices that are capable of communicating with each other.
It should be noted that according to an aspect of the present application, a computer product or a computer program is also provided, and the computer product includes a computer program, and the computer program is stored in a computer storage medium. A processor in the computer device reads the computer program from the computer storage medium and executes the computer program, thereby enabling the computer device to perform the methods provided in the various alternatives described above in connection with the model optimization method embodiments shown in fig. 2 and 3.
It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer storage medium and may include the processes of the above embodiments of the model optimization method when executed. The computer storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
It should be understood that the above-described embodiments are only exemplary of the present disclosure, and should not be construed as limiting the scope of the present disclosure, and those skilled in the art will understand that all or part of the above-described embodiments may be implemented and equivalents thereof may be made to the claims of the present disclosure while remaining within the scope of the present disclosure.

Claims (11)

1. A method of model optimization, comprising:
obtaining a test sample;
extracting a plurality of predicted sample characteristics of the test sample by adopting a label prediction model, wherein each predicted sample characteristic is obtained by discarding part of characteristic information of the test sample, and the discarded characteristic information corresponding to different predicted sample characteristics is different;
performing label prediction on the test sample based on each prediction sample characteristic to obtain a prediction result corresponding to each prediction sample characteristic; wherein the prediction result comprises a prediction probability of each of the at least one candidate label, the prediction probability being indicative of a probability that the test sample is predicted as the respective candidate label;
and determining the model confidence of the label prediction model according to the prediction probability of each candidate label in the prediction result corresponding to each prediction sample characteristic, and performing model optimization processing on the label prediction model towards the direction of increasing the model confidence to obtain an optimized label prediction model, wherein the optimized label prediction model is used for predicting the target label of the data to be predicted.
2. The method of claim 1, wherein the model optimizing the tag prediction model in the direction of increasing the model confidence level comprises:
obtaining a plurality of candidate model parameters in the label prediction model;
determining at least one model parameter to be optimized from the candidate model parameters, wherein the generalization ability strength value of each model parameter to be optimized in the at least one model parameter to be optimized meets the strength value condition;
and carrying out parameter adjustment on each model parameter to be optimized towards the direction of increasing the confidence coefficient of the model so as to carry out model optimization processing on the label prediction model.
3. The method of claim 2, wherein the model confidence is determined based on a model penalty value calculated using an objective penalty function based on the predicted probability of each candidate tag; the determining at least one model parameter to be optimized from the plurality of candidate model parameters includes:
back-propagating the model loss values, and calculating a second derivative of the target loss function for each candidate model parameter in a back propagation process, wherein an absolute value of the second derivative is used for indicating a generalization capability intensity value of the corresponding candidate model parameter;
determining the at least one model parameter to be optimized based on the absolute value of the second derivative of each of the plurality of candidate model parameters.
4. The method of claim 2 or 3, wherein the label prediction model comprises at least one feature extraction layer, each feature extraction layer comprising a plurality of feature extraction modules; the method for extracting the predicted sample characteristics of the test sample by adopting the label prediction model comprises the following steps:
selecting at least one feature extraction module from each feature extraction layer to obtain a plurality of target feature extraction modules; the number of the target feature extraction modules is less than the total number of all feature extraction modules in the label prediction model, and the all feature extraction modules are used for extracting sample features including complete feature information of the test sample;
and performing feature extraction on the test sample by adopting the plurality of target feature extraction modules to obtain the predicted sample feature of the test sample.
5. The method of claim 2 or 3, wherein the label prediction model comprises at least one feature extraction layer, each feature extraction layer comprising a plurality of feature extraction modules; the method for extracting the predicted sample characteristics of the test sample by adopting the label prediction model comprises the following steps:
performing feature extraction on the test sample by adopting each feature extraction module in each feature extraction layer to obtain reference sample features of the test sample, wherein the reference sample features comprise complete feature information of the test sample;
and discarding part of feature information in the reference sample features, and taking the reference sample features after discarding part of feature information as the prediction sample features.
6. The method of claim 4, wherein obtaining the plurality of candidate model parameters in the label prediction model comprises:
for each prediction sample feature in the plurality of prediction sample features, obtaining a model parameter included in each target feature extraction module in a plurality of target feature extraction modules for extracting each prediction sample feature;
and taking all the obtained model parameters as candidate model parameters to obtain the candidate model parameters.
7. The method of claim 1, further comprising:
receiving a label prediction request, wherein the label prediction request carries data to be predicted;
extracting the data characteristics of the data to be predicted by adopting the optimized label prediction model;
performing label prediction on the data to be predicted based on the data characteristics to obtain a target prediction probability of each candidate label in the at least one candidate label;
and selecting the target label of the data to be predicted from the at least one candidate label based on the target prediction probability of each candidate label.
8. A model optimization apparatus, comprising:
an acquisition unit for acquiring a test sample;
the characteristic extraction unit is used for extracting a plurality of predicted sample characteristics of the test sample by adopting a label prediction model, wherein each predicted sample characteristic is obtained by discarding part of characteristic information of the test sample, and the discarded characteristic information corresponding to different predicted sample characteristics is different;
the prediction unit is used for performing label prediction on the test sample based on each prediction sample characteristic to obtain a prediction result corresponding to each prediction sample characteristic; wherein the prediction result comprises a prediction probability of each of at least one candidate label, the prediction probability being indicative of a probability that the test sample is predicted as the respective candidate label;
and the model optimization unit is used for determining the model confidence of the label prediction model according to the prediction probability of each candidate label in the prediction result corresponding to each prediction sample characteristic, and performing model optimization processing on the label prediction model towards the direction of increasing the model confidence to obtain an optimized label prediction model, wherein the optimized label prediction model is used for predicting the target label of the data to be predicted.
9. A computer device, comprising:
a processor for implementing one or more computer programs;
a computer storage medium storing one or more computer programs adapted to be loaded by the processor and to perform the model optimization method according to any one of claims 1-7.
10. A computer storage medium, characterized in that it stores one or more computer programs adapted to be loaded by a processor and to perform the model optimization method of any one of claims 1 to 7.
11. A computer product, characterized in that the computer product comprises a computer program adapted to be loaded by a processor and to perform the model optimization method according to any of the claims 1-7.
CN202210709588.2A 2022-06-21 2022-06-21 Model optimization method, device, equipment, storage medium and program product Pending CN115062709A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210709588.2A CN115062709A (en) 2022-06-21 2022-06-21 Model optimization method, device, equipment, storage medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210709588.2A CN115062709A (en) 2022-06-21 2022-06-21 Model optimization method, device, equipment, storage medium and program product

Publications (1)

Publication Number Publication Date
CN115062709A true CN115062709A (en) 2022-09-16

Family

ID=83201414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210709588.2A Pending CN115062709A (en) 2022-06-21 2022-06-21 Model optimization method, device, equipment, storage medium and program product

Country Status (1)

Country Link
CN (1) CN115062709A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115543855A (en) * 2022-12-01 2022-12-30 江苏邑文微电子科技有限公司 Semiconductor device parameter testing method, device, electronic device and storage medium
CN116737129A (en) * 2023-08-08 2023-09-12 杭州比智科技有限公司 Supply chain control tower generation type large language model and construction method thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020000876A1 (en) * 2018-06-27 2020-01-02 北京字节跳动网络技术有限公司 Model generating method and device
CN112115995A (en) * 2020-09-11 2020-12-22 北京邮电大学 Image multi-label classification method based on semi-supervised learning
CN113158554A (en) * 2021-03-25 2021-07-23 腾讯科技(深圳)有限公司 Model optimization method and device, computer equipment and storage medium
CN114281932A (en) * 2021-09-13 2022-04-05 腾讯科技(深圳)有限公司 Method, device and equipment for training work order quality inspection model and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020000876A1 (en) * 2018-06-27 2020-01-02 北京字节跳动网络技术有限公司 Model generating method and device
CN112115995A (en) * 2020-09-11 2020-12-22 北京邮电大学 Image multi-label classification method based on semi-supervised learning
CN113158554A (en) * 2021-03-25 2021-07-23 腾讯科技(深圳)有限公司 Model optimization method and device, computer equipment and storage medium
CN114281932A (en) * 2021-09-13 2022-04-05 腾讯科技(深圳)有限公司 Method, device and equipment for training work order quality inspection model and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115543855A (en) * 2022-12-01 2022-12-30 江苏邑文微电子科技有限公司 Semiconductor device parameter testing method, device, electronic device and storage medium
CN116737129A (en) * 2023-08-08 2023-09-12 杭州比智科技有限公司 Supply chain control tower generation type large language model and construction method thereof
CN116737129B (en) * 2023-08-08 2023-11-17 杭州比智科技有限公司 Supply chain control tower generation type large language model and construction method thereof

Similar Documents

Publication Publication Date Title
CN108694225B (en) Image searching method, feature vector generating method and device and electronic equipment
CN111241345A (en) Video retrieval method and device, electronic equipment and storage medium
CN115062709A (en) Model optimization method, device, equipment, storage medium and program product
CN113158554B (en) Model optimization method and device, computer equipment and storage medium
CN111625715B (en) Information extraction method and device, electronic equipment and storage medium
CN113255354B (en) Search intention recognition method, device, server and storage medium
CN114282059A (en) Video retrieval method, device, equipment and storage medium
CN113515669A (en) Data processing method based on artificial intelligence and related equipment
CN113641797A (en) Data processing method, device, equipment, storage medium and computer program product
Tian et al. Sequential deep learning for disaster-related video classification
CN113806582A (en) Image retrieval method, image retrieval device, electronic equipment and storage medium
CN114332500A (en) Image processing model training method and device, computer equipment and storage medium
CN114519397A (en) Entity link model training method, device and equipment based on comparative learning
CN113870863A (en) Voiceprint recognition method and device, storage medium and electronic equipment
CN114332550A (en) Model training method, system, storage medium and terminal equipment
CN113128526A (en) Image recognition method and device, electronic equipment and computer-readable storage medium
CN117095460A (en) Self-supervision group behavior recognition method and system based on long-short time relation predictive coding
CN113362852A (en) User attribute identification method and device
CN111783734B (en) Original edition video recognition method and device
CN113076963B (en) Image recognition method and device and computer readable storage medium
CN115115966A (en) Video scene segmentation method and device, computer equipment and storage medium
CN113822291A (en) Image processing method, device, equipment and storage medium
CN111091198A (en) Data processing method and device
CN117575894B (en) Image generation method, device, electronic equipment and computer readable storage medium
CN109871487B (en) News recall method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination