CN117788979A

CN117788979A - Model pre-training method, model pre-training device, computer device, and storage medium

Info

Publication number: CN117788979A
Application number: CN202311871388.8A
Authority: CN
Inventors: 杨腾; 高鹏程; 唐永亮
Original assignee: Shenzhen Lingyun Shixun Technology Co ltd
Current assignee: Shenzhen Lingyun Shixun Technology Co ltd
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-03-29

Abstract

The application discloses a model pre-training method, a model pre-training device, computer equipment and a nonvolatile computer readable storage medium. The method includes performing data augmentation on an input sample using a first augmentation parameter to generate a first augmentation sample, and performing data augmentation on the input sample using a second augmentation parameter to generate a second augmentation sample, the first and second augmentation parameters being different; inputting a first augmented sample to a first training branch of the pre-training model to output first characteristic information, and inputting a second augmented sample to a second training branch of the pre-training model to output second characteristic information; determining feature loss information according to the first feature information and the second feature information; and updating model parameters of the first training branch and the second training branch according to the feature loss information to generate model pre-training parameters of the pre-training model. Limited industrial scene samples are fully utilized through data augmentation, and the samples can be free from labeling through self-supervision contrast learning.

Description

Model pre-training method, model pre-training device, computer device, and storage medium

Technical Field

The present application relates to the field of model training technology, and more particularly, to a model pre-training method, a model pre-training apparatus, a computer device, and a non-volatile computer-readable storage medium.

Background

With the rapid development of artificial intelligence technology, the artificial intelligence technology is widely applied in the industrial field, such as industrial defect identification, classification, segmentation and the like. However, industrial defect data tend to be shallow, weak, small, which makes labeling costly and requires some expertise by the data labeling personnel.

The existing technical means often adopts weights obtained by image Net pre-training to initialize a model, then carries out migration learning under the industrial specific task scene, and adopts a small amount of data with labels to finely adjust the model so as to obtain a special model which is finally applied to each industrial scene.

The prior art uses an ImageNet dataset to pretrain the model, but the ImageNet data are natural scene images, which have a huge inter-domain difference from the industrial scene data. Thus, models pre-trained using ImageNet data do not learn the general visual representation of the industrial scene image, so that more data is needed when the model is fine-tuned and iterative rounds shift model weights from natural to industrial scene, which results in reduced model performance under limited sample conditions and thus cannot be applied to actual scenes.

Disclosure of Invention

The embodiment of the application provides a model pre-training method, a model pre-training device, computer equipment and a non-volatile computer readable storage medium.

The model pre-training method comprises the steps of performing data augmentation on an input sample by using a first augmentation parameter to generate a first augmentation sample different from the input sample, and performing data augmentation on the input sample by using a second augmentation parameter to generate a second augmentation sample different from the input sample, wherein the first augmentation parameter and the second augmentation parameter are different; inputting the first augmented sample to a first training branch of a pre-training model to output first characteristic information, and inputting the second augmented sample to a second training branch of the pre-training model to output second characteristic information; determining feature loss information according to the first feature information and the second feature information; and updating model parameters of the first training branch and the second training branch according to the characteristic loss information so as to generate model pre-training parameters of the pre-training model.

In certain embodiments, the first and second augmentation parameters comprise at least one of color dithering, stochastic graying, gaussian blur, and exposure.

In some embodiments, the input samples are unlabeled samples, the first augmentation parameters corresponding to each of the input samples are the same, and the second augmentation parameters corresponding to each of the input samples are the same; or, the first augmentation parameters corresponding to the input samples are at least partially different, and the second augmentation parameters corresponding to the input samples are at least partially different.

In some embodiments, the model pre-training method further comprises pre-processing each original sample to generate the input samples of the same size, respectively, the pre-processing comprising at least one of clipping and downsampling.

In some embodiments, the determining feature loss information according to the first feature information and the second feature information includes: calculating cosine similarity according to the first characteristic information and the second characteristic information; calculating cosine similarity loss according to the cosine similarity, and taking the cosine similarity loss as the characteristic loss information; or calculating cross entropy loss according to the cosine similarity as the characteristic loss information.

In some implementations, the updating model parameters of the first training branch and the second training branch according to the feature loss information includes: updating model parameters of the first training branch according to the characteristic loss information; and updating the model parameters of the second training branch according to the sliding average value of the model parameters of the first training branch after the continuous multi-round training process.

In some embodiments, the number of perceptron layers of the first training branch is greater than the number of perceptron layers of the second training branch.

The model pre-training device comprises a data augmentation module, a feature extraction module, a loss calculation module and a parameter updating module. The data augmentation module is used for performing data augmentation on an input sample by using a first augmentation parameter to generate a first augmentation sample different from the input sample, performing data augmentation on the input sample by using a second augmentation parameter to generate a second augmentation sample different from the input sample, wherein the first augmentation sample and the second augmentation sample are different; the feature extraction module is used for inputting the first augmentation sample to a first training branch of the pre-training model to output first feature information, and inputting the second augmentation sample to a second training branch of the pre-training model to output second feature information; the loss calculation module is used for determining characteristic loss information according to the first characteristic information and the second characteristic information; the parameter updating module is used for updating model parameters of the first training branch and the second training branch according to the characteristic loss information so as to generate model pre-training parameters of the pre-training model.

The computer device of an embodiment of the present application includes a processor, a memory, and a computer program, where the computer program is stored in the memory and executed by the processor, and the computer program includes instructions for executing the model pre-training method of any of the embodiments above.

The non-transitory computer readable storage medium of an embodiment of the present application includes a computer program that, when executed by a processor, causes the processor to perform the model pre-training method of any of the embodiments described above.

According to the model pre-training method, the model pre-training device, the computer equipment and the computer readable storage medium, the input samples are respectively subjected to data by using different augmentation parameters, so that a first augmentation sample and a second augmentation sample which are different from the input samples are respectively generated, more samples in the current industrial scene are obtained, and subsequent model pre-training is performed. Therefore, the information contained in the input sample is fully mined through data augmentation, so that the universal visual representation in the industrial scene is better learned, and the performance of the model in the application of the downstream task with limited annotation data is improved.

And the first augmentation sample and the second augmentation sample corresponding to the same input sample form a positive sample pair, negative samples are formed among different input samples, different training branches are respectively input by utilizing the first augmentation sample and the second augmentation sample forming the positive sample pair, and the self-supervision pre-training is realized by comparing the characteristic loss of different branches.

Additional aspects and advantages of embodiments of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of embodiments of the application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a schematic illustration of an application scenario of a model pre-training method of certain embodiments of the present application;

FIG. 2 is a flow diagram of a model pre-training method of certain embodiments of the present application;

FIG. 3 is a flow diagram of a model pre-training method of certain embodiments of the present application;

FIG. 4 is a flow diagram of a model pre-training method of certain embodiments of the present application;

FIG. 5 is a flow diagram of a model pre-training method of certain embodiments of the present application;

FIG. 6 is a block diagram of a model pre-training apparatus according to certain embodiments of the present application;

FIG. 7 is a schematic diagram of a connection state of a non-volatile computer readable storage medium and a processor according to some embodiments of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the embodiments of the present application and are not to be construed as limiting the embodiments of the present application.

To facilitate an understanding of the present application, the following description of terms appearing in the present application will be provided:

1. artificial intelligence (Artificial Intelligence, AI), is a theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions. The technical scheme provided by the embodiment of the application mainly relates to natural language processing technology in artificial intelligence, machine learning/deep learning and the like.

2. Self-supervision study: self-supervised learning is a training method without manual labeling, which trains models by automatically generating supervisory signals by utilizing the internal structure or implicit information of data. Compared with the traditional supervised learning, the self-supervised learning does not need to manually label a large number of training samples, so that the method has great potential in the aspects of reducing the labeling cost and improving the data utilization rate.

An application scenario of the model pre-training method of the present application is described first, as shown in fig. 1, and is a schematic application scenario of the model pre-training method provided in an embodiment of the present application, where the application scenario involves a terminal device 110 and a server 120, and the terminal device 110 may communicate with the server 120.

Fig. 1 illustrates one terminal device 110 and one server 120 as an example, and may include other numbers of terminal devices and servers in practice, which are not limited in this embodiment of the present application.

The computer device of the present application may be the terminal device 110; alternatively, the computer device may be the server 120; alternatively, the computer device may be a system composed of the terminal device 110 and the server 120.

In some implementations, the server 120 in fig. 1 may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and artificial intelligence platforms. The embodiments of the present application are not limited in this regard.

Optionally, in the embodiment of the present application, the terminal device 110 may be a device with rich man-machine interaction modes, having capability of accessing to the internet, generally carrying various operating systems, and having a relatively strong processing capability. The terminal device 110 may be a smart phone, smart glasses, a handheld terminal, a smart television, a tablet computer, a vehicle-mounted terminal, etc., but is not limited thereto.

In an implementation manner, the server 120 and the terminal device 110 may perform the model pre-training method provided in the embodiments of the present application in an interactive manner, or the terminal device 110 or the server 120 may perform the model pre-training method provided in the embodiments of the present application.

The model pre-training method of the present application will be described in detail below:

referring to fig. 2, an embodiment of the present application provides a model pre-training method, which includes:

step 011: the data is amplified using a first amplification parameter for the input sample to generate a first amplification sample different from the input sample, and the data is amplified using a second amplification parameter for the input sample to generate a second amplification sample different from the input sample, the first and second amplification parameters being different.

Specifically, in an industrial scene, as the collected sample size is generally smaller, samples in a shared sample database in a network are required to be used, but the difference between the network samples and the industrial scene is larger, after the model is pre-trained by using the network samples, more iteration rounds are still required for subsequent training adjustment, and the pre-training effect is poor.

The input samples of the present application may be large-scale industrial data sets built for self-supervised pre-training based on public as well as private unlabeled industrial data. In order to ensure the diversity of the data, redundant data with higher similarity can be removed by comparing the similarity in the feature space.

Therefore, the sample size is increased by carrying out data augmentation on the input samples in the industrial scene, so that the pre-training can be realized without using network samples with larger differences from the industrial scene, the pre-training effect is better, and the number of iteration rounds required in the subsequent training adjustment is less.

When data augmentation is performed, data augmentation is performed on an input sample using a first augmentation parameter to generate a first augmentation sample different from the input sample, data augmentation is performed on the input sample using a second augmentation parameter to generate a second augmentation sample different from the input sample, and the first and second augmentation samples are different due to the different first and second augmentation parameters.

And different amplified samples of the same input sample form positive sample pairs among different amplified samples of the same input sample, and negative sample pairs among different input samples are formed, so that the follow-up self-supervision pre-training is facilitated.

Optionally, the first and second augmentation parameters comprise at least one of color dithering, random graying, gaussian blurring, and exposure, the first and second augmentation parameters being different.

Color dithering is where the saturation, brightness, contrast, etc. of the original sample are randomly adjusted to produce a new image.

The random graying is to graying the original sample randomly, such as using different gray parameters randomly, adjusting the gray of the original sample, or selecting whether to graying randomly.

Gaussian blur, also known as gaussian smoothing, is commonly used to reduce image noise and to reduce the level of detail. The blurring technique produces an image that visually appears to be viewed through a frosted glass. Gaussian smoothing is also used in the preprocessing stage of computer vision algorithms to enhance the image effect of the image at different scale sizes.

Exposure refers to performing exposure treatment on an original sample, for example, adjusting the exposure degree of the original sample to obtain samples with different exposure degrees.

Optionally, the first and second augmentation parameters differ in the probability of at least one of gaussian blur and exposure. If the probability that the first and second augmentation parameters perform Gaussian blur is different; the first and second augmentation parameters have different probabilities of performing exposure; the first and second augmentation parameters differ in probability of performing both gaussian blur and exposure.

Optionally, since the application is self-supervision pre-training, the estimated input sample may be a non-labeled sample, and the sample is not required to be labeled manually.

Specifically, the first augmentation parameters corresponding to the respective input samples are the same. For each input sample, data augmentation is performed based on the same first augmentation parameters, such as color dithering, random graying, and gaussian blurring, then each input sample is required to perform color dithering, random graying, and gaussian blurring, thereby generating a first augmentation sample.

Likewise, the second augmentation parameters corresponding to the respective input samples are the same. For each input sample, data augmentation is performed based on the same second augmentation parameters, such as color dithering, random graying, and exposure, then each input sample is required to perform color dithering, random graying, and exposure, thereby generating a second augmentation sample.

It can be understood that when each input sample performs data augmentation through the augmentation parameters, taking the first augmentation parameter as an example, since specific parameters of color dithering, random graying and gaussian blurring can also be changed randomly (for example, the blurring parameters of gaussian blurring are changed randomly), the diversity of data augmentation can still be ensured, the diversity of samples is improved, and the generalization capability of the model is further improved.

Optionally, the first augmentation parameters corresponding to the respective input samples are at least partially different, and the second augmentation parameters corresponding to the respective input samples are at least partially different.

Specifically, the first augmentation parameters corresponding to the respective input samples are at least partially different. For each input sample, data augmentation may be performed based on a different first augmentation parameter. For each input sample, data augmentation may be performed based on a different second augmentation parameter. And the specific parameters in the first augmentation parameters and the second augmentation parameters can also be changed randomly (such as the fuzzy parameters of Gaussian blur are changed randomly), so that the diversity of data augmentation is further improved, the diversity of samples is improved, and the generalization capability of a model is further improved.

Step 012: the first augmented sample is input to a first training branch of the pre-training model to output first characteristic information, and the second augmented sample is input to a second training branch of the pre-training model to output second characteristic information.

Specifically, after the first and second augmentation samples of the input samples are generated, the first augmentation sample may be input to the first training branch of the pre-training model to output the first feature information, and the second augmentation sample may be input to the second training branch of the pre-training model to output the second feature information.

Wherein the first training branch comprises a feature encoder and a multi-layer perceptron. The first augmented sample is encoded by a feature encoder and then passes through a multi-layer perceptron to obtain a final abstract representation (i.e., first feature information), often represented as a one-dimensional vector.

The second training branch also includes a feature encoder and a multi-layer perceptron. The second augmented sample is encoded by the feature encoder and then passed through the multi-layer perceptron to obtain a final abstract representation (i.e., second feature information), often represented as a one-dimensional vector.

The perceptron is a linear classification model of class II classification, the input is a feature vector of an example, the output is the class of the example, the positive class takes 1, and the negative class takes-1.

The difference between the first training branch and the second training branch includes the following 2 points:

(1) The model parameters of the first training branch and the second training branch are different, and the model parameters of the second training branch are determined according to the model parameters of the first training branch, that is, the model parameters of the first training branch are updated first when model training is performed, and then the model parameters of the second training branch are updated according to the model parameters of the first training branch.

(2) The number of perceptron layers of the first training branch is set to be greater than the number of perceptron layers of the second training branch. Therefore, an asymmetric structure can be realized, contrast learning is facilitated, and model collapse can be avoided.

Moreover, in industrial settings, defects in defect samples tend to be shallow, weak, small and have some similarity between different defect samples. The method of contrast learning is favorable for distinguishing good product samples from defective samples in data, and also favorable for distinguishing different defective forms, so that high-quality industrial visual characterization is learned.

Step 013: and determining feature loss information according to the first feature information and the second feature information.

Specifically, after the first training branch extracts the first feature information and the second training branch extracts the second feature information, the first feature information and the second feature information are features of the same input sample because a positive sample is formed between the first augmentation sample and the second augmentation sample, so that the feature loss information can be determined by comparing the first feature information and the second feature information.

The feature loss information may be used to characterize the accuracy of feature extraction of the model, and it may be understood that the larger the feature loss (e.g., the larger the difference between the first feature information and the second feature information), the larger the difference between the features extracted by the different training branches, which indicates that the accuracy of feature extraction is worse. And the smaller the feature loss (for example, the smaller the difference between the first feature information and the second feature information), the smaller the feature difference extracted by each of different training branches is, and the higher the feature extraction accuracy is.

Step 014: and updating model parameters of the first training branch and the second training branch according to the feature loss information to generate model pre-training parameters of the pre-training model.

Specifically, after the feature loss information is obtained, model parameters of the first training branch and the second training branch can be updated according to the feature loss information. The lower the accuracy of feature loss information characterization feature extraction (e.g., the greater the difference between the first feature information and the second feature information), the greater the model parameter adjustment of the first training branch and the second training branch, and vice versa, the smaller the model parameter adjustment, thereby realizing model parameter update through back propagation.

After the model parameters of the first training branch and the second training branch are updated, model pre-training parameters of the pre-training model can be generated, for example, the model parameters of the first training branch are used as the model pre-training parameters. Thus, the model can be pre-trained by self-supervision contrast learning.

By initializing the downstream task model by using the pre-training parameters, the performance of the downstream task is effectively improved, and particularly, the performance of the downstream task model is remarkably improved in a small sample application scene.

The pre-training model can be determined according to the requirements of the model, and if the detection efficiency of the model is required to be high, a lightweight model with high detection efficiency can be selected as the pre-training model; alternatively, a deep learning model with higher detection accuracy may be selected as the pre-training model if higher detection accuracy of the model is required.

According to the model pre-training method, the input samples are respectively subjected to data by using different augmentation parameters, so that a first augmentation sample and a second augmentation sample which are different from the input samples are respectively generated, more samples in the current industrial scene are obtained, and subsequent model pre-training is performed. Therefore, the information contained in the input sample is fully mined through data augmentation, so that the universal visual representation in the industrial scene is better learned, and the performance of the model in the application of the downstream task with limited annotation data is improved.

Referring to fig. 3, in some embodiments, the model pre-training method further comprises:

step 015: each original sample is preprocessed to generate input samples of the same size, respectively, the preprocessing including at least one of cropping and downsampling.

Specifically, the original samples in the industrial scene are various, and the sample sizes may be different, so that to improve the pre-training effect, the original samples may be preprocessed first to generate input samples with the same size respectively. Therefore, the sizes of all samples for training are the same, compared with the characteristics of samples with different sizes, the samples are extracted for comparison learning, the characteristic loss information is affected by the size difference, the accuracy is possibly lower, the pre-training efficiency and effect are lower, the samples with the same size are used for pre-training, the characteristic loss information is not affected by the size, the accuracy is higher, and the pre-training efficiency and effect can be improved.

The preprocessing may include at least one of clipping and downsampling, for example, clipping different original samples (for example, clipping the center of the original samples), and obtaining each input sample with a preset size (for example, 224×224), while ensuring that the feature body of the original sample is not clipped; or, downsampling different original samples to obtain respective input samples of a predetermined size (e.g., 224×224); alternatively, the different original samples are simultaneously clipped and downsampled to obtain respective input samples of a predetermined size (e.g., 224 x 224).

Referring to fig. 4, in certain embodiments, step 013: determining feature loss information according to the first feature information and the second feature information, including:

step 0131: according to the first characteristic information and the second characteristic information, calculating cosine similarity;

step 0132: calculating cosine similarity loss according to the cosine similarity, and taking the cosine similarity loss as characteristic loss information; alternatively, the cross entropy loss is calculated from the cosine similarity as the feature loss information.

Specifically, when calculating the feature loss information, the cosine similarity of the first feature information and the second feature information can be calculated first, and then cosine similarity loss (for example, cosine similarity loss=1-cosine similarity) is calculated through the cosine similarity, so as to be used as the feature loss information; alternatively, cosine similarity is used to calculate cross entropy loss (e.g., using cosine similarity and cross entropy loss function to calculate cross entropy loss) as the feature loss information.

Cross Entropy (Cross Entropy) is an indicator for measuring the difference between two probability distributions. In machine learning, we often need to compare the probability distribution of model predictions with the probability distribution of real labels. The cross entropy measures the prediction accuracy of the model by calculating the difference of the two probability distributions, the cross entropy being the minimum value 0 when the two probability distributions are identical and the cross being the maximum value when the two probability distributions are identical. Therefore, we want to improve the performance of the model by optimizing the model so that the cross entropy is as close to 0 as possible.

Cosine similarity (Cosine Similarity) is an indicator for measuring the similarity between two vectors, and in natural language processing, text is usually expressed in terms of questions, each dimension representing a feature or word. The cosine similarity measures the similarity degree of the two vectors by calculating the included angle between the two vectors, when the included angle of the two vectors is 0 degree, the cosine similarity is the maximum value of 1, which indicates that the two vectors are identical, and when the included angle of the two vectors is 90 degrees, the cosine similarity is the minimum value of 0, which indicates that the two vectors are completely different. Thus, cosine similarity can help us judge similarity or correlation between texts.

Care is taken in calculating the cross entropy to smooth and normalize the probability distribution. Second, the cosine similarity considers only the direction of the vector and ignores the length of the vector, and thus may be inaccurate when the difference in length of the vectors is large. Furthermore, cosine similarity also assumes that the relationship between vectors is linear, and may not be applicable to data of a nonlinear relationship. In this manner, the feature loss information may be calculated in different ways to accommodate different industrial scenarios.

Referring to fig. 5, in certain embodiments, step 014: updating model parameters of the first training branch and the second training branch according to the feature loss information, including:

step 0141: updating model parameters of the first training branch according to the characteristic loss information;

step 0142: and updating the model parameters of the second training branch according to the sliding average value of the model parameters of the first training branch after the continuous multi-round training process.

Specifically, when the model parameters are updated, the model parameters of the first training branch can be updated according to the back propagation of the characteristic loss information, and the model parameters of the corresponding first training branch can be updated and obtained in each training round; after multiple rounds of training, the model parameters of the second training branch may be updated according to a running average of the model parameters of the first training branch after successive rounds of training. If the training of the current wheel is completed, the model parameters of the second training branch corresponding to the current wheel can be used as the model parameters of the first training branch according to the model parameters of the first training branch obtained by the training of the current wheel and the sliding average value of the model parameters of the first training branch obtained by the training of the previous N (e.g. N is 1, 2, 3, etc.).

Referring to fig. 6, in order to better implement the model pre-training method according to the embodiment of the present application, the embodiment of the present application further provides a model pre-training apparatus 10. The model pre-training apparatus 10 may include a data augmentation module 11, a feature extraction module 12, a loss calculation module 13, and a parameter update module 14.

The data augmentation module 11 is configured to perform data augmentation on the input sample using a first augmentation parameter to generate a first augmentation sample different from the input sample, and perform data augmentation on the input sample using a second augmentation parameter to generate a second augmentation sample different from the input sample, where the first augmentation sample and the second augmentation sample are different; the feature extraction module 12 is configured to input a first augmentation sample to a first training branch of the pre-training model to output first feature information, and input a second augmentation sample to a second training branch of the pre-training model to output second feature information; the loss calculation module 13 is configured to determine feature loss information according to the first feature information and the second feature information; the parameter updating module 14 is configured to update model parameters of the first training branch and the second training branch according to the feature loss information, so as to generate model pre-training parameters of the pre-training model.

In certain embodiments, the model pre-training apparatus 10 further comprises a pre-processing module 15. The preprocessing module 15 is configured to perform preprocessing on each original sample to generate input samples with the same size, where the preprocessing includes at least one of clipping and downsampling.

In some embodiments, the loss calculation module 13 is specifically configured to calculate the cosine similarity according to the first feature information and the second feature information; calculating cosine similarity loss according to the cosine similarity, and taking the cosine similarity loss as characteristic loss information; alternatively, the cross entropy loss is calculated from the cosine similarity as the feature loss information.

In some embodiments, the parameter updating module 14 is configured to update the model parameters of the first training branch according to the feature loss information; and updating the model parameters of the second training branch according to the sliding average value of the model parameters of the first training branch after the continuous multi-round training process.

The model pre-training device 10 is described above in connection with the accompanying drawings from the perspective of functional modules, which may be implemented in hardware, instructions in software, or a combination of hardware and software modules. Specifically, each step of the method embodiments in the embodiments of the present application may be implemented by an integrated logic circuit of hardware in a processor and/or an instruction in software form, and the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented as a hardware encoding processor or implemented by a combination of hardware and software modules in the encoding processor. Alternatively, the software modules may be located in a well-established storage medium in the art such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and the like. The storage medium is located in a memory, and the processor reads information in the memory, and in combination with hardware, performs the steps in the above method embodiments.

The computer device of an embodiment of the present application comprises a processor, a memory, and a computer program, wherein the computer program is stored in the memory and executed by the processor, the computer program comprising instructions for performing the model pre-training method of any of the embodiments described above.

Alternatively, the computer device may be any device having image processing capabilities, such as a server or terminal device (e.g., a cell phone, tablet, display device, notebook, smart watch, head display device, gaming machine, etc.).

Referring to fig. 7, the embodiment of the present application further provides a computer readable storage medium 300, on which a computer program 310 is stored, where the computer program 310, when executed by the processor 320, implements the steps of the model pre-training method of any of the foregoing embodiments, which are not described herein for brevity.

In the description of the present specification, reference to the terms "certain embodiments," "in one example," "illustratively," and the like, means that a particular feature, structure, material, or characteristic described in connection with the embodiments or examples is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the present application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the present application.

Claims

1. A method of model pre-training, comprising:

performing data augmentation on an input sample using a first augmentation parameter to generate a first augmentation sample different from the input sample, performing data augmentation on the input sample using a second augmentation parameter to generate a second augmentation sample different from the input sample, the first and second augmentation parameters being different;

inputting the first augmented sample to a first training branch of a pre-training model to output first characteristic information, and inputting the second augmented sample to a second training branch of the pre-training model to output second characteristic information;

determining feature loss information according to the first feature information and the second feature information;

and updating model parameters of the first training branch and the second training branch according to the characteristic loss information so as to generate model pre-training parameters of the pre-training model.

2. The model pre-training method of claim 1, wherein the first and second augmentation parameters comprise at least one of color dithering, stochastic graying, gaussian blur, and exposure.

3. The model pre-training method of claim 1, wherein the input samples are unlabeled samples, the first augmentation parameters corresponding to each of the input samples are the same, and the second augmentation parameters corresponding to each of the input samples are the same; or, the first augmentation parameters corresponding to the input samples are at least partially different, and the second augmentation parameters corresponding to the input samples are at least partially different.

4. The model pre-training method of claim 2, further comprising:

each original sample is preprocessed to generate the input samples of the same size, respectively, the preprocessing including at least one of clipping and downsampling.

5. The model pre-training method of claim 1, wherein the determining feature loss information from the first feature information and the second feature information comprises:

calculating cosine similarity according to the first characteristic information and the second characteristic information;

calculating cosine similarity loss according to the cosine similarity, and taking the cosine similarity loss as the characteristic loss information; or calculating cross entropy loss according to the cosine similarity as the characteristic loss information.

6. The model pre-training method according to claim 1 or 5, wherein the updating model parameters of the first training branch and the second training branch according to the feature loss information comprises:

updating model parameters of the first training branch according to the characteristic loss information;

and updating the model parameters of the second training branch according to the sliding average value of the model parameters of the first training branch after the continuous multi-round training process.

7. The model pre-training method of claim 1, wherein the number of perceptron layers of the first training branch is greater than the number of perceptron layers of the second training branch.

8. A model pre-training device, characterized in that the model pre-training device comprises:

a data augmentation module for data augmentation of an input sample using a first augmentation parameter to generate a first augmentation sample different from the input sample, and data augmentation of the input sample using a second augmentation parameter to generate a second augmentation sample different from the input sample, the first and second augmentation samples being different;

the feature extraction module is used for inputting the first augmentation sample to a first training branch of the pre-training model to output first feature information, and inputting the second augmentation sample to a second training branch of the pre-training model to output second feature information;

the loss calculation module is used for determining feature loss information according to the first feature information and the second feature information;

and the parameter updating module is used for updating the model parameters of the first training branch and the second training branch according to the characteristic loss information so as to generate model pre-training parameters of the pre-training model.

9. A computer device, comprising:

a processor, a memory; and

A computer program, wherein the computer program is stored in the memory and executed by the processor, the computer program comprising instructions for performing the model pre-training method of any of claims 1 to 7.

10. A non-transitory computer readable storage medium containing a computer program which, when executed by a processor, causes the processor to perform the model pre-training method of any of claims 1-7.