CN114693950B

CN114693950B - Training method and device of image feature extraction network and electronic equipment

Info

Publication number: CN114693950B
Application number: CN202210431311.8A
Authority: CN
Inventors: 杨喜鹏; 李莹莹; 谭啸; 孙昊; 丁二锐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-04-22
Filing date: 2022-04-22
Publication date: 2023-08-25
Anticipated expiration: 2042-04-22
Also published as: CN114693950A

Abstract

The disclosure provides a training method, a training device and electronic equipment for an image feature extraction network, relates to the technical field of artificial intelligence, and particularly relates to the technical fields of computer vision, image processing, deep learning and the like, and can be applied to scenes such as smart cities and the like. Acquiring true value labels of sample image data and sample image data; performing region division on the sample image data to obtain sample image data comprising a plurality of partitions; inputting sample image data comprising a plurality of partitions into an image feature extraction network to obtain image features comprising the plurality of partitions; performing mask processing on at least one partition in the image features comprising a plurality of partitions to obtain mask processed image features; predicting based on the image characteristics after mask processing to obtain a current prediction result of sample image data; and adjusting parameters of the image feature extraction network according to the current prediction result and the true value label of the sample image data. The present disclosure enables training of an image feature extraction network.

Description

Training method and device of image feature extraction network and electronic equipment

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, image processing, deep learning and the like, and can be applied to scenes such as smart cities and the like.

Background

Image data is identified and classified in the field according to the characteristics of the image, which is a common image processing task. For such image processing tasks, an image feature extraction network for extracting features of an entire image is often trained, and classification discrimination of a domain is performed on the image based on the extracted image features.

Disclosure of Invention

The disclosure provides a training method and device for an image feature extraction network and electronic equipment.

According to an aspect of the present disclosure, there is provided a training method of an image feature extraction network, including:

acquiring sample image data and true value labeling of the sample image data;

performing region division on the sample image data to obtain sample image data comprising a plurality of partitions;

inputting the sample image data comprising a plurality of subareas into an image feature extraction network to obtain image features comprising a plurality of subareas;

performing mask processing on at least one partition in the image features comprising a plurality of partitions to obtain mask processed image features;

Predicting based on the image characteristics after mask processing to obtain a current prediction result of the sample image data;

and adjusting parameters of the image feature extraction network according to the current prediction result and the true value annotation of the sample image data.

According to another aspect of the present disclosure, there is provided another training method of an image feature extraction network, including:

acquiring sample image data and true value labeling of the sample image data;

performing mask processing on at least one partition in the sample image data comprising a plurality of partitions to obtain sample image data subjected to mask processing;

inputting the sample image data processed by the mask into an image feature extraction network to obtain image features comprising a plurality of partitions;

predicting based on the image characteristics comprising a plurality of partitions to obtain a current prediction result of the sample image data;

According to another aspect of the present disclosure, there is provided an image processing method including:

Extracting image features of an image to be processed by utilizing a pre-trained image feature extraction network; and determining a prediction result of the image to be processed based on the image characteristics of the image to be processed, wherein the image characteristic extraction network is obtained by training the image characteristic extraction network by the training method of any one of the image characteristic extraction networks.

According to another aspect of the present disclosure, there is provided a training apparatus of an image feature extraction network, including:

the first sample image acquisition module is used for acquiring sample image data and true value labels of the sample image data;

the second region dividing module is used for dividing the region of the sample image data to obtain sample image data comprising a plurality of partitions;

the first image feature obtaining module is used for inputting the sample image data comprising a plurality of subareas into an image feature extraction network to obtain image features comprising a plurality of subareas;

the second image feature obtaining module is used for carrying out mask processing on at least one partition in the image features comprising a plurality of partitions to obtain image features after mask processing;

the first image prediction module is used for predicting based on the image characteristics after mask processing to obtain a current prediction result of the sample image data;

And the first parameter adjustment module is used for adjusting parameters of the image feature extraction network according to the current prediction result and the true value annotation of the sample image data.

According to another aspect of the present disclosure, there is provided a training apparatus of another image feature extraction network, including:

the second sample image acquisition module is used for acquiring sample image data and true value labels of the sample image data;

the image data obtaining module is used for carrying out mask processing on at least one partition in the sample image data comprising a plurality of partitions to obtain sample image data after mask processing;

the third image feature obtaining module is used for inputting the sample image data processed by the mask into an image feature extraction network to obtain image features comprising a plurality of partitions;

the second image prediction module is used for predicting based on the image characteristics comprising a plurality of partitions to obtain a current prediction result of the sample image data;

and the second parameter adjustment module is used for adjusting parameters of the image feature extraction network according to the current prediction result and the true value annotation of the sample image data.

According to another aspect of the present disclosure, there is provided an image processing apparatus including:

the image prediction result determining module is used for extracting image features of the image to be processed by utilizing a pre-trained image feature extracting network; and determining a prediction result of the image to be processed based on the image characteristics of the image to be processed, wherein the image characteristic extraction network is obtained through training by a training device of any one of the image characteristic extraction networks.

According to the training method of the image feature extraction network, firstly, sample image data and true value labels of the sample image data are obtained, and the sample image data are subjected to regional division to obtain the sample image data comprising a plurality of subareas. Sample image data comprising a plurality of partitions is then input into an image feature extraction network to obtain image features comprising a plurality of partitions. And performing mask processing on at least one partition in the image features comprising a plurality of partitions to obtain mask processed image features, predicting based on the mask processed image features to obtain a current prediction result of the sample image data, and adjusting parameters of the image feature extraction network according to the current prediction result and true value labeling of the sample image data, so that training of the image feature extraction network is realized.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a flow chart of a training method of a first image feature extraction network provided in the present disclosure;

FIG. 2 is one possible embodiment of step S14 provided by the present disclosure;

FIG. 3a is one possible embodiment of step S15 provided by the present disclosure;

FIG. 3b is an exemplary diagram of a process for deriving image features to be analyzed provided by the present disclosure;

FIG. 4 is a flow chart of a training method of a second image feature extraction network provided by the present disclosure;

FIG. 5 is one possible embodiment of step S43 provided by the present disclosure;

FIG. 6 is one possible embodiment of step S45 provided by the present disclosure;

FIG. 7 is a schematic diagram of a training device of a first image feature extraction network provided by the present disclosure;

FIG. 8 is a schematic diagram of a training device of a second image feature extraction network provided by the present disclosure;

Fig. 9 is a block diagram of an electronic device for implementing a training method of an image feature extraction network of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the prior art, an image feature extraction network is generally utilized to extract image features of the whole sample image, and domain classification and discrimination are carried out on the image based on the extracted image features, and although the training mode of the image feature extraction network is simple, certain limitation exists. For example, if an image with complex features is encountered, the feature information of the whole image is redundant and large in quantity, and the whole image is extracted to be recognized by the features, so that the recognition becomes complex and difficult to recognize, and the recognition efficiency is reduced. Furthermore, if multiple complex images need to be identified at the same time, or multiple image identification tasks need to be processed at the same time, for example, tasks such as style conversion between multiple images, target detection and segmentation based on domain self-adaption are to be realized, and then whole image feature identification is performed one by one at this time, the identification efficiency is low, and it is difficult to ensure that multiple tasks are learned at the same time.

To solve at least one of the above problems, the present disclosure provides a training method of an image feature extraction network, including:

acquiring sample image data and true value labeling of the sample image data;

From the above, according to the training method of the image feature extraction network provided by the present disclosure, the prediction result is obtained based on the image features processed by the mask, without predicting all the image features based on the sample image data, and then comparing with the true value label of the sample image data, and then adjusting the parameters of the image feature extraction network, so as to train the image feature extraction network continuously, thereby realizing training of the image feature extraction network, increasing the domain discrimination capability of the model, increasing the generalization of the model, and reducing the over-fitting problem caused by information redundancy.

The training method of the image feature extraction network provided by the present disclosure is described in detail below through specific embodiments.

The method of the embodiment of the disclosure is applied to the intelligent terminal, and can be implemented by the intelligent terminal, and in the actual use process, the intelligent terminal can be a computer, a mobile phone and the like.

Referring to fig. 1, fig. 1 is a flowchart of a training method of a first image feature extraction network provided in the present disclosure, including:

step S11: and acquiring sample image data and true value labeling of the sample image data.

The sample image data is known image data for training the image feature extraction network, which may have complex or simple image features, and based on these image features, the sample image data has been subjected to an overproduction value marking representing classification information of the sample image data for comparison reference data as a training result of the image feature extraction network.

The content of the truth value annotation can be determined according to the actual application scene of the image feature extraction network, for example, in the scene of object detection, the truth value annotation of the sample image data can be the target frame of the object; in a scene of image scene recognition, the truth labels of the sample image data may be scene types.

Step S12: and carrying out region division on the sample image data to obtain sample image data comprising a plurality of partitions.

The sample image data is divided into regions, that is, the complete sample image data is divided into sample image data composed of a plurality of image regions, and the regions in the obtained sample image data are called partitions.

The area division of the sample image data is random division, the number of the areas can be randomly selected, and the sample image data is equally divided according to the number of the selected areas; the sample image data may be randomly divided according to gray scale, color, texture, shape, and the like.

Step S13: and inputting the sample image data comprising a plurality of subareas into an image feature extraction network to obtain image features comprising a plurality of subareas.

The sample image data after region division comprises a plurality of subareas, and the image characteristics comprising the plurality of subareas can be obtained by inputting the sample image data into an image characteristic extraction network, so that the characteristics of each subarea in the sample image data can be represented.

In one example, the image feature extraction network may be a res net network (a residual network) such as res net34 (a res net network), res net50 (a res net network), res net101 (a res net network), or the like, or a dark net (a deep learning network framework) frame network such as dark net19 (a dark net network), dark net53 (a dark net network), or the like. Specifically, an image feature extraction network of an appropriate volume size may be selected according to an application scenario, such as, for example, lightweight selection ResNet19, resNet34, darkNet19, medium-sized selection ResNet50, resnegXt 50 (a deep learning network), darkNet53, heavy-duty selection ResNet101, resnegXt 152 (a deep learning network), and the like.

Step S14: and carrying out mask processing on at least one partition in the image features comprising the partitions to obtain the image features after mask processing.

The mask refers to covering original image features through predetermined mask image data, so that the effect of shielding and shielding partial areas of the image features is realized. The mask may be a predetermined two-dimensional matrix array, a predetermined multi-value image, or the like. The specific manner of masking is not limited by this disclosure.

In the embodiment of the disclosure, at least one partition of the image features including the plurality of partitions is subjected to masking, that is, the image features of one or more partitions are selected from the extracted image features including the plurality of partitions and subjected to masking, so that the image features of one or more partitions in the image features after masking are covered, and the image features of one or more partitions remaining unmasked are uncovered.

Step S15: and predicting based on the image characteristics after mask processing to obtain the current prediction result of the sample image data.

As mentioned above, the masking process is to mask one or more partitions of the image features including the plurality of partitions, so that the image features of one or more partitions of the image features including the plurality of partitions are already masked, and the prediction is based on the image features after the masking process, and the prediction result for the sample image data is obtained based on the image features of one or more partitions that are not masked, i.e., are not masked.

Step S16: and adjusting parameters of the image feature extraction network according to the current prediction result and the true value annotation of the sample image data.

The content of the prediction result is determined according to the training purpose of the image feature extraction network, and corresponds to the true value label of the sample image data, for example, if the training purpose of the image feature extraction network is to perform feature extraction on an image so as to determine the domain to which the image belongs, the true value label of the sample image data is the domain to which the sample image data belongs, and the prediction result is the prediction for the domain to which the sample image data belongs.

And comparing the current prediction result with the true value label of the sample image data, and adjusting parameters of the image feature extraction network according to the difference between the current prediction result and the true value label. In one example, the parameters of the image feature extraction network are adjusted according to the current prediction result and the true value label of the sample image data, which may be that the loss of the current image feature extraction network is calculated according to the current prediction result and the true value label of the sample image data, the parameters of the image feature extraction network are adjusted according to the obtained loss, and then the sample image data is continuously processed by the image feature extraction network after the parameter adjustment, so that training of the image feature extraction network is continued.

In one embodiment of the present disclosure, when the preset third end condition is satisfied, the training of the image feature extraction network ends, and the adjustment parameters are not continued to process the sample image data. The preset third ending condition may be that the iteration number of the image feature extraction network reaches a preset iteration number threshold, loss of the image feature extraction network converges, and the like, and is determined according to actual requirements.

In a possible implementation manner, as shown in fig. 2, the step S14 performs masking processing on at least one partition of the image features including multiple partitions to obtain a masked image feature, where the masking processing includes:

Step S21: generating mask parameters according to preset mask rules.

The preset mask rule is a predetermined processing rule for performing mask processing on the image feature, and may be a selection rule for selecting one or more partitions from the image feature including a plurality of partitions to perform mask processing on the partition. The mask parameters then represent the selected regions requiring masking for image features comprising a plurality of partitions.

In one example, the preset mask rule may be to randomly select a certain proportion of the regions of the image feature including the plurality of regions to perform the masking process, and the mask parameter indicates the regions that need to perform the masking process. For example, if the preset mask rule is to randomly select 30% or 50% of the partitions of the image feature for mask processing, then according to the total number of the partitions included in the image feature, a corresponding number of partitions needing mask processing are randomly selected according to a proportion of 30% or 50%, that is, mask parameters are determined.

In one example, the total number of the partitions included in the image features is 16, the preset masking rule is to randomly select 25% of the partitions of the image features for masking, and the partitions 1, 4, 6 and 13 are randomly selected as the partitions needing masking, that is, the masking parameters represent the partitions 1, 4, 6 and 13.

Step S22: and performing mask processing on at least one partition in the image features comprising the partitions by using the currently generated mask parameters to obtain the image features after mask processing.

And performing mask processing on one or more subareas in the image characteristics comprising a plurality of subareas by using the currently generated mask parameters to obtain the image characteristics after mask processing.

For example, if the preset mask rule is that 50% of the partitions of the image feature are selected for masking, and the total number of the partitions included in the image feature is 100, after the partitions included in the image feature are ordered by 1-100, 50 partitions are randomly selected, and the mask parameter represents the 50 selected partitions. Masking the selected 50 partitions. The remaining 50 partitions are not subjected to masking processing, and the image features after masking processing are obtained.

In one example, after generating the mask parameters, identification information may be added to the partition to be masked based on the mask parameters, indicating that the partition needs to be masked. And then, carrying out mask processing on the corresponding subareas with the identification information added in the image features of the subareas according to the identification information to obtain the image features after mask processing. In one example, the identification information herein may be index information.

After training the feature extraction network with the current mask, the new mask parameters may be used to obtain new mask processed image features based on the current image features, so as to train the feature extraction network again, and in one possible implementation, the method further includes: generating new mask parameters according to preset mask rules, and training a feature extraction network by utilizing the currently generated mask parameters until a preset first ending condition is met, wherein the currently generated mask parameters are different from the previously generated mask parameters.

In one example, after adjusting the parameters of the image feature extraction network according to the current prediction result and the true value labeling of the sample image data in step S16, the method further includes: generating new mask parameters according to preset mask rules, wherein the currently generated mask parameters are different from the previously generated mask parameters. And returning to the execution step S22 to continue execution until the preset first ending condition is met.

After the parameters of the image feature extraction network are adjusted according to the current prediction result and the true value label of the sample image data, the sample image data are further processed by the image feature extraction network after parameter adjustment, so that training of the image feature extraction network after parameter adjustment is continued. It will be appreciated that repeated training of the parameter-adjusted image feature extraction network with sample image data having the same image features does not result in further training effort. However, in the training process described above, only the image features of the partial region including the unmasked portion of the sample image data are used, and the sample image data is predicted, so that the image feature extraction network is trained.

Therefore, after the parameters of the image feature extraction network are adjusted according to the current prediction result and the true value label of the sample image data, new mask parameters are generated according to preset mask rules, mask processing is performed on the image features comprising a plurality of subareas based on the new mask parameters, and specifically, different mask parameters represent different subareas of the image features to be masked. For example, the preset mask rule is to select 50% of the partitions of the image feature for mask processing, the total number of the partitions included in the image feature is 100, the previously generated mask parameters are to select the partitions with the sequence numbers of 1-50 for mask processing after the partitions included in the image feature are ordered by 1-100, and the new mask parameters may be to select the partitions with the sequence numbers of 11-60 for mask processing.

And (3) carrying out mask processing again on one or more subareas in the image characteristics comprising a plurality of subareas by utilizing the new mask parameters generated at present, and randomly selecting the subareas which meet the requirement in the image characteristics according to the mask parameters and are different from the subareas subjected to the mask processing selected last time, so as to carry out the mask processing, and obtain the image characteristics subjected to the mask processing. At this time, the image features after masking include unmasked image features different from those after masking the previous image features, and based on this, prediction is performed to obtain a current prediction result of the sample image data. And repeatedly executing the process until the preset first end condition is met, generating no new mask parameters, performing mask processing on the image features by using the new mask parameters, and predicting. I.e. the training of the image feature extraction network with the sample image data is stopped.

The preset first ending condition is a preset training ending condition, and in one example, the preset first ending condition may be that the iteration number reaches a preset iteration number threshold; in one example, the preset first end condition may be that a non-repeating mask parameter cannot be generated.

From the above, according to the training method of the image feature extraction network provided by the present disclosure, mask processing is performed on at least one partition in the image features including multiple partitions by using continuously generated new mask parameters, the image features after each mask processing have image features of different partitions for result prediction, based on the mask processing, the image feature extraction network can be repeatedly trained by using limited sample image data, while the training calculation amount is reduced, the training perfection of the image feature extraction network is ensured, and the training efficiency is improved.

In a possible embodiment, as shown in fig. 3a, the step S15 predicts based on the image features after the mask processing to obtain a prediction result of the sample image data, including:

step S31: extracting image features of each partition which is not subjected to mask processing from the image features subjected to mask processing to obtain image features of each target partition;

Step S32: splicing the target partition image features into image features to be analyzed;

step S33: and outputting the image characteristics to be analyzed to a discrimination network for analysis to obtain a prediction result of the sample image data.

And after the image features are subjected to mask processing according to the mask parameters, extracting the image features of each partition which is not subjected to mask processing, namely the image features of each partition which is not blocked by the mask image, from the image features after the mask processing, so as to obtain the image features of each target partition.

And splicing the obtained image features of each target partition to obtain the image features to be analyzed, and inputting the image features to be analyzed into a discrimination network for analysis to obtain a prediction result of sample image data. The above-mentioned stitching of the image features of each target partition may be free stitching, and the stitching mode is not limited.

In one embodiment of the present disclosure, instead of extracting the image features of each partition not subjected to masking, the image features after masking may be directly output to a discrimination network for analysis, where the input image features include the image features of each blocked partition after masking, and also include the image features of each unblocked partition not subjected to masking, and after the input image features are input to the discrimination network, the discrimination network still analyzes and discriminates the image features of the unblocked partition not subjected to masking, but in the embodiment of the present disclosure, the image features of the unblocked partition not subjected to masking need not be extracted, so that a prediction result of sample image data may be obtained.

As shown in fig. 3b, fig. 3b shows an exemplary diagram of a process for obtaining features of an image to be analyzed. In the figure, gray shielding areas are image features after mask processing, and all target image features obtained by extracting the image features which are not subjected to mask processing are spliced to obtain the image features to be analyzed.

From the above, according to the training method of the image feature extraction network provided by the present disclosure, the image features of each partition which is not subjected to mask processing are spliced into the image features to be analyzed, and the image features are output to the discrimination network for analysis, so as to obtain the prediction result of the sample image data, that is, the discrimination network only needs to predict the image features of each partition which is not subjected to mask processing, and does not need to predict all the image features of the sample image data, thereby effectively reducing the calculation amount of the discrimination network and further improving the efficiency of network training.

Referring to fig. 4, fig. 4 is a flowchart of a training method of a second image feature extraction network provided in the present disclosure, including:

step S41: and acquiring sample image data and true value labeling of the sample image data.

Step S42: performing region division on the sample image data to obtain sample image data comprising a plurality of partitions;

Step S43: and carrying out mask processing on at least one partition in the sample image data comprising a plurality of partitions to obtain sample image data after mask processing.

The mask refers to covering original image data through predetermined mask image data, so that the effect of shielding and shielding partial areas of the image data is realized. The mask may be a predetermined two-dimensional matrix array, a predetermined multi-value image, or the like. The specific manner of masking is not limited by this disclosure.

In the embodiment of the disclosure, at least one partition in sample image data including a plurality of partitions is subjected to masking, that is, in the extracted sample image data including a plurality of partitions, sample image data of one or more partitions is selected for masking, one or more partitions in the sample image data after masking are covered, and one or more partitions remaining unmasked are uncovered.

Step S44: and inputting the sample image data processed by the mask into an image feature extraction network to obtain image features comprising a plurality of partitions.

The masked sample image data includes one or more masked partitions and one or more unmasked partitions. Such sample image data is input to an image feature extraction network, and image features including a plurality of partitions are obtained, whereby the respective features of the respective partitions in the sample image data can be represented. Specifically, the image feature extraction network may perform feature extraction only on the non-occluded partition that is not subjected to mask processing, where the extracted features only include image features of the partition that is not subjected to mask processing; the whole image feature extraction can also be carried out on the sample image data after mask processing, and the extracted image features comprise the subareas after mask processing and the subareas without mask processing.

Step S45: and predicting based on the image characteristics comprising a plurality of partitions to obtain a current prediction result of the sample image data.

The prediction is performed based on the image characteristics after mask processing, and the prediction result aiming at the sample image data is obtained based on the image characteristics of one or more partitions which are not subjected to mask processing and are not covered.

Step S46: and adjusting parameters of the image feature extraction network according to the current prediction result and the true value annotation of the sample image data.

From the above, according to the training method of the image feature extraction network provided by the present disclosure, the image features extracted from the sample image data after mask processing are used to obtain the prediction result, without predicting based on the whole sample image data, then comparing with the true value label of the sample image data, and then adjusting the parameters of the image feature extraction network, so as to train the image feature extraction network continuously, thereby realizing training of the image feature extraction network, increasing the domain discrimination capability of the model, increasing the model generalization, and reducing the overfitting problem caused by information redundancy.

In one possible implementation manner, as shown in fig. 5, step S43 performs masking processing on at least one partition in the sample image data including multiple partitions, to obtain masked sample image data, where the masking processing includes:

step S51: generating mask parameters according to preset mask rules.

The preset mask rule is a predetermined processing rule for performing mask processing on the sample image data, and may be a selection rule for selecting one or more partitions from the sample image data including a plurality of partitions and performing mask processing on the selected partition. The mask parameters then represent the selected regions of the sample image data comprising the plurality of regions that require masking.

In one example, the preset mask rule may be to randomly select a certain proportion of the regions of the sample image data including the plurality of regions to perform the masking process, and the mask parameter indicates the regions to be subjected to the masking process. For example, if the preset mask rule randomly selects 30% or 50% of the partitions of the sample image data to perform mask processing, then according to the total number of the partitions included in the image features, a corresponding number of partitions needing to perform mask processing are randomly selected according to a proportion of 30% or 50%, that is, mask parameters are determined.

In one example, the total number of the partitions included in the sample image data is 16, the preset masking rule is to randomly select 25% of the partitions of the image features for masking, and the partitions 1, 4, 6 and 13 are randomly selected as the partitions needing masking, that is, the masking parameters represent the partitions 1, 4, 6 and 13.

Step S52: and performing mask processing on at least one partition in the sample image data comprising a plurality of partitions by using the currently generated mask parameters to obtain sample image data after mask processing.

And performing mask processing on one or more partitions in the sample image data comprising the partitions by using the currently generated mask parameters to obtain sample image data after mask processing.

For example, if the mask rule is preset to select 50% of the partitions of the sample image data for masking, and the total number of the partitions included in the sample image data is 100, after the partitions included in the sample image data are ordered by 1-100, 50 partitions are randomly selected, and the mask parameter represents the 50 selected partitions. Masking the selected 50 partitions. The rest 50 areas are not subjected to masking treatment, so that sample image data after masking treatment is obtained.

In one example, after generating the mask parameters, identification information may be added to the partition to be masked based on the mask parameters, indicating that the partition needs to be masked. And then, carrying out mask processing on the corresponding subareas with the identification information added in the sample image data comprising the plurality of subareas according to the identification information, so as to obtain sample image data after mask processing. In one example, the identification information herein may be index information.

After training the feature extraction network with the current mask, new mask parameters may also be used to obtain new mask processed sample image data based on the current sample image data, so as to train the feature extraction network again, and in one possible implementation, the method further includes: generating new mask parameters according to preset mask rules, and training a feature extraction network by utilizing the currently generated mask parameters until a preset second ending condition is met, wherein the currently generated mask parameters are different from the previously generated mask parameters.

In one example, after adjusting the parameters of the image feature extraction network according to the current prediction result and the true value labeling of the sample image data in step S46, the method further includes: generating new mask parameters according to preset mask rules, wherein the currently generated mask parameters are different from the previously generated mask parameters. The execution returns to the execution step S52 to continue execution until the preset second ending condition is satisfied.

After the parameters of the image feature extraction network are adjusted according to the current prediction result and the true value label of the sample image data, the sample image data are further processed by the image feature extraction network after parameter adjustment, so that training of the image feature extraction network after parameter adjustment is continued. It will be appreciated that repeated training of the parameter-adjusted image feature extraction network with the same sample image data does not result in further training effort. However, in the training process described above, only the partial region including the unmasked processing of the sample image data is used, and the sample image data is predicted, so that the image feature extraction network is trained.

Therefore, after the parameters of the image feature extraction network are adjusted according to the current prediction result and the true value label of the sample image data, new mask parameters are generated according to preset mask rules, and mask processing is performed again on the sample image data comprising a plurality of partitions based on the new mask parameters, specifically, different mask parameters represent different partitions of the sample image data which need to be masked. For example, the preset mask rule is to select 50% of the partitions of the sample image data for mask processing, the total number of the partitions included in the sample image data is 100, the mask parameters generated before are selected for mask processing after the 1-100 sorting of the partitions included in the sample image data, and the new mask parameters may be selected for mask processing for the 11-60 partition.

And returning to the executing step S52, performing mask processing on at least one partition in the sample image data comprising the plurality of partitions by using the currently generated mask parameters to obtain sample image data after mask processing, and continuing to execute until a preset second ending condition is met.

And carrying out mask processing on one or more partitions in the sample image data comprising a plurality of partitions by utilizing the currently generated new mask parameters, and randomly selecting the partitions which meet the requirement and are different from the previously selected partitions in the sample image data according to the mask parameters to carry out the mask processing to obtain the sample image data after the mask processing. At this time, the unmasked partition included in the sample image data after the masking process is different from the unmasked partition included in the sample image data after the previous masking process, and the current prediction result of the sample image data is obtained by performing the prediction based on the unmasked partition. And repeatedly executing the process until the preset second ending condition is met, generating no new mask parameters, performing mask processing on the sample image data by using the new mask parameters, and then predicting. I.e. the training of the image feature extraction network with the sample image data is stopped.

The preset second ending condition is a preset training ending condition, and in one example, the preset second ending condition may be that the iteration number reaches a preset iteration number threshold; in one example, the preset first end condition may be that a non-repeating mask parameter cannot be generated.

From the above, according to the training method of the image feature extraction network provided by the present disclosure, at least one partition in sample image data including a plurality of partitions is subjected to mask processing by using continuously generated new mask parameters, sample image data after each mask processing has sample image data of different partitions for result prediction, based on the mask processing, the image feature extraction network can be repeatedly trained by using limited sample image data, the accuracy of the network training is improved while the training calculation amount is reduced, the perfection of the image feature extraction network training is also ensured, and the training efficiency is further improved.

In a possible implementation manner, as shown in fig. 6, the step S45 predicts based on the image features including the plurality of partitions to obtain a current prediction result of the sample image data, where the predicting includes:

step S61: extracting image features of each partition which is not subjected to mask processing from the image features of the plurality of partitions to obtain image features of each target partition;

Step S62: splicing the target partition image features into image features to be analyzed;

step S63: and outputting the image characteristics to be analyzed to a discrimination network for analysis to obtain a prediction result of the sample image data.

In one embodiment of the present disclosure, there is also provided an image processing method including:

extracting image features of an image to be processed by utilizing a pre-trained image feature extraction network; and determining a prediction result of the image to be processed based on the image characteristics of the image to be processed.

The image feature extraction network is obtained through training by the training method of the image feature extraction network.

From the above, according to the image processing method provided by the present disclosure, the image feature extraction network obtained by training the image feature extraction network in advance extracts the image features of the image to be processed, and then determines the prediction result of the image to be processed based on the image features of the image to be processed, so that the prediction result of the image to be processed can be obtained by using part of the image features of the image to be processed, without extracting and predicting all the image features included in the whole image of the image to be processed, thereby reducing the calculation amount of image processing and improving the efficiency of image processing.

Referring to fig. 7, the present disclosure further provides a schematic structural diagram of a training apparatus of the first image feature extraction network, including:

a first sample image obtaining module 701, configured to obtain sample image data and a true value label of the sample image data;

a second region dividing module 702, configured to perform region division on the sample image data to obtain sample image data including a plurality of partitions;

a first image feature obtaining module 703, configured to input the sample image data including the plurality of partitions into an image feature extraction network, to obtain image features including the plurality of partitions;

A second image feature obtaining module 704, configured to perform mask processing on at least one partition in the image features including multiple partitions, to obtain mask-processed image features;

the first image prediction module 705 is configured to predict based on the image features after mask processing, to obtain a current prediction result of the sample image data;

a first parameter adjustment module 706, configured to adjust parameters of the image feature extraction network according to the current prediction result and the true value label of the sample image data.

From the above, the training device of the image feature extraction network provided by the present disclosure obtains the prediction result based on the image features processed by the mask, without predicting all the image features based on the sample image data, then comparing with the true value label of the sample image data, and then adjusting the parameters of the image feature extraction network, so as to train the image feature extraction network continuously, thereby realizing training of the image feature extraction network, increasing the discrimination capability of the model to the domain, increasing the generalization of the model, and reducing the over-fitting problem caused by the information redundancy.

In one embodiment of the disclosure, the second image feature obtaining module 704 is specifically configured to:

Generating mask parameters according to preset mask rules;

and performing mask processing on at least one partition in the image features comprising the partitions by using the currently generated mask parameters to obtain the image features after mask processing.

In one embodiment of the present disclosure, the apparatus further comprises:

the first feature extraction network training module is used for generating new mask parameters according to preset mask rules, and training the feature extraction network by utilizing the currently generated mask parameters until a preset first ending condition is met, wherein the currently generated mask parameters are different from the previously generated mask parameters.

From the above, the training device of the image feature extraction network provided by the present disclosure performs mask processing on at least one partition in the image features including multiple partitions by using continuously generated new mask parameters, and the image features after each mask processing have image features of different partitions for result prediction.

In one embodiment of the present disclosure, the first image prediction module 705 is specifically configured to:

extracting image features of each partition which is not subjected to mask processing from the image features subjected to mask processing to obtain image features of each target partition;

splicing the target partition image features into image features to be analyzed;

and outputting the image characteristics to be analyzed to a discrimination network for analysis to obtain a prediction result of the sample image data.

From the above, the training device of the image feature extraction network provided by the present disclosure splices the image features of each partition which is not subjected to mask processing into the image features to be analyzed, and outputs the image features to the discrimination network for analysis, so as to obtain the prediction result of the sample image data, that is, the discrimination network only needs to predict the image features of each partition which is not subjected to mask processing, but does not need to predict all the image features of the sample image data, thereby effectively reducing the calculation amount of the discrimination network and further improving the efficiency of network training.

Referring to fig. 8, the present disclosure further provides a schematic structural diagram of a training apparatus of the first image feature extraction network, including:

a second sample image obtaining module 801, configured to obtain sample image data and a true value label of the sample image data;

A second region dividing module 802, configured to perform region division on the sample image data to obtain sample image data including a plurality of partitions;

an image data obtaining module 803, configured to perform mask processing on at least one partition in the sample image data including a plurality of partitions, to obtain sample image data after mask processing;

a third image feature obtaining module 804, configured to input the sample image data after the mask processing into an image feature extraction network, to obtain an image feature including a plurality of partitions;

a second image prediction module 805, configured to predict based on the image features including the plurality of partitions, to obtain a current prediction result of the sample image data;

a second parameter adjustment module 806, configured to adjust parameters of the image feature extraction network according to the current prediction result and the true value label of the sample image data.

From the above, the training device of the image feature extraction network provided by the present disclosure obtains a prediction result based on the image features extracted from the sample image data after mask processing, without predicting based on the whole sample image data, then comparing with the true value label of the sample image data, and then adjusting the parameters of the image feature extraction network, so as to train the image feature extraction network continuously, thereby realizing training of the image feature extraction network, increasing the discrimination capability of the model to the domain, increasing the generalization of the model, and reducing the overfitting problem caused by information redundancy.

In one embodiment of the present disclosure, the image data obtaining module 803 is specifically configured to:

generating mask parameters according to preset mask rules;

and performing mask processing on at least one partition in the sample image data comprising a plurality of partitions by using the currently generated mask parameters to obtain sample image data after mask processing.

In one embodiment of the present disclosure, the apparatus further comprises:

and the second feature extraction network training module is used for generating new mask parameters according to a preset mask rule, and training the feature extraction network by utilizing the currently generated mask parameters until a preset second ending condition is met, wherein the currently generated mask parameters are different from the previously generated mask parameters.

From the above, the training device of the image feature extraction network provided by the present disclosure performs mask processing on at least one partition in sample image data including a plurality of partitions by using continuously generated new mask parameters, and sample image data after each mask processing has sample image data of different partitions for result prediction, based on which the image feature extraction network can be repeatedly trained by using limited sample image data, thereby improving accuracy of network training while reducing training calculation amount, and further ensuring perfection of the image feature extraction network training, and further improving training efficiency.

In one embodiment of the present disclosure, the second image prediction module 805 is specifically configured to:

extracting image features of each partition which is not subjected to mask processing from the image features of the plurality of partitions to obtain image features of each target partition;

In one embodiment of the present disclosure, the present disclosure further provides an image processing apparatus including:

the image prediction result determining module is used for extracting image features of the image to be processed by utilizing a pre-trained image feature extracting network; and determining a prediction result of the image to be processed based on the image characteristics of the image to be processed.

From the above, it can be seen that, in the image processing apparatus provided by the present disclosure, the image feature extraction network obtained by training in advance by using the training method of the image feature extraction network provided by the present disclosure extracts the image feature of the image to be processed, and then determines the prediction result of the image to be processed based on the image feature of the image to be processed, so that the prediction result of the image to be processed can be obtained by using part of the image features of the image to be processed, without extracting and predicting all the image features included in the whole image of the image to be processed, thereby reducing the calculation amount of image processing and improving the efficiency of image processing.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 9 shows a schematic block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

Various components in device 900 are connected to I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, for example, a training method of the image feature extraction network. For example, in some embodiments, the training method of the image feature extraction network may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the training method of the image feature extraction network described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the training method of the image feature extraction network in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A training method of an image feature extraction network, comprising:

acquiring sample image data and true value labeling of the sample image data;

adjusting parameters of the image feature extraction network according to the current prediction result and the true value label of the sample image data;

and performing mask processing on at least one partition in the image features including the partitions to obtain mask processed image features, wherein the mask processing comprises the following steps:

generating mask parameters according to preset mask rules;

performing mask processing on at least one partition in the image features comprising a plurality of partitions by using the currently generated mask parameters to obtain mask processed image features;

generating new mask parameters according to preset mask rules, and training a feature extraction network by utilizing the currently generated mask parameters until a preset first ending condition is met, wherein the currently generated mask parameters are different from the previously generated mask parameters.

2. The method according to claim 1, wherein the predicting based on the image features after mask processing to obtain the prediction result of the sample image data includes:

3. A training method of an image feature extraction network, comprising:

acquiring sample image data and true value labeling of the sample image data;

the masking at least one partition in the sample image data including multiple partitions to obtain sample image data after masking, including:

Generating mask parameters according to preset mask rules;

performing mask processing on at least one partition in the sample image data comprising a plurality of partitions by using the currently generated mask parameters to obtain sample image data after mask processing;

generating new mask parameters according to preset mask rules, and training a feature extraction network by utilizing the currently generated mask parameters until a preset second ending condition is met, wherein the currently generated mask parameters are different from the previously generated mask parameters.

4. A method according to claim 3, wherein said predicting based on said image features comprising a plurality of partitions, results in a current prediction of said sample image data, comprises:

5. An image processing method, comprising:

extracting image features of an image to be processed by utilizing a pre-trained image feature extraction network; determining a prediction result of the image to be processed based on the image characteristics of the image to be processed, wherein the image characteristic extraction network is trained by the method of any one of claims 1-4.

6. A training device of an image feature extraction network, comprising:

the first parameter adjustment module is used for adjusting parameters of the image feature extraction network according to the current prediction result and the true value annotation of the sample image data;

the second image feature obtaining module is specifically configured to:

generating mask parameters according to preset mask rules;

the apparatus further comprises:

7. The apparatus of claim 6, wherein the first image prediction module is specifically configured to:

8. A training device of an image feature extraction network, comprising:

the second parameter adjustment module is used for adjusting parameters of the image feature extraction network according to the current prediction result and the true value annotation of the sample image data;

the image data obtaining module is specifically configured to:

generating mask parameters according to preset mask rules;

The apparatus further comprises:

9. The apparatus of claim 8, wherein the second image prediction module is specifically configured to:

10. An image processing apparatus comprising:

the image prediction result determining module is used for extracting image features of the image to be processed by utilizing a pre-trained image feature extracting network; determining a prediction result of the image to be processed based on the image characteristics of the image to be processed, wherein the image characteristic extraction network is obtained through training of the device according to any one of claims 6-9.

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.