CN113378982A

CN113378982A - Training method and system of image processing model

Info

Publication number: CN113378982A
Application number: CN202110752725.6A
Authority: CN
Inventors: 王莹桂; 王力; 张本宇
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-07-02
Filing date: 2021-07-02
Publication date: 2021-09-10

Abstract

The embodiment of the specification discloses a training method and a training system for an image processing model. Wherein, the method comprises the following steps: acquiring a first sample image and a label thereof; wherein the first sample image is from a private data set; acquiring a plurality of feature maps of a first sample image; screening a target characteristic diagram from a plurality of characteristic diagrams of the first sample image on the basis of energy coefficients in one-to-one correspondence with the plurality of characteristic diagrams; the energy coefficients are obtained by training the first model using a second sample image and its label, the second sample image being from a public data set; desensitizing the target characteristic diagram to obtain desensitized image data for representing the first sample image; inputting desensitization image data serving as input features into an image processing model to obtain a processing result; parameters of the image processing model are adjusted to reduce differences between the processing results and the labels.

Description

Training method and system of image processing model

Technical Field

The present disclosure relates to the field of information technology, and in particular, to a method and a system for training an image processing model.

Background

Currently, image recognition technology is widely applied to various fields. In the related art, image recognition technology has become one of the main means for authenticating the identity of a user. For example, a face image may be collected, and a user identity corresponding to the face may be identified by using an image recognition technology.

However, images used for identification often contain sensitive information about the person of the user. How to protect privacy of sensitive information of images is a problem which needs to be solved urgently at present.

Therefore, there is a need for a training method and system for an image processing model to better protect privacy of sensitive information of an image.

Disclosure of Invention

One aspect of embodiments of the present specification provides a method of training an image processing model. The method comprises the following steps: acquiring a first sample image and a label thereof; wherein the first sample image is from a private data set; acquiring a plurality of feature maps of a first sample image; screening a target characteristic map from a plurality of characteristic maps of the first sample image based on energy coefficients in one-to-one correspondence with the plurality of characteristic maps; the energy coefficients are obtained by training the first model using a second sample image and its label, the second sample image being from a public dataset; desensitizing the target characteristic diagram to obtain desensitized image data for representing the first sample image; inputting the desensitization image data as input features into an image processing model to obtain a processing result; adjusting parameters of the image processing model to reduce differences between processing results and the labels.

Another aspect of embodiments of the present specification provides training of an image processing model. The system comprises: the first acquisition module is used for acquiring a first sample image and a label thereof; wherein the first sample image is from a private data set; the second acquisition module is used for acquiring a plurality of characteristic maps of the first sample image; the characteristic diagram screening module is used for screening a target characteristic diagram from a plurality of characteristic diagrams of the first sample image on the basis of energy coefficients in one-to-one correspondence with the plurality of characteristic diagrams; the energy coefficients are obtained by training the first model using a second sample image and its label, the second sample image being from a public dataset; a desensitization processing module, configured to perform desensitization processing on the target feature map, to obtain desensitization image data used for characterizing the first sample image; the input module is used for inputting the desensitization image data into an image processing model as input characteristics to obtain a processing result; and the parameter adjusting module is used for adjusting the parameters of the image processing model so as to reduce the difference between the processing result and the label.

Another aspect of embodiments of the present specification provides an image processing method including: acquiring a plurality of characteristic maps of an image to be processed; screening a target characteristic diagram from the plurality of characteristic diagrams on the basis of energy coefficients in one-to-one correspondence with the plurality of characteristic diagrams, wherein the energy coefficients are obtained through model training; and carrying out desensitization treatment on the target characteristic diagram to obtain desensitization image data for representing the image to be treated.

Another aspect of an embodiment of the present specification provides an apparatus for training an image processing model, comprising at least one storage medium and at least one processor, the at least one storage medium storing computer instructions; the at least one processor is configured to execute the computer instructions to implement the method of training an image processing model as described above.

Drawings

The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and like reference numerals refer to like structures throughout these embodiments.

FIG. 1 is a schematic diagram of an exemplary application scenario of an image processing system in accordance with some embodiments of the present description;

FIG. 2 is an exemplary flow diagram of a method of training an image processing model according to some embodiments of the present description;

FIG. 3 is an exemplary flow diagram of a method of obtaining an energy coefficient, according to some embodiments described herein;

FIG. 4 is an exemplary block diagram of a first model shown in accordance with some embodiments of the present description;

FIG. 5 is an exemplary flow diagram of an image processing method according to some embodiments of the present description;

FIG. 6 is an exemplary block diagram of a training system for an image processing model according to some embodiments of the present description;

fig. 7 is an exemplary diagram of a partial discrete cosine transform in accordance with some embodiments of the present description.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.

It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.

As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.

With the development of computer technology, image recognition is more and more deep into various fields of people's lives. For example, face recognition has been widely used in many scenarios, such as intelligent unlocking for face recognition, terminal application login, face-brushing payment, and the like. However, the face image may contain some sensitive information (e.g. visual face portrait) or personal privacy information, and in order to avoid unnecessary loss due to leakage of the information, it is necessary to protect the privacy information in the image.

The face image is an important basis for identity recognition, the face recognition process comprises the steps of extracting features of the face image, inputting the extracted face feature data into a face recognition model for processing, and performing identity verification based on a processing result (for example, face feature data) of the face recognition model. However, there is a possibility that the face feature data is maliciously stolen. Lawbreakers may restore original face image data by means of reverse pushing and the like based on stolen face feature data, and further illegally obtain operation rights of people corresponding to the face data based on the original face image data, such as unlocking, embezzlement and the like, which poses great threats to safety payment, unlocking and the like. Therefore, an image feature extraction method capable of effectively increasing the difficulty of reverse estimation is needed to process the face image to obtain desensitized face image data, so that even if the face feature data is stolen, the original face image is difficult to obtain.

In some embodiments, a type of image feature extraction method with desensitization effect includes transforming an original image to obtain a plurality of feature maps, discarding feature maps which carry a large amount of visual information but have small contribution to recognition accuracy, and discarding the remaining feature maps which can be used for further processing of a subsequent face recognition model to perform face recognition, wherein the carried visual information is few, and it is difficult to reversely deduce the original face image based on the visual information. However, the basis for screening the feature map (such as the energy coefficient mentioned later) is obtained by model training.

Therefore, no matter in the feature extraction process or the face recognition process, the machine learning model is inevitably used. In some embodiments, the training samples used for model training may also be derived from a private data set, and how to protect the privacy of the training samples while ensuring the accuracy of model training becomes a significant issue.

Therefore, some embodiments of the present disclosure provide a training method and system for an image processing model, which perform model training based on different images from a public data set and a private data set, and can effectively protect privacy of the images during a training phase of the model.

Although the present specification mainly uses a face image as an example for description, it should be understood that the technical solutions disclosed in the present specification can be applied to any type of image data requiring privacy protection, for example, fingerprint image data, and the like. The technical solution disclosed in the present specification is explained by the description of the drawings below.

FIG. 1 is a schematic diagram of an exemplary application scenario of an image processing system in accordance with some embodiments of the present description.

As shown in fig. 1, a server 110, a network 120, a terminal device 130, and a storage device 140 may be included in an application scenario.

The image processing system can be widely applied to various image recognition scenes, such as face unlocking, face payment, face terminal application login and the like. In some embodiments, the method can also be applied to any other scenes needing image privacy protection, such as transmission, storage and the like of sensitive image data. The image processing system may enable acquisition of desensitized image data, image recognition based on desensitized image data, and training of related machine learning models. The desensitization processing is carried out on the image by implementing the method disclosed by the specification through the image processing system, so that an attacker can be effectively prevented from reversely deducing the original image data, and the privacy information in the image can be effectively protected from being leaked. By implementing the method disclosed by the specification through the image processing system for model training, the image processing model adaptive to desensitized image data can be obtained, and the privacy and the safety of a training sample are effectively protected.

In some embodiments, the terminal device 130 may acquire an image to be processed (e.g., a face image) through an image acquisition device (e.g., a camera), and the terminal device 130 may perform desensitization processing on the acquired image to be processed by implementing the image processing method provided in this specification to obtain desensitization image data, and then transmit the desensitization image data to the server 110 through the network 120. Server 110 may be used to process information and/or data related to data service requests and/or image processing, image recognition. For example, the server 110 may receive desensitization image data sent by the terminal device 130 in response to a data service request from the terminal device 130, and after completing desensitization image data identification (for example, when determining that the desensitization image data is from a legal face image), feed back an identification result to the terminal device 130 or provide a corresponding data service to the terminal device 130. In some embodiments, server 110 may process it through a pre-trained image processing model and derive the prediction vector. After obtaining the prediction vector, the server 110 may further perform subsequent operations, such as comparing with the feature vector of the to-be-processed image that has been successfully registered and stored in the system and feeding back the comparison result (e.g., the identification result) to the terminal device 130, thereby completing face payment, unlocking, and the like. In some embodiments, the server 110 is also used to train the image processing model.

In some embodiments, the server 110 may be local or remote. For example, the server 110 may connect the terminal device 130 locally at the terminal device 130 to obtain the information and/or data it transmits. As another example, server 110 may remotely receive information and/or data transmitted by terminal device 130 via network 120. In some embodiments, the server 110 may be implemented on a cloud platform. By way of example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an internal cloud, a multi-tiered cloud, and the like, or any combination thereof. In some embodiments, the server 110 includes a processing device 112.

Network 120 may facilitate the exchange of information and/or data. In some embodiments, one or more components of the application scenario 100 (e.g., server 110, terminal device 130, storage device 140) may communicate information to other components in the application scenario 100 over the network 120. For example, the terminal device 130 may transmit image desensitization data of the image to be processed to the server 110 through the network 120. For another example, the server 110 may send the prediction vector obtained by processing the image desensitization data of the image to be processed to the storage device for storage, and return the result of comparing the prediction vector with the feature vector to the terminal device 130, and the like. In some embodiments, the network 120 may be any form of wired or wireless network, or any combination thereof. By way of example only, network 120 may be one or more combinations of a wireline network, a fiber optic network, a telecommunications network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a Public Switched Telephone Network (PSTN), a bluetooth network, and so forth.

Terminal device 130 may be used to process information and/or data associated with image processing, image recognition to perform one or more of the functions disclosed in this specification. In some embodiments, the terminal device 130 may be a common device that provides image acquisition and/or data processing services to the public, such as an internet of things device (IoT device) 130-1. Exemplary IoT devices 130-1 may include, but are not limited to, face vending machines, face payment devices, banking personal business devices, and the like, or any combination thereof. After the user completes face recognition on the terminal device 130, the data service provided by the device can be used. In some embodiments, the terminal device 130 may be configured to acquire the image data to be processed acquired as a result of the device image acquisition means being triggered. In some embodiments, the terminal device 130 may obtain a plurality of feature maps corresponding to the image data to be processed; the terminal device 130 may screen one or more target feature maps from the plurality of feature maps based on the energy coefficients corresponding to the plurality of feature maps one to one, perform desensitization processing on the target feature maps, and obtain desensitization image data used for characterizing the to-be-processed image. In some embodiments, the terminal device 130 may have a trusted execution environment deployed thereon, and perform image acquisition and image processing in the trusted execution environment. In some embodiments, end device 130 may include one or more processing engines (e.g., single core processing engines or multi-core processors). By way of example only, the processing engine may include one or more combinations of central processing units (cpus), Application Specific Integrated Circuits (ASICs), application specific instruction set processors (ASIPs), image processors (GPUs), physical arithmetic processing units (PPUs), Digital Signal Processors (DSPs), Field Programmable Gate Arrays (FPGAs), Programmable Logic Devices (PLDs), controllers, microcontroller units, Reduced Instruction Set Computers (RISCs), microprocessors, and the like.

In some embodiments, the terminal device 130 may also be a portable device with data acquisition, storage, and/or transmission capabilities, such as a tablet 130-2, a laptop 130-3, a smartphone 130-4, a camera, a smart payment terminal, and the like, or any combination thereof. In some embodiments, the terminal device 130 may perform data interaction with the server 110 through a network, for example, the terminal device 130 may transmit processed desensitized image data of the image data to be processed to the server 110. In some embodiments, the data acquired by the terminal device 130 may be face image data acquired by a camera of the device, and correspondingly, the server 110 may receive the face image data from the terminal device 130, perform desensitization processing and subsequent identification on the face image data. At this time, the server 110 may be integrated with the terminal device 130.

The storage device 140 may store data and/or instructions related to image processing, such as feature vectors, image data, identity information, etc. of images of users who have successfully registered with the system. In some embodiments, storage device 140 may store data obtained/obtained by terminal device 130 and/or server 110. In some embodiments, storage device 140 may store data and/or instructions for execution or use by server 110 to perform the exemplary methods described in this application. In some embodiments, storage device 140 may include mass storage, removable storage, volatile read-write memory, read-only memory (ROM), and the like, or any combination thereof. Exemplary mass storage may include magnetic disks, optical disks, solid state disks, and the like. Exemplary removable memory may include flash drives, floppy disks, optical disks, memory cards, compact disks, magnetic tape, and the like. Exemplary volatile read-only memory can include Random Access Memory (RAM). Exemplary RAM may include Dynamic RAM (DRAM), double-data-rate synchronous dynamic RAM (DDRSDRAM), Static RAM (SRAM), thyristor RAM (T-RAM), zero-capacitance RAM (Z-RAM), and the like. Exemplary ROMs may include Mask ROM (MROM), Programmable ROM (PROM), erasable programmable ROM (PEROM), Electrically Erasable Programmable ROM (EEPROM), compact disk ROM (CD-ROM), digital versatile disk ROM, and the like. In some embodiments, the storage device 140 may be implemented on a cloud platform. By way of example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an internal cloud, a multi-tiered cloud, and the like, or any combination thereof.

In some embodiments, a storage device 140 may be connected to the network 120 to communicate with one or more components (e.g., server 110, terminal device 130) in the application scenario 100. One or more components in the application scenario 100 may access data or instructions stored in the storage device 140 through the network 120. In some embodiments, the storage device 140 may be directly connected or in communication with one or more components (e.g., the server 110, the terminal device 130, etc.) in the application scenario 100. In some embodiments, the storage device 140 may be part of the server 110.

FIG. 2 is an exemplary flow diagram of a method of training an image processing model according to some embodiments shown in the present description. In some embodiments, the flow 200 may be performed by a processing device (e.g., the terminal device 130 or the server 110 or a device other than the application scenario 100). For example, the process 200 may be stored in a storage device (e.g., an onboard storage unit of a processing device or an external storage device) in the form of a program or instructions that, when executed, may implement the process 200. The flow 200 may include the following operations.

Step 202, a first sample image and its label are obtained. In some embodiments, step 202 may be performed by the first obtaining module 610.

The first sample image may refer to an image that can be used to train an image processing model.

In some embodiments, the first sample image may comprise a face image. The first sample image is from a private data set. The private data set is a data set which is not open to the public and can be viewed only by the owner or a part of persons having authority.

The tag may be a vector representation characterizing the first sample image or identity information corresponding to a face in the image, e.g., a person identity ID, etc.

In some embodiments, the label may be obtained by manual labeling or other labeling, which is not limited in this embodiment.

In some embodiments, the processing device may obtain the first sample image and the corresponding tag by reading from a database, calling a related data interface, and the like.

And step 204, acquiring a plurality of characteristic maps corresponding to the first sample image. In some embodiments, step 204 may be performed by the second obtaining module 620.

The feature map is a plurality of sub-maps extracted from the first sample image by adopting a certain image processing means, and each sub-map carries partial features of the first sample image. The subgraph may be the same size as the first sample image, such as one-to-one correspondence of pixel points, or may be different from the first sample image.

In some embodiments, the processing device may obtain a plurality of feature maps corresponding to the first sample image by, for example, using discrete cosine transform, wavelet transform, orthogonal basis transform, or the like, or may extract a feature map from an image file of the first sample image, for example, for image data in jpeg format, the feature map may be directly extracted from the image file.

For example, the processing device may obtain a plurality of feature maps corresponding to the first sample image through discrete cosine transform by using the method described in the following embodiments.

The processing device may perform a local discrete cosine transform on the first sample image to obtain a plurality of transform results. The discrete cosine transform may convert the first sample image from the spatial domain to the frequency domain. The number of feature points after spatial domain and frequency domain conversion may be the same, for example, one point in the spatial domain may represent one pixel position and one point in the frequency domain may represent one frequency position. In some embodiments, an image block smaller than the first sample image may be selected by local discrete cosine transform, for example, the size of the first sample image is 256 × 256, the selected image block size is 8 × 8, the image block is used to move samples on the first sample image by a certain step size, and local data (i.e., the size of 8 × 8) of the first sample image obtained by sampling each time of the image block is transformed based on a formula of the discrete cosine transform to obtain a plurality of transform results, where each transform result may be 8 × 8. The smaller the moving step length of the image block is in discrete cosine transform, the more transform result features are obtained, and the accuracy of subsequent image identification can be improved.

The processing device may combine values at the same frequency position in each transformation result to obtain one feature map, and further obtain a plurality of feature maps corresponding to different frequency positions in the transformation result. In the transform result, the position of each feature point (element) corresponds to one frequency position. It is easy to understand that the number of the feature maps is consistent with the number of the pixel points of the image blocks used for sampling in the conversion process. The combination method may be to extract a part of values from the plurality of transformation results respectively according to a certain rule and to recombine the values. For example, values at the same frequency position in each transform result may be combined to obtain one feature map, and a plurality of feature maps corresponding to different frequency positions in the transform result may be obtained. The number of the obtained feature maps can be consistent with the number of the pixel points of the image blocks used in the conversion process.

Taking fig. 7 as an example, the first sample image 710 with a size of 4 × 4 is sampled by image blocks with a size of 2, 2 × 2, and a local discrete cosine transform is implemented. It is easy to know that 4 transform results can be obtained by local discrete cosine transform, which are respectively represented by s1, s2, s3 and s4, and the values of the respective frequency positions in each transform result are respectively represented by fi1, fi2, fi3 and fi4, wherein i represents the ith transform result. For example, fi1 represents the value of the first frequency location in the ith transform result. Each transformation result has 4 corresponding frequency positions, and values of the same frequency positions in the transformation results are put together to obtain a plurality of feature maps, for example, f11 of s1, f21 of s2, f31 of s3 and f41 of s4 are put together to form a feature map z 1. By analogy, a feature map z2, a feature map z3 and a feature map z4 are obtained. In some embodiments, when portions of the same frequency are read to recompose the transform results, the reads may be performed in a "zig-zag" order.

Through the above processing procedure, different feature maps represent different frequency components in the image to be processed. In some embodiments, the processing device may use all feature maps obtained by the transformation as the plurality of feature maps corresponding to the first sample image. In some embodiments, the processing device may filter the plurality of feature maps obtained by the reorganization, and retain a part of the feature maps to obtain a plurality of feature maps corresponding to the first sample image. For example, the feature maps are screened according to the abundance of feature information included in the feature maps, the feature maps including a large amount of feature information are retained, and the feature maps including a small amount of feature information are discarded. For example, the processing device may discard the portions of the plurality of feature maps at the different frequency locations to obtain the plurality of feature maps based on the SEnet network or based on a preset selection rule.

In some embodiments, the processing device may input a plurality of transformation results into a trained SEnet network, and the SEnet network gives the importance (e.g., a score that is positively correlated with the importance) of each feature map. The SEnet network may be trained along with an image processing model (e.g., an image recognition model), for example, by adding the SEnet network to the image processing model, and adjusting parameters of the SEnet network during the training of the model to obtain the SEnet network for determining the importance of the feature map.

In some embodiments, the preset selection rule may be to select a part of the feature map containing more feature information, which retains a preset ratio. For example, among a plurality of feature maps obtained by discrete cosine transform and recombination, a low-frequency feature map with a preset ratio may be selected and retained, and a part of high-frequency feature maps may be discarded. For example, the low-frequency feature map may be retained in a proportion of 50%, 60%, 70%, and the high-frequency feature map may be discarded in the rest.

And step 206, screening a target characteristic map from the plurality of characteristic maps of the first sample image based on the energy coefficients in one-to-one correspondence with the plurality of characteristic maps. In some embodiments, step 206 may be performed by feature map screening module 630.

An energy coefficient refers to a parameter that can be used to characterize how much its corresponding feature map contributes to subsequent image processing or image recognition. The one-to-one correspondence between the energy coefficient and the plurality of feature maps means that each feature map has one energy coefficient corresponding thereto. That is, the energy coefficients are associated with frequency components of the feature map or first sample image. As an example, the smaller the energy coefficient, the less energy of the corresponding feature map is required to participate in the subsequent image processing or image recognition, and conversely, the more energy of the corresponding feature map is required to participate in the subsequent image processing or image recognition.

In some embodiments, the energy coefficients and the plurality of profiles may correspond by frequency location. For example, there are four signatures z1-z4, which correspond to the first frequency location, the second frequency location, …, and the fourth frequency location, respectively. There are four energy coefficients a1-a4, a1 represents the energy coefficient corresponding to the first frequency location, a2 represents the energy coefficient corresponding to the second frequency location or signature, and so on. Therefore, z1 corresponds to a1, z2 corresponds to a2, z3 corresponds to a3, and z4 corresponds to a 4.

In some embodiments, the energy coefficients may be obtained by training the first model using a second sample image and its label, the second sample image being from a public dataset. Public data sets may refer to data sets that are open to the public and that all personnel can view.

In some embodiments, the public data sets may be processed or integrated from various types of data sets that are open-source on the internet. For example, the data downloaded from different public platforms are sorted, including filtered, categorized, aggregated, etc., to obtain a public data set.

In some embodiments, the private data set is co-distributed with the public data set. The homography means that the types of images in the data sets are the same and the object information is similar. The category refers to the type of image, for example, a face image, a fingerprint image, and the like. The object information includes a color, a posture, an angle, a background, and the like of the object. For example, if two data sets are distributed identically, the related images are both face images, the skin colors of the faces in the images are similar (such as the skin colors of all black people or the skin colors of all yellow people), the angles of the faces in the images are the same (such as all head-on photos), and the backgrounds are all pure colors.

For more details on obtaining the energy coefficient, refer to fig. 3 and the related description thereof, which are not repeated herein.

In some embodiments, the processing device may screen one or more target feature maps from the plurality of feature maps of the first sample image according to the magnitude of the energy coefficient corresponding to the feature map. The target characteristic diagram refers to a characteristic diagram which is screened from a plurality of characteristic diagrams of the first sample image and meets preset requirements. The preset requirement may include that the energy coefficient corresponding to the feature map is greater than a threshold.

In some embodiments, the processing device may determine whether the energy coefficient is less than a threshold. For example, the energy factor may be compared to a threshold to determine whether the energy factor is less than the threshold. The threshold value may be preset, for example, 0.3, 0.5, 1, 2, 10, 30, etc. The threshold may be set in relation to the overall size of the energy factor. The value of the energy coefficient may be any value, such as (0-1), (-100-. Illustratively, if the energy coefficient takes a value of 0 to 1, the threshold may be 0.3. The values of the energy coefficient are not limited in the examples of the present specification.

In some embodiments, in order to more intuitively reflect the size of the energy coefficient, the value of the energy coefficient may be set to be between 0 and 1. For example, the size of the energy coefficient may be constrained between 0-1 by a sigmoid function. For example, the sigmoid function for the constraint may be as shown in equation (1).

Wherein, a_iAnd the energy coefficient after the constraint size is represented, x represents the energy coefficient which is of any value before the constraint, and i represents the ith energy coefficient.

If the energy coefficient is smaller than the threshold, the processing device may discard the feature map corresponding to the energy coefficient. The rejection may refer to that the feature map corresponding to the energy coefficient does not participate in the subsequent processing. E.g., not participating in subsequent desensitization treatments, etc. The discarding mode may be to directly discard the feature map corresponding to the energy coefficient, or may be to set the element value of the feature map corresponding to the energy coefficient to zero.

The feature map left after discarding part of the feature map, or the feature map with the element value set to zero and the feature maps with other unmodified element values may be used as the target feature map.

And 208, carrying out desensitization processing on the target characteristic diagram to obtain desensitization image data for representing the first sample image. In some embodiments, step 208 may be performed by desensitization processing module 640.

Desensitization is to process the target characteristic map in a manner such that sensitive information therein is deformed. For example, sensitive information is removed from the target feature map, changed to other information, and so forth.

Illustratively, the processing device may perform desensitization processing on the target feature map in a manner illustrated by the embodiments below.

In some embodiments, the processing device may perform a loss processing on the target feature map. The loss processing may include changing the values of elements in one or more target feature maps.

The loss processing refers to discarding the screened target feature map or changing the element value in the target feature map, and the processing may lose a part of useful information, but can further enhance the privacy protection of the data. The discarded target feature map will not participate in the subsequent processing (e.g., subsequent fusion processing, sequential randomization, etc.), and the element values of the changed target feature map will be different from the values of the original target feature map obtained after screening, so that it is more difficult to reversely deduce the original image data according to the feature map with changed element values.

In some embodiments, the loss processing on the target feature map may be to further discard a part of the feature map in the target feature map and/or to change the element values in one or more feature maps in the filtered feature map. For example, the processing device may filter again from the target feature maps, and select one or more target feature maps for the loss processing. If only one feature map is selected, the target feature map can be directly discarded, or all or part of element values in the target feature map can be changed; when a plurality of (two or more) target feature maps are selected, all of the selected target feature maps may be discarded or changed as they are, or one of the selected plurality of target feature maps may be discarded and the element value of the other of the plurality of target feature maps may be changed. For example, when two target feature maps are selected, one of the target feature maps may be discarded, and the element values of the other target feature map may be changed. Changing the element values of the feature map may be to replace the element values in the feature map with other values, and the changed element values may be arbitrary. In some embodiments, the values of the elements in the target feature map may all be replaced with the same number (e.g., may all be replaced with 1, 2, or 3, etc.), or the values of the elements in the target feature map may be scaled down (e.g., 1.5 times, 2 times, etc.). The influence on the identification of the subsequently used desensitized image data is avoided or reduced, so that the privacy protection capability of the face image can be improved, and the accuracy of face identification is not greatly influenced.

The processing device may perform fusion processing on the target feature map subjected to the loss processing to obtain one or more fusion feature maps, the number of which is less than that of the target feature map subjected to the loss processing.

The fusion processing may refer to performing an operation on two or more target feature maps of the plurality of target feature maps in a preset calculation manner. For example, values of corresponding element points in two or more target feature maps may be calculated, and the calculated values may be used as values of corresponding element points in a fused feature map, so that two or more target feature maps may be fused into one feature map. The preset calculation mode can be mean value, sum value, difference value and the like.

In some embodiments, the processing device may combine a plurality of target feature maps in a manner that two or more target feature maps are in a group to obtain one or more combined results; and for each combination result, calculating the target characteristic diagram according to a preset calculation mode to obtain a fusion characteristic diagram.

Combining refers to dividing two or more target feature maps into a group. The combination may be to combine two adjacent target feature maps, or to combine the target feature map with the lowest frequency and the target feature map with the highest frequency, or to combine any two or more target feature maps. When combining the target feature maps, the combination rules may be the same for the feature maps of different first sample images. For example, the rule may be that, starting from the first target feature map, the current target feature map is combined with the next target feature map adjacent to the current target feature map, that is, the first target feature map is combined with the second target feature map, and the second target feature map is combined with the third target feature map. For another example, 1 to 3 target feature maps are combined, 4 to 6 target feature maps are combined, and so on.

The values of the respective element points in the fused feature map have changed from those of the target feature map before the fusion. The fusion processing can destroy the relative relationship of the values of each element point among the original multiple target characteristic graphs, thereby further increasing the difficulty of reversely deducing the original image data according to the characteristic graphs. In some embodiments, the desensitization processing mode may further include performing normalization processing on the target feature map and randomizing the order of the target feature map to further increase the difficulty of the backward estimation.

The characteristic diagram obtained after desensitization processing is carried out on the target characteristic diagram can be used as desensitization image data for representing the first sample image.

And step 210, inputting the desensitization image data as an input feature into an image processing model to obtain a processing result. In some embodiments, step 210 may be performed by input module 650.

The processing result may be a result obtained by processing the target feature map by the image processing model. Which differs depending on the prediction task of the image processing model. For example, it may be a vector representation representing the first sample image or referred to as a prediction vector, or a classification result of the first sample image, or a recognition result of the identity of the target object in the first sample image, such as identity information.

The image processing model may be a neural network model, a SVM, a tree model, or the like.

Step 212, adjusting parameters of the image processing model to reduce differences between processing results and the labels. In some embodiments, step 212 may be performed by parameter adjustment module 660.

In some embodiments, the processing device may adjust parameters of the image processing model by minimizing the loss function value, so as to reduce a difference between a processing result of the model and the tag, and improve accuracy of subsequent task processing of the model, for example, accuracy of recognition when identity information of a human face in an image is recognized.

In the embodiment of the description, the first model is trained by using the public data set, a target feature map which is important for a subsequent image processing task in an original image can be analyzed, a part of feature maps which contain more visual information but have little influence on the processing precision of the model is abandoned, desensitization is further performed on the target feature map based on desensitization processing, and the difficulty of reversely deducing the original image data from desensitization image data is greatly increased. When the desensitization image data is used for training the image processing model, only the desensitization image data of the first sample image can be provided for a model training party, and the data privacy of the image recognition model in the training stage is effectively protected.

FIG. 3 is an exemplary flow chart of a method of obtaining an energy coefficient according to some embodiments described herein. In some embodiments, flow 300 may be performed by a processing device. For example, the process 300 may be stored in a storage device (e.g., an onboard storage unit of a processing device or an external storage device) in the form of a program or instructions that, when executed, may implement the process 300. The flow 300 may include the following operations.

Step 302, a plurality of feature maps of the second sample image are obtained.

The second sample image refers to an image from the public dataset that can be used for model training. The second sample image may include a face image.

In some embodiments, the plurality of feature maps of the second sample image may be obtained in a variety of ways. For example, a local discrete cosine transform, a wavelet transform, an orthogonal basis transform, and direct extraction from an image file of a sample image, etc. are employed. For more details about obtaining the multiple feature maps of the second sample image, the obtaining manner may be the same as the obtaining manner of the multiple feature maps of the first sample image, and refer to the related description of step 204, which is not repeated herein.

In some embodiments, the processing device may obtain the plurality of feature maps of the second sample image by processing the second sample image, reading from a database, calling an associated data interface, and the like.

And step 304, inputting the multiple feature maps of the second sample image into the first model to obtain a sample prediction result.

The sample prediction result is a result obtained by processing the feature map of the second sample image by the first model, and is different according to the prediction task of the first model. For example, the classification result of the second sample image or the recognition result of the identity of the target object in the second sample image, such as identity information, may be used. In some embodiments, the prediction of the first model may be any of the same as the image processing model.

The first model may be a neural network model, a SVM, a tree model, or the like. The first parameter may include energy coefficients in one-to-one correspondence with the plurality of feature maps.

In some embodiments, the processing device may input the plurality of feature maps of the second sample image to the first model, and the first model processes the plurality of feature maps inside the model, for example, in the first model, an energy coefficient corresponding to each feature map may be multiplied by the input plurality of feature maps, and a result of the multiplication is output to other parts of the first model for further processing, so as to obtain a prediction result. For the first model, the feature maps are processed, and further details of the prediction result can be found in the related description of fig. 4.

At step 306, a first loss function value is determined.

In some embodiments, the processing device may construct a first loss function based on the prediction of the model and the label of the second sample image, and determine a first loss function value based on the constructed first loss function. The first loss function value may reflect a difference between the prediction result and a label of the second sample image.

The label may be a vector representation characterizing the second sample image or identity information corresponding to a face in the second sample image, e.g., a person's name, etc. In some embodiments, the label may be obtained by manual labeling or other labeling, which is not limited in this embodiment. In some embodiments, the label may be acquired at the same time that the processing device acquires the plurality of second specimen images.

Illustratively, the first loss function may be as shown in equation (2).

Wherein L represents a first loss function, p_jRepresenting the prediction of the model, y_jRepresents the label, j represents the jth training sample, and T is the total number of training samples.

It should be noted that the first loss function shown above is only for illustrative purposes, and in the present specification, any loss function that can reflect the difference between the prediction result of the model and the label of the second sample image may be used, for example, an Arcface loss function, the distance between the prediction vector and the vector representation of the second sample image, and the like.

In some embodiments, the processing device may determine the first loss function value by substituting the prediction of the model and the label into the first loss function.

A second loss function value is determined, step 308.

In some embodiments, the processing device may construct the second loss function based on a one-to-one correspondence between the energy coefficients and the plurality of feature maps. For example, the second loss function may be constructed based on information of the plurality of feature maps and energy coefficients corresponding to the plurality of feature maps. In some embodiments, the information of the feature map may be an energy value of the feature map, wherein the energy value of the feature map may reflect an overall size level of each pixel (or element) value in the feature map, and may specifically be characterized based on a calculated value of each element value in the feature map. Accordingly, the second loss function value may reflect a weighted sum of the energy values of the plurality of feature maps and the energy coefficients corresponding to the plurality of feature maps.

Illustratively, the second loss function may be as shown in equation (3).

Loss_pri＝a₁f₁+a₂f₂+…a_nf_n(i＝1，2，…，n) (3)

Therein, Loss_priRepresenting a second loss function, a_nA numerical value, f, representing an energy coefficient, not in the range of (0, 1)_nThe calculated value representing each element value in the feature map (or the energy value representing the feature map) corresponds to, for example, the sum of absolute values of each element value in the feature map, the mean of the sum of absolute values of each element value, the variance of absolute values of each element value, the maximum value of absolute values of each element value, and the like, and n is the number of feature maps.

The energy coefficient is to a certain extent understood to be a weighting coefficient of the energy values of the respective characteristic map.

And after each characteristic diagram is substituted into the corresponding position in the second loss function, the second loss function value can be calculated.

In step 310, parameters of the first model are adjusted to minimize the first loss function value and the second loss function value.

Minimizing the first loss function value can reduce the difference between the prediction result of the model and the label, and improve the identification precision of the model when identifying the identity information of the face in the image.

Minimizing the second loss function value may cause the model to focus more on feature maps with smaller energy values during the training process. Because the visualized information of the image is mainly concentrated in the low-frequency part and the middle-low frequency part, the feature map of the part is important for the visualization of the image, but the influence on the identification of the image is not large, and the energy coefficient corresponding to the feature map of the part can be made smaller by minimizing the second loss function value, the feature map corresponding to the part with the larger energy coefficient is more focused on by the model, the feature map of the part has less contribution to the visualization of the image and has larger contribution to the identification of the image, therefore, after the feature map corresponding to the smaller energy coefficient is discarded, the visualized information of the image can be effectively destroyed, the privacy protection of the image is promoted, and the identification of the image cannot be greatly influenced.

In some embodiments, a total loss function may be determined, the total loss function including the first loss function and the second loss function. During training, the parameters of the first model may be adjusted to minimize the total loss function value. Illustratively, the total loss function is the first loss function + the second loss function.

Step 312, the energy coefficient is obtained from the trained first model.

It can be understood that the energy coefficient obtained through training can reflect the contribution of the corresponding feature map to model prediction, and then the desensitization image data is obtained by screening a plurality of feature maps corresponding to the image to be processed based on the energy coefficient. In some embodiments, the processing device may extract the energy coefficients from parameters of the trained first model.

In some embodiments, the first model may or may not be related to the image processing model. Correlation may refer to retraining the trained first model as a subsequent image processing model (e.g., continuing training with the first sample image). Specifically, the structure of the first model may be modified (for example, the input channel of the first model is modified to be capable of adapting to the input channel of desensitized image data), so as to obtain an image processing model, and then the image processing model is iteratively updated by using the first sample image. Uncorrelated may mean that the first model and the subsequent image processing model are two completely independent models.

Because the feature graph of the sample image is required to be used as input data of the model for obtaining the energy coefficient, the difficulty of obtaining the original image by reverse-deducing the feature graph is low, and if the energy coefficient is obtained by performing model training by using a training sample of the privacy data set, a large privacy hidden danger exists. The energy coefficient is obtained by performing model training by using the public data set which is distributed in the same way as the private data set, so that the accuracy of the energy coefficient is ensured to a certain extent, and the private data set is prevented from being disclosed in the model training stage of obtaining the energy coefficient.

FIG. 4 is an exemplary block diagram of a first model shown in accordance with some embodiments of the present description. Model 400 may include an input layer 410 and a processing layer 420.

The input layer 410 includes an energy coefficient, and is configured to receive a plurality of feature maps corresponding to the image to be processed, and output a result of the multiplication after the plurality of feature maps are multiplied by the plurality of energy coefficients.

The input layer 410 may have a plurality of input channels, and the number of the plurality of input channels may be the same as the number of the plurality of feature maps of the image to be processed, one feature map for each channel. Naturally, each channel also corresponds to an energy coefficient or frequency location.

The energy coefficients of the input layer may be used to weight the feature maps input to the model such that the energy values input to the feature maps of the model are reassigned or filtered. Specifically, in the input layer, each element value of each feature map is multiplied by a corresponding energy coefficient to be weighted.

When the initial first model is built, the number of input channels can be adjusted, and the number of input channels is consistent with the number of feature maps of the second sample image. And the number of feature maps of the second sample image may be set in advance. For example, the input channels are adjusted to be 8, 24, or 64, etc. In some embodiments, the first model may be a neural network model, and the input layer may include a plurality of channels, each channel corresponding to a number of neurons, for example, the number of neurons in each channel is the same as the number of elements in the corresponding feature map, and the weight corresponding to each neuron in the same channel is the energy coefficient corresponding to the channel.

The processing layer 420 may be configured to process the result of the multiplication to obtain a prediction result of the model.

In some embodiments, the first model may be a deep neural network, such as a CNN, RNN, etc. network. The processing layer may include a convolution layer, a pooling layer, etc., and processes (e.g., convolves, pools, etc.) each feature map to obtain a more abstract feature vector representation, such as a prediction vector.

The processing layer may further include an MLP, a fully connected layer, and the like, to convert the feature vector into a specific prediction result, such as an identification result, a classification result, and the like of a target object corresponding to desensitized image data. For example, the processing layer may transform the feature vector of the desensitized image data into a predicted value, which may indicate the identity information of a person in the image, i.e., the identification result of the target object.

FIG. 5 is an exemplary flow diagram of an image processing method according to some embodiments of the present description. In some embodiments, flow 500 may be performed by a processing device. For example, the process 500 may be stored in a storage device (e.g., an onboard storage unit of a processing device or an external storage device) in the form of a program or instructions that, when executed, may implement the process 500. Flow 500 may include the following operations.

Step 502, a plurality of feature maps of the image to be processed are obtained.

The image to be processed means image data on which desensitization processing has not been performed. The image to be processed may include face image data. In some embodiments, the image to be processed may be original face image data, or may be image data obtained by performing face detection or face alignment on the original face image data. The raw face image data may refer to image data directly acquired by an image acquisition apparatus (e.g., a camera, etc.) without any processing. The face detection refers to detecting the position of the face in the image, and the image to be processed may be an image cut based on the position of the face in the image, for example, cutting off an unnecessary portion of the image except the face. The face alignment refers to correcting the angle of a face in an image, the face in an original face image may be inclined at a certain angle, and the face can be aligned on the image so as to facilitate subsequent recognition processing and the like of the image.

In some embodiments, the processing device may acquire the image to be processed through a camera of the device, or may acquire the image to be processed in a manner of reading from a database or a storage device, or calling a data interface, or the like.

In some embodiments, the processing device may obtain a plurality of feature maps of the to-be-processed image according to the method described in step 204, and for further description, reference may be made to the related description of step 204, which is not described herein again.

Step 504, a target feature map is screened out from the feature maps based on the energy coefficients corresponding to the feature maps in a one-to-one mode.

In some embodiments, the energy coefficients are obtained through model training. For example, the first model is trained as described in connection with fig. 2 and 3. For more details about step 504, reference may be made to the description related to fig. 2 and fig. 3, which are not described herein again.

And 506, performing desensitization processing on the target characteristic diagram to obtain desensitization image data for representing the image to be processed.

In some embodiments, the processing device may obtain desensitized image data of the target feature map according to the desensitization processing method described in embodiments of this specification, e.g., the correlation description in step 208. For more details, reference may be made to the related description of step 208, which is not repeated here.

It should be noted that the program/code for acquiring the to-be-processed image or the desensitization image data of the to-be-processed image may be run in a trusted execution environment deployed in the processing device, and it may be ensured that the image data acquired by the processing device is not stolen by using the security characteristics of a feasible execution environment, and the desensitization program of the to-be-processed image is not disturbed or changed by the outside. Desensitization image data of the image to be processed obtained by performing the process 500 may be sent to the trained image processing model to obtain a final processing result.

It should be noted that the above description of the respective flows is only for illustration and description, and does not limit the applicable scope of the present specification. Various modifications and alterations to the flow may occur to those skilled in the art, given the benefit of this description. However, such modifications and variations are intended to be within the scope of the present description. For example, changes to the flow steps described herein, such as the addition of pre-processing steps and storage steps, may be made.

FIG. 6 is an exemplary block diagram of a training system for an image processing model in accordance with some embodiments of the present description. As shown in fig. 6, the system 600 may include a first acquisition module 610, a second acquisition module 620, a profile screening module 630, a desensitization processing module 640, an input module 650, and a parameter adjustment module 660.

The first acquiring module 610 may be used to acquire a first sample image and its label.

The first sample image may refer to an image that can be used to train an image processing model. The first sample image is from a private data set.

In some embodiments, the first obtaining module 610 may obtain the first sample image and the corresponding tag thereof by reading from a database, calling a related data interface, and the like.

The second obtaining module 620 may be used to obtain a plurality of feature maps of the first sample image.

In some embodiments, the second obtaining module 620 may obtain a plurality of feature maps corresponding to the first sample image by using discrete cosine transform, wavelet transform, orthogonal basis transform, or the like, or may extract the feature maps from an image file of the image to be processed, for example, for image data in jpeg format, the feature maps may be directly extracted from the image file.

In some embodiments, the second obtaining module 620 may perform local discrete cosine transform on the sample image to obtain a plurality of transform results; combining values of the same frequency position in each transformation result to obtain a feature map, so as to obtain a plurality of feature maps corresponding to different frequency positions in the transformation result; and obtaining the plurality of feature maps based on the plurality of feature maps of different frequency positions.

In some embodiments, said deriving said plurality of feature maps based on a plurality of feature maps for said different frequency locations comprises: taking a plurality of feature maps of the different frequency positions as the plurality of feature maps; or based on the SEnet network or based on a preset selection rule, discarding parts of the feature maps of the different frequency positions to obtain the feature maps.

The feature map filtering module 630 may be configured to filter out a target feature map from the plurality of feature maps of the first sample image based on energy coefficients corresponding to the plurality of feature maps in a one-to-one manner.

The energy coefficients are obtained by training the first model using a second sample image and its label, the second sample image being from a public data set. The private data set is co-distributed with the public data set.

In some embodiments, the energy coefficient may be obtained by: acquiring a plurality of feature maps of a second sample image; inputting a plurality of characteristic graphs of the second sample image into the first model to obtain a sample prediction result; the parameters of the first model comprise energy coefficients in one-to-one correspondence with the characteristic maps; determining a first loss function value; the first loss function value reflects a difference between the sample prediction result and a label of a second sample image; determining a second loss function value; the second loss function value is constructed based on the information of the plurality of feature maps and the energy coefficients corresponding to the plurality of feature maps; adjusting parameters of the first model to minimize a first loss function value and a second loss function value; the energy coefficients are obtained from the trained first model.

In some embodiments, in the first model, each energy coefficient is used for corresponding multiplication with a plurality of input characteristic maps, and the multiplication result is used for outputting to other parts of the first model for further processing.

In some embodiments, the feature map filtering module 630 may discard feature maps corresponding to energy coefficients smaller than a threshold.

The desensitization processing module 640 may be configured to perform desensitization processing on the target feature map to obtain desensitization image data for characterizing the first sample image.

Desensitization processing refers to processing sensitive information in a target characteristic diagram in a certain way so as to deform the sensitive information. For example, sensitive information is removed from the target feature map, changed to other information, and so forth.

In some embodiments, the desensitization treatment may include one or more of significance screening, normalization, order randomization, loss treatment, and fusion treatment.

In some embodiments, the desensitization process comprises: carrying out loss processing on the target characteristic diagram; and performing fusion processing on the target feature maps subjected to the loss processing to obtain one or more fusion feature maps of which the number is less than that of the target feature maps subjected to the loss processing.

In some embodiments, the loss processing includes changing the values of elements in one or more target feature maps.

In some embodiments, the fusion process includes combining the plurality of target feature maps subjected to the loss process in a manner that two or more target feature maps are in a group to obtain one or more combined results; and for each combination result, calculating the target characteristic diagram according to a preset calculation mode to obtain a fusion characteristic diagram.

The input module 650 may be configured to input the desensitization image data as input features into an image processing model, resulting in a processing result.

The processing result may be a result obtained by processing the target feature map by the image processing model. Which differs depending on the prediction task of the image processing model.

The parameter adjustment module 660 may be configured to adjust parameters of the image processing model to reduce differences between processing results and the labels.

In some embodiments, the parameter adjusting module 660 may adjust the parameters of the image processing model by minimizing the loss function value, so as to reduce the difference between the processing result of the model and the tag, and improve the accuracy of the model in subsequent task processing, for example, the accuracy of identification when identifying the identity information of a human face in an image.

For the above detailed description of each module of each system, reference may be made to the flowchart section of this specification, for example, the related description of fig. 2 to 5.

It should be understood that the system and its modules shown in FIG. 6 may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules in this specification may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).

It should be noted that the above description of the training system and the modules thereof for the image processing model is only for convenience of description, and the description is not limited to the scope of the embodiments. It will be appreciated by those skilled in the art that, given the teachings of the present system, any combination of modules or sub-system configurations may be used to connect to other modules without departing from such teachings. For example, in some embodiments, the first obtaining module 610, the second obtaining module 620, and the feature map filtering module 630 may be different modules in one system, or may be one module to implement the functions of two or more modules described above. For example, each module may share one memory module, and each module may have its own memory module. Such variations are within the scope of the present disclosure.

The beneficial effects that may be brought by the embodiments of the present description include, but are not limited to: (1) the energy coefficient is introduced, so that the privacy protection of the original image is guaranteed while the precision of a specific task is guaranteed, and the white box attack and the black box attack can be well resisted. (2) The feature diagram screening process is simple, parameterization can be performed in the screening process, the screening process can be completed through end-to-end training of the model, and the speed of subsequent model identification is high. (3) The model training is divided into two stages, the first stage can be that the data of the public training set is used to train the first model to obtain the energy coefficient, the second stage is that the desensitization image data obtained by the private data is used to train the image processing model, and the safety of the private data can be ensured in the model training stage.

It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be regarded as illustrative only and not as limiting the present specification. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.

Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.

Moreover, those skilled in the art will appreciate that aspects of the present description may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of this description may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present description may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.

Computer program code required for the operation of various portions of this specification may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Additionally, the order in which the elements and sequences of the process are recited in the specification, the use of alphanumeric characters, or other designations, is not intended to limit the order in which the processes and methods of the specification occur, unless otherwise specified in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features than are expressly recited in a claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.

For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims

1. A method of training an image processing model, the method comprising:

acquiring a first sample image and a label thereof; wherein the first sample image is from a private data set;

acquiring a plurality of feature maps of a first sample image;

screening a target characteristic map from a plurality of characteristic maps of the first sample image based on energy coefficients in one-to-one correspondence with the plurality of characteristic maps; the energy coefficients are obtained by training the first model using a second sample image and its label, the second sample image being from a public dataset;

desensitizing the target characteristic diagram to obtain desensitized image data for representing the first sample image;

inputting the desensitization image data as input features into an image processing model to obtain a processing result;

adjusting parameters of the image processing model to reduce differences between processing results and the labels.

2. The method of claim 1, wherein the energy coefficients corresponding to the plurality of feature maps one-to-one are obtained by:

acquiring a plurality of feature maps of a second sample image;

inputting a plurality of characteristic graphs of the second sample image into the first model to obtain a sample prediction result; the parameters of the first model comprise energy coefficients in one-to-one correspondence with the characteristic maps;

determining a first loss function value; the first loss function value reflects a difference between the sample prediction result and a label of a second sample image;

determining a second loss function value; the second loss function value is constructed based on the information of the plurality of feature maps and the energy coefficients corresponding to the plurality of feature maps;

adjusting parameters of the first model to minimize a first loss function value and a second loss function value;

the energy coefficients are obtained from the trained first model.

3. A method according to claim 2, wherein in the first model, each energy coefficient is used for corresponding multiplication with a plurality of input signatures, and the result of the multiplication is used for output to other parts of the first model for further processing.

4. The method of claim 1, wherein the filtering out the target feature map from the plurality of feature maps of the first sample image based on the energy coefficients corresponding to the plurality of feature maps in a one-to-one manner comprises:

and discarding the characteristic map corresponding to the energy coefficient smaller than the threshold value.

5. The method of claim 1, the private data set being co-distributed with the public data set.

6. The method according to claim 1 or 2, wherein the plurality of feature maps of the sample image are obtained by:

carrying out local discrete cosine transform on the sample image to obtain a plurality of transform results;

combining values of the same frequency position in each transformation result to obtain a feature map, so as to obtain a plurality of feature maps corresponding to different frequency positions in the transformation result;

and obtaining the plurality of feature maps based on the plurality of feature maps of different frequency positions.

7. The method of claim 6, the deriving the plurality of feature maps based on a plurality of feature maps of the different frequency locations, comprising:

taking a plurality of feature maps of the different frequency positions as the plurality of feature maps; alternatively, the first and second electrodes may be,

and discarding parts of the plurality of feature maps of the different frequency positions to obtain the plurality of feature maps based on the SEnet network or based on a preset selection rule.

8. The method of claim 1, the desensitization treatment comprising:

carrying out loss processing on the target characteristic diagram;

and performing fusion processing on the target feature maps subjected to the loss processing to obtain one or more fusion feature maps of which the number is less than that of the target feature maps subjected to the loss processing.

9. The method of claim 8, wherein the processing the target feature map for loss comprises:

the values of the elements in one or more target feature maps are changed.

10. The method according to claim 8, wherein the fusing the loss-processed target feature maps to obtain one or more fused feature maps with a smaller number than the loss-processed target feature maps comprises:

combining the multiple target characteristic graphs after loss processing in a mode that two or more target characteristic graphs are in a group to obtain one or more combined results;

and for each combination result, calculating the target characteristic diagram according to a preset calculation mode to obtain a fusion characteristic diagram.

11. A training system for an image processing model, the system comprising:

the first acquisition module is used for acquiring a first sample image and a label thereof; wherein the first sample image is from a private data set;

the second acquisition module is used for acquiring a plurality of characteristic maps of the first sample image;

the characteristic diagram screening module is used for screening a target characteristic diagram from a plurality of characteristic diagrams of the first sample image on the basis of energy coefficients in one-to-one correspondence with the plurality of characteristic diagrams; the energy coefficients are obtained by training the first model using a second sample image and its label, the second sample image being from a public dataset;

a desensitization processing module, configured to perform desensitization processing on the target feature map, to obtain desensitization image data used for characterizing the first sample image;

the input module is used for inputting the desensitization image data into an image processing model as input characteristics to obtain a processing result;

and the parameter adjusting module is used for adjusting the parameters of the image processing model so as to reduce the difference between the processing result and the label.

12. An apparatus for training an image processing model, comprising at least one storage medium and at least one processor, the at least one storage medium for storing computer instructions; the at least one processor is configured to execute the computer instructions to implement the method of any of claims 1-10.

13. A method of image processing, the method comprising:

acquiring a plurality of characteristic maps of an image to be processed;

screening a target characteristic diagram from the plurality of characteristic diagrams on the basis of energy coefficients in one-to-one correspondence with the plurality of characteristic diagrams, wherein the energy coefficients are obtained through model training;

and carrying out desensitization treatment on the target characteristic diagram to obtain desensitization image data for representing the image to be treated.