WO2021056765A1 - 图像处理方法及相关装置 - Google Patents

图像处理方法及相关装置 Download PDF

Info

Publication number
WO2021056765A1
WO2021056765A1 PCT/CN2019/119180 CN2019119180W WO2021056765A1 WO 2021056765 A1 WO2021056765 A1 WO 2021056765A1 CN 2019119180 W CN2019119180 W CN 2019119180W WO 2021056765 A1 WO2021056765 A1 WO 2021056765A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
similarity
feature data
iteration
image
Prior art date
Application number
PCT/CN2019/119180
Other languages
English (en)
French (fr)
Inventor
葛艺潇
陈大鹏
李鸿升
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to SG11202010487PA priority Critical patent/SG11202010487PA/en
Priority to KR1020217019630A priority patent/KR20210095671A/ko
Priority to JP2021500683A priority patent/JP7108123B2/ja
Priority to US17/077,251 priority patent/US11429809B2/en
Publication of WO2021056765A1 publication Critical patent/WO2021056765A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present disclosure relates to the field of image processing technology, and in particular to an image processing method and related devices.
  • neural networks have been widely used in various image recognition tasks (such as pedestrian re-recognition, image classification) in recent years.
  • training a neural network requires a large amount of labeled data, and people use unlabeled data to complete the training of the neural network through unsupervised learning.
  • the traditional unsupervised learning method recognizes the unlabeled images in the target domain through a neural network trained on the source domain, and adds labels to the unlabeled images in the target domain, and then supervises the training on the source domain with the label. And adjust the parameters of the neural network trained on the source domain to obtain the neural network applied to the target domain.
  • the present disclosure provides a technical solution for image processing.
  • an image processing method includes: acquiring an image to be processed; using a target neural network to perform feature extraction processing on the image to be processed to obtain target feature data of the image to be processed, the
  • the parameters of the target neural network are the time-series average values of the parameters of the first neural network, the first neural network is trained under the supervision of the training image set and the average network, and the parameters of the average network are the parameters of the second neural network.
  • Time series average value obtained by training the second neural network under the supervision of the training image set and the target neural network.
  • the parameters of the target neural network and the parameters of the average network are obtained respectively, and then the output of the target neural network is used to supervise the second
  • the neural network uses the output of the average network to supervise the first neural network to train the target neural network, which can improve the training effect. Furthermore, when the target neural network is used to perform related recognition tasks on the target domain, more informative target feature data can be extracted.
  • the first neural network is obtained by training under the supervision of a training image set and an average network, including: obtaining the training image set, a first neural network to be trained, and a second neural network to be trained Execute the first iteration x times on the first neural network to be trained and the second neural network to be trained to obtain the first neural network and the second neural network, where x is a positive integer; the x times
  • the i-th first iteration in the first iteration includes: supervising the first neural network to be trained in the i-th first iteration with the training image set and the output of the average network of the i-th first iteration
  • the first neural network to be trained in the i+1 first iteration supervises the training image set and the output of the target neural network in the i-th first iteration to supervise the first neural network in the i-th first iteration.
  • the first neural network to be trained in the i-th first iteration is supervised by the average network of the i-th first iteration, and the first neural network to be trained in the i-th first iteration is supervised through the i-th first iteration.
  • the iterative target neural network supervises the second neural network to be trained in the i-th first iteration, which can reduce the output of the second neural network to be trained in the i-th first iteration and the first to be trained in the i-th first iteration.
  • the correlation between the outputs of the neural network has an impact on the training effect, thereby enhancing the training effect.
  • the output of the training image set and the i-th average network is used to supervise the first neural network to be trained in the i-th first iteration to obtain the (i+1)th
  • One iteration of the first neural network to be trained includes: processing the training image set by the first neural network to be trained in the i-th first iteration to obtain a first feature data set, and The average network of the i-th first iteration processes the training image set to obtain a second feature data set; obtains a first soft ternary loss according to the first feature data set and the second feature data set; Use the training image set and the first soft ternary loss to supervise the first neural network to be trained in the i-th first iteration to obtain the first Neural network to be trained.
  • adjusting the parameters of the first neural network to be trained for the i-th first iteration based on determining the first soft ternary loss through the first feature data set and the second feature data set can increase the i-th time
  • the feature extraction effect of the first neural network to be trained in the first iteration on the image on the target domain can further improve the feature extraction effect of the target neural network on the image on the target domain.
  • the obtaining the first soft ternary loss according to the first feature data set and the second feature data set includes: determining that the first image in the training image set is in the training image set. The minimum similarity between the first feature data in the first feature data set and the feature data in the positive sample feature data subset in the first feature data set to obtain the first similarity; and it is determined that the first image is at all The minimum similarity between the second feature data in the second feature data set and the feature data in the positive sample feature data subset in the second feature data set to obtain a second similarity, the positive sample feature data
  • the subset includes feature data of an image with the same label as the first label of the first image; determining the difference between the first feature data and the feature data in the negative sample feature data subset in the first feature data set Obtain the third similarity degree; determine the maximum similarity between the second feature data and the feature data in the negative sample feature data subset in the second feature data set to obtain the fourth similarity ,
  • the negative sample feature data subset includes feature data of an image with
  • the first similarity, the second similarity, and the third similarity are normalized.
  • the degree and the fourth degree of similarity are transformed into values between 0 and 1, to obtain the fifth degree, sixth degree, seventh degree, and eighth degree of similarity that are more in line with the true distribution of the data, thereby improving the target neural network Training effect.
  • the normalization processing is performed on the first similarity, the second similarity, the third similarity, and the fourth similarity, respectively, to obtain the fifth degree of similarity.
  • the similarity, the sixth similarity, the seventh similarity, and the eighth similarity include: determining the sum of the second similarity and the fourth similarity to obtain a first total similarity, and determining the first similarity The sum of the second similarity and the third similarity obtains a second total similarity; the quotient of the second similarity and the first total similarity is determined to obtain the fifth similarity, and the fourth similarity is determined to be The quotient of the first total similarity obtains the sixth similarity; the quotient of the first similarity and the second total similarity is determined to obtain the seventh similarity, and the third similarity is determined to be The quotient of the second total similarity obtains the eighth similarity.
  • the first neural network to be trained in the i-th first iteration is supervised by the training image set and the first soft ternary loss to obtain the first
  • the first neural network to be trained in the i+1 first iteration includes: processing the first image by the first neural network to be trained in the i-th first iteration to obtain a first classification result Determine the first loss of the first neural network to be trained in the i-th first iteration according to the first classification result, the first label and the first soft ternary loss; based on the first A loss adjusts the parameters of the first neural network to be trained in the i-th first iteration to obtain the first neural network to be trained in the i+1-th first iteration.
  • the first to-be-trained of the i-th first iteration is determined according to the first classification result, the first label, and the first soft ternary loss
  • the first loss of the neural network includes: determining the first hard classification loss according to the difference between the first classification result and the first label; according to the first hard classification loss and the first soft ternary loss Determine the first loss.
  • the method before the determining the first loss according to the first hard classification loss and the first soft ternary loss, the method further includes: performing the i-th The average network of the first iteration processes the first image to obtain a second classification result; determines the first soft classification loss according to the difference between the first classification result and the second classification result; the basis The first hard classification loss and the first soft ternary loss determining the first loss includes: according to the first hard classification loss, the first soft classification loss, and the first soft ternary loss Determine the first loss.
  • the method before the first loss is determined according to the first hard classification loss, the first soft classification loss, and the first soft ternary loss, the method further The method includes: determining a first hard ternary loss according to the first similarity and the third similarity; and the determining the first hard ternary loss based on the first hard classification loss, the first soft classification loss, and the first soft ternary loss
  • the loss determining the first loss includes: determining the first loss according to the first hard classification loss, the first soft classification loss, the first soft ternary loss, and the first hard ternary loss .
  • the first neural network to be trained after the i-th first iteration processes the first image in the training image set to obtain the first classification result including: Perform a first preprocessing on the training image set to obtain a first image set, where the first preprocessing includes any one of erasing processing, clipping processing, and flipping processing; after the i-th first iteration
  • the first neural network to be trained processes a second image in the first image set to obtain the first classification result, and the second image is obtained by performing the first preprocessing on the first image,
  • the feature data of the second image in the first feature data set is the same as the data of the first image in the first feature data set.
  • the first image set is obtained by first preprocessing the images in the training image set, and then the first image set is input to the first neural network to be trained and the first neural network of the i-th first iteration.
  • the target neural network for the first iteration of i times to reduce the probability of overfitting during the training process.
  • the first neural network to be trained through the i-th first iteration processes the training image set to obtain a first feature data set, including: The first neural network to be trained in the first iteration processes the first image set to obtain the first feature data set.
  • the acquiring the training image set includes: acquiring a to-be-processed image set and a third neural network; performing y second iterations on the third neural network to obtain the training Image set, the y is a positive integer; the t second iteration in the y second iteration includes: sampling from the to-be-processed image set to obtain a second image set, the second iteration of the t A three-neural network processes the images in the second image set to obtain a third feature data set containing feature data of the images in the second image set and a classification result containing the classification results of the images in the second image set Set; perform clustering processing on the feature data in the third feature data set to determine the label of the feature data in the third feature data set, and add the label of the feature data in the third feature data set to the second From the corresponding images in the image set, obtain a third image set; determine the third loss according to the difference between the classification result in the classification result set and the labels of the images in the third image
  • the method further includes: using the target characteristic data to retrieve a database, and obtaining an image with characteristic data matching the target characteristic data as the target image.
  • the image to be processed includes a person object.
  • said supervising the second neural network to be trained in the i-th first iteration with the training image set and the output of the i-th first iteration of the target neural network Obtaining the second neural network to be trained in the i+1 first iteration includes: processing the training image set by the second neural network to be trained in the i-th first iteration to obtain the fourth A feature data set, the training image set is processed by the target neural network of the i-th first iteration to obtain a fifth feature data set; according to the fourth feature data set and the fifth feature data set Obtain a second soft ternary loss; use the training image set and the second soft ternary loss to supervise the second to-be-trained neural network of the i-th first iteration to obtain the i+1th The second neural network to be trained in the first iteration.
  • the obtaining the second soft ternary loss according to the fourth feature data set and the fifth feature data set includes: determining that the first image is in the fourth feature The minimum similarity between the third feature data in the data set and the feature data in the positive sample feature data subset in the fourth feature data set to obtain the ninth similarity; it is determined that the first image is in the fifth feature The minimum similarity between the fourth feature data in the data set and the feature data in the positive sample feature data subset in the fifth feature data set to obtain the eleventh similarity, and the positive sample feature data subset includes Feature data of an image with the same label as the first label; determine the maximum similarity between the third feature data and the feature data in the negative sample feature data subset in the fourth feature data set, and obtain the first Ten similarity; determining the maximum similarity between the third feature data and the feature data in the negative sample feature data subset in the fourth feature data set to obtain the twelfth similarity, the negative sample feature The data subset includes feature data of images with labels different from the first label; the ninth
  • the second neural network to be trained in the i-th first iteration is supervised by the training image set and the second soft ternary loss to obtain the first
  • the second neural network to be trained in the i+1 first iteration includes: processing the first image by the second neural network to be trained in the i-th first iteration to obtain a third classification result Determine the second loss of the second neural network to be trained in the i-th first iteration according to the third classification result, the first label and the second soft ternary loss; based on the first The second loss adjusts the parameters of the second neural network to be trained in the i-th first iteration to obtain the second neural network to be trained in the (i+1)th first iteration.
  • the second to-be-trained of the i-th first iteration is determined according to the third classification result, the first label, and the second soft ternary loss
  • the second loss of the neural network includes: determining the second hard classification loss according to the difference between the third classification result and the first label; according to the second hard classification loss and the second soft ternary loss Determine the second loss.
  • the method before the determining the second loss according to the second hard classification loss and the second soft ternary loss, the method further includes: performing the i-th The first iteration of the target neural network processes the first image to obtain a fourth classification result; determines a second soft classification loss according to the difference between the third classification result and the fourth classification result; Determining the second loss according to the second hard classification loss and the second soft ternary loss includes: according to the second hard classification loss, the second soft classification loss, and the second soft ternary loss The loss determines the second loss.
  • the method before the determining the second loss based on the second hard classification loss, the second soft classification loss, and the second soft ternary loss, the method further Including: determining a second hard ternary loss based on the ninth similarity and the tenth similarity; and determining the second hard ternary loss based on the second hard classification loss, the second soft classification loss, and the second soft ternary loss
  • the loss determining the second loss includes: determining the second loss based on the second hard classification loss, the second soft classification loss, the second soft ternary loss, and the second hard ternary loss .
  • the second neural network to be trained after the i-th first iteration processes the first image in the training image set to obtain a third classification result including: Perform a second preprocessing on the training image set to obtain a fourth image set, where the second preprocessing includes any one of erasing processing, cropping processing, and flipping processing; after the i-th first iteration
  • the second neural network to be trained processes a third image in the fourth image set to obtain the third classification result, and the third image is obtained by performing the second preprocessing on the first image,
  • the feature data of the third image in the fourth feature data set is the same as the data of the first image in the fourth feature data set, and the first preprocessing is different from the second preprocessing.
  • the second neural network to be trained through the i-th first iteration processes the training image set to obtain a fourth characteristic data set, including: The second neural network to be trained in the first iteration processes the fourth image set to obtain the fourth feature data set.
  • an image processing device in a second aspect, includes: an acquisition unit for acquiring an image to be processed; a feature extraction processing unit for using a target neural network to perform feature extraction processing on the image to be processed to obtain
  • the target feature data of the image to be processed, the parameters of the target neural network are the time-series average values of the parameters of the first neural network, and the first neural network is obtained by training under the supervision of the training image set and the average network, the The parameters of the average network are time-series averages of the parameters of the second neural network, which is obtained by training under the supervision of the training image set and the target neural network.
  • the first neural network is obtained by training under the supervision of a training image set and an average network, including: obtaining the training image set, a first neural network to be trained, and a second neural network to be trained Execute the first iteration x times on the first neural network to be trained and the second neural network to be trained to obtain the first neural network and the second neural network, where x is a positive integer; the x times
  • the i-th first iteration in the first iteration includes: supervising the first neural network to be trained in the i-th first iteration with the training image set and the output of the average network of the i-th first iteration
  • the first neural network to be trained in the i+1 first iteration supervises the training image set and the output of the target neural network in the i-th first iteration to supervise the first neural network in the i-th first iteration.
  • the output of the training image set and the i-th average network is used to supervise the first neural network to be trained in the i-th first iteration to obtain the (i+1)th
  • One iteration of the first neural network to be trained includes: processing the training image set by the first neural network to be trained in the i-th first iteration to obtain a first feature data set, and The average network of the i-th first iteration processes the training image set to obtain a second feature data set; obtains a first soft ternary loss according to the first feature data set and the second feature data set; Use the training image set and the first soft ternary loss to supervise the first neural network to be trained in the i-th first iteration to obtain the first Neural network to be trained.
  • the obtaining the first soft ternary loss according to the first feature data set and the second feature data set includes: determining that the first image in the training image set is in the training image set. The minimum similarity between the first feature data in the first feature data set and the feature data in the positive sample feature data subset in the first feature data set to obtain the first similarity; and it is determined that the first image is at all The minimum similarity between the second feature data in the second feature data set and the feature data in the positive sample feature data subset in the second feature data set to obtain a second similarity, the positive sample feature data
  • the subset includes feature data of an image with the same label as the first label of the first image; determining the difference between the first feature data and the feature data in the negative sample feature data subset in the first feature data set Obtain the third similarity degree; determine the maximum similarity between the second feature data and the feature data in the negative sample feature data subset in the second feature data set to obtain the fourth similarity ,
  • the negative sample feature data subset includes feature data of an image with
  • the normalization processing is performed on the first similarity, the second similarity, the third similarity, and the fourth similarity, respectively, to obtain the fifth degree of similarity.
  • the similarity, the sixth similarity, the seventh similarity, and the eighth similarity include: determining the sum of the second similarity and the fourth similarity to obtain a first total similarity, and determining the first similarity The sum of the second similarity and the third similarity obtains a second total similarity; the quotient of the second similarity and the first total similarity is determined to obtain the fifth similarity, and the fourth similarity is determined to be The quotient of the first total similarity obtains the sixth similarity; the quotient of the first similarity and the second total similarity is determined to obtain the seventh similarity, and the third similarity is determined to be The quotient of the second total similarity obtains the eighth similarity.
  • the first neural network to be trained in the i-th first iteration is supervised by the training image set and the first soft ternary loss to obtain the first
  • the first neural network to be trained in the i+1 first iteration includes: processing the first image by the first neural network to be trained in the i-th first iteration to obtain a first classification result Determine the first loss of the first neural network to be trained in the i-th first iteration according to the first classification result, the first label and the first soft ternary loss; based on the first A loss adjusts the parameters of the first neural network to be trained in the i-th first iteration to obtain the first neural network to be trained in the i+1-th first iteration.
  • the first to-be-trained of the i-th first iteration is determined according to the first classification result, the first label, and the first soft ternary loss
  • the first loss of the neural network includes: determining the first hard classification loss according to the difference between the first classification result and the first label; according to the first hard classification loss and the first soft ternary loss Determine the first loss.
  • the i-th first iteration of the The averaging network processes the first image to obtain a second classification result; determines the first soft classification loss according to the difference between the first classification result and the second classification result; said according to the first hard classification Loss and the first soft ternary loss to determine the first loss, including: determining the first loss according to the first hard classification loss, the first soft classification loss, and the first soft ternary loss .
  • the first loss is determined according to the first hard classification loss, the first soft classification loss, and the first soft ternary loss.
  • a degree of similarity and the third degree of similarity determine a first hard ternary loss; the first hard ternary loss is determined based on the first hard classification loss, the first soft classification loss, and the first soft ternary loss
  • the loss includes: determining the first loss according to the first hard classification loss, the first soft classification loss, the first soft ternary loss, and the first hard ternary loss.
  • the first neural network to be trained after the i-th first iteration processes the first image in the training image set to obtain a first classification result.
  • Training a neural network includes: performing a first preprocessing on the training image set to obtain a first image set, where the first preprocessing includes any one of erasing processing, clipping processing, and flipping processing;
  • the first neural network to be trained in the first iteration of i times processes the second image in the first image set to obtain the first classification result, and the second image is performed by performing the first image on the first image.
  • the first preprocessing is obtained, and the feature data of the second image in the first feature data set is the same as the data of the first image in the first feature data set.
  • the first neural network to be trained through the i-th first iteration processes the training image set to obtain a first feature data set, including: The first neural network to be trained in the first iteration processes the first image set to obtain the first feature data set.
  • the acquisition unit is specifically configured to: acquire a set of images to be processed and a third neural network; perform y second iterations on the third neural network to obtain the training image set,
  • the y is a positive integer
  • the t second iteration of the y second iterations includes: sampling from the to-be-processed image set to obtain a second image set, and the third neural network of the t second iteration Processing the images in the second image set to obtain a third feature data set containing feature data of the images in the second image set and a classification result set containing the classification results of the images in the second image set; Perform clustering processing on the feature data in the third feature data set to determine the label of the feature data in the third feature data set, and add the label of the feature data in the third feature data set to the corresponding second image set
  • a third image set is obtained; the third loss is determined according to the difference between the classification result in the classification result set and the labels of the images in the third image set; the third loss is adjusted based
  • the device further includes: a retrieval unit configured to use the target feature data to retrieve a database, and obtain an image with feature data matching the target feature data as the target image.
  • a retrieval unit configured to use the target feature data to retrieve a database, and obtain an image with feature data matching the target feature data as the target image.
  • said supervising the second neural network to be trained in the i-th first iteration with the training image set and the output of the i-th first iteration of the target neural network Obtaining the second neural network to be trained in the i+1 first iteration includes: processing the training image set by the second neural network to be trained in the i-th first iteration to obtain the fourth A feature data set, the training image set is processed by the target neural network of the i-th first iteration to obtain a fifth feature data set; according to the fourth feature data set and the fifth feature data set Obtain a second soft ternary loss; use the training image set and the second soft ternary loss to supervise the second to-be-trained neural network of the i-th first iteration to obtain the i+1th The second neural network to be trained in the first iteration.
  • the obtaining the second soft ternary loss according to the fourth feature data set and the fifth feature data set includes: determining that the first image is in the fourth feature The minimum similarity between the third feature data in the data set and the feature data in the positive sample feature data subset in the fourth feature data set to obtain the ninth similarity; it is determined that the first image is in the fifth feature The minimum similarity between the fourth feature data in the data set and the feature data in the positive sample feature data subset in the fifth feature data set to obtain the eleventh similarity, and the positive sample feature data subset includes Feature data of an image with the same label as the first label; determine the maximum similarity between the third feature data and the feature data in the negative sample feature data subset in the fourth feature data set, and obtain the first Ten similarity; determining the maximum similarity between the third feature data and the feature data in the negative sample feature data subset in the fourth feature data set to obtain the twelfth similarity, the negative sample feature The data subset includes feature data of images with labels different from the first label; the ninth
  • the second neural network to be trained in the i-th first iteration is supervised by the training image set and the second soft ternary loss to obtain the first
  • the second neural network to be trained in the i+1 first iteration includes: processing the first image by the second neural network to be trained in the i-th first iteration to obtain a third classification result Determine the second loss of the second neural network to be trained in the i-th first iteration according to the third classification result, the first label and the second soft ternary loss; based on the first The second loss adjusts the parameters of the second neural network to be trained in the i-th first iteration to obtain the second neural network to be trained in the (i+1)th first iteration.
  • the second to-be-trained of the i-th first iteration is determined according to the third classification result, the first label, and the second soft ternary loss
  • the second loss of the neural network includes: determining the second hard classification loss according to the difference between the third classification result and the first label; according to the second hard classification loss and the second soft ternary loss Determine the second loss.
  • the i-th first iteration of the The target neural network processes the first image to obtain the fourth classification result; determines the second soft classification loss according to the difference between the third classification result and the fourth classification result; the second soft classification loss is determined according to the second hard
  • the classification loss and the second soft ternary loss determining the second loss includes: determining the second loss based on the second hard classification loss, the second soft classification loss, and the second soft ternary loss loss.
  • the second loss is determined according to the first Determining the second hard ternary loss based on the nine similarity and the tenth similarity; the second hard ternary loss is determined based on the second hard classification loss, the second soft classification loss, and the second soft ternary loss
  • the loss includes: determining the second loss according to the second hard classification loss, the second soft classification loss, the second soft ternary loss, and the second hard ternary loss.
  • the second neural network to be trained after the i-th first iteration processes the first image in the training image set to obtain a third classification result including: Perform a second preprocessing on the training image set to obtain a fourth image set, where the second preprocessing includes any one of erasing processing, cropping processing, and flipping processing; after the i-th first iteration
  • the second neural network to be trained processes a third image in the fourth image set to obtain the third classification result, and the third image is obtained by performing the second preprocessing on the first image,
  • the feature data of the third image in the fourth feature data set is the same as the data of the first image in the fourth feature data set, and the first preprocessing is different from the second preprocessing.
  • the second neural network to be trained through the i-th first iteration processes the training image set to obtain a fourth characteristic data set, including: The second neural network to be trained in the first iteration processes the fourth image set to obtain the fourth feature data set.
  • a processor is provided, and the processor is configured to execute a method as in the above-mentioned first aspect and any one of its possible implementation manners.
  • an electronic device including: a processor, a sending device, an input device, an output device, and a memory.
  • the memory is used to store computer program code.
  • the computer program code includes computer instructions. When the computer executes the computer instruction, the electronic device executes the method as described in the first aspect and any one of its possible implementation manners.
  • a computer-readable storage medium stores a computer program.
  • the computer program includes program instructions that, when executed by a processor of an electronic device, cause The processor executes the method according to the first aspect described above and any one of its possible implementation manners.
  • a computer program product containing instructions, which when running on a computer, causes the computer to execute the first aspect and any one of its possible implementation methods.
  • FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the disclosure
  • FIG. 2 is a schematic diagram of a training method provided by an embodiment of the disclosure
  • FIG. 3 is a schematic flowchart of another image processing method provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of another training method provided by an embodiment of the disclosure.
  • FIG. 5 is a schematic diagram of yet another training method provided by an embodiment of the disclosure.
  • FIG. 6 is a schematic flowchart of another image processing method provided by an embodiment of the disclosure.
  • FIG. 7 is a schematic structural diagram of an image processing device provided by an embodiment of the disclosure.
  • FIG. 8 is a schematic diagram of the hardware structure of an image processing device provided by an embodiment of the disclosure.
  • neural networks have been widely used in various image recognition tasks (such as pedestrian re-recognition, image classification) in recent years.
  • the performance of the neural network in these tasks largely depends on the training effect of the neural network, and the training effect of the neural network mainly depends on the number of training images for training the neural network, that is, the more the number of training images, the more the neural network The better the training effect is, the better the effect of applying the trained neural network to perform corresponding image recognition tasks.
  • a training image refers to an image with annotated information (hereinafter referred to as a label).
  • the task that needs to be performed is to classify the content contained in the image, and determine whether the content contained in the image is apple, banana, pear, peach, orange, or watermelon. Which one of them, then the above-mentioned labeling information includes apples, bananas, pears, peaches, oranges, and watermelons.
  • the task to be performed is pedestrian re-recognition, that is, to identify the identity of the person included in the image, then the above-mentioned labeling information includes the identity of the person (such as Zhang San, Li Si, Wang Wu, Saturday, etc.).
  • the tasks performed by the neural network on the above-mentioned labeled images and the tasks performed on the above-mentioned unlabeled images are correlated, and there is also a correlation between the above-mentioned labeled images and the above-mentioned unlabeled images.
  • a large number of images containing pedestrians under a cloudy sky (hereinafter referred to as the image of A) are collected by the surveillance camera in City A, and the marked identity is obtained by marking the identity of the pedestrian in the image of A Data, and use the labeled data to train the neural network a, so that the trained neural network a can be used to identify the identity of the pedestrian in the image collected in A on a cloudy day.
  • the parameters of the trained neural network a can be adjusted through unsupervised transfer learning, so that the trained neural network a can be used to identify the identity of the pedestrian in the image collected from B.
  • the tasks performed by the neural network a on the labeled images and the tasks performed on the unlabeled images (images collected from B) are both the identification of pedestrians, and the labeled images and unlabeled images are both images containing pedestrians.
  • the labeled data are all images collected on cloudy days.
  • the images collected from B include images collected on cloudy days, images collected on sunny days, and images collected on rainy days.
  • the brightness of the environment in the images collected in different weather Different, different environmental brightness has a greater impact on the recognition accuracy of the neural network.
  • the neural network obtained by training the image collected on a cloudy day has a low recognition accuracy for the identity of the pedestrian in the image collected on a sunny day.
  • the parameters of the surveillance camera in A and the surveillance camera in B are different (such as the shooting angle), which will also lead to the accuracy of the neural network in identifying pedestrians in the images collected by different cameras.
  • the parameters of the surveillance camera in A and the parameters of the surveillance camera in B are not the same, resulting in the use of labeled data training to obtain the neural network to identify the identity of the pedestrian in the image collected from the accuracy rate of B low.
  • the set containing the above-mentioned labeled images is called the source domain, and the set containing the above-mentioned unlabeled images is called the target domain.
  • unsupervised transfer learning is a kind of neural network trained on the source domain and applied to the target domain. The training method of the network.
  • the traditional unsupervised learning method recognizes the unlabeled images on the target domain through a neural network trained on the source domain, and adds labels to the unlabeled images on the target domain (hereinafter referred to as pseudo-hard labels), and then The pseudo-hard label supervises the neural network trained on the source domain, and adjusts the parameters of the neural network trained on the source domain to obtain the neural network applied to the target domain (hereinafter referred to as the applied neural network). Due to the error of the pseudo-hard label, the effect of the neural network trained on the source domain through the pseudo-hard label supervision is not good, which in turn leads to the poor feature extraction effect of the application of the neural network on the target domain, and further leads to the failure in the target domain.
  • the application effect is poor (for example, the recognition accuracy of the pedestrian's identity is low).
  • Using the technical solutions provided by the embodiments of the present disclosure it is possible to obtain a neural network that has a better feature extraction effect on the target domain than the feature extraction effect of the neural network on the target domain on the basis of the above-mentioned traditional method, thereby improving the performance The effect of the application on the target domain.
  • the most difficult feature data in a class the two feature data with the least similarity among the feature data of the same label.
  • the most difficult feature data outside the class the two feature data with the greatest similarity among the feature data of images with different labels.
  • the most difficult feature data of the image in the class of the feature data set the most difficult feature data of the image in the class of the feature data of the feature data set.
  • the most difficult feature data outside the class of the image in the feature data set the most difficult feature data in the class of the feature data of the image in the feature data set.
  • the feature data of image 1 is feature data 1
  • the feature data of image 2 is feature data 2
  • the feature data of image 3 is feature data 3
  • the feature data of image 4 is feature data 4
  • the feature data of image 5 is feature data.
  • the label of image 1 is the same as the label of image 2 and the label of image 3, and the label of image 1 is different from the label of image 4, and the label of image 5 is different. If the similarity between the feature data 1 and the feature data 2 is smaller than the similarity between the feature data 1 and the feature data 3, then the feature data 3 is the most difficult feature data in the class of the feature data 1.
  • the feature data 5 is the most difficult feature data outside the class of the feature data 1. Assuming that feature data set 1 contains feature data 1, feature data 2, feature data 3, feature data 4, and feature data 5, the most difficult feature data in the class of feature data set 1 for image 1 is feature data 3, and image 1 is The most difficult feature data outside the class in feature data set 1 is feature data 5.
  • FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the present disclosure.
  • the execution subject of this embodiment may be a terminal such as a server, a mobile phone, a computer, and a tablet computer.
  • the above-mentioned image to be processed may be any digital image.
  • the image to be processed may contain human objects, where the image to be processed may only include a human face, without the torso and limbs (hereinafter the torso and limbs are referred to as the human body), or may only include the human body, excluding the human body, or only Including lower limbs or upper limbs.
  • the present disclosure does not limit the region of the human body specifically included in the image to be processed.
  • the image to be processed may contain animals.
  • the image to be processed may include plants.
  • the present disclosure does not limit the content contained in the image to be processed.
  • the way to obtain the image to be processed may be to receive the image to be processed input by the user through an input component, where the input component includes: a keyboard, a mouse, a touch screen, a touch pad, and an audio input device. It may also be the image to be processed sent by the receiving terminal, where the terminal includes a mobile phone, a computer, a tablet, a server, and so on.
  • the present disclosure does not limit the way of acquiring the image to be processed.
  • a target neural network to perform feature extraction processing on the image to be processed to obtain target feature data of the image to be processed.
  • the above-mentioned target neural network is a neural network that has the function of extracting characteristic data from an image.
  • the target neural network can be stacked or composed of neural network layers such as convolutional layer, pooling layer, normalization layer, fully connected layer, down-sampling layer, up-sampling layer, and classifier in a certain way.
  • the present disclosure does not limit the structure of the target neural network.
  • the target neural network includes a multi-layer convolutional layer and a normalization layer, and then the multi-layer convolutional layer and the normalization layer in the target neural network are used to perform convolution processing and processing on the image to be processed.
  • the normalization process can extract the feature data of the image to be processed and obtain the target feature data.
  • the neural network trained on the source domain through pseudo-hard label supervision will make it train on the source domain.
  • a good neural network is getting worse and worse in the direction that it is not learning well, which in turn leads to a poor application effect of the applied neural network in the target domain.
  • the neural network trained on the source domain has a low recognition accuracy of Li Si, that is, the probability of recognizing Li Si as another person is high.
  • Input the image a containing Li Si into the neural network trained on the source domain, and the pseudo-hard label obtained is Wang Wu, and then use Wang Wu as the supervision data to supervise the output of the neural network trained on the source domain, and adjust The parameters of the neural network trained on the source domain.
  • the neural network trained on the source domain will be adjusted to make the feature data extracted from the image a close to the feature data of Wang Wu. In this way, when the finally obtained applied neural network is used to identify the image containing Li Si, the extracted feature data are all close to the feature data of Wang Wu, which will lead to the recognition of Li Si as Wang Wu.
  • the present disclosure considers the output of another neural network trained on the source domain (hereinafter referred to as the supervised neural network) to supervise the output of the applied neural network to improve the application of the neural network on the target domain.
  • the effect of the application is different.
  • the application of neural network has a high recognition accuracy for Zhang San, but a low recognition accuracy for Saturday.
  • the supervisory neural network has a low recognition accuracy for Zhang San, but a high recognition accuracy for Saturday.
  • the output of the applied neural network and the output of the supervised neural network are used for mutual supervision, that is, the output of the applied neural network is used to supervise the neural network, and the output of the supervised neural network is used to supervise the applied neural network, which can replace the pseudo-hard label on the applied neural network.
  • Network supervision In the training process of mutual supervision, the parameters of the applied neural network and the supervised neural network are updated at the same time, which will cause the output of the applied neural network to be more and more similar to the output of the supervised neural network (the mutual supervision will be described below) This defect is called correlation defect).
  • the supervised neural network can learn the "disadvantages" of the applied neural network (as in Example 2 the neural network is used to recognize Saturday's identity), and the applied neural network can learn the "disadvantages” of the supervised neural network.
  • "Inferiority” such as the recognition of Zhang San's identity by the supervisory neural network in Example 2
  • the parameters of the applied neural network and the parameters of the second neural network are getting higher and higher. In this way, the parameters of the applied neural network cannot be optimized, and the application effect of the applied neural network in the target domain cannot be improved.
  • the embodiment of the present disclosure proposes another training method to reduce the correlation between the application neural network and the supervisory neural network by "isolating" the application neural network and the supervisory neural network , In order to reduce the impact of correlation defects, and then obtain a target neural network that has a better effect on the target domain than the application of the neural network obtained through this mutual supervision method.
  • This training method includes: determining the parameters of the target neural network through the time series average of the parameters of the first neural network, and then supervising the second neural network through the output of the target neural network, and at the same time using the time series average of the parameters of the second neural network Determine the parameters of the average network, and supervise the first neural network with the output of the average network to complete the training of the above-mentioned target neural network.
  • the first neural network and the second neural network are both neural networks trained on the source domain, and the first neural network and the second neural network are both capable of extracting feature data from images and classifying based on feature data
  • the structure of the first neural network and the structure of the second neural network may be the same or different, which is not limited in the present disclosure.
  • the parameters of the target neural network are the time-series average values of the parameters of the first neural network.
  • the first neural network is trained under the supervision of the training image set and the average network, that is, the label of the image in the training image set
  • the output of the sum average network is the output of the first neural network supervised by the supervision data, and the parameters of the first neural network are adjusted.
  • the parameters of the above-mentioned average network are the time-series average values of the parameters of the second neural network, which is obtained by training under the supervision of the above-mentioned training image set and the above-mentioned target neural network, that is, using the labels of the images in the training image set and the target nerve
  • the output of the network is the output of the second neural network supervised by the supervision data and the parameters of the second neural network are adjusted.
  • the parameters of the above four networks will be updated every time a training cycle is completed.
  • the parameters of the target neural network are the time-series average values of the parameters of the first neural network
  • the parameters of the target neural network are the average values of the parameters of the first neural network in different training periods
  • the parameters of the average network are the parameters of the second neural network.
  • the time-series average value of refers to the average value of the parameters of the average network of the second neural network with different training periods.
  • the average value of the parameters of the first neural network in different training periods is used to reflect the average performance of the first neural network in the completed training period.
  • the specific parameters of the target neural network may not be limited to It is determined by calculating the average value of the parameters of the first neural network in different training periods. In the same way, when the parameters of the average network are specifically determined, it is not limited to calculating the average values of the parameters of the second neural network in different training periods.
  • the parameters of the target neural network can be determined by the following formula:
  • E T ( ⁇ 1 ) is the parameter of the target neural network in the T-th training cycle
  • E T-1 ( ⁇ 1 ) is the parameter of the target neural network in the T-1 training cycle
  • the parameters of the average network can be determined through the above two possible implementation methods.
  • the parameters of the first neural network are obtained through supervised training, that is, the loss of the first neural network and the loss of the second neural network are determined separately through the supervision of the supervised data, and are based on the first neural network.
  • the loss of a neural network determines the gradient of the back propagation of the first neural network, and then propagates the gradient through the back propagation method to update the parameters of the first neural network.
  • the parameters of the second neural network are also updated by means of reverse gradient propagation.
  • the parameters of the target neural network and the average network are not updated by back gradient propagation, but by the average value of the parameters of the first neural network in different training periods and the average value of the parameters of the second neural network in different training periods Determine the parameters of the target neural network and the parameters of the average network respectively. Therefore, the update speed of the parameters of the target neural network and the parameters of the average network is slower than the update speed of the parameters of the first neural network and the parameters of the second neural network, that is, the output of the target neural network and the output of the second neural network.
  • the degree of similarity between is low, and the degree of similarity between the output of the average network and the output of the first neural network is low.
  • the second neural network and the first neural network are supervised by the output of the target neural network and the output of the average network respectively, so that the second neural network can learn the "advantages" of the target neural network (that is, the “advantages” of the first neural network Advantages”), so that the first neural network learns the “advantages” of the average network.
  • the parameters of the target neural network can reflect the average performance of the first neural network in the completed training period
  • the parameters of the average network can reflect the average performance of the second neural network in the completed training period, so that the second neural network can reflect the average performance of the second neural network in the completed training period.
  • the neural network learning the "advantage” of the target neural network is equivalent to making the second neural network learn the "advantage” of the first neural network, and making the first neural network learn the "advantage” of the average network is equivalent to making the first neural network
  • the network learns the "advantages" of the second neural network.
  • the parameters of the target neural network are determined according to the time-series average values of the parameters of the first neural network, so that the effect of the target neural network obtained by training on the target domain can be better than that of the application of the neural network on the target domain.
  • the above-mentioned “advantage” refers to the high accuracy of neural network recognition of a certain category or individual. For example, in Example 2, the neural network is used to identify Saturday's identity, and the supervised neural network is used to identify Zhang San's identity.
  • the parameters of the target neural network and the parameters of the average network are obtained respectively, and then the output of the target neural network is used to supervise the second neural network.
  • the target neural network is used to perform related recognition tasks on the target domain, more informative target feature data can be extracted, and this information can improve the accuracy of recognition in the target domain.
  • FIG. 3 is a schematic flowchart of a possible implementation method of 102 provided by an embodiment of the present disclosure.
  • the execution subject of this embodiment may be a server or a computer.
  • the execution subject of the training method in this embodiment may be different from the foregoing execution subject, or may be the same.
  • the training image set can be obtained by the traditional method shown in FIG. 4.
  • input multiple unlabeled images on the target domain hereinafter referred to as a set of images to be processed
  • the image is subjected to feature extraction processing to obtain a third feature data set containing the feature data of the images in the image set to be processed, and then the feature data of the images in the image set to be processed are distinguished by a clustering algorithm to obtain a predetermined number of sets, and give
  • the images corresponding to the feature data in each set are pseudo-hard tags.
  • the second iteration of the third neural network is performed y times to obtain the above-mentioned training image set, where y is a positive integer.
  • the t second iteration in the y second iteration includes:
  • the second image set is sampled from the image set to be processed, and the images in the second image set are processed by the third neural network of the second iteration of the t-th time to obtain the third feature data including the feature data of the images in the second image set.
  • a set and a classification result set containing the classification results of the images in the second image set. Perform clustering processing on the feature data in the third feature data set to determine the label of the feature data in the third feature data set, and add the label of the feature data in the third feature data set to the corresponding image in the second image set to obtain The third image set.
  • the third loss is determined according to the difference between the classification result in the classification result set and the label of the image in the third image set.
  • the parameters of the third neural network of the second iteration of the tth time are adjusted based on the third loss, and the parameters of the third neural network of the second iteration of the t+1th time are obtained.
  • the third image set is sampled from the image set to be processed, where the images in the third image set are different from the images in the second image set. Then use the t-th second iteration of the third neural network to process the second image set to obtain the labels of the images in the second image set and the parameters of the third neural network of the t+1 second iteration.
  • the third neural network of the second iteration of t+1 times processes the third image set to obtain the labels of the images in the third image set, and the third neural network of the second iteration of t+2 times.
  • the training image set is obtained.
  • the third neural network of the first and second iteration is the third neural network.
  • Example 3 Five images containing human objects are sampled from the image set to be processed, which are image a, image b, image c, image d, and image e. Input these 5 images to the third neural network to obtain the feature data of these 5 images, and divide these 5 images into 3 categories according to the identity of the person object represented by the feature data of these 5 images through the clustering algorithm, and obtain 3 sets. They are the first set consisting of image a and image e, the second set consisting of image b, and the third set consisting of image c and image d.
  • the classifier in the third neural network predicts the category to which the image belongs based on the feature data of these images (hereinafter referred to as the prediction result), and will determine the total preparatory hard label based on the difference between the prediction result and the pseudo-hard label. Classification loss.
  • the preliminary hard classification loss is obtained by separately calculating the difference between the prediction result and the label of each image in the image set to be processed, and the average value of the preliminary hard classification loss of at least one image in the image set to be processed is calculated
  • the total reserve hard loss can be determined.
  • the predicted category of the image a output by the classifier of the third neural network is [0.7, 0.2, 0.1]
  • the probability that the identity of the character object in the image a is Zhang San is 0.7
  • the probability that the identity of the character object in image a is Li Si is 0.2
  • the probability that the identity of the character object in image a is Wang Wu is 0.1.
  • the preliminary hard classification loss of image a By calculating the cross-entropy loss between the predicted category ([0.7,0.2,0.1]) and the pseudo hard label ([1,0,0]), the preliminary hard classification loss of image a can be determined. In the same way, the hard classification loss of image b, image c, image d, and image e can be determined, and then the preliminary hard classification loss of image a, the hard classification loss of image b, the preliminary hard classification loss of image c, and image d can be calculated. The average of the hard classification loss and the preliminary hard classification loss of image e obtains the total preliminary hard classification loss.
  • Example 4 continues with an example.
  • the preliminary hard ternary loss of image a When calculating the preliminary hard ternary loss of image a, first calculate the similarity between the feature data of the image belonging to the same category and the feature data of image a (hereinafter referred to as positive Similarity), respectively calculate the similarity (hereinafter referred to as negative similarity) between the feature data of the image belonging to a different class from the image a and the feature data of the image a, and based on the minimum value of the positive similarity and the negative similarity The maximum value of the degree determines the preliminary hard ternary loss.
  • positive Similarity the similarity between the feature data of the image belonging to the same category and the feature data of image a
  • negative similarity the similarity between the feature data of the image belonging to a different class from the image a and the feature data of the image a
  • the preliminary hard ternary loss of image b, image c, image d, and image e can be determined, and then the preliminary hard ternary loss of image a, the hard ternary loss of image b, and the preliminary hard ternary loss of image c can be calculated.
  • the average value of the meta loss, the hard ternary loss of image d, and the preliminary hard ternary loss of image e obtains the total preliminary hard ternary loss.
  • the total preliminary hard classification loss and the total preliminary hard ternary loss are weighted and summed to obtain the third loss. Adjust the parameters of the third neural network based on the third loss, and obtain the second second iteration of the third neural network. Until at least one image in the image set to be processed is tagged (ie, pseudo hard label), a training image set is obtained.
  • the first neural network to be trained and the second neural network to be trained are both neural networks trained on the source domain. Both the first neural network to be trained and the second neural network to be trained are capable of extracting feature data from images. Functions and neural networks classified according to feature data.
  • the structure of the first neural network to be trained and the structure of the second neural network to be trained may be the same or different, which is not limited in the present disclosure.
  • FIG. 5 is a schematic diagram of the training of the i-th first iteration in the x-th first iteration provided by this embodiment.
  • the i-th first iteration includes: using the above-mentioned training image set and the i-th first iteration.
  • the output of the iterative average network supervises the first neural network to be trained in the i-th first iteration to obtain the first neural network to be trained in the i+1-th first iteration, using the above-mentioned training image set and the i-th first neural network to be trained.
  • the output of the iterative target neural network supervises the second neural network to be trained in the i-th first iteration, and obtains the second neural network to be trained in the i+1th first iteration.
  • the training image set is used to supervise the first neural network to be trained in the i-th first iteration to obtain the first neural network to be trained in the i+1 first iteration
  • the training image set is used to supervise
  • Obtaining the second neural network to be trained in the i+1 first iteration by the second neural network to be trained in the i-th first iteration may include the following steps: the first neural network to be trained in the i-th first iteration compares the above The first image is processed to obtain the first classification result. The first image is processed by the ith first iteration of the average network to obtain the second classification result.
  • the ith first iteration of the second neural network to be trained The first image is processed to obtain the third classification result, and the first image is processed by the target neural network before the training of the i-th first iteration to obtain the fourth classification result. Then, according to the difference between the first classification result and the first label of the first image (ie the pseudo hard label obtained in step 301), the first hard classification loss of the first neural network to be trained for the i-th first iteration is determined , According to the difference between the third classification result and the first label, the second hard classification loss of the second neural network to be trained in the i-th first iteration is determined.
  • the first hard classification loss is used to supervise the first neural network to be trained in the i-th first iteration and the second hard classification loss to supervise the second neural network to be trained in the i-th first iteration, so that the i-th neural network is supervised by the training image set.
  • determine the first soft classification loss of the first neural network to be trained in the first iteration of the i-th according to the difference between the third classification result and the fourth classification result , Determine the second soft classification loss of the second neural network before the training of the i-th first iteration.
  • the average network supervises the first neural network to be trained in the i-th first iteration and the target neural network in the i-th first iteration supervises the second neural network to be trained in the i-th first iteration.
  • the weighted summation obtains the second loss of the second neural network to be trained in the first iteration of the i-th. Then, based on the first loss, the parameters of the first neural network to be trained in the i-th first iteration are adjusted to obtain the first neural network to be trained in the i+1th first iteration.
  • the parameters of the second neural network to be trained in the i-th first iteration are adjusted based on the second loss to obtain the second neural network to be trained in the i+1th first iteration.
  • the above-mentioned i-th first iteration can be determined according to the parameters of the target neural network of the i-1th first iteration and the parameters of the first to-be-trained neural network of the i-th first iteration.
  • the parameters of the iterative target neural network, and the average network parameters of the i-1 first iteration and the parameters of the second neural network to be trained in the i-th first iteration determine the average of the i-th first iteration The parameters of the network.
  • the parameters of the target neural network for the i+1th first iteration and the i+1th first iteration can be determined respectively according to the following two formulas
  • E i+1 ( ⁇ 1 ) is the parameter of the target neural network of the i+1 first iteration
  • E i ( ⁇ 1 ) is the parameter of the target neural network of the i-th first iteration
  • E i+1 ( ⁇ 2 ) is the parameter of the average network of the i+1th first iteration
  • E i ( ⁇ 2 ) is the parameter of the average network of the i-th first iteration
  • ⁇ 1 i+1 is the parameter of the first neural network to be trained in the i+1th first iteration
  • ⁇ 2 i+1 is the parameter of the second neural network to be trained in the i+1th first iteration.
  • the above x is a positive integer, and the above i is a positive integer less than or equal to x.
  • the parameters of the second neural network to be trained in the first iteration of the i+1th After the parameters of the i+1th iteration and the average network parameters of the first iteration, the i+1th iteration is performed. After the execution of the xth iteration, the parameters of the target neural network of the first iteration of the xth iteration are adjusted to obtain the target neural network.
  • the training image set contains image 1, image 2, and image 3.
  • the pseudo-hard label of image 1 is [1,0]
  • the first nerve to be trained after the first iteration The network (ie, the first neural network to be trained) processes image 1 (ie, the first image) in the above training image set, and the classification result is [0.7, 0.3], the second neural network to be trained after the first iteration
  • the network (that is, the second neural network to be trained) processes the above image 1 to obtain a classification result of [0.8, 0.2].
  • the image 1 The classification result obtained by processing is [0.7, 0.3], and the classification result obtained by processing image 1 through the first iteration of the average network (ie, the second neural network to be trained) is [0.8, 0.2].
  • the cross entropy loss between [1,0] and [0.7,0.3] to obtain the first hard classification loss
  • the cross entropy loss between [1,0] and [0.8,0.2] to obtain the second hard classification loss
  • the first hard classification loss and the first soft classification loss are weighted and summed to obtain the first loss
  • the second hard classification loss and the second soft classification loss are weighted and summed to obtain the second loss.
  • the first loss is obtained by weighted summation of the first hard classification loss and the first soft classification loss
  • the weighted summation of the second hard classification loss and the second soft classification loss is obtained
  • the first hard ternary loss of the first neural network to be trained for i times of the first iteration, and the second hard ternary loss of the second neural network to be trained for the first iteration of i times can also be determined.
  • weight the second hard classification loss, the second soft classification loss and the second hard ternary loss Summing to get the second loss.
  • the first neural network to be trained in the i-th first iteration processes the above-mentioned training image set to obtain the first feature data set
  • the second neural network to be trained in the i-th first iteration is The network processes the above-mentioned training image set to obtain a fourth feature data set. Determine the minimum similarity between the first feature data of the first image in the first feature data set and the feature data in the positive sample feature data subset of the first feature data set, obtain the first similarity, and determine the first feature data
  • the minimum similarity between the third feature data in the fourth feature data set and the feature data in the positive sample feature data subset in the fourth feature data set of an image is obtained to obtain the ninth similarity.
  • the negative sample feature data subset includes feature data of an image with a label different from the above-mentioned first label
  • the positive sample feature data subset includes feature data of an image with the same label as the above-mentioned first label.
  • the training image set includes image 1, image 2, image 3, image 4, and image 5.
  • the labels in image 1, image 3 and image 5 are all Zhang San, and the labels in image 2 and image 4 are all Li Si.
  • the first feature data set includes feature data of image 1 (first feature data), feature data of image 2 (hereinafter referred to as feature data 2), feature data of image 3 (hereinafter referred to as feature data 3), and image 4
  • feature data 4 feature data of image 4
  • feature data 5 the feature data of image 5
  • the fourth feature data set includes feature data of image 1 (third feature data), feature data of image 2 (hereinafter referred to as feature data 6), feature data of image 3 (hereinafter referred to as feature data 7), and image 4
  • feature data 8 The feature data of (hereinafter will be referred to as feature data 8), the feature data of image 5 (hereinafter will be referred to as feature data 9).
  • the positive sample feature data subset in the first feature data set includes feature data 3 and feature data 5
  • the negative sample feature data subset in the first feature data set includes feature data 2 and feature data 4.
  • the positive sample feature data subset in the fourth feature data set includes feature data 7 and feature data 9, and the negative sample feature data subset in the fourth feature data set includes feature data 6 and feature data 8.
  • the similarity between the third feature data and feature data 6, feature data 7, feature data 8, and feature data 9, respectively assuming that the similarity between the third feature data and feature data 7 is less than the second feature data and feature data
  • the similarity between 9 and the similarity between the third feature data and the feature data 7 is the ninth similarity.
  • the similarity between the third feature data and the feature data 6 is less than the similarity between the third feature data and the feature data 8
  • the similarity between the third feature data and the feature data 8 is the tenth similarity.
  • the first hard ternary loss between the first characteristic data, characteristic data 3 and characteristic data 4 can be determined
  • the second characteristic data, characteristic data 7 and characteristic data 8 can be determined according to formula (5).
  • max (A, B) is the maximum value of A and B
  • d 1 is the first similarity
  • d 3 is the third similarity
  • d 9 is the ninth similarity
  • d 10 is the tenth similarity
  • m It is a natural number greater than 0 and less than 1.
  • the feature data with the smallest degree of similarity between the positive sample feature data subset and the first feature data is called the most difficult feature data in the first feature data class, and the negative sample feature data subset and the first feature data are the most difficult feature data in the class.
  • the feature data with the greatest similarity is called the most difficult feature data outside the class of the first feature data.
  • the feature data of other images in the training image set including image 2, image 3, image 4, and image 5) in the first feature data set can be determined in the positive sample feature data subset of the first feature data subset.
  • the most difficult feature data within and the most difficult feature data outside the class, and then the first feature data of each image can be determined based on the feature data of each image in the first feature data set, the most difficult feature data within the class, and the most difficult feature data outside the class.
  • Training the hard ternary loss of the neural network In the same way, the hard ternary loss of the second neural network to be trained for each image can be determined according to the feature data of each image in the fourth feature data set, the most difficult feature data within the class, and the most difficult feature data outside the class. .
  • the training image on the first neural network to be trained focus the training image on the first neural network to be trained
  • the average value of the hard ternary loss is taken as the first hard ternary loss
  • the average value of the hard ternary loss of the second neural network to be trained in at least one image in the training image set is taken as the second hard ternary loss.
  • the first neural network to be trained in the first iteration of the i-th iteration will improve the characteristic data of images belonging to the same category. Similarity, and reduce the similarity between feature data of images belonging to different classes. In order to better distinguish different types of images, improve the recognition accuracy of image categories.
  • the second hard ternary loss supervises the second neural network to be trained in the first iteration of the i-th iteration, which can improve the effect of extracting features from the image by the second neural network to be trained in the first iteration of the i-th iteration. Richer feature data of image information.
  • the first loss is determined based on the first hard classification loss, the first soft classification loss, and the first hard ternary
  • the second loss is determined based on the second hard classification loss, the second soft classification loss, and the second hard ternary. Then adjust the first neural network to be trained in the i-th first iteration based on the first loss, and adjust the second neural network to be trained in the i-th first iteration based on the second loss, so that through the training image and the i-th first iteration
  • the average network supervises the first neural network to be trained in the i-th first iteration, and supervises the second neural network to be trained in the i-th first iteration through the training image and the pre-training target network of the i-th first iteration.
  • the i-th iteration in the embodiment determines the positive sample feature data subset and the negative sample feature data subset through the labels of the images in the training image set, and the labels are pseudo hard labels obtained by the traditional method in step 301. Since the pseudo hard tag is data processed by one-hot encoding, that is, the value in the pseudo hard tag is not 0 or 1, so the pseudo hard tag has a large error, so the positive sample subset determined by the pseudo hard tag There is also a large error in the subset of negative samples and the negative sample subset, which leads to the poor feature extraction effect of the first neural network to be trained on the target domain obtained after the first iteration of the i+1th iteration, which in turn leads to The recognition accuracy on the target domain is low.
  • the label of the image in the training image set contains two categories (Zhang San and Li Si). Since the value of the pseudo-hard label is not 0 or 1, the person object in the image in the training image set is either Zhang San, or Li Si. Assume that the person object in image 1 in the training image set is Zhang San, the category represented by the pseudo-hard label of image 1 is Zhang San, the person object in image 2 is Li Si, but the category represented by the pseudo-hard label of image 2 is Zhang Third, the person object in image 3 is Zhang San, but the category represented by the pseudo-hard label of image 3 is Li Si.
  • the feature data of image 1 in the first feature data set is feature data a
  • the feature data of image 2 in the first feature data set is feature data b
  • the feature data of image 3 in the first feature data set is feature data c.
  • the most difficult feature data within the class of feature data a is feature data b
  • the most difficult feature data outside the class of feature data a is feature data c.
  • the first hard ternary loss determined by feature data a, feature data b, and feature data c supervises the first neural network to be trained in the i-th first iteration and adjusts the first neural network to be trained in the i-th first iteration so that The first neural network to be trained in the first iteration of the i-th increases the similarity between the feature data extracted from image 1 and the feature data extracted from image 2, and reduces the feature data extracted from image 1.
  • the similarity with the feature data extracted from image 2 is obtained to obtain the first neural network to be trained for the first iteration of the i+1th time.
  • the person object (Zhang San) in image 1 and the person object (Li Si) in image 2 are not the same person.
  • embodiments of the present disclosure provide a method for obtaining the first soft ternary loss by supervising the first neural network to be trained in the i-th first iteration through soft tags.
  • the first soft ternary loss supervises the first neural network to be trained in the first iteration of the i+1th iteration to improve the recognition accuracy of the first neural network to be trained in the first iteration of the i+1 time, thereby improving the recognition of the target neural network Accuracy.
  • FIG. 6 is a flowchart of another image processing method provided by an embodiment of the present disclosure.
  • the average network of the i-th first iteration processes the above-mentioned training image set to obtain a second feature data set
  • the target network of the i-th first iteration before training processes the above-mentioned training image set to obtain a fifth feature. data set.
  • the maximum similarity with the feature data in the negative sample feature data subset in the second feature data set obtains the fourth similarity.
  • the maximum similarity between the data and the feature data in the negative sample feature data subset in the fifth feature data set is the twelfth similarity.
  • the feature data contained in the positive sample feature data subsets in different feature data sets are different, and the feature data contained in the negative sample feature data subsets in different feature data sets are also different.
  • the classification of the image categories in the training image set is "too absolute", which in turn leads to the characteristics of the first neural network to be trained in the first iteration of the i+1th iteration on the target domain
  • the extraction effect is poor.
  • the first similarity, the second similarity, the third similarity, the fourth similarity, the ninth similarity, the tenth similarity, the eleventh similarity, and the twelfth similarity are respectively normalized.
  • Conversion processing converting the first similarity, the second similarity, the third similarity, the fourth similarity, the ninth similarity, the tenth similarity, the eleventh similarity, and the twelfth similarity into 0 to 1
  • the difference between the similarities obtained after the normalization process is used to determine the first soft ternary loss of the first neural network to be trained in the i-th first iteration and the first soft ternary loss of the i-th first iteration 2.
  • the second soft ternary loss of the neural network to be trained to improve the feature extraction effect of the first neural network to be trained in the first iteration of the i+1th time.
  • the first total similarity is obtained by determining the sum of the second similarity and the fourth similarity
  • the second total similarity is obtained by determining the sum of the first similarity and the third similarity
  • the second total similarity is determined.
  • the sum of the nine degree of similarity and the tenth degree of similarity obtains the third total similarity
  • the sum of the eleventh degree of similarity and the twelfth similarity degree is determined to obtain the fourth total similarity.
  • the quotient obtains the seventh similarity
  • the quotient of the third similarity and the second total similarity is calculated to obtain the eighth similarity
  • the quotient of the ninth similarity and the third total similarity is calculated to obtain the thirteenth similarity
  • the tenth similarity is calculated
  • the quotient of the degree and the second total similarity obtains the fourteenth similarity
  • the quotient of the eleventh similarity and the fourth total similarity is calculated to obtain the fifteenth similarity
  • the twelfth similarity and the fourth total similarity are calculated
  • the quotient obtains the sixteenth similarity.
  • the first soft ternary loss is determined based on the difference between the fifth similarity and the seventh similarity and the difference between the sixth similarity and the eighth similarity, and the difference between the thirteenth similarity and the fifteenth similarity is determined.
  • the difference between the fourteenth similarity and the sixteenth similarity determines the second soft ternary loss.
  • the similarity between the most difficult feature data of each image in the second feature data set and the feature data of each image in the second feature data set for each image in the training image set determines the soft ternary loss of the average network of the i-th first iteration of each image.
  • the similarity between the most difficult feature data of each image in the fifth feature data set and the feature data of each image in the fifth feature data set for each image in the training image set, and the class outside the fifth feature data set determines the soft ternary loss of the target neural network for the i-th first iteration of each image. Then calculate the average value of the soft ternary loss of the network for the i-th first iteration of at least one image in the training image set to obtain the first soft ternary loss, and calculate the i-th first iteration of at least one image in the training image set The average value of the soft ternary loss of the target neural network obtains the second soft ternary loss.
  • the normalized similarity can be used as the supervision data to improve the accuracy of the target neural network.
  • the image set to be processed contains 10 images.
  • the images in the image set to be processed are divided into Zhang San and Li Si according to the identity of the person objects in the images in the image set to be processed.
  • the pseudo-hard label of the identity of the object is the image of Zhang San (hereinafter referred to as the first type of image) and the pseudo-hard label of the identity of the included person object is the image of Li Si (hereinafter referred to as the second type of image). 5 images.
  • the real identity of the person object in image 1 in the first type of image is Li Si
  • the real identity of the person object in image 2 in the second type of image is Zhang San.
  • the first type of image contains 4 images with the identity of Zhang San and 1 image with the identity of Li Si.
  • the distribution of the true label should be [0.8,0.2 ].
  • [0.8,0.2] characterizes the identity of the person objects contained in the first type of images.
  • the number of images in which the number of images of Zhang San accounts for 0.8 of the total number of images of the first type.
  • the proportion of the number of images with the identity of Li Si in the total number of images of the first type is 0.2.
  • the distribution of the true label for the second type of image should be [0.2, 0.8].
  • [0.2,0.8] characterizes the identity of the person objects contained in the second type of images.
  • the number of images of Zhang San accounted for 0.2 of the total number of images of the second type, and the proportion of the person objects contained in the second type of images
  • the number of images with the identity of Li Si accounts for 0.8 of the total number of images in the second category.
  • the pseudo hard label of the first type of image is [1,0]
  • the pseudo hard label of the second type of image is [0,1], which obviously does not conform to the distribution of the true label of the first type of image and the second type of image.
  • Distribution of true labels The soft label obtained by the method provided in this embodiment is a value between 0 and 1, which is more in line with the distribution of the true label of the first type of image and the distribution of the true label of the second type of image, so the soft label is used as the supervision data.
  • the similarity in the embodiments of the present disclosure may be Euclidean distance or cosine similarity, which is not limited in the present disclosure.
  • the first preprocessing can be performed on the images in the training image set to obtain the first image set, and then the first image set is input to the first neural network to be trained in the i-th first iteration
  • the network obtains the first feature data set, and inputs the first image set to the target neural network of the i-th first iteration to obtain the fifth feature data set.
  • the first preprocessing includes any one of erasing processing, trimming processing, and inversion processing.
  • the first neural network to be trained in the i-th first iteration By performing the first preprocessing on the training image set, it can be reduced during the training process, the first neural network to be trained in the i-th first iteration, the second neural network to be trained in the i-th first iteration, and the second neural network to be trained.
  • Network the probability that the target neural network of the i-th first iteration and the average network of the i-th first iteration have overfitting.
  • the second preprocessing may also be performed on the training image set to obtain a fourth image set.
  • the second preprocessing includes any one of erasing processing, trimming processing, and inversion processing, and the first preprocessing and the second preprocessing are different.
  • the fourth image set is input to the second neural network to be trained in the i-th first iteration to obtain the fourth feature data set, and the fourth image set is input to the i-th first iteration of the average network to obtain the second feature data set.
  • the first neural network to be trained in the i-th first iteration and the second neural network to be trained in the i-th first iteration can be further reduced in the training process.
  • the training image set contains image 1 and image 2, cropping image 1 to obtain image 3, and erasing image 2 (erasing any area in image 2) to obtain image 4. And take image 3 and image 4 as the first image set.
  • Image 1 is flipped to obtain image 5
  • image 2 is cropped to obtain image 6, and image 5 and image 6 are used as the fourth image set.
  • the target neural network of the first iteration i obtains the fifth feature data set containing the feature data of image 3 and the feature data of image 4 respectively, and inputs image 5 and image 6 to the second neural network to be trained in the first iteration of the i
  • the network contains the feature data of image 5 and the fourth feature data set of image 6, and the image 5 and image 6 are input to the ith first iteration of the average network to obtain the feature data of image 5 and the feature of image 6
  • the second feature data set of the data is
  • Example 7 continues the example, the label of image 1, the label of image 3 and the label of image 5 are all the same, the label of image 2, the label of image 4 and the label of image 6 are all the same.
  • the second image is processed by the first neural network to be trained in the i-th first iteration to obtain the first classification result
  • the second image is processed by the target neural network of the i-th first iteration to obtain the fourth classification.
  • the third classification result can be obtained by processing the fourth image by the second to-be-trained neural network of the first iteration of the i
  • the second classification can be obtained by processing the second image by the average network of the first iteration of the i result.
  • the feature data in the first feature data set obtained by the processing of the first image set by the first neural network to be trained in the i-th first iteration and the training image set of the first neural network to be trained in the i-th first iteration is different.
  • the images in the training image set mentioned above are in the first feature data set (or the second feature data set or the fourth feature data set or the fifth feature data set).
  • the most difficult feature data in the first feature data set (or the second feature data set or the fourth feature data set or the fifth feature data set) of the image after the first preprocessing or the second preprocessing, and the most difficult feature data in the training image set The most difficult out-of-class feature data of the image in the first feature data set (or the second feature data set or the fourth feature data set or the fifth feature data set) refers to the image after the first preprocessing or the second preprocessing.
  • the structures of the first neural network to be trained, the first neural network, and the first neural network to be trained of the i-th first iteration in the embodiments of the present disclosure are the same, but the parameters are different.
  • the structure of the second neural network to be trained, the second neural network and the second neural network to be trained of the i-th first iteration are the same, but the parameters are different.
  • the structure of the target network and the target neural network of the i-th first iteration are the same, but the parameters are different.
  • the structure of the average network and the average neural network before the training of the i-th first iteration are the same, but the parameters are different.
  • the output of the target neural network of the i-th first iteration includes the fourth classification result And the fifth feature data set
  • the output of the second neural network to be trained of the i-th first iteration includes the third classification result and the fourth feature data set
  • the output of the average network of the i-th first iteration includes the second classification result And the second feature data set.
  • samples may be obtained from the training image set.
  • Image set and use the sample image set as training data for a first iteration or a second iteration.
  • the 603. Determine the first loss based on the above-mentioned first hard classification loss, the above-mentioned first soft classification loss, the above-mentioned first soft ternary loss, and the above-mentioned first hard ternary loss, and determine the first loss based on the above-mentioned second hard classification loss and the above-mentioned second soft classification loss. Loss, the above-mentioned second soft ternary loss and the above-mentioned second hard ternary loss determine the second loss.
  • the first hard ternary loss Perform a weighted summation of the first hard ternary loss, the first hard classification loss, the first soft ternary loss, and the first soft classification loss to obtain the first loss.
  • the second soft ternary loss and the second soft classification loss are weighted and summed to obtain the second loss.
  • the weight of the weighted summation can be adjusted according to the actual use situation, which is not limited in the present disclosure.
  • soft labels can be obtained according to the first feature data set, the second feature data set, the fourth feature data set, and the fifth feature data set, and the first to-be-trained of the i-th iteration can be supervised by the soft label.
  • the neural network and the second to-be-trained neural network of the i-th iteration can obtain the first soft ternary loss and the second soft ternary loss.
  • the second neural network to be trained can improve the recognition accuracy of the first neural network to be trained in the i+1th iteration on the target domain and the feature extraction of the second neural network to be trained in the i+1th iteration on the target domain The effect of, in turn, can improve the recognition accuracy of the target neural network in the target domain.
  • the embodiments of the present disclosure also provide application scenarios for image retrieval based on the target feature data of the image to be processed obtained by the embodiments. That is, the database is retrieved using the target feature data, and an image with feature data matching the target feature data is obtained as the target image.
  • the above-mentioned database may be established before acquiring the image to be processed.
  • the database includes the image and the characteristic data of the image, wherein the characteristic data of the image is related to the task performed by the target neural network on the target domain.
  • the target neural network is used to identify the identity of the human object in the image in the target domain.
  • the feature data of the image includes the characteristics of the human object in the image, including the clothing attributes, appearance characteristics and other characteristics that can be used to identify the identity of the human object.
  • Clothing attributes include at least one of the characteristics of items that decorate the human body (such as top color, trouser color, trouser length, hat style, shoe color, umbrella type, luggage category, presence or absence of masks, mask color).
  • Appearance characteristics include body type, gender, hairstyle, hair color, age group, whether to wear glasses, and whether to hold something on the chest.
  • Other features that can be used to identify the identity of human objects include: posture, viewing angle, stride length, and environmental brightness.
  • the target neural network is used to identify which of the apples, pears, and peaches are contained in the image in the target domain, and the characteristic data of the image includes characteristic information of apples, characteristic information of pears, or characteristic information of peaches.
  • the target feature data is used to retrieve the database, and the feature data matching the target feature data is determined from the database, that is, the similarity between the target feature data and the feature data of the image in the database is determined , And use the feature data of the image whose similarity reaches the threshold value as the feature data that matches the target feature data, and then determine the target image.
  • the number of target images can be one or more.
  • the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possibility.
  • the inner logic is determined.
  • FIG. 7 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present disclosure.
  • the apparatus 1 includes: an acquisition unit 11, a processing unit 12, and a retrieval unit 13, wherein:
  • the acquiring unit 11 is used to acquire an image to be processed
  • the feature extraction processing unit 12 is configured to perform feature extraction processing on the image to be processed using a target neural network to obtain target feature data of the image to be processed, and the parameters of the target neural network are the time sequence of the parameters of the first neural network
  • the average value the first neural network is trained under the supervision of the training image set and the average network
  • the parameters of the average network are the time-series average values of the parameters of the second neural network
  • the second neural network is trained in the training The image set and the target neural network are trained under the supervision.
  • the first neural network is obtained by training under the supervision of a training image set and an average network, including: obtaining the training image set, a first neural network to be trained, and a second neural network to be trained Execute the first iteration x times on the first neural network to be trained and the second neural network to be trained to obtain the first neural network and the second neural network, where x is a positive integer; the x times
  • the i-th first iteration in the first iteration includes: supervising the first neural network to be trained in the i-th first iteration with the training image set and the output of the average network of the i-th first iteration
  • the first neural network to be trained in the i+1 first iteration supervises the training image set and the output of the target neural network in the i-th first iteration to supervise the first neural network in the i-th first iteration.
  • the output of the training image set and the i-th average network is used to supervise the first neural network to be trained in the i-th first iteration to obtain the (i+1)th
  • One iteration of the first neural network to be trained includes: processing the training image set by the first neural network to be trained in the i-th first iteration to obtain a first feature data set, and The average network of the i-th first iteration processes the training image set to obtain a second feature data set; obtains a first soft ternary loss according to the first feature data set and the second feature data set; Use the training image set and the first soft ternary loss to supervise the first neural network to be trained in the i-th first iteration to obtain the first Neural network to be trained.
  • the obtaining the first soft ternary loss according to the first feature data set and the second feature data set includes: determining that the first image in the training image set is in the training image set. The minimum similarity between the first feature data in the first feature data set and the feature data in the positive sample feature data subset in the first feature data set to obtain the first similarity; and it is determined that the first image is at all The minimum similarity between the second feature data in the second feature data set and the feature data in the positive sample feature data subset in the second feature data set to obtain a second similarity, the positive sample feature data
  • the subset includes feature data of an image with the same label as the first label of the first image; determining the difference between the first feature data and the feature data in the negative sample feature data subset in the first feature data set Obtain the third similarity degree; determine the maximum similarity between the second feature data and the feature data in the negative sample feature data subset in the second feature data set to obtain the fourth similarity ,
  • the negative sample feature data subset includes feature data of an image with
  • the normalization processing is performed on the first similarity, the second similarity, the third similarity, and the fourth similarity, respectively, to obtain the fifth degree of similarity.
  • the similarity, the sixth similarity, the seventh similarity, and the eighth similarity include: determining the sum of the second similarity and the fourth similarity to obtain a first total similarity, and determining the first similarity The sum of the second similarity and the third similarity obtains a second total similarity; the quotient of the second similarity and the first total similarity is determined to obtain the fifth similarity, and the fourth similarity is determined to be The quotient of the first total similarity obtains the sixth similarity; the quotient of the first similarity and the second total similarity is determined to obtain the seventh similarity, and the third similarity is determined to be The quotient of the second total similarity obtains the eighth similarity.
  • the first neural network to be trained in the i-th first iteration is supervised by the training image set and the first soft ternary loss to obtain the first
  • the first neural network to be trained in the i+1 first iteration includes: processing the first image by the first neural network to be trained in the i-th first iteration to obtain a first classification result Determine the first loss of the first neural network to be trained in the i-th first iteration according to the first classification result, the first label and the first soft ternary loss; based on the first A loss adjusts the parameters of the first neural network to be trained in the i-th first iteration to obtain the first neural network to be trained in the i+1-th first iteration.
  • the first to-be-trained of the i-th first iteration is determined according to the first classification result, the first label, and the first soft ternary loss
  • the first loss of the neural network includes: determining the first hard classification loss according to the difference between the first classification result and the first label; according to the first hard classification loss and the first soft ternary loss Determine the first loss.
  • the i-th first iteration of the The averaging network processes the first image to obtain a second classification result; determines the first soft classification loss according to the difference between the first classification result and the second classification result; said according to the first hard classification Loss and the first soft ternary loss to determine the first loss, including: determining the first loss according to the first hard classification loss, the first soft classification loss, and the first soft ternary loss .
  • the first loss is determined according to the first hard classification loss, the first soft classification loss, and the first soft ternary loss.
  • a degree of similarity and the third degree of similarity determine a first hard ternary loss; the first hard ternary loss is determined based on the first hard classification loss, the first soft classification loss, and the first soft ternary loss
  • the loss includes: determining the first loss according to the first hard classification loss, the first soft classification loss, the first soft ternary loss, and the first hard ternary loss.
  • the first neural network to be trained after the i-th first iteration processes the first image in the training image set to obtain a first classification result.
  • Training a neural network includes: performing a first preprocessing on the training image set to obtain a first image set, where the first preprocessing includes any one of erasing processing, clipping processing, and flipping processing;
  • the first neural network to be trained in the first iteration of i times processes the second image in the first image set to obtain the first classification result, and the second image is performed by performing the first image on the first image.
  • the first preprocessing is obtained, and the feature data of the second image in the first feature data set is the same as the data of the first image in the first feature data set.
  • the first neural network to be trained through the i-th first iteration processes the training image set to obtain a first feature data set, including: The first neural network to be trained in the first iteration processes the first image set to obtain the first feature data set.
  • the acquiring unit 11 is specifically configured to: acquire a set of images to be processed and a third neural network; perform y second iterations on the third neural network to obtain the training image set ,
  • the y is a positive integer
  • the t second iteration of the y second iterations includes: sampling from the image set to be processed to obtain a second image set, and the third nerve of the t second iteration
  • the network processes the images in the second image set to obtain a third feature data set containing feature data of the images in the second image set and a classification result set containing the classification results of the images in the second image set; Perform clustering processing on the feature data in the third feature data set to determine the label of the feature data in the third feature data set, and add the label of the feature data in the third feature data set to the second image set
  • a third image set is obtained; a third loss is determined according to the difference between the classification result in the classification result set and the labels of the images in the third image set; and the third loss is adjusted
  • the device further includes: a retrieval unit 13 configured to retrieve a database using the target feature data, and obtain an image with feature data matching the target feature data as a target image.
  • a retrieval unit 13 configured to retrieve a database using the target feature data, and obtain an image with feature data matching the target feature data as a target image.
  • the parameters of the target neural network and the parameters of the average network are obtained respectively, and then the output of the target neural network is used to supervise the second neural network.
  • the target neural network is used to perform related recognition tasks on the target domain, more informative target feature data can be extracted, and this information can improve the accuracy of recognition in the target domain.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • FIG. 8 is a schematic diagram of the hardware structure of an image processing device provided by an embodiment of the disclosure.
  • the image processing device 2 includes a processor 21, a memory 22, an input device 23, and an output device 24.
  • the processor 21, the memory 22, the input device 23, and the output device 24 are coupled through a connector, and the connector includes various interfaces, transmission lines, or buses, etc., which are not limited in the embodiment of the present disclosure.
  • coupling refers to mutual connection in a specific manner, including direct connection or indirect connection through other devices, for example, can be connected through various interfaces, transmission lines, buses, and the like.
  • the processor 21 may be one or more graphics processing units (GPUs).
  • the GPU may be a single-core GPU or a multi-core GPU.
  • the processor 21 may be a processor group composed of multiple GPUs, and the multiple processors are coupled to each other through one or more buses.
  • the processor may also be other types of processors, etc., which is not limited in the embodiment of the present disclosure.
  • the memory 22 may be used to store computer program instructions and various computer program codes including program codes used to execute the solutions of the present disclosure.
  • the memory includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) ), or portable read-only memory (compact disc read-only memory, CD-ROM), which is used for related instructions and data.
  • the input device 23 is used to input data and/or signals
  • the output device 24 is used to output data and/or signals.
  • the output device 23 and the input device 24 may be independent devices or a whole device.
  • the memory 22 can be used not only to store related instructions, but also to store related images.
  • the memory 22 can be used to store the neural network to be searched obtained through the input device 23, or the memory 22 can also be used.
  • the embodiment of the present disclosure does not limit the specific data stored in the memory.
  • FIG. 8 only shows a simplified design of an image processing device.
  • the image processing device may also include other necessary components, including but not limited to any number of input/output devices, processors, memories, etc., and all image processing devices that can implement the embodiments of the present disclosure are in this Within the scope of public protection.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium.
  • the computer instructions can be sent from a website, computer, server, or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (such as infrared, wireless, microwave, etc.) Another website site, computer, server or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital versatile disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)) )Wait.
  • a magnetic medium for example, a floppy disk, a hard disk, a magnetic tape
  • an optical medium for example, a digital versatile disc (DVD)
  • DVD digital versatile disc
  • SSD solid state disk
  • the process can be completed by a computer program instructing relevant hardware.
  • the program can be stored in a computer readable storage medium. , May include the processes of the foregoing method embodiments.
  • the aforementioned storage media include: read-only memory (ROM) or random access memory (RAM), magnetic disks or optical disks and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

本公开公开了一种图像处理方法及相关装置。该方法包括:获取待处理图像;使用目标神经网络对所述待处理图像进行特征提取处理,获得所述待处理图像的目标特征数据,所述目标神经网络的参数为第一神经网络的参数的时序平均值,所述第一神经网络在训练图像集和平均网络的监督下训练获得,所述平均网络的参数为第二神经网络的参数的时序平均值,所述第二神经网络在所述训练图像集和所述目标神经网络的监督下训练获得。还公开了相应的装置。以通过对待处理图像进行特征提取处理获得的待处理图像目标特征数据。

Description

图像处理方法及相关装置
本公开要求在2019年9月24日提交中国专利局、申请号为201910905445.7、申请名称为“图像处理方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及图像处理技术领域,尤其涉及一种图像处理方法及相关装置。
背景技术
得益于强大的性能,近几年神经网络被广泛应用于各种图像识别任务(如行人重识别、图像分类)。而训练神经网络需要大量标注数据,人们采通过无监督学习的方式使用未标注的数据完成对神经网络的训练。传统的无监督学习方法通过在源域上训练好的神经网络对目标域上的未标注图像进行识别,并给目标域上的未标注图像添加标签,再以该标签监督在源域上训练好的神经网络,并调整在源域上训练好的神经网络的参数,获得应用于目标域上的神经网络。
发明内容
本公开提供一种图像处理技术方案。
第一方面,提供了一种图像处理方法,所述方法包括:获取待处理图像;使用目标神经网络对所述待处理图像进行特征提取处理,获得所述待处理图像的目标特征数据,所述目标神经网络的参数为第一神经网络的参数的时序平均值,所述第一神经网络在训练图像集和平均网络的监督下训练获得,所述平均网络的参数为第二神经网络的参数的时序平均值,所述第二神经网络在所述训练图像集和所述目标神经网络的监督下训练获得。
在该方面中,通过确定第一神经网络的参数的时序平均值和第二神经网络的时序平均值,分别获得目标神经网络的参数和平均网络的参数,再使用目标神经网络的输出监督第二神经网络,使用平均网络的输出监督第一神经网络对目标神经网络进行训练,可提升训练效果。进而使用目标神经网络在目标域上执行相关的识别任务时,可提取出信息更丰富的目标特征数据。
在一种可能实现的方式中,所述第一神经网络在训练图像集和平均网络的监督下训练获得,包括:获取所述训练图像集、第一待训练神经网络和第二待训练神经网络;对所述第一待训练神经网络和所述第二待训练神经网络执行x次第一迭代,获得所述第一神经网络和第二神经网络,所述x为正整数;所述x次第一迭代中的第i次第一迭代包括:以所述训练图像集和第i次第一迭代的所述平均网络的输出监督第i次第一迭代的所述第一待训练神经网络获得第i+1次第一迭代的所述第一待训练神经网络,以所述训练图像集和第i次第一迭代的所述目标神经网络的输出监督第i次第一迭代的所述第二待训练神经网络,获得第i+1次第一迭代的所述第二待训练神经网络;所述目标神经网络的参数为第一神经网络的参数的时序平均值,包括:依据第i-1次第一迭代的所述目标神经网络的参数和所述第i次第一迭代的所述第一待训练神经网络的参数确定所述第i次第一迭代的所述目标神经网络的参数,所述i为小于或等于所述x的正整数;在所述i=1的情况下,所述第i-1次第一迭代的所述目标神经网络的参数与所述第一待训练神经网络的参数相同。
在该种可能实现的方式中,在第i次第一迭代中,通过第i次第一迭代的平均网络监督第i次第一迭代的第一待训练神经网络,并通过第i次第一迭代的目标神经网络监督第i次第一迭代的第二待训练神经网络,可减小第i次第一迭代的第二待训练神经网络的输出与第i次第一迭代的第一待训练神经网络的输出之间存在的相关性对训练效果的影响,进而提升训练效果。
在另一种可能实现的方式中,所述以所述训练图像集和第i次平均网络的输出监督第i次第一迭代的所述第一待训练神经网络,获得第i+1次第一迭代的所述第一待训练神经网络,包括:经所述第i次第一迭代的所述第一待训练神经网络对所述训练图像集进行处理获得第一特征数据集,经所述第i次第一迭代的所述平均网络对所述训练图像集进行处理获得第二特征数据集;依据所述第一特征数据集和所述第二特征数据集获得第一软三元损失;以所述训练图像集和所述第一软三元损失监督所述第i次第一迭代的所述第一待训练神经网络,获得所述第i+1次第一迭代的所述第一待训练神经网络。
在该种可能实现的方式中,基于通过第一特征数据集和第二特征数据集确定第一软三元损失调整第i次第一迭代的第一待训练神经网络的参数可提升第i次第一迭代的第一待训练神经网络对目标域上的图像的特征提取效果,进而可提升目标神经网络对目标域上的图像的特征提取效果。
在又一种可能实现的方式中,所述依据所述第一特征数据集和所述第二特征数据集获得第一软三元损失,包括:确定所述训练图像集中的第一图像在所述第一特征数据集中的第一特征数据与所述第一特征数据集中的正样本特征数据子集中的特征数据之间的最小相似度,获得第一相似度;确定所述第一图像在所述第二特征数据集中的第二特征数据与所述第二特征数据集中的所述正样本特征数据子集中的特征数据之间的最小相似度,获得第二相似度,所述正样本特征数据子集包括具有与所述第一图像的第一标签相同 的标签的图像的特征数据;确定所述第一特征数据与所述第一特征数据集中的负样本特征数据子集中的特征数据之间的最大相似度,获得第三相似度;确定所述第二特征数据与所述第二特征数据集中的所述负样本特征数据子集中的特征数据之间的最大相似度,获得第四相似度,所述负样本特征数据子集包括具有与所述第一标签不同的标签的图像的特征数据;分别对所述第一相似度、所述第二相似度、所述第三相似度和所述第四相似度进行归一化处理,获得第五相似度、第六相似度、第七相似度和第八相似度;依据所述第五相似度,所述第六相似度,所述第七相似度和所述第八相似度,获得所述第一软三元损失。
在该种可能实现的方式中,通过对第一相似度、第二相似度、第三相似度和第四相似度进行归一化处理,将第一相似度、第二相似度、第三相似度和第四相似度转化为0至1之间的数值,获得更符合数据的真实分布的第五相似度、第六相似度、第七相似度和第八相似度,进而提升对目标神经网络的训练效果。
在又一种可能实现的方式中,所述分别对所述第一相似度、所述第二相似度、所述第三相似度和所述第四相似度进行归一化处理,获得第五相似度、第六相似度、第七相似度和第八相似度,包括:确定所述第二相似度和所述第四相似度的和获得第一总相似度,确定所述第一相似度和所述第三相似度的和获得第二总相似度;确定所述第二相似度与所述第一总相似度的商获得所述第五相似度,确定所述第四相似度与所述第一总相似度的商获得所述第六相似度;确定所述第一相似度与所述第二总相似度的商获得所述第七相似度,确定所述第三相似度与所述第二总相似度的商获得所述第八相似度。
在又一种可能实现的方式中,所述以所述训练图像集和所述第一软三元损失监督所述第i次第一迭代的所述第一待训练神经网络,获得所述第i+1次第一迭代的所述第一待训练神经网络,包括:经所述第i次第一迭代的所述第一待训练神经网络对所述第一图像进行处理获得第一分类结果;依据所述第一分类结果、所述第一标签和所述第一软三元损失确定所述第i次第一迭代的所述第一待训练神经网络的第一损失;基于所述第一损失调整所述第i次第一迭代的所述第一待训练神经网络的参数获得所述第i+1次第一迭代的所述第一待训练神经网络。
在又一种可能实现的方式中,所述依据所述第一分类结果、所述第一标签和所述第一软三元损失确定所述第i次第一迭代的所述第一待训练神经网络的第一损失,包括:依据所述第一分类结果和所述第一标签之间的差异确定第一硬分类损失;依据所述第一硬分类损失和所述第一软三元损失确定所述第一损失。
在又一种可能实现的方式中,在所述依据所述第一硬分类损失和所述第一软三元损失确定所述第一损失之前,所述方法还包括:经所述第i次第一迭代的所述平均网络对所述第一图像进行处理获得第二分类结果;依据所述第一分类结果和所述第二分类结果之间的差异确定第一软分类损失;所述依据所述第一硬分类损失和所述第一软三元损失确定所述第一损失,包括:依据所述第一硬分类损失、所述第一软分类损失和所述第一软三元损失确定所述第一损失。
在又一种可能实现的方式中,在所述依据所述第一硬分类损失、所述第一软分类损失和所述第一软三元损失确定所述第一损失之前,所述方法还包括:依据所述第一相似度和所述第三相似度确定第一硬三元损失;所述依据所述第一硬分类损失、所述第一软分类损失和所述第一软三元损失确定所述第一损失,包括:依据所述第一硬分类损失、所述第一软分类损失、所述第一软三元损失和所述第一硬三元损失确定所述第一损失。
在又一种可能实现的方式中,所述经所述第i次第一迭代的所述第一待训练神经网络对所述训练图像集中的第一图像进行处理获得第一分类结果,包括:对所述训练图像集进行第一预处理,获得第一图像集,所述第一预处理包括擦除处理、剪裁处理、翻转处理中的任意一种;经所述第i次第一迭代的所述第一待训练神经网络对所述第一图像集中的第二图像进行处理获得所述第一分类结果,所述第二图像通过对所述第一图像进行所述第一预处理获得,所述第二图像在所述第一特征数据集中的特征数据与所述第一图像在所述第一特征数据集中的数据相同。
在该种可能实现的方式中,通过对训练图像集中的图像进行第一预处理获得第一图像集,再将第一图像集输入至第i次第一迭代的第一待训练神经网络和第i次第一迭代的目标神经网络,以减小训练过程中出现过拟合的概率。
在又一种可能实现的方式中,所述经第i次第一迭代的所述第一待训练神经网络对所述训练图像集进行处理获得第一特征数据集,包括:经所述第i次第一迭代的所述第一待训练神经网络对所述第一图像集进行处理获得所述第一特征数据集。
在又一种可能实现的方式中,所述获取所述训练图像集,包括:获取待处理图像集和第三神经网络;对所述第三神经网络执行y次第二迭代,获得所述训练图像集,所述y为正整数;所述y次第二迭代中的第t次第二迭代包括:从所述待处理图像集中采样获得第二图像集,经第t次第二迭代的第三神经网络对所述第二图像集中的图像进行处理,获得包含所述第二图像集中的图像的特征数据的第三特征数据集以及包含所述第二图像集中的图像的分类结果的分类结果集;对所述第三特征数据集中的特征数据进行聚类处理确定所述第三特征数据集中的特征数据的标签,将所述第三特征数据集中的特征数据的标签添加至所述第二图像集中对应的图像中,获得第三图像集;依据所述分类结果集中的分类结果与所述第三图像集中的 图像的标签之间的差异,确定第三损失;基于所述第三损失调整所述第t次第二迭代的第三神经网络的参数,获得第t+1次第二迭代的第三神经网络的参数,所述t为小于所述y的正整数。
在又一种可能实现的方式中,所述方法还包括:使用所述目标特征数据检索数据库,获得具有与所述目标特征数据匹配的特征数据的图像,作为目标图像。
在又一种可能实现的方式中,所述待处理图像包含人物对象。
在又一种可能实现的方式中,所述以所述训练图像集和第i次第一迭代的所述目标神经网络的输出监督第i次第一迭代的所述第二待训练神经网络,获得第i+1次第一迭代的所述第二待训练神经网络,包括:经所述第i次第一迭代的所述第二待训练神经网络对所述训练图像集进行处理获得第四特征数据集,经所述第i次第一迭代的所述目标神经网络对所述训练图像集进行处理获得第五特征数据集;依据所述第四特征数据集和所述第五特征数据集获得第二软三元损失;以所述训练图像集和所述第二软三元损失监督所述第i次第一迭代的所述第二待训练神经网络,获得所述第i+1次第一迭代的所述第二待训练神经网络。
在又一种可能实现的方式中,所述依据所述第四特征数据集和所述第五特征数据集获得第二软三元损失,包括:确定所述第一图像在所述第四特征数据集中的第三特征数据与所述第四特征数据集中的正样本特征数据子集中的特征数据之间的最小相似度,获得第九相似度;确定所述第一图像在所述第五特征数据集中的第四特征数据与所述第五特征数据集中的所述正样本特征数据子集中的特征数据之间的最小相似度,获得第十一相似度,所述正样本特征数据子集包括具有与所述第一标签相同的标签的图像的特征数据;确定所述第三特征数据与所述第四特征数据集中的负样本特征数据子集中的特征数据之间的最大相似度,获得第十相似度;确定所述第三特征数据与所述第四特征数据集中的所述负样本特征数据子集中的特征数据之间的最大相似度,获得第十二相似度,所述负样本特征数据子集包括具有与所述第一标签不同的标签的图像的特征数据;分别对所述第九相似度、所述第十相似度、所述第十一相似度和所述第十二相似度进行归一化处理,获得第十三相似度、第十四相似度、第十五相似度和第十六相似度;依据所述第十三相似度、所述第十四相似度、所述第十五相似度和所述第十六相似度,获得所述第二软三元损失。
在又一种可能实现的方式中,所述分别对所述第九相似度、所述第十相似度、所述第十一相似度和所述第十二相似度进行归一化处理,获得第十三相似度、第十四相似度、第十五相似度和第十六相似度,包括:确定所述第九相似度和所述第十相似度的和获得第三总相似度,确定所述第十一相似度和所述第十二相似度的和获得第四总相似度;确定所述第九相似度与所述第三总相似度的商获得所述第十三相似度,确定所述第十相似度与所述第三总相似度的商获得所述第十四相似度;确定所述第十一相似度与所述第四总相似度的商获得所述第十五相似度,确定所述第十二相似度与所述第四总相似度的商获得所述第十六相似度。
在又一种可能实现的方式中,所述以所述训练图像集和所述第二软三元损失监督所述第i次第一迭代的所述第二待训练神经网络,获得所述第i+1次第一迭代的所述第二待训练神经网络,包括:经所述第i次第一迭代的所述第二待训练神经网络对所述第一图像进行处理获得第三分类结果;依据所述第三分类结果、所述第一标签和所述第二软三元损失确定所述第i次第一迭代的所述第二待训练神经网络的第二损失;基于所述第二损失调整所述第i次第一迭代的所述第二待训练神经网络的参数获得所述第i+1次第一迭代的所述第二待训练神经网络。
在又一种可能实现的方式中,所述依据所述第三分类结果、所述第一标签和所述第二软三元损失确定所述第i次第一迭代的所述第二待训练神经网络的第二损失,包括:依据所述第三分类结果和所述第一标签之间的差异确定第二硬分类损失;依据所述第二硬分类损失和所述第二软三元损失确定所述第二损失。
在又一种可能实现的方式中,在所述依据所述第二硬分类损失和所述第二软三元损失确定所述第二损失之前,所述方法还包括:经第所述i次第一迭代的所述目标神经网络对所述第一图像进行处理获得第四分类结果;依据所述第三分类结果和所述第四分类结果之间的差异确定第二软分类损失;所述依据所述第二硬分类损失和所述第二软三元损失确定所述第二损失,包括:依据所述第二硬分类损失、所述第二软分类损失和所述第二软三元损失确定所述第二损失。
在又一种可能实现的方式中,在所述依据所述第二硬分类损失、所述第二软分类损失和所述第二软三元损失确定所述第二损失之前,所述方法还包括:依据所述第九相似度和所述第十相似度确定第二硬三元损失;所述依据所述第二硬分类损失、所述第二软分类损失和所述第二软三元损失确定所述第二损失,包括:依据所述第二硬分类损失、所述第二软分类损失、所述第二软三元损失和所述第二硬三元损失确定所述第二损失。
在又一种可能实现的方式中,所述经所述第i次第一迭代的所述第二待训练神经网络对所述训练图像集中的第一图像进行处理获得第三分类结果,包括:对所述训练图像集进行第二预处理,获得第四图像集,所述第二预处理包括擦除处理、剪裁处理、翻转处理中的任意一种;经所述第i次第一迭代的所述第二待训练神经网络对所述第四图像集中的第三图像进行处理获得所述第三分类结果,所述第三图像通过对所述第一图像进行所述第二预处理获得,所述第三图像在所述第四特征数据集中的特征数据与所述第一图像在 所述第四特征数据集中的数据相同,所述第一预处理与所述第二预处理不同。
在又一种可能实现的方式中,所述经第i次第一迭代的所述第二待训练神经网络对所述训练图像集进行处理获得第四特征数据集,包括:经所述第i次第一迭代的所述第二待训练神经网络对所述第四图像集进行处理获得所述第四特征数据集。
第二方面,提供了一种图像处理装置,所述装置包括:获取单元,用于获取待处理图像;特征提取处理单元,用于使用目标神经网络对所述待处理图像进行特征提取处理,获得所述待处理图像的目标特征数据,所述目标神经网络的参数为第一神经网络的参数的时序平均值,所述第一神经网络在训练图像集和平均网络的监督下训练获得,所述平均网络的参数为第二神经网络的参数的时序平均值,所述第二神经网络在所述训练图像集和所述目标神经网络的监督下训练获得。
在一种可能实现的方式中,所述第一神经网络在训练图像集和平均网络的监督下训练获得,包括:获取所述训练图像集、第一待训练神经网络和第二待训练神经网络;对所述第一待训练神经网络和所述第二待训练神经网络执行x次第一迭代,获得所述第一神经网络和第二神经网络,所述x为正整数;所述x次第一迭代中的第i次第一迭代包括:以所述训练图像集和第i次第一迭代的所述平均网络的输出监督第i次第一迭代的所述第一待训练神经网络获得第i+1次第一迭代的所述第一待训练神经网络,以所述训练图像集和第i次第一迭代的所述目标神经网络的输出监督第i次第一迭代的所述第二待训练神经网络,获得第i+1次第一迭代的所述第二待训练神经网络;所述目标神经网络的参数为第一神经网络的参数的时序平均值,包括:依据第i-1次第一迭代的所述目标神经网络的参数和所述第i次第一迭代的所述第一待训练神经网络的参数确定所述第i次第一迭代的所述目标神经网络的参数,所述i为小于或等于所述x的正整数;在所述i=1的情况下,所述第i-1次第一迭代的所述目标神经网络的参数与所述第一待训练神经网络的参数相同。
在另一种可能实现的方式中,所述以所述训练图像集和第i次平均网络的输出监督第i次第一迭代的所述第一待训练神经网络,获得第i+1次第一迭代的所述第一待训练神经网络,包括:经所述第i次第一迭代的所述第一待训练神经网络对所述训练图像集进行处理获得第一特征数据集,经所述第i次第一迭代的所述平均网络对所述训练图像集进行处理获得第二特征数据集;依据所述第一特征数据集和所述第二特征数据集获得第一软三元损失;以所述训练图像集和所述第一软三元损失监督所述第i次第一迭代的所述第一待训练神经网络,获得所述第i+1次第一迭代的所述第一待训练神经网络。
在又一种可能实现的方式中,所述依据所述第一特征数据集和所述第二特征数据集获得第一软三元损失,包括:确定所述训练图像集中的第一图像在所述第一特征数据集中的第一特征数据与所述第一特征数据集中的正样本特征数据子集中的特征数据之间的最小相似度,获得第一相似度;确定所述第一图像在所述第二特征数据集中的第二特征数据与所述第二特征数据集中的所述正样本特征数据子集中的特征数据之间的最小相似度,获得第二相似度,所述正样本特征数据子集包括具有与所述第一图像的第一标签相同的标签的图像的特征数据;确定所述第一特征数据与所述第一特征数据集中的负样本特征数据子集中的特征数据之间的最大相似度,获得第三相似度;确定所述第二特征数据与所述第二特征数据集中的所述负样本特征数据子集中的特征数据之间的最大相似度,获得第四相似度,所述负样本特征数据子集包括具有与所述第一标签不同的标签的图像的特征数据;分别对所述第一相似度、所述第二相似度、所述第三相似度和所述第四相似度进行归一化处理,获得第五相似度、第六相似度、第七相似度和第八相似度;依据所述第五相似度,所述第六相似度,所述第七相似度和所述第八相似度,获得所述第一软三元损失。
在又一种可能实现的方式中,所述分别对所述第一相似度、所述第二相似度、所述第三相似度和所述第四相似度进行归一化处理,获得第五相似度、第六相似度、第七相似度和第八相似度,包括:确定所述第二相似度和所述第四相似度的和获得第一总相似度,确定所述第一相似度和所述第三相似度的和获得第二总相似度;确定所述第二相似度与所述第一总相似度的商获得所述第五相似度,确定所述第四相似度与所述第一总相似度的商获得所述第六相似度;确定所述第一相似度与所述第二总相似度的商获得所述第七相似度,确定所述第三相似度与所述第二总相似度的商获得所述第八相似度。
在又一种可能实现的方式中,所述以所述训练图像集和所述第一软三元损失监督所述第i次第一迭代的所述第一待训练神经网络,获得所述第i+1次第一迭代的所述第一待训练神经网络,包括:经所述第i次第一迭代的所述第一待训练神经网络对所述第一图像进行处理获得第一分类结果;依据所述第一分类结果、所述第一标签和所述第一软三元损失确定所述第i次第一迭代的所述第一待训练神经网络的第一损失;基于所述第一损失调整所述第i次第一迭代的所述第一待训练神经网络的参数获得所述第i+1次第一迭代的所述第一待训练神经网络。
在又一种可能实现的方式中,所述依据所述第一分类结果、所述第一标签和所述第一软三元损失确定所述第i次第一迭代的所述第一待训练神经网络的第一损失,包括:依据所述第一分类结果和所述第一标签之间的差异确定第一硬分类损失;依据所述第一硬分类损失和所述第一软三元损失确定所述第一损失。
在又一种可能实现的方式中,在所述依据所述第一硬分类损失和所述第一软三元损失确定所述第一损 失之前,经所述第i次第一迭代的所述平均网络对所述第一图像进行处理获得第二分类结果;依据所述第一分类结果和所述第二分类结果之间的差异确定第一软分类损失;所述依据所述第一硬分类损失和所述第一软三元损失确定所述第一损失,包括:依据所述第一硬分类损失、所述第一软分类损失和所述第一软三元损失确定所述第一损失。
在又一种可能实现的方式中,在所述依据所述第一硬分类损失、所述第一软分类损失和所述第一软三元损失确定所述第一损失之前,依据所述第一相似度和所述第三相似度确定第一硬三元损失;所述依据所述第一硬分类损失、所述第一软分类损失和所述第一软三元损失确定所述第一损失,包括:依据所述第一硬分类损失、所述第一软分类损失、所述第一软三元损失和所述第一硬三元损失确定所述第一损失。
在又一种可能实现的方式中,所述经所述第i次第一迭代的所述第一待训练神经网络对所述训练图像集中的第一图像进行处理获得第一分类结果第一待训练神经网络,包括:对所述训练图像集进行第一预处理,获得第一图像集,所述第一预处理包括擦除处理、剪裁处理、翻转处理中的任意一种;经所述第i次第一迭代的所述第一待训练神经网络对所述第一图像集中的第二图像进行处理获得所述第一分类结果,所述第二图像通过对所述第一图像进行所述第一预处理获得,所述第二图像在所述第一特征数据集中的特征数据与所述第一图像在所述第一特征数据集中的数据相同。
在又一种可能实现的方式中,所述经第i次第一迭代的所述第一待训练神经网络对所述训练图像集进行处理获得第一特征数据集,包括:经所述第i次第一迭代的所述第一待训练神经网络对所述第一图像集进行处理获得所述第一特征数据集。
在又一种可能实现的方式中,所述获取单元具体用于:获取待处理图像集和第三神经网络;对所述第三神经网络执行y次第二迭代,获得所述训练图像集,所述y为正整数;所述y次第二迭代中的第t次第二迭代包括:从所述待处理图像集中采样获得第二图像集,经第t次第二迭代的第三神经网络对所述第二图像集中的图像进行处理,获得包含所述第二图像集中的图像的特征数据的第三特征数据集以及包含所述第二图像集中的图像的分类结果的分类结果集;对所述第三特征数据集中的特征数据进行聚类处理确定所述第三特征数据集中的特征数据的标签,将所述第三特征数据集中的特征数据的标签添加至所述第二图像集中对应的图像中,获得第三图像集;依据所述分类结果集中的分类结果与所述第三图像集中的图像的标签之间的差异,确定第三损失;基于所述第三损失调整所述第t次第二迭代的第三神经网络的参数,获得第t+1次第二迭代的第三神经网络的参数,所述t为小于所述y的正整数。
在又一种可能实现的方式中,所述装置还包括:检索单元,用于使用所述目标特征数据检索数据库,获得具有与所述目标特征数据匹配的特征数据的图像,作为目标图像。
在又一种可能实现的方式中,所述以所述训练图像集和第i次第一迭代的所述目标神经网络的输出监督第i次第一迭代的所述第二待训练神经网络,获得第i+1次第一迭代的所述第二待训练神经网络,包括:经所述第i次第一迭代的所述第二待训练神经网络对所述训练图像集进行处理获得第四特征数据集,经所述第i次第一迭代的所述目标神经网络对所述训练图像集进行处理获得第五特征数据集;依据所述第四特征数据集和所述第五特征数据集获得第二软三元损失;以所述训练图像集和所述第二软三元损失监督所述第i次第一迭代的所述第二待训练神经网络,获得所述第i+1次第一迭代的所述第二待训练神经网络。
在又一种可能实现的方式中,所述依据所述第四特征数据集和所述第五特征数据集获得第二软三元损失,包括:确定所述第一图像在所述第四特征数据集中的第三特征数据与所述第四特征数据集中的正样本特征数据子集中的特征数据之间的最小相似度,获得第九相似度;确定所述第一图像在所述第五特征数据集中的第四特征数据与所述第五特征数据集中的所述正样本特征数据子集中的特征数据之间的最小相似度,获得第十一相似度,所述正样本特征数据子集包括具有与所述第一标签相同的标签的图像的特征数据;确定所述第三特征数据与所述第四特征数据集中的负样本特征数据子集中的特征数据之间的最大相似度,获得第十相似度;确定所述第三特征数据与所述第四特征数据集中的所述负样本特征数据子集中的特征数据之间的最大相似度,获得第十二相似度,所述负样本特征数据子集包括具有与所述第一标签不同的标签的图像的特征数据;分别对所述第九相似度、所述第十相似度、所述第十一相似度和所述第十二相似度进行归一化处理,获得第十三相似度、第十四相似度、第十五相似度和第十六相似度;依据所述第十三相似度、所述第十四相似度、所述第十五相似度和所述第十六相似度,获得所述第二软三元损失。
在又一种可能实现的方式中,所述分别对所述第九相似度、所述第十相似度、所述第十一相似度和所述第十二相似度进行归一化处理,获得第十三相似度、第十四相似度、第十五相似度和第十六相似度,包括:确定所述第九相似度和所述第十相似度的和获得第三总相似度,确定所述第十一相似度和所述第十二相似度的和获得第四总相似度;确定所述第九相似度与所述第三总相似度的商获得所述第十三相似度,确定所述第十相似度与所述第三总相似度的商获得所述第十四相似度;确定所述第十一相似度与所述第四总相似度的商获得所述第十五相似度,确定所述第十二相似度与所述第四总相似度的商获得所述第十六相似度。
在又一种可能实现的方式中,所述以所述训练图像集和所述第二软三元损失监督所述第i次第一迭代 的所述第二待训练神经网络,获得所述第i+1次第一迭代的所述第二待训练神经网络,包括:经所述第i次第一迭代的所述第二待训练神经网络对所述第一图像进行处理获得第三分类结果;依据所述第三分类结果、所述第一标签和所述第二软三元损失确定所述第i次第一迭代的所述第二待训练神经网络的第二损失;基于所述第二损失调整所述第i次第一迭代的所述第二待训练神经网络的参数获得所述第i+1次第一迭代的所述第二待训练神经网络。
在又一种可能实现的方式中,所述依据所述第三分类结果、所述第一标签和所述第二软三元损失确定所述第i次第一迭代的所述第二待训练神经网络的第二损失,包括:依据所述第三分类结果和所述第一标签之间的差异确定第二硬分类损失;依据所述第二硬分类损失和所述第二软三元损失确定所述第二损失。
在又一种可能实现的方式中,在所述依据所述第二硬分类损失和所述第二软三元损失确定所述第二损失之前,经第所述i次第一迭代的所述目标神经网络对所述第一图像进行处理获得第四分类结果;依据所述第三分类结果和所述第四分类结果之间的差异确定第二软分类损失;所述依据所述第二硬分类损失和所述第二软三元损失确定所述第二损失,包括:依据所述第二硬分类损失、所述第二软分类损失和所述第二软三元损失确定所述第二损失。
在又一种可能实现的方式中,在所述依据所述第二硬分类损失、所述第二软分类损失和所述第二软三元损失确定所述第二损失之前,依据所述第九相似度和所述第十相似度确定第二硬三元损失;所述依据所述第二硬分类损失、所述第二软分类损失和所述第二软三元损失确定所述第二损失,包括:依据所述第二硬分类损失、所述第二软分类损失、所述第二软三元损失和所述第二硬三元损失确定所述第二损失。
在又一种可能实现的方式中,所述经所述第i次第一迭代的所述第二待训练神经网络对所述训练图像集中的第一图像进行处理获得第三分类结果,包括:对所述训练图像集进行第二预处理,获得第四图像集,所述第二预处理包括擦除处理、剪裁处理、翻转处理中的任意一种;经所述第i次第一迭代的所述第二待训练神经网络对所述第四图像集中的第三图像进行处理获得所述第三分类结果,所述第三图像通过对所述第一图像进行所述第二预处理获得,所述第三图像在所述第四特征数据集中的特征数据与所述第一图像在所述第四特征数据集中的数据相同,所述第一预处理与所述第二预处理不同。
在又一种可能实现的方式中,所述经第i次第一迭代的所述第二待训练神经网络对所述训练图像集进行处理获得第四特征数据集,包括:经所述第i次第一迭代的所述第二待训练神经网络对所述第四图像集进行处理获得所述第四特征数据集。
第三方面,提供了一种处理器,所述处理器用于执行如上述第一方面及其任意一种可能实现的方式的方法。
第四方面,提供了一种电子设备,包括:处理器、发送装置、输入装置、输出装置和存储器,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述处理器执行所述计算机指令时,所述电子设备执行如上述第一方面及其任意一种可能实现的方式的方法。
第五方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被电子设备的处理器执行时,使所述处理器执行如上述第一方面及其任意一种可能实现的方式的方法。
第六方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面及其任一种可能的实现方式的方法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。
附图说明
为了更清楚地说明本公开实施例或背景技术中的技术方案,下面将对本公开实施例或背景技术中所需要使用的附图进行说明。
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。
图1为本公开实施例提供的一种图像处理方法的流程示意图;
图2为本公开实施例提供的一种训练方法的示意图;
图3为本公开实施例提供的另一种图像处理方法的流程示意图;
图4为本公开实施例提供的另一种训练方法的示意图;
图5为本公开实施例提供的又一种训练方法的示意图;
图6为本公开实施例提供的另一种图像处理方法的流程示意图;
图7为本公开实施例提供的一种图像处理装置的结构示意图;
图8为本公开实施例提供的一种图像处理装置的硬件结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本公开方案,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部 的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本公开的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或预备的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
得益于强大的性能,近几年神经网络被广泛应用于各种图像识别任务(如行人重识别、图像分类)。神经网络在这些任务中的表现效果很大程度取决于对神经网络的训练效果,而神经网络的训练效果又主要取决于训练神经网络的训练图像的数量,即训练图像的数量越多,神经网络的训练效果越好,进而应用训练后的神经网络执行相应图像识别的任务的效果也就越好。
训练图像指有标注信息(下文将称为标签)图像,例如,需要执行的任务为对图像中包含的内容进行分类,判断图像中包含的内容是苹果、香蕉、梨子、桃子、橙子、西瓜中的哪一种,那么上述标注信息包括苹果、香蕉、梨子、桃子、橙子、西瓜。又例如,需要执行的任务为行人重识别,即识别图像中包含的人物的身份,那么上述标注信息包括人物的身份(如张三、李四、王五、周六等)。
训练图像的标注信息越准确,神经网络的训练效果越好,因此,训练图像的标注图像与训练图像的真实内容的匹配度越高,训练效果就越好。例如,如果将包含梨子的图像标注为苹果就不正确。又例如,将包含张三的图像标注为李四也不正确。而标注信息不正确的训练图像会使训练效果变差,因此传统方法大多通过人工标注的方式完成对训练图像的标注。但是在训练图像的数量很大时,人工标注的方式效率低,人工成本高。因此,越来越多的人们通过无监督迁移学习的方式训练神经网络,即将通过已有的标注图像训练好的神经网络应用于未标注的图像上,以降低人工成本。
神经网络在上述标注图像上执行的任务和在上述未标注图像上执行的任务具有相关性,上述标注图像和上述未标注图像之间也存在相关性。举例来说(例1),通过A城市的监控摄像头采集大量阴天下包含行人的图像(下文将称为A地的图像),并通过对A地的图像中的行人的身份进行标注获得已标注数据,并使用已标注数据训练神经网络a,使训练后的神经网络a可以用于识别阴天在A地采集的图像中的行人的身份。现需要识别在B地采集的图像中的行人的身份,由于若通过使用对从B地采集的图像进行标注获得的图像训练一个新的神经网络(如神经网络b),需要耗费大量的人工成本,因此,可通过无监督迁移学习的方式调整训练后的神经网络a的参数,使训练后的神经网络a可用于识别从B地采集的图像中的行人的身份。神经网络a在已标注图像上执行的任务和在未标注图像(从B地采集的图像)上执行的任务均为行人身份的识别,而已标注图像和未标注图像均为包含行人的图像。
虽然已标注图像和未标注图像相关,但两者之间存在差异,这就导致无法将通过已标注图像训练获得的神经网络直接应用于未标注图像上。接着例1继续举例,已标注数据均为阴天采集的图像,从B地采集的图像包括阴天采集图像、晴天采集的图像、雨天采集的图像,而不同的天气采集的图像中的环境亮度不一样,不同的环境亮度对神经网络的识别准确率的影响较大,如通过在阴天采集的图像训练获得的神经网络对晴天采集的图像中的行人的身份的识别准确率低。此外,A地的监控摄像头的参数和B地的监控摄像头的参数也不一样(如拍摄的角度),这也将导致神经网络对不同摄像头采集到的图像中的行人的身份的识别的准确率不同,如,A地的监控摄像头的参数和B地的监控摄像头的参数也不一样,导致使用已标注数据训练获得的神经网络对从B地的采集的图像中的行人的身份的识别准确率低。
将包含上述标注图像的集合称为源域,将包含上述未标注图像的集合称为目标域,那么无监督迁移学习就是一种将在源域上训练好的神经网络应用于目标域上的神经网络的训练方式。
传统的无监督学习方法通过在源域上训练好的神经网络对目标域上的未标注图像进行识别,并给目标域上的未标注图像添加标签(下文将称为伪硬标签),再以伪硬标签监督在源域上训练好的神经网络,并调整在源域上训练好的神经网络的参数,获得应用于目标域上的神经网络(下文将称为应用神经网络)。由于伪硬标签存在误差,导致通过伪硬标签监督在源域上训练好的神经网络的效果不佳,进而导致应用神经网络在目标域上的特征提取效果不好,进一步导致在目标域上的应用效果差(如对行人的身份的识别准确率低)。应用本公开实施例提供的技术方案,可实现在上述传统方法的基础上,获得在目标域上的特征提取效果比应用神经网络在目标域上的特征提取效果更佳的神经网络,进而提升在目标域上的应用效果。
在对本公开实施例提供的技术方案进行详细阐述之前,首先定义几个概念。1.类内最难特征数据:标签相同的图像的特征数据中相似度最小的两个特征数据。2.类外最难特征数据:标签不同的图像的特征数据中相似度最大的两个特征数据。3.图像在特征数据集中的类内最难特征数据:该图像在该特征数据集中的特征数据的类内最难特征数据。4.图像在特征数据集中的类外最难特征数据:该图像在该特征数据集中的特征数据的类内最难特征数据。
举例来说,假定图像1的特征数据为特征数据1,图像2的特征数据为特征数据2,图像3的特征数据为特征数据3,图像4的特征数据为特征数据4,图像5的特征数据为特征数据5。图像1的标签与图像2的标签、图像3的标签相同,图像1的标签与图像4的标签、图像5的标签均不同。若特征数据1与特征数据2之间的相似度比特征数据1与特征数据3之间的相似度小,则特征数据3为特征数据1的类内最难特征数据。若特征数据1与特征数据4之间的相似度比特征数据1与特征数据5之间的相似度小,则特征数据5为特征数据1的类外最难特征数据。假定特征数据集1包含特征数据1、特征数据2、特征数据3、特征数据4、特征数据5,则图像1在特征数据集1中的类内最难特征数据为特征数据3,图像1在特征数据集1中的类外最难特征数据为特征数据5。
下面结合本公开实施例中的附图对本公开实施例进行描述。请参阅图1,图1是本公开实施例提供的一种图像处理方法的流程示意图。
101、获取待处理图像。
本实施例的执行主体可以是服务器、手机、电脑、平板电脑等终端。上述待处理图像可以是任意数字图像。例如,待处理图像可以包含人物对象,其中,待处理图像可以只包括人脸,并无躯干、四肢(下文将躯干和四肢称为人体),也可以只包括人体,不包括人体,还可以只包括下肢或上肢。本公开对待处理图像具体包含的人体区域不做限定。又例如,待处理图像可以包含动物。再例如,待处理图像可以包含植物。本公开对待处理图像中包含的内容不做限定。
获取待处理图像的方式可以是接收用户通过输入组件输入的待处理图像,其中,输入组件包括:键盘、鼠标、触控屏、触控板和音频输入器等。也可以是接收终端发送的待处理图像,其中,终端包括手机、计算机、平板电脑、服务器等。本公开对获取待处理图像的方式不做限定。
102、使用目标神经网络对上述待处理图像进行特征提取处理,获得上述待处理图像的目标特征数据。
上述目标神经网络为具备从图像中提取特征数据的功能的神经网络。例如,目标神经网络可以由卷积层、池化层、归一化层、全连接层、下采样层、上采样层、分类器等神经网络层按照一定方式堆叠或组成。本公开对目标神经网络的结构不做限定。
在一种可能实现的方式中,通过目标神经网络包括多层卷积层、归一化层,依次通过目标神经网络中的多层卷积层和归一化层对待处理图像进行卷积处理和归一化处理,可提取出待处理图像的特征数据,获得目标特征数据。
如上所述,由于传统方法中的伪硬标签是通过在源域上训练好的神经网络获得的,因此,通过伪硬标签监督在源域上训练好的神经网络,会使在源域上训练好的神经网络在训练过程中在学的不好的方向变得越来越差,进而导致应用神经网络在目标域上的应用效果差。
举例来说,在源域上训练好的神经网络对李四的识别准确率低,即将包含李四的图像识别为其他人的概率高。将包含李四的图像a输入至在源域上训练好的神经网络,获得的伪硬标签为王五,再以王五为监督数据监督在源域上训练好的神经网络的输出,并调整在源域上训练好的神经网络的参数。将使在源域上训练好的神经网络通过调整参数使从图像a中提取出的特征数据与王五的特征数据接近。这样,将导致使用最终获得的应用神经网络识别包含李四的图像时,提取出的特征数据均与王五的特征数据接近,进而导致将李四识别为王五。
鉴于传统方法中存在的上述缺陷,本公开考虑通过另一个在源域上训练好的神经网络(下文将称为监督神经网络)的输出监督应用神经网络的输出以提升应用神经网络在目标域上的应用效果。应用神经网络和监督神经网络虽然均为在源域上训练好的神经网络,但应用神经网络和监督神经网络的参数不同,也就是说,应用神经网络和监督神经网络对不同人的身份的识别的准确率不同。举例来说(例2),应用神经网络对张三的识别准确率高,但对周六的识别准确率低。而监督神经网络对张三的识别准确率低,但对周六的识别准确率高。
因此,如果使用应用神经网络的输出和监督神经网络的输出进行相互监督,即使用应用神经网络的输出监督监督神经网络,使用监督神经网络的输出监督应用神经网络,可代替伪硬标签对应用神经网络的监督。但由于在相互监督的训练过程中,应用神经网络和监督神经网络的参数是同时更新的,这将导致应用神经网络的输出与监督神经网络的输出的相似度越来越高(下文将相互监督中存在的这个缺陷称为相关性缺陷)。进而在相互监督的训练过程中使监督神经网络学习到应用神经网络的“劣势”(如例2中应用神经网络对周六的身份的识别),并使应用神经网络学习到监督神经网络的“劣势”(如例2中监督神经网络对张三的身份的识别),即使应用神经网络的参数和第二神经网络的参数的相似度越来越高。这样将无法优 化应用神经网络的参数,进而无法提升应用神经网络在目标域上的应用效果。
鉴于相互监督的训练过程中存在的相关性缺陷,本公开实施例提出了另一种训练方法,通过“隔离”应用神经网络和监督神经网络减小应用神经网络和监督神经网络之间的相关性,以减小相关性缺陷带来的影响,进而获得在目标域上效果比通过该种相互监督的方式训练获得的应用神经网络的效果好的目标神经网络。该种训练方法包括:通过第一神经网络的参数的时序平均值确定上述目标神经网络的参数,再通过目标神经网络的输出监督第二神经网络,同时通过第二神经网络的参数的时序平均值确定平均网络的参数,并以平均网络的输出监督第一神经网络,完成对上述目标神经网络的训练。其中,第一神经网络和第二神经网络均为在源域上训练好的神经网络,第一神经网络和第二神经网络均为具备从图像中提取特征数据的功能和依据特征数据进行分类的功能的神经网络,第一神经网络的结构和第二神经网络的结构可以相同,也可以不同,本公开对此不做限定。
如图2所示,目标神经网络的参数为第一神经网络的参数的时序平均值,该第一神经网络在训练图像集和平均网络的监督下训练获得,即以训练图像集中的图像的标签和平均网络的输出为监督数据监督第一神经网络的输出,并调整第一神经网络的参数。上述平均网络的参数为第二神经网络的参数的时序平均值,该第二神经网络在上述训练图像集和上述目标神经网络的监督下训练获得,即以训练图像集中的图像的标签和目标神经网络的输出为监督数据监督第二神经网络的输出,并调整第二神经网络的参数。
将训练图像集分别输入至第一神经网络、第二神经网络、目标神经网络和平均网络,依据图2所示的监督关系监督第一神经网络和第二神经网络完成多个周期的训练,可更新目标神经网络的参数,直到图2中的四个网络(包括第一神经网络、第二神经网络、目标神经网络和平均网络)均收敛停止训练。
在上述对图2中的四个网络进行训练的过程中,每完成一个训练周期,上述四个网络的参数均会更新。其中,目标神经网络的参数为第一神经网络的参数的时序平均值指目标神经网络的参数为不同训练周期的第一神经网络的参数的平均值,平均网络的参数为第二神经网络的参数的时序平均值指平均网络的参数为不同训练周期的第二神经网络的参数的平均值。
需要理解的是,不同训练周期的第一神经网络的参数的平均值用于反应第一神经网络在已完成的训练周期下获得的平均性能,具体确定目标神经网络的参数时,可以不局限于通过计算不同训练周期的第一神经网络的参数的平均值来确定。同理,具体确定平均网络的参数时,可以不局限于通过计算不同训练周期的第二神经网络的参数的平均值来确定。
在一种可能实现的方式中,通过下式可以确定目标神经网络的参数:
E T1)=αE T-11)+(1-α)γ 1 T…公式(1)
其中,E T1)为目标神经网络在第T个训练周期中的参数,E T-11)为目标神经网络在第T-1个训练周期中的参数,γ 1 T为第一神经网络在第T个训练周期中的参数,α为大于或等于0且小于1的自然数,E 01)=E 11)=γ 1 1
在另一种可能实现的方式中,通过计算不同训练周期的第一神经网络的参数的平均值可确定目标神经网络的参数。举例来说,假设进行第k个周期的训练前第一神网络的参数为E k(θ),进行第k个周期的训练前第一神网络的参数为E k+1(θ),则目标神经网络在第k+1个周期的训练前的参数为:E k(σ)=(E k(θ)+E k(θ))/2。
同理,可通过上述两种可能实现的方式确定平均网络的参数。
在本公开实施例提供的训练方法的训练过程中,第一神经网络的参数通过监督训练获得,即通过监督数据的监督分别确定第一神经网络的损失和第二神经网络的损失,并基于第一神经网络的损失确定第一神经网络反向传播的梯度,再通过反向传播的方式传播该梯度,更新第一神经网络的参数。同理,第二神经网路的参数也是通过反向梯度传播的方式实现更新。而目标神经网络和平均网络的参数均不是通过反向梯度传播的方式更新,而是通过不同训练周期的第一神经网络的参数的平均值和不同训练周期的第二神经网络的参数的平均值分别确定目标神经网络的参数和平均网络的参数。因此,目标神经网络的参数和平均网络的参数的更新速度比第一神经网络的参数和第二神经网络的参数的更新速度慢,也就是说,目标神经网络的输出和第二神经网络的输出之间的相似度低,平均网络的输出和第一神经网络的输出之间的相似度低。这样,再通过目标神经网络的输出和平均网络的输出监督分别监督第二神经网络和第一神经网络,可使第二神经网络学习到目标神经网络的“优势”(即第一神经网络的“优势”),使第一神经网络学习到平均网络的“优势”。而由于目标神经网络的参数可反应第一神经网络在已完成的训练周期下获得的平均性能,平均网络的参数可反应第二神经网络在已完成的训练周期下获得的平均性能,使第二神经网络学习到目标神经网络的“优势”就相当于使第二神经网络学习到第一神经网络的“优势”,使第一神经网络学习到平均网络的“优势”就相当于使第一神经网络学习到第二神经网络的“优势”。进而依据第一神经网络的参数的时序平均值确定目标神经网络的参数,可使训练获得的目标神经网络在目标域上的效果比应用神经网络在目标域上的效果好。上述“优势”指神经网络对某个类别或个体的识别准确率高。例如例2中应用神经网络对周六的身份的识别,以及监督神经网络对张三的身份的识别。
本实施通过确定第一神经网络的参数的时序平均值和第二神经网络的时序平均值,分别获得目标神经网络的参数和平均网络的参数,再使用目标神经网络的输出监督第二神经网络,使用平均网络的输出监督第一神经网络对目标神经网络进行训练,可提升训练效果。进而使用目标神经网络在目标域上执行相关的识别任务时,可提取出信息更丰富的目标特征数据,该信息可提高在目标域上的识别准确率。
下面将详细阐述实施例中的训练方法的实现过程,请参阅图3,图3是本公开实施例提供的102的一种可能实现的方法的流程示意图。
301、获取上述训练图像集、第一待训练神经网络和第二待训练神经网络。
本实施例的执行主体可以是服务器,电脑。本实施例的训练方法执行主体可以与上述执行主体不同,也可以相同。本实施例中,训练图像集可通过图4所示的传统方法获得。如图4所示,向在源域上训练好的第三神经网络输入多张未标注的目标域上的图像(下文将称为待处理图像集),通过第三神经网络对待处理图像集中的图像进行特征提取处理,获得包含待处理图像集中的图像的特征数据的第三特征数据集,再通过聚类算法将待处理图像集中的图像的特征数据进行区分,获得预定数量个集合,并给每个集合中的特征数据对应的图像添加伪硬标签。
在一种可能实现的方式中,对第三神经网络执行y次第二迭代,可获得上述训练图像集,其中,y为正整数。上述y次第二迭代中的第t次第二迭代包括:
从待处理图像集中采样获得第二图像集,经第t次第二迭代的第三神经网络对第二图像集中的图像进行处理,获得包含第二图像集中的图像的特征数据的第三特征数据集以及包含第二图像集中的图像的分类结果的分类结果集。对第三特征数据集中的特征数据进行聚类处理确定第三特征数据集中的特征数据的标签,将第三特征数据集中的特征数据的标签添加至所述第二图像集中对应的图像中,获得第三图像集。依据分类结果集中的分类结果与第三图像集中的图像的标签之间的差异,确定第三损失。基于第三损失调整第t次第二迭代的第三神经网络的参数,获得第t+1次第二迭代的第三神经网络的参数。
在第t+1次第二迭代中,再从待处理图像集中采样获得第三图像集,其中,第三图像集中的图像与第二图像集中的图像不同。再以第t次第二迭代的第三神经网络对第二图像集的处理获得第二图像集中的图像的标签以及第t+1次第二迭代的第三神经网络的参数的方法,使用第t+1次第二迭代的第三神经网络对第三图像集进行处理,获得第三图像集中的图像的标签,以及第t+2次第二迭代的第三神经网络。直到给待处理图像集中至少一个图像的添加上标签,获得训练图像集。其中,第1次第二迭代的第三神经网络即为第三神经网络。
以第1次第二迭代为例(例3),从待处理图像集中采样获得5张包含人物对象的图像,分别为图像a、图像b、图像c、图像d、图像e。将这5张图像输入至第三神经网络获得这5张图像的特征数据,并通过聚类算法按这5张图像的特征数据表征的人物对象的身份将这5张图像分为3类,获得3个集合。分别是由图像a和图像e组成的第一集合,由图像b组成的第二集合,由图像c和图像d组成的第三集合。确定第一集合中的图像包含的人物对象的身份为张三,给图像a和图像e分别添加的伪硬标签为[1,0,0],表征图像a中的人物对象的身份和图像e中的人物对象的身份均属于第一类(张三)。确定第二集合中的图像包含的人物对象的身份为李四,给图像b添加的伪硬标签为[0,1,0],表征图像b中的人物对象的身份均属于第二类(李四)。确定第三集合中的图像包含的人物对象的身份为王五,给图像c和图像d分别添加的伪硬标签为[0,0,1],表征图像c中的人物对象的身份和图像d中的人物对象的身份均属于第三类(王五)。
同时,第三神经网络中的分类器依据这些图像的特征数据,预测图像所属的类别(下文将称为预测结果),并将依据预测结果和伪硬标签之间的差异,确定总的预备硬分类损失。
在一种可能实现的方式中,通过分别计算待处理图像集中每张图像的预测结果与标签之间差异获得预备硬分类损失,并求待处理图像集中至少一个图像的预备硬分类损失的平均值可确定总的预备硬损失。接着例3继续举例(例4),第三神经网络的分类器输出的图像a的预测类别为[0.7,0.2,0.1],表征图像a中的人物对象的身份是张三的概率为0.7,图像a中的人物对象的身份是李四的概率为0.2,图像a中的人物对象的身份是王五的概率为0.1。通过计算预测类别([0.7,0.2,0.1])和伪硬标签([1,0,0])之间的交叉熵损失即可确定图像a的预备硬分类损失。再以同样的方式可确定图像b、图像c、图像d和图像e的硬分类损失,再计算图像a的预备硬分类损失、图像b的硬分类损失、图像c的预备硬分类损失、图像d的硬分类损失和图像e的预备硬分类损失的平均值获得总的预备硬分类损失。
同时还将依据待处理图像集中的图像的特征数据确定待处理图像集中每张图像的预备硬三元损失,并计算待处理图像集中每张图像的预备硬三元损失的平均值确定总的预备硬三元损失。接着例4继续举例,在计算图像a的预备硬三元损失时,首先分别计算与图像a属于同一类的图像的特征数据中与图像a的特征数据之间的相似度(下文将称为正相似度),分别计算与图像a属于不同类的图像的特征数据中与图像a的特征数据之间的相似度(下文将称为负相似度),并依据正相似度的最小值和负相似度的最大值确定预备硬三元损失。再以同样的方式可确定图像b、图像c、图像d和图像e的预备硬三元损失,再计算图像a的预备硬三元损失、图像b的硬三元损失、图像c的预备硬三元损失、图像d的硬三元损失和图像e的预 备硬三元损失的平均值获得总的预备硬三元损失。
再对总的预备硬分类损失和总的预备硬三元损失进行加权求和,获得第三损失。基于第三损失调整第三神经网络的参数,获得第2次第二迭代的第三神经网络。直到给待处理图像集中的至少一个图像添加标签(即伪硬标签),获得训练图像集。
上述第一待训练神经网络和上述第二待训练神经网络均为在源域上训练好的神经网络,第一待训练神经网络和第二待训练神经网络均为具备从图像中提取特征数据的功能和依据特征数据进行分类的功能的神经网络,第一待训练神经网络的结构和第二待训练神经网络的结构可以相同,也可以不同,本公开对此不做限定。
302、对上述第一待训练神经网络和上述第二待训练神经网络执行x次第一迭代,获得目标神经网络。
请参与图5,图5为本实施例提供的上述x次第一迭代中的第i次第一迭代的训练示意图,第i次第一迭代包括:以上述训练图像集和第i次第一迭代的平均网络的输出监督第i次第一迭代的所述第一待训练神经网络获得第i+1次第一迭代的第一待训练神经网络,以上述训练图像集和第i次第一迭代的目标神经网络的输出监督第i次第一迭代的第二待训练神经网络,获得第i+1次第一迭代的第二待训练神经网络。
在一种可能实现的方式中,以训练图像集监督第i次第一迭代的第一待训练神经网络获得第i+1次第一迭代的第一待训练神经网络,以及以训练图像集监督第i次第一迭代的第二待训练神经网络获得第i+1次第一迭代的第二待训练神经网络可包括以下步骤:经第i次第一迭代的第一待训练神经网络对上述第一图像进行处理获得第一分类结果,经第i次第一迭代的平均网络对上述第一图像进行处理获得第二分类结果,经第i次第一迭代的第二待训练神经网络对上述第一图像进行处理获得第三分类结果,经第i次第一迭代的训练前的目标神经网络对上述第一图像进行处理获得第四分类结果。再依据第一分类结果和第一图像的第一标签(即步骤301中获得的伪硬标签)之间的差异,确定第i次第一迭代的第一待训练神经网络的第一硬分类损失,依据第三分类结果和第一标签之间的差异确定第i次第一迭代的第二待训练神经网络的第二硬分类损失。以第一硬分类损失监督第i次第一迭代的第一待训练神经网络和以第二硬分类损失监督第i次第一迭代的第二待训练神经网络,实现通过训练图像集监督第i次第一迭代的第一待训练神经网络和第i次第一迭代的第二待训练神经网络。依据第一分类结果和第二分类结果之间的差异,确定第i次第一迭代的第一待训练神经网络的第一软分类损失,依据第三分类结果和第四分类结果之间的差异,确定第i次第一迭代的训练前的第二神经网络的第二软分类损失。以第一软分类损失监督第i次第一迭代的第一待训练神经网络和以第二软分类损失监督第i次第一迭代的第二待训练神经网络,实现以第i次第一迭代的平均网络监督第i次第一迭代的第一待训练神经网络和以第i次第一迭代的目标神经网络监督第i次第一迭代的第二待训练神经网络。再对第一硬分类损失和第一软分类损失进行加权求和,获得第i次第一迭代的第一待训练神经网络的第一损失,对第二硬分类损失和第二软分类损失进行加权求和,获得第i次第一迭代的第二待训练神经网络的第二损失。再基于第一损失调整第i次第一迭代的第一待训练神经网络的参数,获得第i+1次第一迭代的第一待训练神经网络。基于第二损失调整第i次第一迭代的第二待训练神经网络的参数,获得第i+1次第一迭代的第二待训练神经网络。在执行第i次第一迭代之前,可依据第i-1次第一迭代的目标神经网络的参数和上述第i次第一迭代的第一待训练神经网络的参数确定上述第i次第一迭代的目标神经网络的参数,以及依据第i-1次第一迭代的平均网络的参数和上述第i次第一迭代的第二待训练神经网络的参数确定上述第i次第一迭代的平均网络的参数。在一种可能实现的方式中,在第i+1次第一迭代中,即可依据下面两个公式分别确定第i+1次第一迭代的目标神经网络的参数,以及第i+1次第一迭代的平均网络的参数:
E i+11)=αE i1)+(1-α)θ 1 i+1…公式(2)
E i+12)=αE i2)+(1-α)θ 2 i+1…公式(3)
其中,E i+11)为第i+1次第一迭代的目标神经网络的参数,E i1)为第i次第一迭代的目标神经网络的参数,E i+12)为第i+1次第一迭代的平均网络的参数,E i2)为第i次第一迭代的平均网络的参数。θ 1 i+1为第i+1次第一迭代的第一待训练神经网络的参数,θ 2 i+1为第i+1次第一迭代的第二待训练神经网络的参数。α为大于或等于0且小于1的自然数,E 21)=E 11)=θ 1 1,E 22)=E 12)=θ 2 1。上述x为正整数,上述i为小于或等于x的正整数。
在确定第i+1次第一迭代的第一待训练神经网络的参数,第i+1次第一迭代的第二待训练神经网络的参数,第i+1次第一迭代的目标神经网络的参数和第i+1次第一迭代的平均网络的参数后,执行第i+1次迭代。在执行完第x次迭代后,调整第x次第一迭代的目标神经网络的参数获得目标神经网络。
以第1次迭代为例,假设训练图像集包含图像1,图像2和图像3,其中,图像1的伪硬标签为[1,0],经第1次第一迭代的第一待训练神经网络(即第一待训练神经网络)对上述训练图像集中的图像1(即第一图像)进行处理获得的分类结果为[0.7,0.3],经第1次第一迭代的第二待训练神经网络(即第二待训练神经网络)对上述图像1进行处理获得的分类结果为[0.8,0.2],经第1次第一迭代的目标神经网络(即第一待训练神经网络)对图像1进行处理获得的分类结果为[0.7,0.3],经第1次第一迭代的平均网络(即第 二待训练神经网络)对图像1进行处理获得的分类结果为[0.8,0.2]。计算[1,0]和[0.7,0.3]之间的交叉熵损失获得第一硬分类损失,计算[1,0]和[0.8,0.2]之间的交叉熵损失获得第二硬分类损失,计算[0.7,0.3]和[0.7,0.3]之间的差异获得第一软分类损失,计算[0.8,0.2]和[0.8,0.2]之间的差异获得第二软分类损失。再对第一硬分类损失和第一软分类损失加权求和,获得第一损失,对第二硬分类损失和第二软分类损失加权求和,获得第二损失。基于第一损失调整第一待训练神经网络的参数,获得第2次迭代的第一待训练神经网络,基于第二损失调整第二待训练神经网络的参数,获得第2次迭代的第二待训练神经网络。
可选的,在第i次迭代中,在对第一硬分类损失和第一软分类损失进行加权求和获得第一损失以及对第二硬分类损失和第二软分类损失进行加权求和获得第二损失之前,还可确定i次第一迭代的第一待训练神经网络的第一硬三元损失,以及i次第一迭代的第二待训练神经网络的第二硬三元损失。再对第一硬分类损失、第一软分类损失和第一硬三元损失进行加权求和获得第一损失,对第二硬分类损失、第二软分类损失和第二硬三元损失进行加权求和获得第二损失。
在一种可能实现的方式中,经第i次第一迭代的第一待训练神经网络对上述训练图像集进行处理获得第一特征数据集,经第i次第一迭代的第二待训练神经网络对上述训练图像集进行处理获得第四特征数据集。确定上述第一图像在上述第一特征数据集中的第一特征数据与上述第一特征数据集中的正样本特征数据子集中的特征数据之间的最小相似度,获得第一相似度,确定上述第一图像在上述第四特征数据集中的第三特征数据与上述第四特征数据集中的正样本特征数据子集中的特征数据之间的最小相似度,获得第九相似度。确定第一特征数据与第一特征数据集中的负样本特征数据子集中的特征数据之间的最大相似度,获得第三相似度,确定第三特征数据与第四特征数据集中的负样本特征数据子集中的特征数据之间的最大相似度,获得第十相似度。再依据第一相似度和第三相似度可确定第一硬三元损失,依据第九相似度和第十相似度可确定第二硬三元损失。其中,负样本特征数据子集包括具有与上述第一标签不同的标签的图像的特征数据,正样本特征数据子集包括具有与上述第一标签相同的标签的图像的特征数据。
举例来说(例5),训练图像集包含图像1,图像2,图像3,图像4,图像5。其中,图像1,图像3和图像5中的标签均为张三,图像2和图像4中的标签均为李四。第一特征数据集包括图像1的特征数据(第一特征数据),图像2的特征数据(下文将称为特征数据2),图像3的特征数据(下文将称为特征数据3),图像4的特征数据(下文将称为特征数据4),图像5的特征数据(下文将称为特征数据5)。第四特征数据集包括图像1的特征数据(第三特征数据),图像2的特征数据(下文将称为特征数据6),图像3的特征数据(下文将称为特征数据7),图像4的特征数据(下文将称为特征数据8),图像5的特征数据(下文将称为特征数据9)。第一特征数据集中的正样本特征数据子集包括特征数据3和特征数据5,第一特征数据集中的负样本特征数据子集包括特征数据2和特征数据4。第四特征数据集中的正样本特征数据子集包括特征数据7和特征数据9,第四特征数据集中的负样本特征数据子集包括特征数据6和特征数据8。分别计算第一特征数据与特征数据2、特征数据3、特征数据4和特征数据5之间的相似度,假设第一特征数据与特征数据3之间的相似度小于第一特征数据与特征数据5之间的相似度,则第一特征数据与特征数据3之间的相似度为第一相似度。假设第一特征数据与特征数据2之间的相似度小于第一特征数据与特征数据4之间的相似度,则第一特征数据与特征数据4之间的相似度为第三相似度。分别计算第三特征数据与特征数据6、特征数据7、特征数据8和特征数据9之间的相似度,假设第三特征数据与特征数据7之间的相似度小于第二特征数据与特征数据9之间的相似度,则第三特征数据与特征数据7之间的相似度为第九相似度。假设第三特征数据与特征数据6之间的相似度小于第三特征数据与特征数据8之间的相似度,则第三特征数据与特征数据8之间的相似度为第十相似度。再依据公式(4)可确定第一特征数据、特征数据3和特征数据4之间的第一硬三元损失,根据公式(5)确定第二特征数据、特征数据7和特征数据8之间的第二硬三元损失:
L 1=max(0,d 1+m-d 3)…公式(4)
L 2=max(0,d 9+m-d 10)…公式(5)
其中,max(A,B)为A和B中的最大值,d 1为第一相似度,d 3为第三相似度,d 9为第九相似度,d 10为第十相似度,m为大于0且小于1的自然数。
将正样本特征数据子集中与第一特征数据之间的相似度最小的特征数据称为第一特征数据的类内最难特征数据,将负样本特征数据子集中与第一特征数据之间的相似度最大的特征数据称为第一特征数据的类外最难特征数据。同理,可确定训练图像集中的其他图像(包括图像2,图像3,图像4,图像5)在第一特征数据集中的特征数据在第一特征数据子集中的正样本特征数据子集中的类内最难特征数据和类外最难特征数据,进而可根据每张图像在第一特征数据集中的特征数据,类内最难特征数据和类外最难特征数据确定每张图像的第一待训练神经网络的硬三元损失。同理,可根据训练图像集中每张图像在第四特征数据集中的特征数据,类内最难特征数据和类外最难特征数据确定每张图像的第二待训练神经网络的硬三元损失。
可选的,在确定训练图像集中的每张图像的第一待训练神经网络的硬三元损失和第二待训练神经网络 的硬三元损失后,将训练图像集中第一待训练神经网络的硬三元损失的平均值作为第一硬三元损失,将训练图像集中至少一个图像的第二待训练神经网络的硬三元损失的平均值作为第二硬三元损失。
通过第一硬三元损失监督第i次第一迭代的第一待训练神经网络,可使第i次第一迭代的第一待训练神经网络将提高属于同一类的图像的特征数据之间的相似度,并降低属于不同类的图像的特征数据之间的相似度。以更好的区分不同类别的图像,提高对图像类别的识别准确率。同理,通过第二硬三元损失监督第i次第一迭代的第二待训练神经网络,可提高第i次第一迭代的第二待训练神经网络从图像中提取特征的效果,获得包含更丰富的图像信息的特征数据。
本实施例依据第一硬分类损失、第一软分类损失和第一硬三元确定第一损失,依据第二硬分类损失、第二软分类损失和第二硬三元确定第二损失。再基于第一损失调整第i次第一迭代的第一待训练神经网络,基于第二损失调整第i次第一迭代的第二待训练神经网络,实现通过训练图像和第i次第一迭代的平均网络监督第i次第一迭代的第一待训练神经网络,以及通过训练图像和第i次第一迭代的训练前的目标网络监督第i次第一迭代的第二待训练神经网络。
实施例中的第i次迭代通过训练图像集中的图像的标签确定正样本特征数据子集和负样本特征数据子集,而该标签为通过步骤301中的传统方法获得的伪硬标签。由于伪硬标签为独热编码(one-hot)处理后的数据,即伪硬标签中的数值非0即1,因此伪硬标签存在较大误差,因此通过伪硬标签确定的正样本子集和负样本子集也存在较大误差,进而导致第i次第一迭代后获得的第i+1次第一迭代的第一待训练神经网络在目标域上的特征提取效果不佳,进而导致在目标域上的识别准确率低。
举例来说(例6),训练图像集中图像的标签包含两个类别(张三和李四),由于伪硬标签中的数值非0即1,因此训练图像集中的图像中的人物对象要么是张三,要么是李四。假设训练图像集中的图像1中的人物对象为张三,图像1的伪硬标签表征的类别为张三,图像2中的人物对象为李四,但图像2的伪硬标签表征的类别为张三,图像3中的人物对象为张三,但图像3的伪硬标签表征的类别为李四。图像1在第一特征数据集中的特征数据为特征数据a,图像2在第一特征数据集中的特征数据为特征数据b,图像3在第一特征数据集中的特征数据为特征数据c。特征数据a的类内最难特征数据为特征数据b,特征数据a的类外最难特征数据为特征数据c。通过特征数据a、特征数据b和特征数据c确定的第一硬三元损失监督第i次第一迭代的第一待训练神经网络调整第i次第一迭代的第一待训练神经网络,使第i次第一迭代的第一待训练神经网络提高从图像1中提取出的特征数据和从图像2中提取出的特征数据之间的相似度,并降低从图像1中提取出的特征数据和从图像2中提取出的特征数据之间的相似度,获得第i+1次第一迭代的第一待训练神经网络。而图像1中的人物对象(张三)和图像2中的人物对象(李四)不是同一个人,若提高图像1的特征数据与图像2的特征数据之间的相似度显然会导致第i+1次第一迭代的第一待训练神经网络对张三或李四的识别准确率低。同理,图像1中的人物对象(张三)和图像3中的人物对象(张三)是同一个人,若降低图像1的特征数据与图像3的特征数据之间的相似度显然会导致第i+1次第一迭代的第一待训练神经网络对张三或李四的识别准确率低。
为减小如例6中伪硬标签带来的影响,本公开实施例提供了一种通过软标签监督第i次第一迭代的第一待训练神经网络,获得第一软三元损失的方法。通过第一软三元损失监督第i次第一迭代的第一待训练神经网络以提高第i+1次第一迭代的第一待训练神经网络的识别准确率,进而提高目标神经网络的识别准确率。
请参阅图6,图6为本公开实施例提供另一种图像处理方法的流程图。
601、经第i次第一迭代的平均网络对上述训练图像集进行处理获得第二特征数据集,经第i次第一迭代的训练前的目标网络对上述训练图像集进行处理获得第五特征数据集。
602、依据上述第一特征数据集、上述第二特征数据集、上述第四特征数据集和上述第五特征数据集获得第一软三元损失和第二软三元损失。
确定上述第一图像在第二特征数据集中的第二特征数据与第二特征数据集中的正样本特征数据子集中的特征数据之间的最小相似度,获得第二相似度,确定第二特征数据与第二特征数据集中的负样本特征数据子集中的特征数据之间的最大相似度,获得第四相似度。确定上述第一图像在第五特征数据集中的第四特征数据与第五特征数据集中的正样本特征数据子集中的特征数据之间的最小相似度,获得第十一相似度,确定第四特征数据与第五特征数据集中的负样本特征数据子集中的特征数据之间的最大相似度,获得第十二相似度。
需要理解的是,本公开实施例中不同特征数据集中的正样本特征数据子集包含的特征数据不同,不同特征数据集中的负样本特征数据子集包含的特征数据也不同。
由于伪硬标签中的数值非0即1导致对训练图像集中的图像的类别的划分“太绝对”,进而导致第i+1次第一迭代的第一待训练神经网络在目标域上的特征提取效果不佳。本实施例通过分别对第一相似度、第二相似度、第三相似度、第四相似度、第九相似度、第十相似度、第十一相似度和第十二相似度进行归一化处理,将第一相似度、第二相似度、第三相似度、第四相似度、第九相似度、第十相似度、第十一相似 度和第十二相似度转化为0至1之间的数值,并通过归一化处理后获得的相似度之间的差异确定第i次第一迭代的第一待训练神经网络的第一软三元损失和第i次第一迭代的第二待训练神经网络的第二软三元损失,以提高第i+1次第一迭代的第一待训练神经网络在目标域上的特征提取效果。
在一种可能实现的方式中,通过确定第二相似度和第四相似度的和获得第一总相似度,确定第一相似度和第三相似度的和获得第二总相似度,确定第九相似度和第十相似度的和获得第三总相似度,确定第十一相似度和第十二相似度的和获得第四总相似度。计算第二相似度与第一总相似度的商获得第五相似度,计算第四相似度与第一总相似度的商获得第六相似度,计算第一相似度与第二总相似度的商获得第七相似度,计算第三相似度与第二总相似度的商获得第八相似度,计算第九相似度与第三总相似度的商获得第十三相似度,计算第十相似度与第二总相似度的商获得第十四相似度,计算第十一相似度与第四总相似度的商获得第十五相似度,计算第十二相似度与第四总相似度的商获得第十六相似度。完成对第一相似度、第二相似度、第三相似度、第四相似度、第九相似度、第十相似度、第十一相似度和第十二相似度的归一化处理。再将第五相似度和第六相似度作为第i次第一迭代的第一待训练神经网络的监督数据(即软标签)调整第i次第一迭代的第一待训练神经网络的参数,将第十五相似度和第十六相似度作为第i次第一迭代的第二待训练神经网络的监督数据(即软标签)调整第i次第一迭代的第二待训练神经网络的参数。即依据第五相似度与第七相似度之间的差异以及第六相似度与第八相似度之间的差异确定第一软三元损失,依据第十三相似度与第十五相似度之间的差异以及第十四相似度与第十六相似度之间的差异确定第二软三元损失。
可选的,依据训练图像集中的每张图像在第二特征数据集中的类内最难特征数据与每张图像在第二特征数据集中的特征数据之间的相似度,以及在第二特征数据集中的类外最难特征数据与每张图像在第二特征数据集中的特征数据之间的相似度,确定每张图像的第i次第一迭代的平均网络的软三元损失。依据训练图像集中的每张图像在第五特征数据集中的类内最难特征数据与每张图像在第五特征数据集中的特征数据之间的相似度,以及在第五特征数据集中的类外最难特征数据与每张图像在第五特征数据集中的特征数据之间的相似度,确定每张图像的第i次第一迭代的目标神经网络的软三元损失。再计算训练图像集中的至少一个图像的第i次第一迭代的平均网络的软三元损失的平均值获得第一软三元损失,计算训练图像集中的至少一个图像的第i次第一迭代的目标神经网络的软三元损失的平均值获得第二软三元损失。
通过对第一相似度、第二相似度、第三相似度、第四相似度、第九相似度、第十相似度、第十一相似度和第十二相似度进行归一化处理获得的大小在0至1之间的相似度比伪硬标签更接近于真实的数据分布,因此将归一化处理后的相似度作为监督数据可提高目标神经网络的识别准确率。
例如,待处理图像集中包含10张图像,经步骤401的处理,按待处理图像集中的图像中的人物对象的身份将待处理图像集中的图像分为张三和李四,其中,包含的人物对象的身份的伪硬标签为张三的图像(下文将称为第一类图像)和包含的人物对象的身份的伪硬标签为李四的图像(下文将称为第二类图像)各有5张图像。但第一类图像中的图像1的人物对象的真实身份为李四,第二类图像中的图像2的人物对象的真实身份为张三。也就是说第一类图像中包含4张人物对象的身份为张三的图像,1张人物对象的身份为李四的图像,对于第一类图像而言真实标签的分布应该是[0.8,0.2]。其中,[0.8,0.2]表征第一类图像中包含的人物对象的身份为张三的图像的数量在第一类图像的总数量的占比为0.8,第一类图像中包含的人物对象的身份为李四的图像的数量在第一类图像的总数量的占比为0.2。同理,对第二类图像而言真实标签的分布应该是[0.2,0.8]。其中,[0.2,0.8]表征第二类图像中包含的人物对象的身份为张三的图像的数量在第二类图像的总数量的占比为0.2,第二类图像中包含的人物对象的身份为李四的图像的数量在第二类图像的总数量的占比为0.8。但第一类图像的伪硬标签为[1,0],第二类图像的伪硬标签为[0,1],这显然不符合第一类图像的真实标签的分布和第二类图像的真实标签的分布。而通过本实施例提供的方法获得的软标签是0至1之间的数值,更符合第一类图像的真实标签的分布和第二类图像的真实标签的分布,因此以软标签为监督数据监督第i次迭代的第一待训练神经网络和第i次迭代的第二待训练神经网络,可提高最终获得的目标神经网络的在目标域上的特征提取效果。可选的,本公开实施例中的相似度可以是欧式距离,也可以是余弦相似度,本公开对此不做限定。
可选的,在将训练图像集输入至第i次第一迭代的第一待训练神经网络,第i次第一迭代的第二待训练神经网络,第i次第一迭代的目标神经网络和第i次第一迭代的平均神经网络之前,可对训练图像集中的图像进行第一预处理获得第一图像集,再将第一图像集输入至第i次第一迭代的第一待训练神经网络获得第一特征数据集,将第一图像集输入至第i次第一迭代的目标神经网络获得第五特征数据集。其中,第一预处理包括擦除处理、剪裁处理、翻转处理中的任意一种。
通过对训练图像集进行第一预处理可减小在训练过程中,第i次第一迭代的第一待训练神经网络,第i次第一迭代的第二待训练神经网络第二待训练神经网络,第i次第一迭代的目标神经网络和第i次第一迭代的平均网络出现过拟合的概率。
可选的,在对训练图像集进行第一预处理的同时,还可对训练图像集进行第二预处理获得第四图像集。 其中,第二预处理包括擦除处理、剪裁处理、翻转处理中的任意一种,且第一预处理和第二预处理不同。将第四图像集输入至第i次第一迭代的第二待训练神经网络获得第四特征数据集,将第四图像集输入至第i次第一迭代的平均网络获得第二特征数据集。
通过对训练图像集同时进行第一预处理和第二预处理可进一步减小在训练过程中,第i次第一迭代的第一待训练神经网络,第i次第一迭代的第二待训练神经网络第二待训练神经网络,第i次第一迭代的目标神经网络和第i次第一迭代的平均网络出现过拟合的概率。
举例来说(例7),训练图像集包含图像1和图像2,对图像1进行剪裁处理获得图像3,对图像2进行擦除处理(擦除图像2中的任意一块区域)获得图像4,并将图像3和图像4作为第一图像集。对图像1进行翻转处理获得图像5,对图像2进行剪裁处理获得图像6,并将图像5和图像6作为第四图像集。将图像3和图像4输入至第i次第一迭代的第一待训练神经网络获得包含图像3的特征数据和图像4的特征数据的第一特征数据集,将图像3和图像4输入至第i次第一迭代的目标神经网络分别获得包含图像3的特征数据和图像4的特征数据的第五特征数据集,将图像5和图像6输入至第i次第一迭代的第二待训练神经网络分别包含图像5的特征数据和图像6的特征数据的第四特征数据集,将图像5和图像6输入至第i次第一迭代的平均网络获得包含图像5的特征数据和图像6的特征数据的第二特征数据集。
对训练图像集中的第一图像进行第一预处理获得第二图像,对第一图像进行第二预处理获得第四图像,第二图像和第四图像的图像内容不同,但第二图像和第四图像的标签相同。接着例7继续举例,图像1的标签,图像3的标签和图像5的标签均相同,图像2的标签,图像4的标签和图像6的标签均相同。
再经第i次第一迭代的第一待训练神经网络对第二图像进行处理可获得第一分类结果,经第i次第一迭代的目标神经网络对第二图像进行处理可获得第四分类结果,经第i次第一迭代的第二待训练神经网络对第四图像进行处理可获得第三分类结果,经第i次第一迭代的平均网络对第二图像进行处理可获得第二分类结果。
经第i次第一迭代的第一待训练神经网络对第一图像集的处理获得的第一特征数据集中的特征数据与经第i次第一迭代的第一待训练神经网络对训练图像集的处理获得的第一特征数据集中的特征数据不同。此时,前文所提到的训练图像集中的图像在第一特征数据集(或第二特征数据集或第四特征数据集或第五特征数据集)中的类内最难特征数据指经过第一预处理或第二预处理后的图像在第一特征数据集(或第二特征数据集或第四特征数据集或第五特征数据集)中的类内最难特征数据,训练图像集中的图像在第一特征数据集(或第二特征数据集或第四特征数据集或第五特征数据集)中的类外最难特征数据指经过第一预处理或第二预处理后的图像在第一特征数据集(或第二特征数据集或第四特征数据集或第五特征数据集)中的类外最难特征数据。
需要理解的是,本公开实施例中的第一待训练神经网络、第一神经网络和第i次第一迭代的第一待训练神经网络的结构相同,但参数不同。第二待训练神经网络、第二神经网络和第i次第一迭代的第二待训练神经网络的结构相同,但参数不同。目标网络和第i次第一迭代的目标神经网络的结构相同,但参数不同。平均网络和第i次第一迭代的训练前的平均神经网络的结构相同,但参数不同。图4中所示的第i次第一迭代的第一待训练神经网络的输出包括第一分类结果和第一特征数据集,第i次第一迭代的目标神经网络的输出包括第四分类结果和第五特征数据集,第i次第一迭代的第二待训练神经网络的输出包括第三分类结果和第四特征数据集,第i次第一迭代的平均网络的输出包括第二分类结果和第二特征数据集。
可选的,若由于执行本公开所提供的技术方案的设备的硬件资源不足,无法在一次第一迭代或第二迭代中处理完训练图像集中至少一个的图像,可从训练图像集中采样获得样本图像集,并将样本图像集作为一次第一迭代或一次第二迭代的训练数据。
603、依据上述第一硬分类损失、上述第一软分类损失、上述第一软三元损失和上述第一硬三元损失确定第一损失,依据上述第二硬分类损失、上述第二软分类损失、上述第二软三元损失和上述第二硬三元损失确定第二损失。
对第一硬三元损失、第一硬分类损失、第一软三元损失和第一软分类损失进行加权求和获得第一损失,对第二硬三元损失、第二硬分类损失、第二软三元损失和第二软分类损失进行加权求和获得第二损失。其中,加权求和的权重可根据实际使用情况进行调整,本公开对此不做限定。
应用本实施例提供的技术方案可依据第一特征数据集、第二特征数据集、第四特征数据集和第五特征数据集获得软标签,以软标签监督第i次迭代的第一待训练神经网络和第i次迭代的第二待训练神经网络可获得第一软三元损失和第二软三元损失。基于第一软三元损失调整第i次迭代的第一待训练神经网络的参数获得第i+1次迭代的第一待训练神经网络,基于第二软三元损失调整第i次迭代的第二待训练神经网络,可提高第i+1次迭代的第一待训练神经网络在目标域上的识别准确率和第i+1次迭代的第二待训练神经网络在目标域上的特征提取的效果,进而可提高目标神经网络在目标域上的识别准确率。
本公开实施例还提供了基于实施例获得的待处理图像的目标特征数据进行图像检索的应用场景。即使用上述目标特征数据检索数据库,获得具有与上述目标特征数据匹配的特征数据的图像,作为目标图像。
上述数据库可以在获取待处理图像之前建立,数据库包括图像和图像的特征数据,其中,图像的特征数据与目标神经网络在目标域上执行的任务有关。例如,使用目标神经网络识别目标域内的图像中的人物对象的身份,图像的特征数据包括图像中的人物对象的特征包括人物对象的服饰属性、外貌特征和其他可用于识别人物对象的身份的特征。服饰属性包括装饰人体的物品的特征中的至少一种(如上衣颜色、裤子颜色、裤子长度、帽子款式、鞋子颜色、打不打伞、箱包类别、有无口罩、口罩颜色)。外貌特征包括体型、性别、发型、发色、年龄段、是否戴眼镜、胸前是否抱东西。其他可用于识别人物对象的身份的特征包括:姿态、视角、步幅、环境亮度。又例如,使用目标神经网络识别目标域内的图像中包含苹果、梨子、桃子中的哪一种水果,图像的特征数据包括可苹果的特征信息或梨子的特征信息或桃子的特征信息。
由于数据库中的每张图像均具有特征数据,因此,使用目标特征数据检索数据库,从数据库中确定与目标特征数据匹配的特征数据,即确定目标特征数据与数据库中的图像的特征数据的相似度,并将相似度达到阈值的图像的特征数据作为与目标特征数据匹配的特征数据,进而确定目标图像。需要理解的是,目标图像的数量可以是一张,也可以是多张。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
上述详细阐述了本公开实施例的方法,下面提供了本公开实施例的装置。
请参阅图7,图7为本公开实施例提供的一种图像处理装置的结构示意图,该装置1包括:获取单元11、处理单元12以及检索单元13,其中:
获取单元11,用于获取待处理图像;
特征提取处理单元12,用于使用目标神经网络对所述待处理图像进行特征提取处理,获得所述待处理图像的目标特征数据,所述目标神经网络的参数为第一神经网络的参数的时序平均值,所述第一神经网络在训练图像集和平均网络的监督下训练获得,所述平均网络的参数为第二神经网络的参数的时序平均值,所述第二神经网络在所述训练图像集和所述目标神经网络的监督下训练获得。
在一种可能实现的方式中,所述第一神经网络在训练图像集和平均网络的监督下训练获得,包括:获取所述训练图像集、第一待训练神经网络和第二待训练神经网络;对所述第一待训练神经网络和所述第二待训练神经网络执行x次第一迭代,获得所述第一神经网络和第二神经网络,所述x为正整数;所述x次第一迭代中的第i次第一迭代包括:以所述训练图像集和第i次第一迭代的所述平均网络的输出监督第i次第一迭代的所述第一待训练神经网络获得第i+1次第一迭代的所述第一待训练神经网络,以所述训练图像集和第i次第一迭代的所述目标神经网络的输出监督第i次第一迭代的所述第二待训练神经网络,获得第i+1次第一迭代的所述第二待训练神经网络;所述目标神经网络的参数为第一神经网络的参数的时序平均值,包括:依据第i-1次第一迭代的所述目标神经网络的参数和所述第i次第一迭代的所述第一待训练神经网络的参数确定所述第i次第一迭代的所述目标神经网络的参数,所述i为小于或等于所述x的正整数;在所述i=1的情况下,所述第i-1次第一迭代的所述目标神经网络的参数与所述第一待训练神经网络的参数相同。
在另一种可能实现的方式中,所述以所述训练图像集和第i次平均网络的输出监督第i次第一迭代的所述第一待训练神经网络,获得第i+1次第一迭代的所述第一待训练神经网络,包括:经所述第i次第一迭代的所述第一待训练神经网络对所述训练图像集进行处理获得第一特征数据集,经所述第i次第一迭代的所述平均网络对所述训练图像集进行处理获得第二特征数据集;依据所述第一特征数据集和所述第二特征数据集获得第一软三元损失;以所述训练图像集和所述第一软三元损失监督所述第i次第一迭代的所述第一待训练神经网络,获得所述第i+1次第一迭代的所述第一待训练神经网络。
在又一种可能实现的方式中,所述依据所述第一特征数据集和所述第二特征数据集获得第一软三元损失,包括:确定所述训练图像集中的第一图像在所述第一特征数据集中的第一特征数据与所述第一特征数据集中的正样本特征数据子集中的特征数据之间的最小相似度,获得第一相似度;确定所述第一图像在所述第二特征数据集中的第二特征数据与所述第二特征数据集中的所述正样本特征数据子集中的特征数据之间的最小相似度,获得第二相似度,所述正样本特征数据子集包括具有与所述第一图像的第一标签相同的标签的图像的特征数据;确定所述第一特征数据与所述第一特征数据集中的负样本特征数据子集中的特征数据之间的最大相似度,获得第三相似度;确定所述第二特征数据与所述第二特征数据集中的所述负样本特征数据子集中的特征数据之间的最大相似度,获得第四相似度,所述负样本特征数据子集包括具有与所述第一标签不同的标签的图像的特征数据;分别对所述第一相似度、所述第二相似度、所述第三相似度和所述第四相似度进行归一化处理,获得第五相似度、第六相似度、第七相似度和第八相似度;依据所述第五相似度,所述第六相似度,所述第七相似度和所述第八相似度,获得所述第一软三元损失。
在又一种可能实现的方式中,所述分别对所述第一相似度、所述第二相似度、所述第三相似度和所述第四相似度进行归一化处理,获得第五相似度、第六相似度、第七相似度和第八相似度,包括:确定所述第二相似度和所述第四相似度的和获得第一总相似度,确定所述第一相似度和所述第三相似度的和获得第 二总相似度;确定所述第二相似度与所述第一总相似度的商获得所述第五相似度,确定所述第四相似度与所述第一总相似度的商获得所述第六相似度;确定所述第一相似度与所述第二总相似度的商获得所述第七相似度,确定所述第三相似度与所述第二总相似度的商获得所述第八相似度。
在又一种可能实现的方式中,所述以所述训练图像集和所述第一软三元损失监督所述第i次第一迭代的所述第一待训练神经网络,获得所述第i+1次第一迭代的所述第一待训练神经网络,包括:经所述第i次第一迭代的所述第一待训练神经网络对所述第一图像进行处理获得第一分类结果;依据所述第一分类结果、所述第一标签和所述第一软三元损失确定所述第i次第一迭代的所述第一待训练神经网络的第一损失;基于所述第一损失调整所述第i次第一迭代的所述第一待训练神经网络的参数获得所述第i+1次第一迭代的所述第一待训练神经网络。
在又一种可能实现的方式中,所述依据所述第一分类结果、所述第一标签和所述第一软三元损失确定所述第i次第一迭代的所述第一待训练神经网络的第一损失,包括:依据所述第一分类结果和所述第一标签之间的差异确定第一硬分类损失;依据所述第一硬分类损失和所述第一软三元损失确定所述第一损失。
在又一种可能实现的方式中,在所述依据所述第一硬分类损失和所述第一软三元损失确定所述第一损失之前,经所述第i次第一迭代的所述平均网络对所述第一图像进行处理获得第二分类结果;依据所述第一分类结果和所述第二分类结果之间的差异确定第一软分类损失;所述依据所述第一硬分类损失和所述第一软三元损失确定所述第一损失,包括:依据所述第一硬分类损失、所述第一软分类损失和所述第一软三元损失确定所述第一损失。
在又一种可能实现的方式中,在所述依据所述第一硬分类损失、所述第一软分类损失和所述第一软三元损失确定所述第一损失之前,依据所述第一相似度和所述第三相似度确定第一硬三元损失;所述依据所述第一硬分类损失、所述第一软分类损失和所述第一软三元损失确定所述第一损失,包括:依据所述第一硬分类损失、所述第一软分类损失、所述第一软三元损失和所述第一硬三元损失确定所述第一损失。
在又一种可能实现的方式中,所述经所述第i次第一迭代的所述第一待训练神经网络对所述训练图像集中的第一图像进行处理获得第一分类结果第一待训练神经网络,包括:对所述训练图像集进行第一预处理,获得第一图像集,所述第一预处理包括擦除处理、剪裁处理、翻转处理中的任意一种;经所述第i次第一迭代的所述第一待训练神经网络对所述第一图像集中的第二图像进行处理获得所述第一分类结果,所述第二图像通过对所述第一图像进行所述第一预处理获得,所述第二图像在所述第一特征数据集中的特征数据与所述第一图像在所述第一特征数据集中的数据相同。
在又一种可能实现的方式中,所述经第i次第一迭代的所述第一待训练神经网络对所述训练图像集进行处理获得第一特征数据集,包括:经所述第i次第一迭代的所述第一待训练神经网络对所述第一图像集进行处理获得所述第一特征数据集。
在又一种可能实现的方式中,所述获取单元11具体用于:获取待处理图像集和第三神经网络;对所述第三神经网络执行y次第二迭代,获得所述训练图像集,所述y为正整数;所述y次第二迭代中的第t次第二迭代包括:从所述待处理图像集中采样获得第二图像集,经第t次第二迭代的第三神经网络对所述第二图像集中的图像进行处理,获得包含所述第二图像集中的图像的特征数据的第三特征数据集以及包含所述第二图像集中的图像的分类结果的分类结果集;对所述第三特征数据集中的特征数据进行聚类处理确定所述第三特征数据集中的特征数据的标签,将所述第三特征数据集中的特征数据的标签添加至所述第二图像集中对应的图像中,获得第三图像集;依据所述分类结果集中的分类结果与所述第三图像集中的图像的标签之间的差异,确定第三损失;基于所述第三损失调整所述第t次第二迭代的第三神经网络的参数,获得第t+1次第二迭代的第三神经网络的参数,所述t为小于所述y的正整数。
在又一种可能实现的方式中,所述装置还包括:检索单元13,用于使用所述目标特征数据检索数据库,获得具有与所述目标特征数据匹配的特征数据的图像,作为目标图像。
本实施通过确定第一神经网络的参数的时序平均值和第二神经网络的时序平均值,分别获得目标神经网络的参数和平均网络的参数,再使用目标神经网络的输出监督第二神经网络,使用平均网络的输出监督第一神经网络对目标神经网络进行训练,可提升训练效果。进而使用目标神经网络在目标域上执行相关的识别任务时,可提取出信息更丰富的目标特征数据,该信息可提高在目标域上的识别准确率。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。
图8为本公开实施例提供的一种图像处理装置的硬件结构示意图。该图像处理装置2包括处理器21,存储器22,输入装置23,输出装置24。该处理器21、存储器22、输入装置23和输出装置24通过连接器相耦合,该连接器包括各类接口、传输线或总线等等,本公开实施例对此不作限定。应当理解,本公开的各个实施例中,耦合是指通过特定方式的相互联系,包括直接相连或者通过其他设备间接相连,例如可以通过各类接口、传输线、总线等相连。
处理器21可以是一个或多个图形处理器(graphics processing unit,GPU),在处理器21是一个GPU 的情况下,该GPU可以是单核GPU,也可以是多核GPU。可选的,处理器21可以是多个GPU构成的处理器组,多个处理器之间通过一个或多个总线彼此耦合。可选的,该处理器还可以为其他类型的处理器等等,本公开实施例不作限定。
存储器22可用于存储计算机程序指令,以及用于执行本公开方案的程序代码在内的各类计算机程序代码。可选地,存储器包括但不限于是随机存储记忆体(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)、或便携式只读存储器(compact disc read-only memory,CD-ROM),该存储器用于相关指令及数据。
输入装置23用于输入数据和/或信号,以及输出装置24用于输出数据和/或信号。输出装置23和输入装置24可以是独立的器件,也可以是一个整体的器件。
可理解,本公开实施例中,存储器22不仅可用于存储相关指令,还可用于存储相关图像,如该存储器22可用于存储通过输入装置23获取的待搜索神经网络,又或者该存储器22还可用于存储通过处理器21搜索获得的目标神经网络等等,本公开实施例对于该存储器中具体所存储的数据不作限定。
可以理解的是,图8仅仅示出了一种图像处理装置的简化设计。在实际应用中,图像处理装置还可以分别包含必要的其他元件,包含但不限于任意数量的输入/输出装置、处理器、存储器等,而所有可以实现本公开实施例的图像处理装置都在本公开的保护范围之内。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本公开的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。所属领域的技术人员还可以清楚地了解到,本公开各个实施例描述各有侧重,为描述的方便和简洁,相同或类似的部分在不同实施例中可能没有赘述,因此,在某一实施例未描述或未详细描述的部分可以参见其他实施例的记载。
在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本公开实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,数字通用光盘(digital versatile disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来指令相关的硬件完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:只读存储器(read-only memory,ROM)或随机存储存储器(random access memory,RAM)、磁碟或者光盘等各种可存储程序代码的介质。

Claims (32)

  1. 一种图像处理方法,其特征在于,所述方法包括:
    获取待处理图像;
    使用目标神经网络对所述待处理图像进行特征提取处理,获得所述待处理图像的目标特征数据,所述目标神经网络的参数为第一神经网络的参数的时序平均值,所述第一神经网络在训练图像集和平均网络的监督下训练获得,所述平均网络的参数为第二神经网络的参数的时序平均值,所述第二神经网络在所述训练图像集和所述目标神经网络的监督下训练获得。
  2. 根据权利要求1所述的方法,其特征在于,所述第一神经网络在训练图像集和平均网络的监督下训练获得,包括:
    获取所述训练图像集、第一待训练神经网络和第二待训练神经网络;
    对所述第一待训练神经网络和所述第二待训练神经网络执行x次第一迭代,获得所述第一神经网络和第二神经网络,所述x为正整数;
    所述x次第一迭代中的第i次第一迭代包括:
    以所述训练图像集和第i次第一迭代的所述平均网络的输出监督第i次第一迭代的所述第一待训练神经网络获得第i+1次第一迭代的所述第一待训练神经网络,以所述训练图像集和第i次第一迭代的所述目标神经网络的输出监督第i次第一迭代的所述第二待训练神经网络,获得第i+1次第一迭代的所述第二待训练神经网络;
    所述目标神经网络的参数为第一神经网络的参数的时序平均值,包括:
    依据第i-1次第一迭代的所述目标神经网络的参数和所述第i次第一迭代的所述第一待训练神经网络的参数确定所述第i次第一迭代的所述目标神经网络的参数,所述i为小于或等于所述x的正整数;
    在所述i=1的情况下,所述第i-1次第一迭代的所述目标神经网络的参数与所述第一待训练神经网络的参数相同。
  3. 根据权利要求2所述的方法,其特征在于,所述以所述训练图像集和第i次平均网络的输出监督第i次第一迭代的所述第一待训练神经网络,获得第i+1次第一迭代的所述第一待训练神经网络,包括:
    经所述第i次第一迭代的所述第一待训练神经网络对所述训练图像集进行处理获得第一特征数据集,经所述第i次第一迭代的所述平均网络对所述训练图像集进行处理获得第二特征数据集;
    依据所述第一特征数据集和所述第二特征数据集获得第一软三元损失;
    以所述训练图像集和所述第一软三元损失监督所述第i次第一迭代的所述第一待训练神经网络,获得所述第i+1次第一迭代的所述第一待训练神经网络。
  4. 根据权利要求3所述的方法,其特征在于,所述依据所述第一特征数据集和所述第二特征数据集获得第一软三元损失,包括:
    确定所述训练图像集中的第一图像在所述第一特征数据集中的第一特征数据与所述第一特征数据集中的正样本特征数据子集中的特征数据之间的最小相似度,获得第一相似度;确定所述第一图像在所述第二特征数据集中的第二特征数据与所述第二特征数据集中的所述正样本特征数据子集中的特征数据之间的最小相似度,获得第二相似度,所述正样本特征数据子集包括具有与所述第一图像的第一标签相同的标签的图像的特征数据;
    确定所述第一特征数据与所述第一特征数据集中的负样本特征数据子集中的特征数据之间的最大相似度,获得第三相似度;确定所述第二特征数据与所述第二特征数据集中的所述负样本特征数据子集中的特征数据之间的最大相似度,获得第四相似度,所述负样本特征数据子集包括具有与所述第一标签不同的标签的图像的特征数据;
    分别对所述第一相似度、所述第二相似度、所述第三相似度和所述第四相似度进行归一化处理,获得第五相似度、第六相似度、第七相似度和第八相似度;
    依据所述第五相似度,所述第六相似度,所述第七相似度和所述第八相似度,获得所述第一软三元损失。
  5. 根据权利要求4所述的方法,其特征在于,所述分别对所述第一相似度、所述第二相似度、所述第三相似度和所述第四相似度进行归一化处理,获得第五相似度、第六相似度、第七相似度和第八相似度,包括:
    确定所述第二相似度和所述第四相似度的和获得第一总相似度,确定所述第一相似度和所述第三相似度的和获得第二总相似度;
    确定所述第二相似度与所述第一总相似度的商获得所述第五相似度,确定所述第四相似度与所述第一总相似度的商获得所述第六相似度;
    确定所述第一相似度与所述第二总相似度的商获得所述第七相似度,确定所述第三相似度与所述第二总相似度的商获得所述第八相似度。
  6. 根据权利要求5所述的方法,其特征在于,所述以所述训练图像集和所述第一软三元损失监督所述第i次第一迭代的所述第一待训练神经网络,获得所述第i+1次第一迭代的所述第一待训练神经网络,包括:
    经所述第i次第一迭代的所述第一待训练神经网络对所述第一图像进行处理获得第一分类结果;
    依据所述第一分类结果、所述第一标签和所述第一软三元损失确定所述第i次第一迭代的所述第一待训练神经网络的第一损失;
    基于所述第一损失调整所述第i次第一迭代的所述第一待训练神经网络的参数获得所述第i+1次第一迭代的所述第一待训练神经网络。
  7. 根据权利要求6所述的方法,其特征在于,所述依据所述第一分类结果、所述第一标签和所述第一软三元损失确定所述第i次第一迭代的所述第一待训练神经网络的第一损失,包括:
    依据所述第一分类结果和所述第一标签之间的差异确定第一硬分类损失;
    依据所述第一硬分类损失和所述第一软三元损失确定所述第一损失。
  8. 根据权利要求7所述的方法,其特征在于,在所述依据所述第一硬分类损失和所述第一软三元损失确定所述第一损失之前,所述方法还包括:
    经所述第i次第一迭代的所述平均网络对所述第一图像进行处理获得第二分类结果;
    依据所述第一分类结果和所述第二分类结果之间的差异确定第一软分类损失;
    所述依据所述第一硬分类损失和所述第一软三元损失确定所述第一损失,包括:
    依据所述第一硬分类损失、所述第一软分类损失和所述第一软三元损失确定所述第一损失。
  9. 根据权利要求8所述的方法,其特征在于,在所述依据所述第一硬分类损失、所述第一软分类损失和所述第一软三元损失确定所述第一损失之前,所述方法还包括:
    依据所述第一相似度和所述第三相似度确定第一硬三元损失;
    所述依据所述第一硬分类损失、所述第一软分类损失和所述第一软三元损失确定所述第一损失,包括:
    依据所述第一硬分类损失、所述第一软分类损失、所述第一软三元损失和所述第一硬三元损失确定所述第一损失。
  10. 根据权利要求5至9中任意一项所述的方法,其特征在于,所述经所述第i次第一迭代的所述第一待训练神经网络对所述训练图像集中的第一图像进行处理获得第一分类结果,包括:
    对所述训练图像集进行第一预处理,获得第一图像集,所述第一预处理包括擦除处理、剪裁处理、翻转处理中的任意一种;
    经所述第i次第一迭代的所述第一待训练神经网络对所述第一图像集中的第二图像进行处理获得所述第一分类结果,所述第二图像通过对所述第一图像进行所述第一预处理获得,所述第二图像在所述第一特征数据集中的特征数据与所述第一图像在所述第一特征数据集中的数据相同。
  11. 根据权利要求10所述的方法,其特征在于,所述经第i次第一迭代的所述第一待训练神经网络对所述训练图像集进行处理获得第一特征数据集,包括:
    经所述第i次第一迭代的所述第一待训练神经网络对所述第一图像集进行处理获得所述第一特征数据集。
  12. 根据权利要求2至11中任意一项所述的方法,其特征在于,所述获取所述训练图像集,包括:
    获取待处理图像集和第三神经网络;
    对所述第三神经网络执行y次第二迭代,获得所述训练图像集,所述y为正整数;
    所述y次第二迭代中的第t次第二迭代包括:
    从所述待处理图像集中采样获得第二图像集,经第t次第二迭代的第三神经网络对所述第二图像集中的图像进行处理,获得包含所述第二图像集中的图像的特征数据的第三特征数据集以及包含所述第二图像集中的图像的分类结果的分类结果集;
    对所述第三特征数据集中的特征数据进行聚类处理确定所述第三特征数据集中的特征数据的标签,将所述第三特征数据集中的特征数据的标签添加至所述第二图像集中对应的图像中,获得第三图像集;
    依据所述分类结果集中的分类结果与所述第三图像集中的图像的标签之间的差异,确定第三损失;
    基于所述第三损失调整所述第t次第二迭代的第三神经网络的参数,获得第t+1次第二迭代的第三神经网络的参数,所述t为小于所述y的正整数。
  13. 根据权利要求1至12中任意一项所述的方法,其特征在于,所述方法还包括:
    使用所述目标特征数据检索数据库,获得具有与所述目标特征数据匹配的特征数据的图像,作为目标图像。
  14. 根据权利要求1至13中任意一项所述的方法,其特征在于,所述待处理图像包含人物对象。
  15. 一种图像处理装置,其特征在于,所述装置包括:
    获取单元,用于获取待处理图像;
    特征提取处理单元,用于使用目标神经网络对所述待处理图像进行特征提取处理,获得所述待处理图像的目标特征数据,所述目标神经网络的参数为第一神经网络的参数的时序平均值,所述第一神经网络在训练图像集和平均网络的监督下训练获得,所述平均网络的参数为第二神经网络的参数的时序平均值,所述第二神经网络在所述训练图像集和所述目标神经网络的监督下训练获得。
  16. 根据权利要求15所述的装置,其特征在于,所述第一神经网络在训练图像集和平均网络的监督下训练获得,包括:
    获取所述训练图像集、第一待训练神经网络和第二待训练神经网络;
    对所述第一待训练神经网络和所述第二待训练神经网络执行x次第一迭代,获得所述第一神经网络和第二神经网络,所述x为正整数;
    所述x次第一迭代中的第i次第一迭代包括:
    以所述训练图像集和第i次第一迭代的所述平均网络的输出监督第i次第一迭代的所述第一待训练神经网络获得第i+1次第一迭代的所述第一待训练神经网络,以所述训练图像集和第i次第一迭代的所述目标神经网络的输出监督第i次第一迭代的所述第二待训练神经网络,获得第i+1次第一迭代的所述第二待训练神经网络;
    所述目标神经网络的参数为第一神经网络的参数的时序平均值,包括:
    依据第i-1次第一迭代的所述目标神经网络的参数和所述第i次第一迭代的所述第一待训练神经网络的参数确定所述第i次第一迭代的所述目标神经网络的参数,所述i为小于或等于所述x的正整数;
    在所述i=1的情况下,所述第i-1次第一迭代的所述目标神经网络的参数与所述第一待训练神经网络的参数相同。
  17. 根据权利要求16所述的装置,其特征在于,所述以所述训练图像集和第i次平均网络的输出监督第i次第一迭代的所述第一待训练神经网络,获得第i+1次第一迭代的所述第一待训练神经网络,包括:
    经所述第i次第一迭代的所述第一待训练神经网络对所述训练图像集进行处理获得第一特征数据集,经所述第i次第一迭代的所述平均网络对所述训练图像集进行处理获得第二特征数据集;
    依据所述第一特征数据集和所述第二特征数据集获得第一软三元损失;
    以所述训练图像集和所述第一软三元损失监督所述第i次第一迭代的所述第一待训练神经网络,获得所述第i+1次第一迭代的所述第一待训练神经网络。
  18. 根据权利要求17所述的装置,其特征在于,所述依据所述第一特征数据集和所述第二特征数据集获得第一软三元损失,包括:
    确定所述训练图像集中的第一图像在所述第一特征数据集中的第一特征数据与所述第一特征数据集中的正样本特征数据子集中的特征数据之间的最小相似度,获得第一相似度;确定所述第一图像在所述第二特征数据集中的第二特征数据与所述第二特征数据集中的所述正样本特征数据子集中的特征数据之间的最小相似度,获得第二相似度,所述正样本特征数据子集包括具有与所述第一图像的第一标签相同的标签的图像的特征数据;
    确定所述第一特征数据与所述第一特征数据集中的负样本特征数据子集中的特征数据之间的最大相似度,获得第三相似度;确定所述第二特征数据与所述第二特征数据集中的所述负样本特征数据子集中的特征数据之间的最大相似度,获得第四相似度,所述负样本特征数据子集包括具有与所述第一标签不同的标签的图像的特征数据;
    分别对所述第一相似度、所述第二相似度、所述第三相似度和所述第四相似度进行归一化处理,获得第五相似度、第六相似度、第七相似度和第八相似度;
    依据所述第五相似度,所述第六相似度,所述第七相似度和所述第八相似度,获得所述第一软三元损失。
  19. 根据权利要求18所述的装置,其特征在于,所述分别对所述第一相似度、所述第二相似度、所述第三相似度和所述第四相似度进行归一化处理,获得第五相似度、第六相似度、第七相似度和第八相似度,包括:
    确定所述第二相似度和所述第四相似度的和获得第一总相似度,确定所述第一相似度和所述第三相似度的和获得第二总相似度;
    确定所述第二相似度与所述第一总相似度的商获得所述第五相似度,确定所述第四相似度与所述第一总相似度的商获得所述第六相似度;
    确定所述第一相似度与所述第二总相似度的商获得所述第七相似度,确定所述第三相似度与所述第二总相似度的商获得所述第八相似度。
  20. 根据权利要求19所述的装置,其特征在于,所述以所述训练图像集和所述第一软三元损失监督所述第i次第一迭代的所述第一待训练神经网络,获得所述第i+1次第一迭代的所述第一待训练神经网络,包括:
    经所述第i次第一迭代的所述第一待训练神经网络对所述第一图像进行处理获得第一分类结果;
    依据所述第一分类结果、所述第一标签和所述第一软三元损失确定所述第i次第一迭代的所述第一待训练神经网络的第一损失;
    基于所述第一损失调整所述第i次第一迭代的所述第一待训练神经网络的参数获得所述第i+1次第一迭代的所述第一待训练神经网络。
  21. 根据权利要求20所述的装置,其特征在于,所述依据所述第一分类结果、所述第一标签和所述第一软三元损失确定所述第i次第一迭代的所述第一待训练神经网络的第一损失,包括:
    依据所述第一分类结果和所述第一标签之间的差异确定第一硬分类损失;
    依据所述第一硬分类损失和所述第一软三元损失确定所述第一损失。
  22. 根据权利要求21所述的装置,其特征在于,在所述依据所述第一硬分类损失和所述第一软三元损失确定所述第一损失之前,经所述第i次第一迭代的所述平均网络对所述第一图像进行处理获得第二分类结果;
    依据所述第一分类结果和所述第二分类结果之间的差异确定第一软分类损失;
    所述依据所述第一硬分类损失和所述第一软三元损失确定所述第一损失,包括:
    依据所述第一硬分类损失、所述第一软分类损失和所述第一软三元损失确定所述第一损失。
  23. 根据权利要求22所述的装置,其特征在于,在所述依据所述第一硬分类损失、所述第一软分类损失和所述第一软三元损失确定所述第一损失之前,依据所述第一相似度和所述第三相似度确定第一硬三元损失;
    所述依据所述第一硬分类损失、所述第一软分类损失和所述第一软三元损失确定所述第一损失,包括:
    依据所述第一硬分类损失、所述第一软分类损失、所述第一软三元损失和所述第一硬三元损失确定所述第一损失。
  24. 根据权利要求19至23中任意一项所述的装置,其特征在于,所述经所述第i次第一迭代的所述第一待训练神经网络对所述训练图像集中的第一图像进行处理获得第一分类结果,包括:
    对所述训练图像集进行第一预处理,获得第一图像集,所述第一预处理包括擦除处理、剪裁处理、翻转处理中的任意一种;
    经所述第i次第一迭代的所述第一待训练神经网络对所述第一图像集中的第二图像进行处理获得所述第一分类结果,所述第二图像通过对所述第一图像进行所述第一预处理获得,所述第二图像在所述第一特征数据集中的特征数据与所述第一图像在所述第一特征数据集中的数据相同。
  25. 根据权利要求24所述的装置,其特征在于,所述经第i次第一迭代的所述第一待训练神经网络对所述训练图像集进行处理获得第一特征数据集,包括:
    经所述第i次第一迭代的所述第一待训练神经网络对所述第一图像集进行处理获得所述第一特征数据集。
  26. 根据权利要求16至25中任意一项所述的装置,其特征在于,所述获取单元具体用于:
    获取待处理图像集和第三神经网络;
    对所述第三神经网络执行y次第二迭代,获得所述训练图像集,所述y为正整数;
    所述y次第二迭代中的第t次第二迭代包括:
    从所述待处理图像集中采样获得第二图像集,经第t次第二迭代的第三神经网络对所述第二图像集中的图像进行处理,获得包含所述第二图像集中的图像的特征数据的第三特征数据集以及包含所述第二图像集中的图像的分类结果的分类结果集;
    对所述第三特征数据集中的特征数据进行聚类处理确定所述第三特征数据集中的特征数据的标签,将所述第三特征数据集中的特征数据的标签添加至所述第二图像集中对应的图像中,获得第三图像集;
    依据所述分类结果集中的分类结果与所述第三图像集中的图像的标签之间的差异,确定第三损失;
    基于所述第三损失调整所述第t次第二迭代的第三神经网络的参数,获得第t+1次第二迭代的第三神经网络的参数,所述t为小于所述y的正整数。
  27. 根据权利要求15至26中任意一项所述的装置,其特征在于,所述装置还包括:
    检索单元,用于使用所述目标特征数据检索数据库,获得具有与所述目标特征数据匹配的特征数据的图像,作为目标图像。
  28. 根据权利要求15至27中任意一项所述的装置,其特征在于,所述待处理图像包含人物对象。
  29. 一种处理器,其特征在于,所述处理器用于执行如权利要求1至14中任意一项所述的方法。
  30. 一种电子设备,其特征在于,包括:处理器、发送装置、输入装置、输出装置和存储器,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述处理器执行所述计算机指令时,所述电子设备执行如权利要求1至14中任一项所述的方法。
  31. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,所述计 算机程序包括程序指令,所述程序指令当被电子设备的处理器执行时,使所述处理器执行权利要求1至14中任意一项所述的方法。
  32. 一种计算机程序,其特征在于,所述计算机程序包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现权利要求1至14中任意一项所述的方法。
PCT/CN2019/119180 2019-09-24 2019-11-18 图像处理方法及相关装置 WO2021056765A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
SG11202010487PA SG11202010487PA (en) 2019-09-24 2019-11-18 Image processing method and related device
KR1020217019630A KR20210095671A (ko) 2019-09-24 2019-11-18 이미지 처리 방법 및 관련 장치
JP2021500683A JP7108123B2 (ja) 2019-09-24 2019-11-18 画像処理方法、画像処理装置、プロセッサ、電子機器、記憶媒体及びコンピュータプログラム
US17/077,251 US11429809B2 (en) 2019-09-24 2020-10-22 Image processing method, image processing device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910905445.7 2019-09-24
CN201910905445.7A CN110647938B (zh) 2019-09-24 2019-09-24 图像处理方法及相关装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/077,251 Continuation US11429809B2 (en) 2019-09-24 2020-10-22 Image processing method, image processing device, and storage medium

Publications (1)

Publication Number Publication Date
WO2021056765A1 true WO2021056765A1 (zh) 2021-04-01

Family

ID=68992555

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/119180 WO2021056765A1 (zh) 2019-09-24 2019-11-18 图像处理方法及相关装置

Country Status (6)

Country Link
JP (1) JP7108123B2 (zh)
KR (1) KR20210095671A (zh)
CN (1) CN110647938B (zh)
SG (1) SG11202010487PA (zh)
TW (1) TW202113692A (zh)
WO (1) WO2021056765A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222139A (zh) * 2021-04-27 2021-08-06 商汤集团有限公司 神经网络训练方法和装置、设备,及计算机存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11429809B2 (en) 2019-09-24 2022-08-30 Beijing Sensetime Technology Development Co., Ltd Image processing method, image processing device, and storage medium
CN110647938B (zh) * 2019-09-24 2022-07-15 北京市商汤科技开发有限公司 图像处理方法及相关装置
CN111598124B (zh) * 2020-04-07 2022-11-11 深圳市商汤科技有限公司 图像处理及装置、处理器、电子设备、存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6128609A (en) * 1997-10-14 2000-10-03 Ralph E. Rose Training a neural network using differential input
CN105894046A (zh) * 2016-06-16 2016-08-24 北京市商汤科技开发有限公司 卷积神经网络训练及图像处理的方法和系统、计算机设备
CN108197670A (zh) * 2018-01-31 2018-06-22 国信优易数据有限公司 伪标签生成模型训练方法、装置及伪标签生成方法及装置
US20190042945A1 (en) * 2017-12-12 2019-02-07 Somdeb Majumdar Methods and arrangements to quantize a neural network with machine learning
CN110188829A (zh) * 2019-05-31 2019-08-30 北京市商汤科技开发有限公司 神经网络的训练方法、目标识别的方法及相关产品
CN110210535A (zh) * 2019-05-21 2019-09-06 北京市商汤科技开发有限公司 神经网络训练方法及装置以及图像处理方法及装置
CN110647938A (zh) * 2019-09-24 2020-01-03 北京市商汤科技开发有限公司 图像处理方法及相关装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4599509B2 (ja) * 2004-09-08 2010-12-15 独立行政法人理化学研究所 自己進化型パターン認識システム
CN108229468B (zh) * 2017-06-28 2020-02-21 北京市商汤科技开发有限公司 车辆外观特征识别及车辆检索方法、装置、存储介质、电子设备
CN108230359B (zh) * 2017-11-12 2021-01-26 北京市商汤科技开发有限公司 目标检测方法和装置、训练方法、电子设备、程序和介质
CN108009528B (zh) * 2017-12-26 2020-04-07 广州广电运通金融电子股份有限公司 基于Triplet Loss的人脸认证方法、装置、计算机设备和存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6128609A (en) * 1997-10-14 2000-10-03 Ralph E. Rose Training a neural network using differential input
CN105894046A (zh) * 2016-06-16 2016-08-24 北京市商汤科技开发有限公司 卷积神经网络训练及图像处理的方法和系统、计算机设备
US20190042945A1 (en) * 2017-12-12 2019-02-07 Somdeb Majumdar Methods and arrangements to quantize a neural network with machine learning
CN108197670A (zh) * 2018-01-31 2018-06-22 国信优易数据有限公司 伪标签生成模型训练方法、装置及伪标签生成方法及装置
CN110210535A (zh) * 2019-05-21 2019-09-06 北京市商汤科技开发有限公司 神经网络训练方法及装置以及图像处理方法及装置
CN110188829A (zh) * 2019-05-31 2019-08-30 北京市商汤科技开发有限公司 神经网络的训练方法、目标识别的方法及相关产品
CN110647938A (zh) * 2019-09-24 2020-01-03 北京市商汤科技开发有限公司 图像处理方法及相关装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222139A (zh) * 2021-04-27 2021-08-06 商汤集团有限公司 神经网络训练方法和装置、设备,及计算机存储介质

Also Published As

Publication number Publication date
SG11202010487PA (en) 2021-04-29
TW202113692A (zh) 2021-04-01
CN110647938B (zh) 2022-07-15
JP7108123B2 (ja) 2022-07-27
JP2022511221A (ja) 2022-01-31
KR20210095671A (ko) 2021-08-02
CN110647938A (zh) 2020-01-03

Similar Documents

Publication Publication Date Title
US11429809B2 (en) Image processing method, image processing device, and storage medium
CN111523621B (zh) 图像识别方法、装置、计算机设备和存储介质
WO2021056765A1 (zh) 图像处理方法及相关装置
Chen et al. Underwater object detection using Invert Multi-Class Adaboost with deep learning
CN110458107B (zh) 用于图像识别的方法和装置
CN111639544B (zh) 基于多分支跨连接卷积神经网络的表情识别方法
CN111783576B (zh) 基于改进型YOLOv3网络和特征融合的行人重识别方法
CN112784763B (zh) 基于局部与整体特征自适应融合的表情识别方法及系统
CN111164601A (zh) 情感识别方法、智能装置和计算机可读存储介质
CN110503076B (zh) 基于人工智能的视频分类方法、装置、设备和介质
WO2020192112A1 (zh) 人脸识别方法及装置
US20210117687A1 (en) Image processing method, image processing device, and storage medium
US20210383149A1 (en) Method for identifying individuals of oplegnathus punctatus based on convolutional neural network
CN112699265A (zh) 图像处理方法及装置、处理器、存储介质
CN114896434B (zh) 一种基于中心相似度学习的哈希码生成方法及装置
Xu et al. Weakly supervised facial expression recognition via transferred DAL-CNN and active incremental learning
CN115457332A (zh) 基于图卷积神经网络和类激活映射的图像多标签分类方法
CN115169386A (zh) 一种基于元注意力机制的弱监督增类活动识别方法
Yan et al. A multi-task learning model for better representation of clothing images
CN111858999B (zh) 一种基于分段困难样本生成的检索方法及装置
Wang et al. A convolutional neural network combined with aggregate channel feature for face detection
Mao et al. A Transfer Learning Method with Multi-feature Calibration for Building Identification
Pinge et al. A novel video retrieval method based on object detection using deep learning
Anggoro et al. Classification of Solo Batik patterns using deep learning convolutional neural networks algorithm
Kuseh et al. A Survey of Deep learning techniques for maize leaf disease detection: Trends from 2016 to 2021 and Future Perspectives

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021500683

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19947022

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20217019630

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19947022

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24.10.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19947022

Country of ref document: EP

Kind code of ref document: A1