WO2021203882A1 - 姿态检测及视频处理方法、装置、电子设备和存储介质 - Google Patents

姿态检测及视频处理方法、装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2021203882A1
WO2021203882A1 PCT/CN2021/079122 CN2021079122W WO2021203882A1 WO 2021203882 A1 WO2021203882 A1 WO 2021203882A1 CN 2021079122 W CN2021079122 W CN 2021079122W WO 2021203882 A1 WO2021203882 A1 WO 2021203882A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
image set
neural network
trained
unlabeled
Prior art date
Application number
PCT/CN2021/079122
Other languages
English (en)
French (fr)
Inventor
赵扬波
张展鹏
Original Assignee
深圳市商汤科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市商汤科技有限公司 filed Critical 深圳市商汤科技有限公司
Priority to JP2021564216A priority Critical patent/JP2022531763A/ja
Priority to KR1020217034492A priority patent/KR20210137213A/ko
Publication of WO2021203882A1 publication Critical patent/WO2021203882A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to an image processing and device, processor, electronic equipment, and storage medium.
  • the neural network Before using the neural network, the neural network needs to be trained.
  • the training data is used to train the neural network to obtain a trained neural network, and the trained neural network is applied to different application scenarios.
  • the accuracy of the processing results obtained is low.
  • the present disclosure provides an image processing and device, a processor, an electronic device, and a storage medium.
  • an image processing method includes:
  • the image processing neural network is used to process the image to be processed to obtain the processing result of the image to be processed; the image processing neural network is trained with the unlabeled image set and the labeled image set as training data; the unlabeled image
  • the acquisition condition of the image set is the same as the acquisition condition of the image to be processed; the acquisition condition of the labeled image set is different from the acquisition condition of the unlabeled image set.
  • the neural network is trained with the unlabeled image set and the labeled image set as the training data, and the label of the unlabeled image set can be determined based on the labeled image set, thereby reducing the labor for labeling the unlabeled image set. Cost, and improve the efficiency of labeling.
  • the neural network is trained with the labels of the labeled image set, unlabeled image set, and unlabeled image set, so that the neural network can learn the information of the second acquisition condition during the training process, so that the image processing obtained by training can be used In the process of processing the image to be processed, the neural network improves the accuracy of the processing result obtained.
  • the method further includes:
  • the first neural network to be trained is trained to obtain The image processing neural network.
  • the unlabeled image set is labeled based on the labeled image set, thereby saving labor costs and improving labeling efficiency.
  • the first neural network to be trained can learn the information of the acquisition conditions of the unlabeled image set during the training process, and obtain the image processing neural network. In this way, using the image processing neural network to process the image to be processed can improve the accuracy of the processing result.
  • the obtaining the label of the unlabeled image set based on the labeled image set includes:
  • the second neural network to be trained is used to process the unlabeled image set to obtain the label of the unlabeled image set.
  • the first neural network to be trained is used to process the unlabeled image set, and after the label of the unlabeled image set is obtained, the labeled image set and the unlabeled image set are used as the training data and the label of the unlabeled image set.
  • the second neural network to be trained is trained to increase the number of training cycles and improve the training effect, thereby improving the accuracy of the processing results obtained by the training image processing neural network processing the image to be processed Spend.
  • the labeled image set and the unlabeled image set are used as training data, and the label of the unlabeled image set is used as the supervision information of the unlabeled image set.
  • the training of the first neural network to be trained to obtain the image processing neural network includes:
  • the parameters of the second neural network to be trained are adjusted to obtain the image processing neural network.
  • the loss of the second neural network to be trained is obtained, and the parameters of the second neural network to be trained are adjusted based on the loss of the second neural network to be trained.
  • an image processing neural network is obtained.
  • both the label of the labeled image set and the label of the unlabeled image carry category information
  • the method further includes:
  • the training image set includes the labeled image set and the unlabeled image set
  • the category of the first image is the same as the category of the second image, and the The category of the first image is different from the category of the third image
  • the obtaining the loss of the second neural network to be trained based on the first difference and the second difference includes:
  • the loss of the second neural network to be trained is obtained.
  • the triplet loss is obtained according to the first similarity and the second similarity, and in the training process of the second neural network to be trained, the second neural network to be trained is determined according to the category loss and the triplet loss.
  • the loss of the network can enable the second neural network to be trained to improve its ability to distinguish image categories during the training process.
  • the first similarity is obtained by determining the similarity between the first image in the training image set and the second image in the training image set, and the determining the similarity between the training image set Before the similarity between the first image and the third image in the training image set obtains the second similarity, the method further includes:
  • the most difficult image in the class of the first image is determined as the second image, and the most difficult image outside the class of the first image is determined as the third image; the most difficult image in the class is between the in-class image set and the first image The image with the least similarity of the class; the most difficult image outside the class is the image with the greatest similarity between the class of the out-of-class image set and the first image; the set of intra-class images includes the label and the label of the first image The same image; the set of out-of-class images includes images with labels different from those of the first image.
  • the minimum similarity between similar images is larger than the maximum similarity between images of different classes, so that the similarity between any two images belonging to the same class is greater than that of any two images.
  • the similarity between images belonging to different categories is large.
  • the method further includes:
  • the second neural network to be trained is used to process the enhanced image set and the unlabeled image set to obtain the second result.
  • the number of images whose acquisition conditions are the same as the acquisition conditions of the unlabeled image set is increased, thereby improving the training effect of the second neural network to be trained.
  • the accuracy of the obtained processing result can be improved.
  • the data set enhancement processing includes at least one of the following: rotation processing, erasing processing, clipping processing, and blurring processing.
  • the acquisition condition of the image includes: parameters of the imaging device that acquires the image.
  • an image processing device in a second aspect, includes:
  • the acquiring part is configured to acquire the image to be processed
  • the processing part is configured to use an image processing neural network to process the image to be processed to obtain a processing result of the image to be processed;
  • the image processing neural network uses unlabeled image sets and labeled image sets as training data for training Obtained;
  • the acquisition condition of the unlabeled image set is the same as the acquisition condition of the image to be processed;
  • the acquisition condition of the labeled image set is different from the acquisition condition of the unlabeled image set.
  • the acquisition part is further configured to acquire the unlabeled image set, the labeled image set, and the first neural network to be trained;
  • the processing part is further configured to obtain a label of the unlabeled image set based on the labeled image set;
  • the device also includes:
  • the training part is configured to use the labeled image set and the unlabeled image set as training data, and the label of the unlabeled image set as the supervision information of the unlabeled image set, and perform the
  • the neural network is trained to obtain the image processing neural network.
  • processing part is further configured to:
  • the second neural network to be trained is used to process the unlabeled image set to obtain the label of the unlabeled image set.
  • processing part is further configured to:
  • the parameters of the second neural network to be trained are adjusted to obtain the image processing neural network.
  • both the label of the labeled image set and the label of the unlabeled image carry category information
  • the device further includes: a first determining part configured to determine the first in the training image set before the loss of the second neural network to be trained is obtained according to the first difference and the second difference
  • the similarity between the image and the second image in the training image set obtains the first similarity
  • the similarity between the first image in the training image set and the third image in the training image set is determined to obtain the second similarity.
  • the training image set includes the labeled image set and the unlabeled image set
  • the category of the first image is the same as the category of the second image, and the category of the first image is the same as that of the The category of the third image is different;
  • the second determining part is configured to obtain the triplet loss according to the difference between the first similarity and the second similarity
  • the processing part is also used to obtain category loss according to the first difference and the second difference;
  • the loss of the second neural network to be trained is obtained.
  • the device further includes:
  • the third determining part is configured to obtain the first similarity between the similarity between the first image in the determined training image set and the second image in the training image set, and the determining the first similarity in the training image set Before obtaining the second similarity between the first image and the third image in the training image set, determine the most difficult image within the class of the first image as the second image, and determine the most difficult image outside the class of the first image As the third image; the most difficult image in the class is the image with the smallest similarity between the in-class image set and the first image; the most difficult image outside the class is the first image in the out-of-class image set The image with the greatest degree of similarity between the images; the in-class image set includes images with the same label as the first image; the out-of-class image set includes images with different labels from the first image.
  • the device further includes:
  • the processing part is configured as:
  • the second neural network to be trained is used to process the enhanced image set and the unlabeled image set to obtain the second result.
  • the data set enhancement processing includes at least one of the following: rotation processing, erasing processing, clipping processing, and blurring processing.
  • the acquisition condition of the image includes: parameters of the imaging device that acquires the image.
  • a processor is provided, and the processor is configured to execute a method as described in the above first aspect and any one of its possible implementation manners.
  • an electronic device including: a processor, a sending device, an input device, an output device, and a memory, where the memory is used to store computer program code, and the computer program code includes computer instructions.
  • the electronic device executes the method according to the first aspect and any one of its possible implementation manners.
  • a computer-readable storage medium stores a computer program.
  • the computer program includes program instructions.
  • the processor executes the method as described in the first aspect and any one of its possible implementation manners.
  • a computer program product includes a computer program or instruction that, when the computer program or instruction runs on a computer, causes the computer to execute the first aspect and any of the foregoing.
  • the computer program product includes a computer program or instruction that, when the computer program or instruction runs on a computer, causes the computer to execute the first aspect and any of the foregoing.
  • FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the disclosure
  • FIG. 2 is a schematic flowchart of another image processing method provided by an embodiment of the disclosure.
  • FIG. 3 is a schematic structural diagram of an image processing device provided by an embodiment of the disclosure.
  • FIG. 4 is a schematic diagram of the hardware structure of an image processing device provided by an embodiment of the disclosure.
  • neural networks have been widely used in image processing in recent years to perform various tasks. For example, use neural networks to perform image classification tasks, and for example, use neural networks to perform image segmentation tasks.
  • the execution task is called application hereinafter, and the image processed by the neural network execution task is called application image.
  • the performance of the neural network in the application process largely depends on the training effect of the neural network. There are many factors that affect the training effect of the neural network.
  • the difference between the image quality of the training image and the image quality of the application image is many factors. one of the. Image quality includes: image resolution, image signal-to-noise ratio, and image clarity.
  • the difference between the image quality of the training image and the image quality of the application image includes at least one of the following: the difference between the resolution of the training image and the resolution of the application image, the signal-to-noise ratio of the training image and the resolution of the application image.
  • the difference between the sharpness of the training image and the sharpness of the application image is the difference between the acquisition condition of the training image and the acquisition condition of the application image.
  • the difference between the acquisition condition of the training image and the acquisition condition of the application image includes at least one of the following: an imaging device that acquires training images (hereinafter referred to as training imaging The difference between the parameters of the device) and the parameters of the imaging device that collects the application image (hereinafter referred to as the application imaging device), and the difference between the environment in which the training image is collected and the environment in which the application image is collected.
  • the difference between the parameters of the imaging device that collects the training image and the parameters of the imaging device that collects the application image includes the difference between the hardware configuration of the training imaging device and the hardware configuration of the application imaging device.
  • the resolution of the image collected by the training device is 1920 ⁇ 1080
  • the resolution of the image collected by the application device is 1280 ⁇ 1024.
  • the focal length range of the training device is 10 mm-22 mm
  • the focal length range of the applied imaging device is 18 mm-135 mm.
  • the environment in which the image is collected includes at least one of the following: the weather in which the image is collected, and the scene in which the image is collected.
  • the weather for collecting images can be cloudy
  • the weather for collecting images can also be rainy
  • the weather for collecting images can also be sunny.
  • the environment of an image collected on a rainy day is different from that of an image collected on a sunny day
  • an environment of an image collected on a cloudy day is different from an environment of an image collected on a sunny day.
  • the scene can be the interior of a car, the scene can also be a waiting hall, and the scene can also be a highway.
  • the scene where the image of the car interior is collected is different from the scene where the image of the waiting hall is collected.
  • the scene where the image of the terminal is collected is different.
  • Use the training images to train the neural network to obtain the trained neural network Use the trained neural network to perform tasks, that is, use the trained neural network to process the application image to obtain the processing result. For example, in the process of performing an image classification task, a trained neural network is used to process the application image to obtain the classification result. For another example, in the process of performing the image segmentation task, the trained neural network is used to process the application image to obtain the segmentation result.
  • the accuracy of the above processing results is low.
  • a surveillance camera in city A is used to collect images containing pedestrians on a cloudy day (hereinafter referred to as the images collected by area A), and the training images are obtained by labeling the identities of the pedestrians in the images collected by area A.
  • Use the training images to train the neural network a so that the trained neural network a can be used to identify the identity of the pedestrian in the image collected at A.
  • the trained neural network a it is necessary to use the trained neural network a to identify the identity of pedestrians in the images collected at B.
  • the images collected from B include: images collected on cloudy days and collected on sunny days
  • the brightness and clarity of the environment in the images collected in different weather are different. The difference in brightness and clarity of the environment affects the recognition accuracy of the neural network.
  • the trained neural network a for sunny or rainy days.
  • the identity of the pedestrian in the collected image is recognized, and the accuracy of the obtained recognition result is low.
  • the parameters of the surveillance camera in A and the parameters of the surveillance camera in B are not the same (such as the shooting angle of view, or resolution), which will also cause the training of the neural network a to B in the image collected
  • the identification accuracy of the pedestrian's identity is low.
  • the traditional method trains the neural network by using the image collected under the first acquisition condition in the application scenario as the training image.
  • this method needs to label the images collected under the first acquisition condition, and the number of training images of the neural network is large, the traditional method requires a lot of manpower costs, and the labeling efficiency is low.
  • the embodiments of the present disclosure provide a technical solution to improve the accuracy of the processing results obtained based on the neural network on the premise of reducing labor costs and improving labeling efficiency.
  • the execution subject of the embodiments of the present disclosure may be an image processing device, where the image processing device may be one of the following: a mobile phone, a computer, a server, and a tablet computer.
  • the embodiment of the present application may also implement the image processing method of the present application by executing computer code by a processor.
  • FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the present disclosure.
  • the image to be processed may contain any content.
  • the image to be processed may include a road.
  • the image to be processed may include roads and vehicles.
  • the image to be processed may include a person.
  • the present disclosure does not limit the content of the image to be processed.
  • the image processing apparatus receives the image to be processed input by the user through the input component.
  • the above-mentioned input components include: a keyboard, a mouse, a touch screen, a touch pad, and an audio input device.
  • the image processing apparatus receives the image to be processed sent by the first terminal.
  • the first terminal may be any of the following: a mobile phone, a computer, a tablet computer, a server, and a wearable device.
  • the image processing device may directly acquire the image to be processed through its own image acquisition component, such as a camera.
  • the image processing neural network is trained on the unlabeled image set and the labeled image set as training data, where the acquisition condition of the unlabeled image set and the acquisition condition of the image to be processed (hereinafter referred to as the second acquisition Conditions) are the same, and the collection conditions of the labeled image set (hereinafter referred to as the third collection condition) are different from the collection conditions of the unlabeled image set.
  • the images to be processed are the images collected in the waiting room, the images in the unlabeled image collection are also the images collected in the waiting room, and the images in the labeled image collection are not the images collected in the waiting room.
  • the image to be processed is the image collected by camera A, the images in the unlabeled image collection are also the images collected by camera A, and the images in the labeled image collection are the images collected by camera B.
  • the images in the marked image set all carry tags.
  • the image classification task performed by the image processing network is: judging which of the content contained in the image is apple, banana, pear, peach, orange, or watermelon, then the labels of the images in the marked image set include apples and bananas , Pear, peach, orange, watermelon.
  • the task performed by the image processing network is pedestrian re-identification, that is, to identify the identity of the person contained in the image, then the label of the image in the marked image set includes the identity of the person (such as Zhang San, Li Si, Wang Wu, Zhou Sixth class).
  • the task performed by the image processing network is to segment the pixel area covered by the person from the image, then the image tags in the marked image set include the person outline, and the pixel area surrounded by the person outline is the person covered Pixel area.
  • the label of the unlabeled image set can be determined based on the labeled image set, and then the labeled image set, unlabeled image set,
  • the label of the unlabeled image set trains the neural network, so that in the process of using the trained neural network to process the image acquired under the second acquisition condition, the accuracy of the obtained processing result can be improved.
  • the images in the labeled image set are all images collected in the waiting room, and the images to be processed and the images in the unlabeled image set are all images collected inside the car. Since the images in the unlabeled image set do not carry labels, the neural network cannot be trained using the unlabeled image set, and the environment in the waiting room is different from the environment inside the car (for example, the light in the waiting room is different from the light in the car , Another example: the objects in the waiting room are different from the objects inside the car), only use the labeled image set to train the neural network, it is impossible for the neural network to learn the information of the environment inside the car through training. In the process of processing the image to be processed by the neural network, the accuracy of the processing result obtained is low.
  • the labeled image set and the unlabeled image set are used as training data, and the label of the unlabeled image set can be determined based on the labeled image set, so that the labels of the unlabeled image set and the unlabeled image set can be used as the training data.
  • the training data trains the neural network, so that the neural network can learn information about the environment inside the car during the training process, thereby improving the accuracy of the processing result.
  • the neural network is trained using the labeled image set as training data to obtain the trained Neural Networks.
  • the labeled image set includes image a
  • the unlabeled image set includes image b, where the label of image a is A.
  • the first intermediate loss is obtained. Adjust the parameters of the neural network based on the first intermediate loss to obtain the trained neural network. Use the trained neural network to process the image b, and obtain the second processing result as the label of the image b.
  • feature extraction processing is performed on the labeled image set to obtain the first intermediate feature data set.
  • the first intermediate feature data set is used as the training data
  • the label of the labeled image set is used as the supervision information of the first intermediate feature data set
  • the support vector machine (SVM) is trained to obtain the trained SVM.
  • Use the trained SVM to process the second intermediate feature data set to obtain the label of the second intermediate feature data set as the label of the unlabeled image set.
  • the labeled image set includes image a and image b
  • the unlabeled image set includes image c.
  • the label of image a is A and the label of image b is B.
  • clustering is performed on the unlabeled image set to obtain at least one cluster, where each Each cluster contains at least one image.
  • the labeled image set is divided based on the label to obtain at least one image set, where each image set includes at least one image, and the labels of the images in each image set are the same.
  • the label of the image set with the greatest similarity is regarded as the label of the cluster, that is, the label of the data in the cluster.
  • the labeled image set includes image a, image b, and image c
  • the unlabeled image set includes image d, image e, and image f.
  • the label of image a and the label of image b are both A
  • the label is B.
  • Determine the similarity between the first cluster and the first image set as s 1 determine the similarity between the first cluster and the second image set as s 2 , and determine the similarity between the second cluster and the first image set Is s 3 , and the similarity between the second cluster and the second image set is determined to be s 4 .
  • the maximum similarity set of the first cluster is the first image set, and the label of the first cluster is A, it can be determined that the label of image d and the label of image e are both A.
  • the maximum similarity set of the first cluster is the second image set, and the label of the first cluster is B, it can be determined that the label of image d and the label of image e are both B.
  • the maximum similarity set of the second cluster is the first image set, and the label of the second cluster is A, and the label of image f is determined to be A.
  • the maximum similarity set of the second cluster is the second image set, and the label of the second cluster is B, and the label of image f is determined to be B.
  • the similarity between each image in the first cluster and each image in the first image cluster is determined separately, Get the similarity set.
  • the maximum value in the similarity set is taken as the similarity between the first cluster and the first image set.
  • the similarity between each image in the first cluster and each image in the first image cluster is determined separately, Get the similarity set.
  • the minimum or average value in the similarity set is taken as the similarity between the first cluster and the first image set.
  • the similarity between the first cluster and the second image set, and the similarity between the second cluster and the first image set can be determined by the implementation of determining the similarity between the first cluster and the first image set , The similarity between the second cluster and the second image set.
  • the neural network is trained with the unlabeled image set and the labeled image set as training data, and the label of the unlabeled image set can be determined based on the labeled image set, thereby reducing the cost of labeling the unlabeled image set. Labor costs and improve labeling efficiency.
  • the neural network is trained with the labels of the labeled image set, unlabeled image set, and unlabeled image set, so that the neural network can learn the information of the second acquisition condition during the training process, so that the image processing obtained by training can be used In the process of processing the image to be processed, the neural network improves the accuracy of the processing result obtained.
  • FIG. 2 is a schematic flowchart of an image processing neural network training method provided by an embodiment of the present disclosure.
  • the execution subject of this embodiment may be an image processing device or not an image device. That is, the execution subject of the training method of the image processing neural network and the execution subject of the image to be processed using the image processing network may be the same or different.
  • the disclosed embodiment does not limit the execution subject of this embodiment.
  • the executive body of this embodiment is referred to as a training device below, where the training device may be any of the following: a mobile phone, a computer, a tablet computer, a server, and a processor.
  • the method of obtaining the unlabeled image set by the training device please refer to the method of obtaining the unlabeled image set by the image processing device in step 101.
  • the method of obtaining the labeled image set by the training device please refer to the method of obtaining the labeled image set by the image processing device in step 101. The implementation of the image set will not be repeated here.
  • the first neural network to be trained is any neural network.
  • the first neural network to be trained may be composed of a stack of at least one network layer among a convolutional layer, a pooling layer, a normalization layer, a fully connected layer, a down-sampling layer, an up-sampling layer, and a classifier.
  • the embodiment of the present disclosure does not limit the structure of the first neural network to be trained.
  • the training device receives the first neural network to be trained input by the user through the input component.
  • the above-mentioned input components include: a keyboard, a mouse, a touch screen, a touch pad, and an audio input device.
  • the training device receives the first neural network to be trained sent by the second terminal.
  • the above-mentioned second terminal may be any one of the following: a mobile phone, a computer, a tablet computer, a server, and a wearable device.
  • the training device may obtain the pre-stored first neural network to be trained from its own storage component.
  • step 102 based on the labeled image set, the label of the unlabeled image set can be obtained.
  • This step adopts the first implementation method in step 102, and uses the labeled image set as training data to train the first neural network to be trained to obtain the second neural network to be trained. Use the second neural network to be trained to process the unlabeled image set to obtain the label of the unlabeled image set.
  • the unlabeled image set After the label of the unlabeled image set is obtained, the unlabeled image set can be used as training data to train the first neural network to be trained.
  • the factors affecting the training effect of the neural network also include the amount of training data, here, the more the number of training data, the better the training effect of the neural network. Therefore, in the embodiment of the present disclosure, in the process of training the first neural network to be trained, the labeled image set and the unlabeled image set are used as training data, and the label of the unlabeled image set is used as the supervision information of the unlabeled image set. , To train the first neural network to be trained to improve the training effect. In this way, in the process of using the trained image processing neural network to process the image to be processed, the accuracy of the obtained processing result can be improved.
  • the labeled image set includes image a
  • the unlabeled image set includes image b
  • the label of image a is A
  • the label of image b is determined to be B after the processing in step 202.
  • Use the first neural network to be trained to process the image a to obtain the first intermediate result. Determine the difference between the first intermediate result and A, and get the first intermediate difference.
  • the loss of the first neural network to be trained is determined, and the parameters of the first neural network to be trained are adjusted based on the loss of the first neural network to be trained to obtain the third neural network to be trained.
  • Use the third neural network to be trained to process the image b to obtain the second intermediate result. Determine the difference between the first intermediate result and B, and get the second intermediate difference.
  • the loss of the third neural network to be trained is determined, and the parameters of the third neural network to be trained are adjusted based on the loss of the third neural network to be trained to obtain an image processing neural network.
  • the first neural network to be trained is used to process the unlabeled image set, and after the label of the unlabeled image set is obtained, the labeled image set and the unlabeled image set are used as training data, and the labels of the unlabeled image set are used as unlabeled images.
  • the second neural network to be trained is trained to increase the number of training cycles and improve the training effect, thereby improving the accuracy of the processing result obtained by processing the image to be processed by the trained image processing neural network.
  • the neural network processes all the training data, that is, completes a training cycle.
  • the training data includes image a and image b.
  • the neural network processes image a to obtain the result of image a.
  • the loss of the neural network is obtained, and based on the loss of the neural network, the parameters of the neural network are adjusted to obtain the neural network after the first iteration.
  • the neural network after the first iteration processes image b to obtain the result of image b.
  • the loss of the neural network after the first iteration is obtained, and based on the loss of the neural network after the first iteration, the parameters of the neural network after the first iteration are adjusted to obtain the first iteration.
  • Neural network after the second iteration In the third iteration, the neural network after the second iteration processes image a to obtain the result of image a.
  • the loss of the neural network after the second iteration is obtained, and based on the loss of the neural network after the second iteration, the parameters of the neural network after the second iteration are adjusted to obtain the first Neural network after three iterations.
  • the first training cycle includes the first iteration and the second iteration, and the third iteration belongs to the second training cycle.
  • the second neural network to be trained is used to process the labeled image set to obtain the first result
  • the second neural network to be trained is used to process the unlabeled image set to obtain the second result.
  • the first difference is obtained according to the difference between the labels of the first result and the labeled image set
  • the second difference is obtained according to the difference between the second result and the labels of the unlabeled image set.
  • the loss of the second neural network to be trained is obtained. Since the second neural network to be trained is obtained by training the first neural network to be trained using the labeled image set, that is to say, the number of training cycles of the second neural network to be trained is greater than that of the first neural network to be trained.
  • the number of cycles is large. Therefore, after obtaining the label of the unlabeled image set, using the labeled image set and the unlabeled image set as the training data and the supervision information of the unlabeled image set to train the second neural network to be trained is more effective than using the labeled image
  • the training data and the unlabeled image set are used as training data, and the supervision information of the unlabeled image set has a good effect on the training of the first neural network to be trained.
  • the first iteration loss of the second neural network to be trained is determined based on the first difference, and based on the first iteration The loss adjusts the parameters of the second neural network to be trained to obtain the second neural network to be trained after the first iteration.
  • the second iteration loss of the second neural network to be trained is determined, and the parameters of the second neural network to be trained after the first iteration are adjusted based on the second iteration loss to obtain the image processing neural network.
  • the first difference and the second difference may be weighted and summed, or a constant may be added after the weighted summation, etc.
  • the loss of the second neural network to be trained is obtained. For example, if the acquisition conditions of the unlabeled image set are the same as the acquisition conditions of the image to be processed, the weight of the second difference can be made larger than the weight of the first difference, so that the image processing neural network can learn more of the second acquisition conditions through training. Information, so that in the process of using the trained neural network to process the image to be processed, the accuracy of the obtained processing result can be improved.
  • the similarity between images of the same category should be greater than the similarity between images of different categories, if in the process of image classification, the labels of the two images with lower similarity are determined to be the same, and the similarity The labels of the two images with high degrees are determined to be different, which will reduce the accuracy of the processing results.
  • the similarity between image a and image b is s 1
  • the similarity between image a and image c is s 2
  • s 1 is less than s 2 .
  • the neural network is in the process of processing image a, image b, and image c, the label of image a and the label of image b are determined to be the same, and the label of image a and the label of image c are determined to be different, the result is The processing result is wrong.
  • the following steps may be performed before the step of "obtaining the loss of the second neural network to be trained based on the first difference and the second difference":
  • the training image set includes a labeled image set and an unlabeled image set.
  • the label of the first image is the same as the label of the second image, that is, the category of the first image is the same as the category of the second image.
  • the label of the first image is different from the label of the third image, that is, the category of the first image is different from the category of the third image.
  • the similarity between the first image and the second image is determined as the first similarity.
  • the similarity between the first image and the second image is determined as the second similarity.
  • the similarity between two images may be one of the following: euclidean distance between the two images, cosine similarity between the two images, and The Mahalanobis distance, the Pearson correlation coefficient between the two images, and the Hamming distance between the two images.
  • the first similarity is the similarity between images of the same type
  • the second similarity is the similarity between different types of images
  • the first similarity should be greater than the second similarity. Therefore, the triplet loss can be obtained based on the difference between the first similarity and the second similarity.
  • the first similarity is s 1
  • the second similarity is s 2
  • the triple loss is L t , s 1 , s 2
  • L t satisfies the following formula:
  • n is a positive number.
  • the first similarity is s 1
  • the second similarity is s 2
  • the triplet loss is L t , s 1 , s 2
  • L t satisfies the following formula:
  • k and n are both positive numbers.
  • the first similarity is s 1
  • the second similarity is s 2
  • the triplet loss is L t , s 1 , s 2
  • L t satisfies the following formula:
  • k and n are both positive numbers.
  • the step "obtain the loss of the second neural network to be trained based on the first difference and the second difference" includes the following steps:
  • the category loss is obtained.
  • step 203 For the implementation process of this step, please refer to the implementation process of "obtain the loss of the second neural network to be trained based on the first difference and the second difference" in step 203. It should be understood that in this step, based on the first difference and the second difference, the loss obtained is not the loss of the second neural network to be trained, but the category loss.
  • the loss of the second neural network to be trained is obtained.
  • L, L c and L t satisfy the following formula:
  • k 1 and k 2 are both positive numbers less than or equal to 1.
  • L, L c and L t satisfy the following formula:
  • k 1 and k 2 are both positive numbers less than or equal to 1.
  • L, L c and L t satisfy the following formula:
  • k 1 and k 2 are both positive numbers less than or equal to 1.
  • the first similarity is determined by the first image and the second image
  • the second similarity is determined by the first image and the third image
  • the first degree of similarity is greater than the second degree of similarity, and there may be errors.
  • the training image set includes image a, image b, image c, image d, image e, where the category of image a, the category of image b, and the category of image e are all A, the category of image c and the category of image d Both are B.
  • the similarity between image a and image b is s 1
  • the similarity between image a and image c is s 2
  • the similarity between image a and image d is s 3
  • the similarity between image a and image e The similarity is s 4 .
  • image a is the first image
  • image b is the second image
  • image c is the third image
  • s 1 is the first similarity degree
  • s 2 is the second similarity degree.
  • the embodiments of the present disclosure provide an implementation manner for determining the first image, the second image, and the third image, so as to reduce the probability of occurrence of the above-mentioned error, and thereby improve the accuracy of the processing result.
  • step 21 the following steps may be performed:
  • the most difficult image within the class of the first image is determined as the second image, and the most difficult image outside the class of the first image is determined as the third image.
  • the most difficult image pair within the class is the two images with the smallest similarity among images with the same label
  • the most difficult image pair outside the class is the two images with the greatest similarity among the images with different labels.
  • image b is called the most difficult image in the class of image a
  • image a is called the most difficult image in the class of image b.
  • image c is called the most difficult image outside the class of image d
  • image c is called the most difficult image outside the class of image d.
  • the category of image 1 is different from the category of image 4, and the category of image 5 is different.
  • the similarity ratio between image 1 and image 2 is The similarity between image 1 and image 3 is small, and the similarity between image 1 and image 4 is smaller than the similarity between image 1 and image 5.
  • Image 2 is the most difficult image in the class of image 1.
  • 5 is the most difficult image outside the category of image 1, that is, image 2 is the second image, and image 5 is the third image.
  • the first similarity is determined based on the first image and the second image, and the first image and the The third image determines the second similarity, and determines the loss of the second neural network to be trained based on the difference between the first similarity and the second similarity, so that the second neural network to be trained can improve the image quality during the training process.
  • the unlabeled image set before inputting the unlabeled image set to the second neural network to be trained, the unlabeled image set may be subjected to data enhancement processing to obtain the enhanced image set, and the enhanced image set and the unlabeled image set may be obtained.
  • the labeled image set is used as training data to train the second neural network to be trained. In this way, the effect of expanding the training data of the second neural network to be trained can be achieved.
  • the result of processing the unlabeled image set and the enhanced image set by the second neural network to be trained will be used as the second result and can be based on the second result.
  • the difference between the result and the label of the unlabeled image set gives the second difference.
  • the unlabeled image set includes image a and image b, the label of image a is A, and the label of image b is B.
  • the second neural network to be trained to process the unlabeled image set and the enhanced image set, and the second result obtained includes result a, result b, result c, and result d, where the result a uses the second neural network to be trained
  • the network processes the image a, the result b is obtained by processing the image b using the second neural network to be trained, the result c is obtained by processing the image c using the second neural network to be trained, and the result d is obtained by using the second neural network to be trained.
  • the neural network processes the image d.
  • the aforementioned data set enhancement processing includes at least one of the following: rotation processing, erasing processing, clipping processing, and blurring processing.
  • Rotating the image is to take the geometric center point of the image as the center of rotation, and the reference angle is the rotation angle to rotate the image, where the reference angle can be adjusted according to the needs of the user.
  • Erasing the image can remove the image content in any pixel area in the image. For example, adjust the pixel value in the pixel area to 0.
  • Clipping the image is to cut out an image of a predetermined size from the image, where the predetermined size can be adjusted according to the needs of the user. By blurring the image, at least part of the content in the image can be blurred.
  • the image set collected under the second acquisition condition is annotated based on the image set acquired under the first acquisition condition, thereby saving labor costs and improving annotation efficiency.
  • the image is processed, and the accuracy of the processing result obtained is high.
  • an adaptive image processing neural network that can be obtained based on the technical solutions provided by the embodiments of the present disclosure, wherein the image processing neural network adapted to the acquisition conditions refers to the image processing neural network acquired under the acquisition conditions The image is processed, and the accuracy of the processing result is high.
  • the embodiments of the present disclosure also provide several possible application scenarios.
  • Scenario 1 With the strengthening of security management awareness of governments, enterprises, and individuals and the popularization of smart hardware devices, more and more access control devices with face recognition functions are put into practical applications.
  • the access control device collects the face image of the visitor through a camera as the image to be identified, and uses a neural network to process the image to be identified to determine the identity of the visitor.
  • the acquisition conditions of the access control device when acquiring the image to be recognized are different. Therefore, how to effectively improve the recognition accuracy of access control equipment in different application scenarios is of great significance.
  • the gate of company A is equipped with access control equipment a and has been used for a period of time.
  • the acquisition conditions when the access control device a collects the image to be identified are different from the acquisition conditions when the access control device b collects the image to be identified.
  • the different acquisition conditions will result in low recognition accuracy of the image processing neural network.
  • the access control device a uses the first neural network
  • the first neural network is obtained by training the face images of the employees of company A (hereinafter referred to as image set c) collected by the access control device a.
  • image set c all carry tags, and the tags include the identities of the people in the images (such as Zhang San, Li Si, and Wang Wu).
  • the recognition accuracy obtained by applying the first neural network to the access control device b is low.
  • the management personnel of A company can use the access control device b to collect the face images of the employees of company A (hereinafter referred to as image set d), and use the first neural network to analyze the image set d After processing, the label of the image set d is obtained.
  • the first neural network is trained to obtain the second neural network. Deploying the second neural network on the access control device b can improve the recognition accuracy of the access control device b.
  • Scenario 2 With the rapid increase in the number of cameras in public places, how to effectively determine the attributes of the characters in the video stream through massive video streams, and determine the whereabouts of the characters based on the attributes of the characters is of great significance.
  • the server can obtain the video stream collected by surveillance camera e (hereinafter referred to as the first video) through this communication connection Stream), and use the third neural network to process the images in the first video stream to obtain the attributes of the characters in the first video stream, where the third neural network collects images containing the characters through the surveillance camera e (below It will be called the image set f) obtained through training.
  • the images in the image set f carry tags.
  • the tags include the attributes of the person.
  • the attributes include at least one of the following: top color, pants color, pants length, hat Style, shoe color, umbrella type, luggage category, presence or absence of masks, hairstyle, gender.
  • the first video stream includes image g and image h.
  • Use the third neural network to process the first video stream, and determine the attributes of the person in image g include: white shirt, black pants, no glasses, short hair, and woman, and determine the attributes of the person in image h include: white shirt, Black pants, white shoes, wearing glasses, wearing a mask, holding an umbrella, short hair, a man.
  • Relevant law enforcement officers in place B installed a new surveillance camera i at the intersection to obtain the attributes of pedestrians at the intersection. Since the environment in the waiting room is different from the environment at the intersection, if the third neural network is used to process the surveillance camera i, the accuracy of the pedestrian attributes obtained is low.
  • relevant law enforcement officers in place B can collect images containing pedestrians through surveillance camera i (hereinafter referred to as image set j), and use a third neural network to process image set j to obtain The label of image set j.
  • image set j images containing pedestrians through surveillance camera i
  • the third neural network is trained to obtain the fourth neural network.
  • the fourth neural network to process the second video stream collected by the surveillance camera i can improve the accuracy of the attributes of pedestrians in the second video stream obtained.
  • Scenario 3 As there are more and more vehicles on the road, how to prevent road traffic accidents is getting more and more attention. Among them, human factors account for a large proportion of the causes of road traffic accidents, including the driver's inattention Distracted driving caused by, loss of attention and other reasons. Therefore, how to effectively monitor whether the driver is distracted driving is of great significance.
  • the vehicle-mounted terminal collects an image containing the driver's face from a camera installed on the vehicle, and uses a neural network to process the image containing the driver's face, so as to determine whether the driver is distracted driving.
  • Company C is the provider of driver attention monitoring solutions.
  • Company C uses the image of the driver’s face collected by the camera on the model k of Company D (hereinafter referred to as image set m) to perform the evaluation on the fifth neural network. Train to get the sixth neural network.
  • image set m all carry tags, and the tags include whether the driver is distracted driving or the driver is not driving.
  • the sixth neural network is deployed in model k, and the vehicle-mounted terminal of model k can use the sixth neural network to determine whether the driver is distracted driving.
  • Company D has produced a new model (hereinafter referred to as model n), and hopes that Company C will provide driver attention monitoring solutions for model n. Since the camera installed on vehicle k (hereinafter referred to as camera p) is different from the camera installed on vehicle n, and the internal environment of vehicle k is different from that of vehicle n, if the sixth neural network is deployed on vehicle n, The monitoring results obtained by the sixth neural network (including the driver's distracted driving or the driver's undistracted driving) have low accuracy.
  • the staff of Company C can collect images containing the driver’s face through the camera p (hereinafter referred to as image set q), and use the sixth neural network to process the image set q , Get the label of the image set q.
  • image set q images containing the driver’s face through the camera p
  • the sixth neural network uses the sixth neural network to process the image set q , Get the label of the image set q.
  • the sixth neural network is trained to obtain the seventh neural network.
  • the seventh neural network is deployed on model n, and the vehicle of model n finally uses the seventh neural network to determine whether the driver is distracted driving, and the monitoring results obtained are highly accurate.
  • the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possibility.
  • the inner logic is determined.
  • FIG. 3 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present disclosure.
  • the apparatus 1 includes: an acquiring part 11 and a processing part 12, wherein:
  • the obtaining part 11 is configured to obtain an image to be processed
  • the processing part 12 is configured to use an image processing neural network to process the image to be processed to obtain a processing result of the image to be processed;
  • the image processing neural network uses an unlabeled image set and an labeled image set as training data Obtained by training; the acquisition conditions of the unlabeled image set are the same as the acquisition conditions of the image to be processed; the acquisition conditions of the labeled image set are different from the acquisition conditions of the unlabeled image set.
  • the acquisition part 11 is further configured to acquire the unlabeled image set, the labeled image set, and the first neural network to be trained;
  • the processing part 12 is further configured to obtain a label of the unlabeled image set based on the labeled image set;
  • the device 1 further includes:
  • the training part 13 is configured to use the labeled image set and the unlabeled image set as training data, and the label of the unlabeled image set as the supervision information of the unlabeled image set, and perform the The training neural network is trained to obtain the image processing neural network.
  • processing part 12 is further configured to:
  • the second neural network to be trained is used to process the unlabeled image set to obtain the label of the unlabeled image set.
  • processing part 12 is further configured to:
  • the parameters of the second neural network to be trained are adjusted to obtain the image processing neural network.
  • both the label of the labeled image set and the label of the unlabeled image carry category information
  • the device 1 further includes: a first determining part 14 configured to determine the loss of the second neural network to be trained based on the first difference and the second difference
  • the similarity between the first image and the second image in the training image set obtains the first similarity, and the similarity between the first image in the training image set and the third image in the training image set is determined to obtain Second similarity
  • the training image set includes the labeled image set and the unlabeled image set
  • the category of the first image is the same as the category of the second image, and the category of the first image Different from the category of the third image;
  • the second determining part 15 is configured to obtain the triplet loss according to the difference between the first similarity and the second similarity
  • the processing part 12 is further configured to obtain a category loss according to the first difference and the second difference;
  • the loss of the second neural network to be trained is obtained.
  • the device 1 further includes:
  • the third determining part 16 is configured to obtain a first similarity based on the similarity between the first image in the determined training image set and the second image in the training image set, and the determining the training image set Before obtaining the second degree of similarity between the first image in the first image and the third image in the training image set, determine the most difficult image within the class of the first image as the second image, and determine the most difficult out-of-class image of the first image
  • the image is used as the third image;
  • the most difficult image in the class is the image with the smallest similarity between the in-class image set and the first image;
  • the most difficult image outside the class is the image in the out-of-class image set and the first image
  • the in-class image set includes images with the same label as the first image;
  • the out-of-class image set includes images with different labels from the first image.
  • the device 1 further includes:
  • the data enhancement processing part 17 is configured to perform data enhancement processing on the unlabeled image set before the second result is obtained by processing the unlabeled image set using the second neural network to be trained to obtain enhancement After the image set;
  • the processing part 12 is configured to:
  • the second neural network to be trained is used to process the enhanced image set and the unlabeled image set to obtain the second result.
  • the data set enhancement processing includes at least one of the following: rotation processing, erasing processing, clipping processing, and blurring processing.
  • the acquisition condition of the image includes: parameters of the imaging device that acquires the image.
  • parts may be parts of circuits, parts of processors, parts of programs or software, etc., of course, may also be units, modules, or non-modular.
  • the neural network is trained with the unlabeled image set and the labeled image set as training data, and the label of the unlabeled image set can be determined based on the labeled image set, thereby reducing the need for labeling the labeled image set. Labor costs and improve labeling efficiency.
  • the neural network is trained with the labels of the labeled image set, unlabeled image set, and unlabeled image set, so that the neural network can learn the information of the second acquisition condition during the training process, so that the image processing obtained by training can be used In the process of processing the image to be processed, the neural network improves the accuracy of the processing result obtained.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • FIG. 4 is a schematic diagram of the hardware structure of an image processing device provided by an embodiment of the disclosure.
  • the image processing device 2 includes a processor 21, a memory 22, an input device 23, and an output device 24.
  • the processor 21, the memory 22, the input device 23, and the output device 24 are coupled through a connector, and the connector includes various interfaces, transmission lines, or buses, etc., which are not limited in the embodiment of the present disclosure.
  • coupling refers to mutual connection in a specific manner, including direct connection or indirect connection through other devices, for example, can be connected through various interfaces, transmission lines, buses, and the like.
  • the processor 21 may be one or more graphics processing units (GPUs). When the processor 21 is a GPU, the GPU may be a single-core GPU or a multi-core GPU. In some embodiments, the processor 21 may be a processor group composed of multiple GPUs, and the multiple processors are coupled to each other through one or more buses. In some embodiments, the processor may also be other types of processors, etc., which are not limited in the embodiments of the present disclosure.
  • GPUs graphics processing units
  • the memory 22 may be used to store computer program instructions and various types of computer program codes including program codes used to execute the solutions of the present disclosure.
  • the memory includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) ), or a portable read-only memory (compact disc read-only memory, CD-ROM), which is used for related instructions and data.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • CD-ROM compact disc read-only memory
  • the input device 23 is used to input data and/or signals
  • the output device 24 is used to output data and/or signals.
  • the input device 23 and the output device 24 may be independent devices or a whole device.
  • the memory 22 can be used not only to store related instructions, but also to store related data.
  • the memory 22 can be used to store images to be processed obtained through the input device 23, or the memory 22 can also be used to store images to be processed.
  • the processing results obtained by the processor 21 are stored, and the embodiment of the present disclosure does not limit the data stored in the memory.
  • FIG. 4 shows a simplified design of an image processing device.
  • the image processing device may also include other necessary components, including but not limited to any number of input/output devices, processors, memories, etc., and all image processing devices that can implement the embodiments of the present disclosure are in this Within the scope of public protection.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium.
  • the computer instructions can be sent from a website, computer, server, or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (such as infrared, wireless, microwave, etc.) Another website site, computer, server or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a digital versatile disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)) )Wait.
  • a magnetic medium for example, a floppy disk, a hard disk, and a magnetic tape
  • an optical medium for example, a digital versatile disc (DVD)
  • DVD digital versatile disc
  • SSD solid state disk
  • the process can be completed by a computer program instructing relevant hardware.
  • the program can be stored in a computer readable storage medium. , May include the processes of the above-mentioned method embodiments.
  • the aforementioned storage media include: read-only memory (ROM) or random access memory (RAM), magnetic disks or optical disks and other media that can store program codes.
  • the neural network is trained with the unlabeled image set and the labeled image set as training data, and the label of the unlabeled image set can be determined based on the labeled image set, thereby reducing the cost of labeling the unlabeled image set. Labor costs and improve labeling efficiency.
  • the neural network is trained with the labels of the labeled image set, unlabeled image set, and unlabeled image set, so that the neural network can learn the information of the second acquisition condition during the training process, so that the image processing obtained by training can be used In the process of processing the image to be processed, the neural network improves the accuracy of the processing result obtained.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

提供了一种图像处理方法及装置、处理器、电子设备、存储介质。图像处理方法包括:获取待处理图像(101);使用图像处理神经网络对待处理图像进行处理,得到待处理图像的处理结果(102);图像处理神经网络以未标注图像集和已标注图像集为训练数据训练得到;未标注图像集的采集条件与待处理图像的采集条件相同;已标注图像集的采集条件与未标注图像集的采集条件不同。

Description

姿态检测及视频处理方法、装置、电子设备和存储介质
相关申请的交叉引用
本公开基于申请号为202010264926.7、申请日为2020年04月07日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。
技术领域
本公开涉及计算机技术领域,尤其涉及一种图像处理及装置、处理器、电子设备、存储介质。
背景技术
得益于强大的性能,近几年神经网络被广泛应用于图像处理领域。在使用神经网络之前,需要对神经网络进行训练。传统方法中,使用训练数据对神经网络进行训练得到训练好的神经网络,并将已训练好的神经网络应用于不同的应用场景。然而,基于一个应用场景的训练数据训练好的神经网络应用到其他应用场景中时,得到的处理结果准确度较低。
发明内容
本公开提供一种图像处理及装置、处理器、电子设备、存储介质。
第一方面,提供了一种图像处理方法,所述方法包括:
获取待处理图像;
使用图像处理神经网络对所述待处理图像进行处理,得到所述待处理图像的处理结果;所述图像处理神经网络以未标注图像集和已标注图像集为训练数据训练得到;所述未标注图像集的采集条件与所述待处理图像的采集条件相同;所述已标注图像集的采集条件与所述未标注图像集的采集条件不同。
在该方面中,以未标注图像集和已标注图像集为训练数据对神经网络进行训练,可基于已标注图像集确定未标注图像集的标签,从而可降低对未标注图像集进行标注的人力成本,并提高标注效率。在使用已标注图像集、未标注图像集、未标注图像集的标签对神经网络进行训练,可使神经网络在训练过程中学习到第二采集条件的信息,从而可在使用训练得到的图像处理神经网络对待处理图像进行处理的过程中,提高得到的处理结果的准确度。
结合本公开任一实施方式,所述方法还包括:
获取所述未标注图像集、所述已标注图像集以及第一待训练神经网络;
基于所述已标注图像集,得到所述未标注图像集的标签;
将所述已标注图像集和所述未标注图像集作为训练数据、所述未标注图像集的标签作为所述未标注图像集的监督信息,对所述第一待训练神经网络进行训练,得到所述图像处理神经网络。
在该实施方式中,基于已标注图像集对未标注图像集进行标注,从而可节省人力成本,并提高标注效率。使用已标注图像集和未标注图像集对第一待训练神经网络进行训练,可 使第一待训练神经网络在训练过程学习到未标注图像集的采集条件的信息,得到图像处理神经网络。这样,使用图像处理神经网络对待处理图像进行处理,可提高处理结果的准确度。
结合本公开任一实施方式,所述基于所述已标注图像集,得到所述未标注图像集的标签,包括:
将所述已标注图像集作为训练数据,对所述第一待训练神经网络进行训练,得到第二待训练神经网络;
使用所述第二待训练神经网络对所述未标注图像集进行处理,得到所述未标注图像集的标签。
在该实施方式中,使用第一待训练神经网络对未标注图像集进行处理,得到未标注图像集的标签后,将已标注图像集和未标注图像集作为训练数据、未标注图像集的标签作为未标注图像集的监督信息,对第二待训练神经网络进行训练,以增大训练周期数,提升训练效果,从而提高训练得到的图像处理神经网络对待处理图像进行处理得到的处理结果的准确度。
结合本公开任一实施方式,所述将所述已标注图像集和所述未标注图像集作为训练数据、所述未标注图像集的标签作为所述未标注图像集的监督信息,对所述第一待训练神经网络进行训练,得到所述图像处理神经网络,包括:
使用所述第二待训练神经网络对所述已标注图像集进行处理得到第一结果,使用所述第二待训练神经网络对所述未标注图像集进行处理得到第二结果;
依据所述第一结果与所述已标注图像集的标签之间的差异得到第一差异,依据所述第二结果与所述未标注图像集的标签之间的差异得到第二差异;
依据所述第一差异和所述第二差异,得到所述第二待训练神经网络的损失;
基于所述第二待训练神经网络的损失,调整所述第二待训练神经网络的参数,得到所述图像处理神经网络。
在该实施方式中,依据第一差异和第二差异,得到第二待训练神经网络的损失,并基于第二待训练神经网络的损失调整第二待训练神经网络的参数,可完成对第二待训练神经网络的训练,得到图像处理神经网络。
结合本公开任一实施方式,所述已标注图像集的标签和所述未标注图像的标签均携带类别信息;
在所述依据所述第一差异和所述第二差异,得到所述第二待训练神经网络的损失之前,所述方法还包括:
确定训练图像集中的第一图像与所述训练图像集中的第二图像之间的相似度得到第一相似度,确定所述训练图像集中的第一图像与所述训练图像集中的第三图像之间的相似度得到第二相似度;所述训练图像集包括所述已标注图像集和所述未标注图像集;所述第一图像的类别与所述第二图像的类别相同,且所述第一图像的类别与所述第三图像的类别不同;
依据所述第一相似度和所述第二相似度之间的差异,得到三元组损失;
所述依据所述第一差异和所述第二差异,得到所述第二待训练神经网络的损失,包括:
依据所述第一差异和所述第二差异,得到类别损失;
依据所述类别损失和所述三元组损失,得到所述第二待训练神经网络的损失。
在该实施方式中,依据第一相似度和第二相似度得到三元组损失,并在对第二待训练神经网络的训练过程中,依据类别损失和三元组损失确定第二待训练神经网络的损失,可使第二待训练神经网络在训练过程中,提高对图像的类别的区分能力。
结合本公开任一实施方式,在所述确定训练图像集中的第一图像与所述训练图像集中的第二图像之间的相似度得到第一相似度,以及所述确定所述训练图像集中的第一图像与所述训练图像集中的第三图像之间的相似度得到第二相似度之前,所述方法还包括:
确定第一图像的类内最难图像作为第二图像,确定第一图像的类外最难图像作为第三图像;所述类内最难图像为类内图像集中与所述第一图像之间的相似度最小的图像;所述类外最难图像为类外图像集中与所述第一图像之间的相似度最大的图像;所述类内图像集包括标签与所述第一图像的标签相同的图像;所述类外图像集包括标签与所述第一图像的标签不同的图像。
通过上述方式,同类的图像之间的相似度最小值比不同类的图像之间的相似度最大值都大,从而使得任意两张属于同一类的图像之间的相似度都要比任意两张属于不同类的图像之间的相似度大。
结合本公开任一实施方式,在所述使用所述第二待训练神经网络对所述未标注图像集进行处理得到第二结果之前,所述方法还包括:
对所述未标注图像集进行数据增强处理,得到增强后的图像集;
所述使用所述第二待训练神经网络对所述未标注图像集进行处理得到第二结果,包括:
使用所述第二待训练神经网络对所述增强后的图像集和所述未标注图像集进行处理,得到所述第二结果。
在该实施方式中,通过对未标注图像集进行数据增强处理,增加采集条件与未标注图像集的采集条件相同的图像的数量,进而提升第二待训练神经网络的训练效果。这样,在使用训练得到的图像处理神经网络对待处理图像进行处理的过程中,可提高得到的处理结果的准确度。
结合本公开任一实施方式,所述数据集增强处理包括以下至少一种:旋转处理、擦除处理、剪裁处理、模糊处理。
结合本公开任一实施方式,图像的所述采集条件包括:采集所述图像的成像设备的参数。
第二方面,提供了一种图像处理装置,所述装置包括:
获取部分,被配置为获取待处理图像;
处理部分,被配置为使用图像处理神经网络对所述待处理图像进行处理,得到所述待处理图像的处理结果;所述图像处理神经网络以未标注图像集和已标注图像集为训练数据训练得到;所述未标注图像集的采集条件与所述待处理图像的采集条件相同;所述已标注图像集的采集条件与所述未标注图像集的采集条件不同。
结合本公开任一实施方式,所述获取部分,还被配置为获取所述未标注图像集、所述已标注图像集以及第一待训练神经网络;
所述处理部分,还被配置为基于所述已标注图像集,得到所述未标注图像集的标签;
所述装置还包括:
训练部分,被配置为将所述已标注图像集和所述未标注图像集作为训练数据、所述未标注图像集的标签作为所述未标注图像集的监督信息,对所述第一待训练神经网络进行训练,得到所述图像处理神经网络。
结合本公开任一实施方式,所述处理部分,还被配置为:
将所述已标注图像集作为训练数据,对所述第一待训练神经网络进行训练,得到第二待训练神经网络;
使用所述第二待训练神经网络对所述未标注图像集进行处理,得到所述未标注图像集的标签。
结合本公开任一实施方式,所述处理部分,还被配置为:
使用所述第二待训练神经网络对所述已标注图像集进行处理得到第一结果,使用所述第二待训练神经网络对所述未标注图像集进行处理得到第二结果;
依据所述第一结果与所述已标注图像集的标签之间的差异得到第一差异,依据所述第二结果与所述未标注图像集的标签之间的差异得到第二差异;
依据所述第一差异和所述第二差异,得到所述第二待训练神经网络的损失;
基于所述第二待训练神经网络的损失,调整所述第二待训练神经网络的参数,得到所述图像处理神经网络。
结合本公开任一实施方式,所述已标注图像集的标签和所述未标注图像的标签均携带类别信息;
所述装置还包括:第一确定部分,被配置为在所述依据所述第一差异和所述第二差异,得到所述第二待训练神经网络的损失之前,确定训练图像集中的第一图像与所述训练图像集中的第二图像之间的相似度得到第一相似度,确定所述训练图像集中的第一图像与所述训练图像集中的第三图像之间的相似度得到第二相似度;所述训练图像集包括所述已标注图像集和所述未标注图像集;所述第一图像的类别与所述第二图像的类别相同,且所述第一图像的类别与所述第三图像的类别不同;
第二确定部分,被配置为依据所述第一相似度和所述第二相似度之间的差异,得到三元组损失;
所述处理部分,还用于依据所述第一差异和所述第二差异,得到类别损失;
依据所述类别损失和所述三元组损失,得到所述第二待训练神经网络的损失。
结合本公开任一实施方式,所述装置还包括:
第三确定部分,被配置为在所述确定训练图像集中的第一图像与所述训练图像集中的第二图像之间的相似度得到第一相似度,以及所述确定所述训练图像集中的第一图像与所述训练图像集中的第三图像之间的相似度得到第二相似度之前,确定第一图像的类内最难图像作为第二图像,确定第一图像的类外最难图像作为第三图像;所述类内最难图像为类内图像集中与所述第一图像之间的相似度最小的图像;所述类外最难图像为类外图像集中与所述第一图像之间的相似度最大的图像;所述类内图像集包括标签与所述第一图像的标签相同的图像;所述类外图像集包括标签与所述第一图像的标签不同的图像。
结合本公开任一实施方式,所述装置还包括:
数据增强处理部分,被配置为在所述使用所述第二待训练神经网络对所述未标注图像集进行处理得到第二结果之前,对所述未标注图像集进行数据增强处理,得到增强后的图 像集;
所述处理部分被配置为:
使用所述第二待训练神经网络对所述增强后的图像集和所述未标注图像集进行处理,得到所述第二结果。
结合本公开任一实施方式,所述数据集增强处理包括以下至少一种:旋转处理、擦除处理、剪裁处理、模糊处理。
结合本公开任一实施方式,图像的所述采集条件包括:采集所述图像的成像设备的参数。
第三方面,提供了一种处理器,所述处理器用于执行如上述第一方面及其任意一种可能实现的方式的方法。
第四方面,提供了一种电子设备,包括:处理器、发送装置、输入装置、输出装置和存储器,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,在所述处理器执行所述计算机指令的情况下,所述电子设备执行如上述第一方面及其任意一种可能实现的方式的方法。
第五方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序包括程序指令,在所述程序指令被处理器执行的情况下,使所述处理器执行如上述第一方面及其任意一种可能实现的方式的方法。
第六方面,提供了一种计算机程序产品,所述计算机程序产品包括计算机程序或指令,在所述计算机程序或指令在计算机上运行的情况下,使得所述计算机执行上述第一方面及其任一种可能的实现方式的方法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。
附图说明
为了更清楚地说明本公开实施例或背景技术中的技术方案,下面将对本公开实施例或背景技术中所需要使用的附图进行说明。
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。
图1为本公开实施例提供的一种图像处理方法的流程示意图;
图2为本公开实施例提供的另一种图像处理方法的流程示意图;
图3为本公开实施例提供的一种图像处理装置的结构示意图;
图4为本公开实施例提供的一种图像处理装置的硬件结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本公开方案,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形, 意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本公开的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
得益于强大的性能,近几年神经网络被广泛应用于图像处理领域,用以执行各种任务。例如,使用神经网络对执行图像分类任务,又例如,使用神经网络执行图像分割任务。为表述方便,下文将执行任务称为应用,将神经网络执行任务所处理的图像称为应用图像。
神经网络在应用过程中的表现效果很大程度取决于对神经网络的训练效果,影响神经网络的训练效果的因素众多,训练图像的图像质量与应用图像的图像质量之间的差异即为众多因素中的一个。图像质量包括:图像的分辨率、图像的信噪比、图像的清晰度。训练图像的图像质量与应用图像的图像质量之间的差异包括以下至少一种:训练图像的分辨率与应用图像的分辨率之间的差异、训练图像的信噪比与应用图像的分辨率之间的差异、训练图像的清晰度与应用图像的清晰度之间的差异。而导致训练图像的图像质量与应用图像的图像质量之间存在差异的原因之一为:训练图像的采集条件与应用图像的采集条件之间的差异。
本公开实施例中,训练图像的采集条件与应用图像的采集条件(下文将称为第一采集条件)之间的差异包括以下至少一种:采集训练图像的成像设备(下文将称为训练成像设备)的参数与采集应用图像的成像设备(下文将称为应用成像设备)的参数之间的差异、采集训练图像的环境与采集应用图像的环境之间的差异。
采集训练图像的成像设备的参数与采集应用图像的成像设备的参数之间的差异包括:训练成像设备的硬件配置与应用成像设备的硬件配置之间的差异。例如,训练设备采集的图像的分辨率为1920×1080,应用设备采集的图像的分辨率为1280×1024。又例如,训练设备的焦距范围为:10毫米-22毫米,应用成像设备的焦距范围为:18毫米-135毫米。
采集图像的环境包括以下至少一种:采集图像的天气、采集图像的场景。例如,采集图像的天气可以是阴天,采集图像的天气也可以是雨天,采集图像的天气还可以是晴天。雨天采集的图像的环境和晴天采集的图像的环境不同,阴天采集的图像的环境和晴天采集的图像的环境不同。又例如,场景可以是汽车内部,场景也可以是候机厅,场景还可以是高速公路,采集汽车内部的图像的场景与采集候机厅的图像的场景不同,采集高速公路的图像的场景与采集候机厅的图像的场景不同。
使用训练图像对神经网络进行训练,得到训练后的神经网络。使用训练后的神经网络执行任务,即使用训练后的神经网络对应用图像进行处理,得到处理结果。例如,在执行图像分类任务的过程中,使用训练后的神经网络对应用图像进行处理,得到分类结果。又例如,在执行图像分割任务的过程中,使用训练后的神经网络对应用图像进行处理,得到分割结果。但在训练图像与应用图像之间存在差异的情况下,上述处理结果(包括分类结果和分割结果)的准确度低。
举例来说,通过A城市的监控摄像头在阴天采集包含行人的图像(下文将称为A地采 集的图像),并通过对A地采集的图像中的行人的身份进行标注得到训练图像。使用训练图像训练神经网络a,使训练后的神经网络a可以用于识别在A地采集的图像中的行人的身份。现需要使用训练后的神经网络a识别在B地采集的图像中的行人的身份,由于训练图像均为阴天采集的图像,而从B地采集的图像包括:阴天采集的图像、晴天采集的图像、雨天采集的图像,不同的天气采集的图像中的环境亮度、清晰度不一样,环境亮度和清晰度的不同影响神经网络的识别准确度,使用训练后的神经网络a对晴天或雨天采集的图像中的行人的身份进行识别,得到的识别结果的准确度低。此外,A地的监控摄像头的参数和B地的监控摄像头的参数也不一样(如拍摄视角,又如分辨率),这也将导致训练后的神经网络a对B地采集到的图像中的行人的身份的识别准确度低。
为提高处理结果的准确度,传统方法通过将应用场景下的第一采集条件下采集的图像作为训练图像,对神经网络进行训练。但这种方法需要对第一采集条件下采集的图像进行标注,而神经网络的训练图像的数量大,传统方法需要耗费大量的人力成本,且标注效率低。
基于此,本公开实施例提供了一种技术方案,以在降低人力成本、提高标注效率的前提下,提高基于神经网络得到的处理结果的准确度。
本公开实施例的执行主体可以为图像处理装置,其中,图像处理装置可以是以下中的一种:手机、计算机、服务器、平板电脑。本申请实施例还可以通过处理器执行计算机代码实现本申请的图像处理方法。
下面结合本公开实施例中的附图对本公开实施例进行描述。
请参阅图1,图1是本公开实施例提供的一种图像处理方法的流程示意图。
101、获取待处理图像。
本公开实施例中,待处理图像可以包含任意内容。例如,待处理图像可以包括道路。又例如,待处理图像可以包括道路和车辆。再例如,待处理图像可以包括人。本公开对待处理图像中的内容不做限定。
在一种获取待处理图像的实现方式中,图像处理装置接收用户通过输入组件输入的待处理图像。上述输入组件包括:键盘、鼠标、触控屏、触控板和音频输入器等。
在另一种获取待处理图像的实现方式中,图像处理装置接收第一终端发送的待处理图像。这里,第一终端可以是以下任意一种:手机、计算机、平板电脑、服务器、可穿戴设备。
在又一种获取待处理图像的实现方式中,图像处理装置可以通过自身的图像采集组件,例如摄像头,直接采集得到待处理图像。
102、使用图像处理神经网络对上述待处理图像进行处理,得到上述待处理图像的处理结果。
本公开实施例中,图像处理神经网络以未标注图像集和已标注图像集为训练数据训练得到,其中,未标注图像集的采集条件与待处理图像的采集条件(下文将称为第二采集条件)相同,已标注图像集的采集条件(下文将称为第三采集条件)与未标注图像集的采集条件不同。
举例来说,待处理图像为在候机室采集的图像,未标注图像集中的图像也为在候机室采集的图像,已标注图像集中的图像非在候机室采集的图像。又例如,待处理图像为摄像 头A采集的图像,未标注图像集中的图像也为摄像头A采集的图像,已标注图像集中的图像为摄像头B采集的图像。
本公开实施例中,已标注图像集中的图像均携带有标签。例如,使用图像处理网络执行的图像分类任务为:判断图像中包含的内容是苹果、香蕉、梨子、桃子、橙子、西瓜中的哪一种,那么已标注图像集中的图像的标签包括苹果、香蕉、梨子、桃子、橙子、西瓜中的一种。又例如,使用图像处理网络执行的任务为行人重识别,即识别图像中包含的人物的身份,那么已标注图像集中的图像的标签包括人物的身份(如张三、李四、王五、周六等)。再例如,使用图像处理网络执行的任务为将人物所覆盖的像素区域从图像中分割出来,那么已标注图像集中的图像标签包括人物轮廓,该人物轮廓所包围的像素区域即为人物所覆盖的像素区域。
在将未标注图像集和已标注图像集作为训练数据对神经网络进行训练的过程中,可基于已标注图像集确定未标注图像集的标签,进而可使用已标注图像集、未标注图像集、未标注图像集的标签对神经网络进行训练,这样,在使用训练后的神经网络对第二采集条件下采集的图像进行处理的过程中,可提高得到的处理结果的准确度。
举例来说,已标注图像集中的图像均为在候机室采集的图像,待处理图像和未标注图像集中的图像均为在汽车内部采集的图像。由于未标注图像集中的图像均未携带标签,无法使用未标注图像集对神经网络进行训练,而候机室内的环境与汽车内部的环境不同(如:候机室内的光线与汽车内部的光线不同,又如:候机室内的物体与汽车内部的物体不同),仅使用已标注图像集对神经网络进行训练,无法使神经网络通过训练学习到汽车内部的环境的信息,从而在使用训练后的神经网络对待处理图像进行处理的过程中,使得到的处理结果的准确度低。在本公开实施例中,将已标注图像集和未标注图像集作为训练数据,可基于已标注图像集确定未标注图像集的标签,从而可使用未标注图像集和未标注图像集的标签作为训练数据对神经网络进行训练,这样,神经网络在训练过程中可学习到汽车内部的环境的信息,进而提高处理结果的准确度。
在一种基于已标注图像集确定未标注图像集的标签的实现方式中(下文将称为第一种实现方式),使用已标注图像集作为训练数据,对神经网络进行训练,得到训练后的神经网络。使用训练后的神经网络对未标注图像集进行处理,得到未标注图像集的标签。使用已标注图像集、未标注图像集、未标注图像集的标签对训练后的神经网络进行训练,得到图像处理神经网络。举例来说,已标注图像集包括图像a,未标注图像集包括图像b,其中,图像a的标签为A。使用神经网络对图像a进行处理,得到第一处理结果。依据第一处理结果与A之间的差异,得到第一中间损失。基于第一中间损失调整神经网络的参数,得到训练后的神经网络。使用训练后的神经网络对图像b进行处理,得到第二处理结果,作为图像b的标签。
在另一种基于已标注图像集确定未标注图像集的标签的实现方式中(下文将称为第二种实现方式),对已标注图像集进行特征提取处理,得到第一中间特征数据集。将第一中间特征数据集作为训练数据、已标注图像集的标签作为第一中间特征数据集的监督信息,对支持向量机(support vetor machine,SVM)进行训练,得到训练后的SVM。对未标注图像集进行特征提取处理,得到第二中间特征数据集。使用训练后的SVM对第二中间特征数据集进行处理,得到第二中间特征数据集的标签,作为未标注图像集的标签。举例来 说,已标注图像集包括图像a和图像b,未标注图像集包括图像c,其中,图像a的标签为A,图像b的标签为B。对图像a进行特征提取处理,得到图像a的特征数据,对图像b进行特征提取处理,得到图像b的特征数据。使用图像a的特征数据和图像b的特征数据对SVM进行训练,得到训练后的SVM。对图像c进行特征提取处理,得到图像c的特征数据。使用训练后的SVM对图像c进行处理,得到目标处理结果,作为图像c的标签。
在又一种基于已标注图像集确定未标注图像集的标签的实现方式中(下文将称为第三种实现方式),对未标注图像集进行聚类处理,得到至少一个簇,其中,每个簇包含至少一张图像。以标签为依据对已标注图像集进行划分,得到至少一个图像集,其中,每个图像集包含至少一张图像,且每个图像集中的图像的标签相同。分别确定与每个簇之间的相似度最大的图像集,作为最大相似度图像集。将最大相似度图像集的标签作为簇的标签,即簇中数据的标签。举例来说,已标注图像集包括图像a、图像b、图像c,未标注图像集包括图像d、图像e、图像f,其中,图像a的标签和图像b的标签均为A,图像c的标签为B。对未标注图像集进行聚类处理,得到第一簇和第二簇,其中,第一簇包括图像d和图像e,第二簇包括图像f。以标签为依据对已标注图像集进行划分,得到第一图像集和第二图像集,其中,第一图像集包括图像a和图像b,第二图像集包括图像c,第一图像集的标签为A,第二图像集的标签为B。确定第一簇与第一图像集之间的相似度为s 1,确定第一簇与第二图像集之间的相似度为s 2,确定第二簇与第一图像集之间的相似度为s 3,确定第二簇与第二图像集之间的相似度为s 4。在s 1大于s 2的情况下,第一簇的最大相似度集为第一图像集,第一簇的标签为A,即可确定图像d的标签和图像e的标签均为A。在s 1小于s 2的情况下,第一簇的最大相似度集为第二图像集,第一簇的标签为B,即可确定图像d的标签和图像e的标签均为B。在s 3大于s 4的情况下,第二簇的最大相似度集为第一图像集,第二簇的标签为A,即可确定图像f的标签为A。在s 3小于s 4的情况下,第二簇的最大相似度集为第二图像集,第二簇的标签为B,即可确定图像f的标签为B。
在一种确定第一簇和第一图像集之间的相似度的实现方式中,假设第一簇的质心为图像A,第一图像集的质心为图像B。确定图像A和图像B之间的相似度,作为第一簇和第一图像集之间的相似度。
在另一种确定第一簇和第一图像集之间的相似度的实现方式中,分别确定第一簇中的每张图像与第一图像集簇中的每张图像之间的相似度,得到相似度集。将相似度集中的最大值作为第一簇和第一图像集之间的相似度。
在又一种确定第一簇和第一图像集之间的相似度的实现方式中,分别确定第一簇中的每张图像与第一图像集簇中的每张图像之间的相似度,得到相似度集。将相似度集中的最小值或平均值作为第一簇和第一图像集之间的相似度。
同理,可通过确定第一簇和第一图像集之间的相似度的实现方式确定第一簇和第二图像集之间的相似度、第二簇和第一图像集之间的相似度、第二簇和第二图像集之间的相似度。
本公开实施例中,以未标注图像集和已标注图像集为训练数据对神经网络进行训练,可基于已标注图像集确定未标注图像集的标签,从而可降低对未标注图像集进行标注的人力成本,并提高标注效率。在使用已标注图像集、未标注图像集、未标注图像集的标签对神经网络进行训练,可使神经网络在训练过程中学习到第二采集条件的信息,从而可在使 用训练得到的图像处理神经网络对待处理图像进行处理的过程中,提高得到的处理结果的准确度。
请参阅图2,图2是本公开实施例提供的一种图像处理神经网络的训练方法的流程示意图。本实施例的执行主体可以是图像处理装置,也可以不是图像装置,即图像处理神经网络的训练方法的执行主体与使用图像处理网络对待处理图像进行处理的执行主体可以相同,也可以不同,本公开实施例对本实施例的执行主体不做限定。为表述方便,下文将本实施例的执行主体称为训练装置,其中,训练装置可以是以下任意一种:手机、计算机、平板电脑、服务器、处理器。
201、获取未标注图像集、已标注图像集以及第一待训练神经网络。
训练装置获取未标注图像集的实现方式,可参见步骤101中图像处理装置获取未标注图像集的实现方式,训练装置获取已标注图像集的实现方式,可参见步骤101中图像处理装置获取已标注图像集的实现方式,此处将不再赘述。
本公开实施例中,第一待训练神经网络为任意神经网络。例如,第一待训练神经网络可以由卷积层、池化层、归一化层、全连接层、下采样层、上采样层、分类器中的至少一种网络层堆叠组成。本公开实施例对第一待训练神经网络的结构不做限定。
在一种获取第一待训练神经网络的实现方式中,训练装置接收用户通过输入组件输入的第一待训练神经网络。上述输入组件包括:键盘、鼠标、触控屏、触控板和音频输入器等。
在另一种获取第一待训练神经网络的实现方式中,训练装置接收第二终端发送的第一待训练神经网络。其中,上述第二终端可以是以下任意一种:手机、计算机、平板电脑、服务器、可穿戴设备。
在又一种获取第一待训练神经网络的实现方式中,训练装置可以从自身的存储部件中获取预存的第一待训练神经网络。
202、基于上述已标注图像集,得到上述未标注图像集的标签。
如步骤102所述,基于已标注图像集,可得到未标注图像集的标签。本步骤采用的是步骤102中的第一种实现方式,将已标注图像集作为训练数据,对第一待训练神经网络进行训练,得到第二待训练神经网络。使用第二待训练神经网络对未标注图像集进行处理,得到未标注图像集的标签。
203、将上述已标注图像集和上述未标注图像集作为训练数据、上述未标注图像集的标签作为上述未标注图像集的监督信息,对上述第一待训练神经网络进行训练,得到上述图像处理神经网络。
在得到未标注图像集的标签后,可将未标注图像集作为训练数据对第一待训练神经网络进行训练。
由于影响神经网络的训练效果的因素还包括训练数据的数量,这里,训练数据的数量越多,神经网络的训练效果越好。因此在本公开实施例中,在对第一待训练神经网络进行训练的过程中,将已标注图像集和未标注图像集作为训练数据、未标注图像集的标签作为未标注图像集的监督信息,对第一待训练神经网络进行训练,以提升训练效果。这样,在使用训练得到的图像处理神经网络对待处理图像进行处理的过程中,可提高得到的处理结果的准确度。
举例来说,已标注图像集包括图像a,未标注图像集包括图像b,其中,图像a的标签为A,经步骤202的处理确定图像b的标签为B。使用第一待训练神经网络对图像a进行处理,得到第一中间结果。确定第一中间结果与A之间的差异,得到第一中间差异。基于第一中间差异,确定第一待训练神经网络的损失,并基于第一待训练神经网络的损失调整第一待训练神经网络的参数,得到第三待训练神经网络。使用第三待训练神经网络对图像b进行处理,得到第二中间结果。确定第一中间结果与B之间的差异,得到第二中间差异。基于第二中间差异,确定第三待训练神经网络的损失,并基于第三待训练神经网络的损失调整第三待训练神经网络的参数,得到图像处理神经网络。
由于影响神经网络的训练效果的因素还包括训练周期数,其中,训练周期数越多,神经网络的训练效果越好。这里,使用第一待训练神经网络对未标注图像集进行处理,得到未标注图像集的标签后,将已标注图像集和未标注图像集作为训练数据、未标注图像集的标签作为未标注图像集的监督信息,对第二待训练神经网络进行训练,以增大训练周期数,提升训练效果,从而提高训练得到的图像处理神经网络对待处理图像进行处理得到的处理结果的准确度。
在神经网络的训练过程中,神经网络将所有训练数据处理完,即完成一个训练周期。举例来说,训练数据包括图像a、图像b。在训练过程中的第一次迭代中,神经网络对图像a进行处理,得到图像a的结果。基于图像a的结果和图像a的标签,得到神经网络的损失,并基于神经网络的损失,调整神经网络的参数,得到第一次迭代后的神经网络。在第二次迭代中,第一次迭代后的神经网络对图像b进行处理,得到图像b的结果。基于图像b的结果和图像b的标签,得到第一次迭代后的神经网络的损失,并基于第一次迭代后的神经网络的损失,调整第一次迭代后的神经网络的参数,得到第二次迭代后的神经网络。在第三次迭代中,第二次迭代后的神经网络对图像a进行处理,得到图像a的结果。基于图像a的结果和图像a的标签,得到第二次迭代后的神经网络的损失,并基于第二次迭代后的神经网络的损失,调整第二次迭代后的神经网络的参数,得到第三次迭代后的神经网络。其中,第一个训练周期包括第一次迭代和第二次迭代,第三次迭代属于第二个训练周期。
在一种可能实现的方式中,使用第二待训练神经网络对已标注图像集进行处理得到第一结果,使用第二待训练神经网络对未标注图像集进行处理得到第二结果。依据第一结果与已标注图像集的标签之间的差异得到第一差异,依据第二结果与未标注图像集的标签之间的差异得到第二差异。依据第一差异和第二差异,得到第二待训练神经网络的损失。由于第二待训练神经网络为通过使用已标注图像集对第一待训练神经网络进行训练得到,也就是说,第二待训练神经网络的已训练周期数比第一待训练神经网络的已训练周期数大。因此,在得到未标注图像集的标签后,使用已标注图像集和未标注图像集作为训练数据、未标注图像集的监督信息对第二待训练神经网络进行训练的效果,比使用已标注图像集和未标注图像集作为训练数据、未标注图像集的监督信息对第一待训练神经网络进行训练的效果好。
在一种依据第一差异和第二差异得到第二待训练神经网络的损失的实现方式中,依据第一差异,确定第二待训练神经网络的第一次迭代损失,并基于第一次迭代损失调整第二待训练神经网络的参数,得到第一次迭代后的第二待训练神经网络。依据第二差异,确定 第二待训练神经网络的第二次迭代损失,并基于第二次迭代损失调整第一次迭代后的第二待训练神经网络的参数,得到图像处理神经网络。
在另一种依据第一差异和第二差异得到第二待训练神经网络的损失的实现方式中,可对第一差异和第二差异进行加权求和,或者加权求和后再加一个常数等方式,得到第二待训练神经网络的损失。例如,未标注图像集的采集条件与待处理图像的采集条件相同,可使第二差异的权重比第一差异的权重大,以使图像处理神经网络通过训练学习到更多第二采集条件的信息,从而在使用训练后的神经网络对待处理图像进行处理的过程中,可提高得到的处理结果的准确度。
由于类别相同的图像之间的相似度应该大于类别不同的图像之间的相似度,若在对图像进行分类处理的过程中,将相似度小的两张图像的标签确定为相同,而将相似度大的两张图像的标签确定为不同,将降低处理结果的准确度。例如,图像a与图像b之间的相似度为s 1,图像a与图像c之间的相似度为s 2,s 1小于s 2。若神经网络在对图像a、图像b、图像c进行处理的过程中,将图像a的标签与图像b的标签确定为相同,而将图像a的标签与图像c的标签确定为不同,得到的处理结果是错误的。
为进一步提高处理结果的准确度,作为一种可选的实施方式,在执行步骤“依据第一差异和所述第二差异,得到第二待训练神经网络的损失”之前,可执行以下步骤:
21、确定训练图像集中的第一图像与上述训练图像集中的第二图像之间的相似度得到第一相似度,确定上述训练图像集中的第一图像与上述训练图像集中的第三图像之间的相似度得到第二相似度。
本步骤中,训练图像集包括已标注图像集和未标注图像集。第一图像的标签与第二图像的标签相同,即第一图像的类别与第二图像的类别相同。第一图像的标签与第三图像的标签不同,即第一图像的类别与第三图像的类别不同。确定第一图像和第二图像之间的相似度,作为第一相似度。确定第一图像与第二图像之间的相似度,作为第二相似度。
本公开实施例中,两张图像之间的相似度可以是以下中的一种:两张图像之间的欧式距离(euclidean distance)、两张图像之间的余弦相似度、两张图像之间的马氏距离(mahalanobis distance)、两张图像之间的皮尔逊相关系数(pearson correlation coefficient)、两张图像之间的汉明距离(hamming distance)。
22、依据第一相似度和第二相似度之间的差异,得到三元组损失。
由于第一相似度为同一类图像之间的相似度,第二相似度为不同类图像之间的相似度,第一相似度应该比第二相似度大。因此可依据第一相似度与第二相似度之间的差异,得到三元组损失。
在一种可能实现的方式中,假设第一相似度为s 1,第二相似度为s 2,三元组损失为L t,s 1,s 2,L t满足下式:
Figure PCTCN2021079122-appb-000001
其中,m为正数。
在另一种可能实现的方式中,假设第一相似度为s 1,第二相似度为s 2,三元组损失为L t,s 1,s 2,L t满足下式:
Figure PCTCN2021079122-appb-000002
其中,k和n均为正数。
在又一种可能实现的方式中,假设第一相似度为s 1,第二相似度为s 2,三元组损失为L t,s 1,s 2,L t满足下式:
Figure PCTCN2021079122-appb-000003
其中,k和n均为正数。
23、步骤“依据第一差异和所述第二差异,得到第二待训练神经网络的损失”包括以下步骤:
1、依据第一差异和第二差异,得到类别损失。
本步骤的实现过程可参见步骤203中“依据第一差异和第二差异,得到第二待训练神经网络的损失”的实现过程。需要理解的是,在本步骤中,依据第一差异和第二差异,得到的损失不是第二待训练神经网络的损失,而是类别损失。
2、依据类别损失和三元组损失,得到第二待训练神经网络的损失。
在一种可能实现的方式中,假设第二待训练神经网络的损失为L,类别损失为L c,三元组损失为L t。L、L c、L t满足下式:
L=k 1L c+k 2L t…公式(4)
其中,k 1和k 2均为小于或等于1的正数。
在另一种可能实现的方式中,假设第二待训练神经网络的损失为L,类别损失为L c,三元组损失为L t。L、L c、L t满足下式:
Figure PCTCN2021079122-appb-000004
其中,k 1和k 2均为小于或等于1的正数。
在又一种可能实现的方式中,假设第二待训练神经网络的损失为L,类别损失为L c,三元组损失为L t。L、L c、L t满足下式:
L=(k 1L c+k 2L t) 2…公式(6)
其中,k 1和k 2均为小于或等于1的正数。
由于属于同一类别的图像中,不同的图像之间的相似度也不同,通过第一图像和第二图像确定第一相似度,以及通过第一图像和第三图像确定第二相似度,并使第一相似度大于第二相似度,可能存在误差。例如,训练图像集包括图像a、图像b、图像c、图像d、图像e,其中,图像a的类别、图像b的类别、图像e的类别均为A,图像c的类别和图像d的类别均为B。图像a与图像b之间的相似度为s 1,图像a与图像c之间的相似度为s 2,图像a与图像d之间的相似度为s 3,图像a与图像e之间的相似度为s 4。在图像a为第一图像、图像b为第二图像、图像c为第三图像的情况下,s 1为第一相似度,s 2为第二相似度。通过对神经网络进行训练,可使s 1大于s 2,但无法保证s 1大于s 3、s 4大于s 2、s 4大于s 3。显然,s 1小于s 3、s 4小于s 2、s 4小于s 3均是错误的。
上述错误的存在,将导致神经网络的训练变差,进而降低处理结果的准确度。本公开实施例提供了一种确定第一图像、第二图像、第三图像的实现方式,以减小上述错误发生的概率,进而提高处理结果的准确度。
作为一种可选的实施方式,在执行步骤21之前,可执行以下步骤:
确定第一图像的类内最难图像作为第二图像,确定第一图像的类外最难图像作为第三图像。
本公开实施例中,类内最难图像对为标签相同的图像的中相似度最小的两张图像,类外最难图像对为标签不同的图像的中相似度最大的两张图像。假设类内最难图像对包含图像a和图像b,则称图像b为图像a的类内最难图像,称图像a为图像b的类内最难图像。假设类外最难图像对包含图像c和图像d,则称图像c为图像d的类外最难图像,称图像c为图像d的类外最难图像。
举例来说,假定图像1的类别、图像2的类别、图像3的类别均相同,图像1的类别与图像4的类别、图像5的类别均不同,图像1与图像2之间的相似度比图像1与图像3之间的相似度小,图像1与图像4之间的相似度比图像1与图像5之间的相似度小。在第一图像为图像1的情况下,类内最难图像对包括图像1和图像2,类外最难图像对包括图像1和图像5,图像2为图像1的类内最难图像,图像5为图像1的类外最难图像,即图像2为第二图像、图像5为第三图像。
通过将第一图像的类内最难图像作为第二图像、将第一图像的类外最难图像作为第三图像,依据第一图像和第二图像确定第一相似度、依据第一图像和第三图像确定第二相似度,并基于第一相似度与第二相似度之间的差异确定第二待训练神经网络的损失,可使第二待训练神经网络在训练过程中,提高对图像的类别的区分能力。
为进一步增加训练图像集中采集条件为第二采集条件的图像的数量,以提升第二待训练神经网络的训练效果。在一些实施例中,在将未标注图像集输入至第二待训练神经网络之前,可对未标注图像集进行数据增强处理,得到增强后的图像集,并将得到增强后的图像集和未标注图像集作为训练数据对第二待训练神经网络进行训练。这样,可达到扩充第二待训练神经网络的训练数据的效果。
由于增强后的图像集与未标注图像集的标签相同,将使用第二待训练神经网络对未标注图像集和增强后的图像集进行处理得到的结果,作为第二结果,并可依据第二结果与未标注图像集的标签之间的差异得到第二差异。
举例来说(例2),未标注图像集包括图像a和图像b,图像a的标签为A,图像b的标签为B。对图像a进行数据增强处理得到图像c,对图像b进行数据增强处理得到图像d,则图像c的标签为A,图像d的标签为B。使用第二待训练神经网络对未标注图像集和增强后的图像集进行处理,得到的第二结果包括结果a、结果b、结果c、结果d,其中,结果a通过使用第二待训练神经网络对图像a进行处理得到,结果b通过使用第二待训练神经网络对图像b进行处理得到,结果c通过使用第二待训练神经网络对图像c进行处理得到,结果d通过使用第二待训练神经网络对图像d进行处理得到。
在本公开的一些实施例中,上述数据集增强处理包括以下至少一种:旋转处理、擦除处理、剪裁处理、模糊处理。
对图像进行旋转处理为以图像的几何中心点为旋转中心,参考角度为旋转角度,对图 像进行旋转,其中,参考角度可依据用户的需求进行调整。对图像进行擦除处理可将图像中的任意一块像素区域中图像内容去除。如将该像素区域中的像素值调整为0。对图像进行剪裁处理为从图像中截取预定尺寸的图像,其中,预定尺寸可依据用户的需求进行调整。通过对图像进行模糊处理,可使图像中的至少部分内容模糊。
在本公开实施例提供的图像处理神经网络的训练方法中,基于第一采集条件下采集的图像集对第二采集条件下采集的图像集进行标注,从而可节省人力成本,并提高标注效率。使用第一采集条件下采集的图像集和第二采集条件下采集的图像集对第一待训练神经网络进行训练,可得到图像处理神经网络,且图像处理神经网络对第二采集条件下采集的图像进行处理,得到的处理结果的准确度高。对于任意采集条件,基于本公开实施例提供的技术方案均可得到的适配的图像处理神经网络,其中,与采集条件适配的图像处理神经网络指图像处理神经网络对该采集条件下采集的图像进行处理,得到处理结果的准确度高。
基于本公开实施例提供的技术方案,本公开实施例还提供了几种可能的应用场景。
场景1:随着政府、企业、个人的安全管理意识加强和智能硬件设备的普及,越来越多的具有人脸识别功能的门禁设备投入到实际应用当中。在一些实施例中,门禁设备通过摄像头采集来访者的人脸图像,作为待识别图像,并使用神经网络对待识别图像进行处理,以确定来访者的身份。然而对应与不同的应用场景,门禁设备采集待识别图像时的采集条件不同。因此如何有效的提高不同应用场景下门禁设备的识别准确度具有非常重要的意义。
例如,A公司的大门装载有门禁设备a,且已使用了一段时间。A公司在办公楼内新安装了门禁设备b。也就是说,门禁设备a安装于室外,门禁设备b安装于室内。显然,门禁设备a采集待识别图像时的采集条件与门禁设备b采集待识别图像时的采集条件不同。采集条件的不同将导致图像处理神经网络的识别准确度低。假设门禁设备a使用的是第一神经网络,其中,第一神经网络通过门禁设备a采集的包含A公司的员工的人脸图像(下文将称为图像集c)训练得到,需要注意的是,图像集c中的图像均携带有标签,标签包括图像中人物的身份(如张三、李四、王五)。将第一神经网络应用于门禁设备b得到的识别准确度低。
为提高门禁设备b的识别准确度,A公司的管理人员可通过门禁设备b采集包含A公司的员工的人脸图像(下文将称为图像集d),并使用第一神经网络对图像集d进行处理,得到图像集d的标签。将用图像集c和图像集d作为训练数据、图像集d的标签作为图像集d的监督信息,对第一神经网络进行训练,得到第二神经网络。将第二神经网络部署于门禁设备b上,可提高门禁设备b的识别准确度。
场景2:随着公共场所内摄像头数量的快速增长,如何有效的通过海量视频流确定视频流中的人物的属性,并依据人物的属性,确定人物的行踪具有重要意义。
在B地方,服务器与候机室内的监控摄像头(下文将称为监控摄像头e)之间具有通信连接,服务器通过该通信连接可获取监控摄像头e采集到的视频流(下文将称为第一视频流),并使用第三神经网络对第一视频流中的图像进行处理,以得到第一视频流中的人物的属性,其中,第三神经网络通过监控摄像头e采集的包含人物的图像(下文将称为图像集f)训练得到,需要注意的是,图像集f中的图像均携带有标签,标签包括人物的属性,该属性包括以下至少一种:上衣颜色、裤子颜色、裤子长度、帽子款式、鞋子颜色、打不打伞、箱包类别、有无口罩、发型、性别。例如,第一视频流包括图像g和图像h。使用 第三神经网络对第一视频流进行处理,确定图像g中的人物的属性包括:白色上衣、黑色裤子、不戴眼镜、短发、女人,确定图像h中的人物的属性包括:白色上衣、黑色裤子、白色鞋子、戴眼镜、戴口罩、手拿雨伞、短发、男人。
B地方的相关执法人员在路口处新安装了监控摄像头i,以获取路口处的行人的属性。由于候机室内的环境与路口处的环境不同,若使用第三神经网络对监控摄像头i进行处理,得到的行人的属性的准确度低。
基于本公开实施例提供的技术方案,B地方的相关执法人员可通过监控摄像头i采集包含行人的图像(下文将称为图像集j),并使用第三神经网络对图像集j进行处理,得到图像集j的标签。将用图像集f和图像集j作为训练数据、图像集j的标签作为图像集j的监督信息,对第三神经网络进行训练,得到第四神经网络。使用第四神经网络对监控摄像头i采集的第二视频流进行处理,可提高获得的第二视频流中的行人的属性的准确度。
场景3:随着道路上的车辆越来越多,如何预防道路交通事故也越来越受关注,其中,人为因素在道路交通事故成因方面占很大的比例,包括由驾驶员注意力不集中、注意力下降等原因引起的分心驾驶。因此,如何有效监测驾驶员是否分心驾驶具有非常重要的意义。
车载终端通过在安装在车辆上的摄像头采集包含驾驶员的脸部的图像,并使用神经网络对包含驾驶员的脸部的图像进行处理,可确定驾驶员是否分心驾驶。
C公司为驾驶员注意力监测方案的提供厂商,C公司使用D公司的车型k上的摄像头采集到的包含驾驶员的脸部的图像(下文将称为图像集m)对第五神经网络进行训练,得到第六神经网络。需要注意的是,图像集m中的图像均携带有标签,标签包括驾驶员分心驾驶或驾驶员未分心驾驶。将第六神经网络部署于车型k,车型k的车载终端可使用第六神经网络确定驾驶员是否分心驾驶。
现D公司生产出了新车型(下文将称为车型n),并希望C公司为车型n提供驾驶员注意力监测方案。由于车型k上安装的摄像头(下文将称为摄像头p)与车型n上安装的摄像头不同,且车型k的内部环境与车型n的内部环境不同,若将第六神经网络部署于车型n,通过第六神经网络得到的监测结果(包括驾驶员分心驾驶或驾驶员未分心驾驶)的准确度低。
基于本公开实施例提供的技术方案,C公司的工作人员可通过摄像头p采集包含驾驶员的脸部的图像(下文将称为图像集q),并使用第六神经网络对图像集q进行处理,得到图像集q的标签。将用图像集m和图像集q作为训练数据、图像集q的标签作为图像集q的监督信息,对第六神经网络进行训练,得到第七神经网络。将第七神经网络部署于车型n,车型n的车载终使用过第七神经网络确定驾驶员是否分心驾驶,得到的监测结果准确度高。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
上述详细阐述了本公开实施例的方法,下面提供了本公开实施例的装置。
请参阅图3,图3为本公开实施例提供的一种图像处理装置的结构示意图,该装置1包括:获取部分11以及处理部分12,其中:
获取部分11,被配置为获取待处理图像;
处理部分12,被配置为使用图像处理神经网络对所述待处理图像进行处理,得到所述待处理图像的处理结果;所述图像处理神经网络以未标注图像集和已标注图像集为训练数据训练得到;所述未标注图像集的采集条件与所述待处理图像的采集条件相同;所述已标注图像集的采集条件与所述未标注图像集的采集条件不同。
结合本公开任一实施方式,所述获取部分11,还被配置为获取所述未标注图像集、所述已标注图像集以及第一待训练神经网络;
所述处理部分12,还被配置为基于所述已标注图像集,得到所述未标注图像集的标签;
所述装置1还包括:
训练部分13,被配置为将所述已标注图像集和所述未标注图像集作为训练数据、所述未标注图像集的标签作为所述未标注图像集的监督信息,对所述第一待训练神经网络进行训练,得到所述图像处理神经网络。
结合本公开任一实施方式,所述处理部分12,还被配置为:
将所述已标注图像集作为训练数据,对所述第一待训练神经网络进行训练,得到第二待训练神经网络;
使用所述第二待训练神经网络对所述未标注图像集进行处理,得到所述未标注图像集的标签。
结合本公开任一实施方式,所述处理部分12,还被配置为:
使用所述第二待训练神经网络对所述已标注图像集进行处理得到第一结果,使用所述第二待训练神经网络对所述未标注图像集进行处理得到第二结果;
依据所述第一结果与所述已标注图像集的标签之间的差异得到第一差异,依据所述第二结果与所述未标注图像集的标签之间的差异得到第二差异;
依据所述第一差异和所述第二差异,得到所述第二待训练神经网络的损失;
基于所述第二待训练神经网络的损失,调整所述第二待训练神经网络的参数,得到所述图像处理神经网络。
结合本公开任一实施方式,所述已标注图像集的标签和所述未标注图像的标签均携带类别信息;
所述装置1还包括:第一确定部分14,被配置为在所述依据所述第一差异和所述第二差异,得到所述第二待训练神经网络的损失之前,确定训练图像集中的第一图像与所述训练图像集中的第二图像之间的相似度得到第一相似度,确定所述训练图像集中的第一图像与所述训练图像集中的第三图像之间的相似度得到第二相似度;所述训练图像集包括所述已标注图像集和所述未标注图像集;所述第一图像的类别与所述第二图像的类别相同,且所述第一图像的类别与所述第三图像的类别不同;
第二确定部分15,被配置为依据所述第一相似度和所述第二相似度之间的差异,得到三元组损失;
所述处理部分12,还被配置为依据所述第一差异和所述第二差异,得到类别损失;
依据所述类别损失和所述三元组损失,得到所述第二待训练神经网络的损失。
结合本公开任一实施方式,所述装置1还包括:
第三确定部分16,被配置为在所述确定训练图像集中的第一图像与所述训练图像集中的第二图像之间的相似度得到第一相似度,以及所述确定所述训练图像集中的第一图像与 所述训练图像集中的第三图像之间的相似度得到第二相似度之前,确定第一图像的类内最难图像作为第二图像,确定第一图像的类外最难图像作为第三图像;所述类内最难图像为类内图像集中与所述第一图像之间的相似度最小的图像;所述类外最难图像为类外图像集中与所述第一图像之间的相似度最大的图像;所述类内图像集包括标签与所述第一图像的标签相同的图像;所述类外图像集包括标签与所述第一图像的标签不同的图像。
结合本公开任一实施方式,所述装置1还包括:
数据增强处理部分17,被配置为在所述使用所述第二待训练神经网络对所述未标注图像集进行处理得到第二结果之前,对所述未标注图像集进行数据增强处理,得到增强后的图像集;
所述处理部分12被配置为:
使用所述第二待训练神经网络对所述增强后的图像集和所述未标注图像集进行处理,得到所述第二结果。
结合本公开任一实施方式,所述数据集增强处理包括以下至少一种:旋转处理、擦除处理、剪裁处理、模糊处理。
结合本公开任一实施方式,图像的所述采集条件包括:采集所述图像的成像设备的参数。
在本公开实施例以及其他的实施例中,“部分”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是单元,还可以是模块也可以是非模块化的。
本公开实施例中,以未标注图像集和已标注图像集为训练数据对神经网络进行训练,可基于已标注图像集确定未标注图像集的标签,从而可降低对为标注图像集进行标注的人力成本,并提高标注效率。在使用已标注图像集、未标注图像集、未标注图像集的标签对神经网络进行训练,可使神经网络在训练过程中学习到第二采集条件的信息,从而可在使用训练得到的图像处理神经网络对待处理图像进行处理的过程中,提高得到的处理结果的准确度。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。
图4为本公开实施例提供的一种图像处理装置的硬件结构示意图。该图像处理装置2包括处理器21,存储器22,输入装置23,输出装置24。该处理器21、存储器22、输入装置23和输出装置24通过连接器相耦合,该连接器包括各类接口、传输线或总线等等,本公开实施例对此不作限定。应当理解,本公开的各个实施例中,耦合是指通过特定方式的相互联系,包括直接相连或者通过其他设备间接相连,例如可以通过各类接口、传输线、总线等相连。
处理器21可以是一个或多个图形处理器(graphics processing unit,GPU),在处理器21是一个GPU的情况下,该GPU可以是单核GPU,也可以是多核GPU。在一些实施例中,处理器21可以是多个GPU构成的处理器组,多个处理器之间通过一个或多个总线彼此耦合。在一些实施例中,该处理器还可以为其他类型的处理器等等,本公开实施例不作限定。
存储器22可用于存储计算机程序指令,以及用于执行本公开方案的程序代码在内的 各类计算机程序代码。可选地,存储器包括但不限于是随机存储记忆体(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)、或便携式只读存储器(compact disc read-only memory,CD-ROM),该存储器用于相关指令及数据。
输入装置23用于输入数据和/或信号,以及输出装置24用于输出数据和/或信号。输入装置23和输出装置24可以是独立的器件,也可以是一个整体的器件。
可理解,本公开实施例中,存储器22不仅可用于存储相关指令,还可用于存储相关数据,如该存储器22可用于存储通过输入装置23获取的待处理图像,又或者该存储器22还可用于存储通过处理器21得到的处理结果等等,本公开实施例对于该存储器中所存储的数据不作限定。
可以理解的是,图4示出了一种图像处理装置的简化设计。在实际应用中,图像处理装置还可以分别包含必要的其他元件,包含但不限于任意数量的输入/输出装置、处理器、存储器等,而所有可以实现本公开实施例的图像处理装置都在本公开的保护范围之内。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本公开的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。所属领域的技术人员还可以清楚地了解到,本公开各个实施例描述各有侧重,为描述的方便和简洁,相同或类似的部分在不同实施例中可能没有赘述,因此,在某一实施例未描述或未详细描述的部分可以参见其他实施例的记载。
在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本公开实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质 中,或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,数字通用光盘(digital versatile disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来指令相关的硬件完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:只读存储器(read-only memory,ROM)或随机存储存储器(random access memory,RAM)、磁碟或者光盘等各种可存储程序代码的介质。
工业实用性
本公开实施例中,以未标注图像集和已标注图像集为训练数据对神经网络进行训练,可基于已标注图像集确定未标注图像集的标签,从而可降低对未标注图像集进行标注的人力成本,并提高标注效率。在使用已标注图像集、未标注图像集、未标注图像集的标签对神经网络进行训练,可使神经网络在训练过程中学习到第二采集条件的信息,从而可在使用训练得到的图像处理神经网络对待处理图像进行处理的过程中,提高得到的处理结果的准确度。

Claims (18)

  1. 一种图像处理方法,所述方法包括:
    获取待处理图像;
    使用图像处理神经网络对所述待处理图像进行处理,得到所述待处理图像的处理结果;所述图像处理神经网络以未标注图像集和已标注图像集为训练数据训练得到;所述未标注图像集的采集条件与所述待处理图像的采集条件相同;所述已标注图像集的采集条件与所述未标注图像集的采集条件不同。
  2. 根据权利要求1所述的方法,其中,所述方法还包括:
    获取所述未标注图像集、所述已标注图像集以及第一待训练神经网络;
    基于所述已标注图像集,得到所述未标注图像集的标签;
    将所述已标注图像集和所述未标注图像集作为训练数据、所述未标注图像集的标签作为所述未标注图像集的监督信息,对所述第一待训练神经网络进行训练,得到所述图像处理神经网络。
  3. 根据权利要求2所述的方法,其中,所述基于所述已标注图像集,得到所述未标注图像集的标签,包括:
    将所述已标注图像集作为训练数据,对所述第一待训练神经网络进行训练,得到第二待训练神经网络;
    使用所述第二待训练神经网络对所述未标注图像集进行处理,得到所述未标注图像集的标签。
  4. 根据权利要求3所述的方法,其中,所述将所述已标注图像集和所述未标注图像集作为训练数据、所述未标注图像集的标签作为所述未标注图像集的监督信息,对所述第一待训练神经网络进行训练,得到所述图像处理神经网络,包括:
    使用所述第二待训练神经网络对所述已标注图像集进行处理得到第一结果,使用所述第二待训练神经网络对所述未标注图像集进行处理得到第二结果;
    依据所述第一结果与所述已标注图像集的标签之间的差异得到第一差异,依据所述第二结果与所述未标注图像集的标签之间的差异得到第二差异;
    依据所述第一差异和所述第二差异,得到所述第二待训练神经网络的损失;
    基于所述第二待训练神经网络的损失,调整所述第二待训练神经网络的参数,得到所述图像处理神经网络。
  5. 根据权利要求4所述的方法,其中,所述已标注图像集的标签和所述未标注图像的标签均携带类别信息;
    在所述依据所述第一差异和所述第二差异,得到所述第二待训练神经网络的损失之前,所述方法还包括:
    确定训练图像集中的第一图像与所述训练图像集中的第二图像之间的相似度得到第一相似度,确定所述训练图像集中的第一图像与所述训练图像集中的第三图像之间的相似度得到第二相似度;所述训练图像集包括所述已标注图像集和所述未标注图像集;所述第一图像的类别与所述第二图像的类别相同,且所述第一图像的类别与所述第三图像的类别不同;
    依据所述第一相似度和所述第二相似度之间的差异,得到三元组损失;
    所述依据所述第一差异和所述第二差异,得到所述第二待训练神经网络的损失,包括:
    依据所述第一差异和所述第二差异,得到类别损失;
    依据所述类别损失和所述三元组损失,得到所述第二待训练神经网络的损失。
  6. 根据权利要求5所述的方法,其中,在所述确定训练图像集中的第一图像与所述训练图像集中的第二图像之间的相似度得到第一相似度,以及所述确定所述训练图像集中的第一图像与所述训练图像集中的第三图像之间的相似度得到第二相似度之前,所述方法还包括:
    确定第一图像的类内最难图像作为第二图像,确定第一图像的类外最难图像作为第三图像;所述类内最难图像为类内图像集中与所述第一图像之间的相似度最小的图像;所述类外最难图像为类外图像集中与所述第一图像之间的相似度最大的图像;所述类内图像集包括标签与所述第一图像的标签相同的图像;所述类外图像集包括标签与所述第一图像的标签不同的图像。
  7. 根据权利要求4至6中任意一项所述的方法,其中,在所述使用所述第二待训练神经网络对所述未标注图像集进行处理得到第二结果之前,所述方法还包括:
    对所述未标注图像集进行数据增强处理,得到增强后的图像集;
    所述使用所述第二待训练神经网络对所述未标注图像集进行处理得到第二结果,包括:
    使用所述第二待训练神经网络对所述增强后的图像集和所述未标注图像集进行处理,得到所述第二结果。
  8. 一种图像处理装置,所述装置包括:
    获取部分,被配置为获取待处理图像;
    处理部分,被配置为使用图像处理神经网络对所述待处理图像进行处理,得到所述待处理图像的处理结果;所述图像处理神经网络以未标注图像集和已标注图像集为训练数据训练得到;所述未标注图像集的采集条件与所述待处理图像的采集条件相同;所述已标注图像集的采集条件与所述未标注图像集的采集条件不同。
  9. 根据权利要求8所述的图像处理装置,其中,
    所述获取部分,还被配置为获取所述未标注图像集、所述已标注图像集以及第一待训练神经网络;
    所述处理部分,还被配置为基于所述已标注图像集,得到所述未标注图像集的标签;
    所述装置还包括:训练部分,被配置为将所述已标注图像集和所述未标注图像集作为训练数据、所述未标注图像集的标签作为所述未标注图像集的监督信息,对所述第一待训练神经网络进行训练,得到所述图像处理神经网络。
  10. 根据权利要求9所述的图像处理装置,其中,所述处理部分,还被配置为:
    将所述已标注图像集作为训练数据,对所述第一待训练神经网络进行训练,得到第二待训练神经网络;
    使用所述第二待训练神经网络对所述未标注图像集进行处理,得到所述未标注图像集的标签。
  11. 根据权利要求10所述的图像处理装置,其中,所述处理部分,还被配置为:
    使用所述第二待训练神经网络对所述已标注图像集进行处理得到第一结果,使用所述 第二待训练神经网络对所述未标注图像集进行处理得到第二结果;
    依据所述第一结果与所述已标注图像集的标签之间的差异得到第一差异,依据所述第二结果与所述未标注图像集的标签之间的差异得到第二差异;
    依据所述第一差异和所述第二差异,得到所述第二待训练神经网络的损失;
    基于所述第二待训练神经网络的损失,调整所述第二待训练神经网络的参数,得到所述图像处理神经网络。
  12. 根据权利要求11所述的图像处理装置,其中,所述已标注图像集的标签和所述未标注图像的标签均携带类别信息;
    所述装置还包括:第一确定部分,被配置为在所述依据所述第一差异和所述第二差异,得到所述第二待训练神经网络的损失之前,确定训练图像集中的第一图像与所述训练图像集中的第二图像之间的相似度得到第一相似度,确定所述训练图像集中的第一图像与所述训练图像集中的第三图像之间的相似度得到第二相似度;所述训练图像集包括所述已标注图像集和所述未标注图像集;所述第一图像的类别与所述第二图像的类别相同,且所述第一图像的类别与所述第三图像的类别不同;
    第二确定部分,被配置为依据所述第一相似度和所述第二相似度之间的差异,得到三元组损失;
    所述处理部分,还用于依据所述第一差异和所述第二差异,得到类别损失;
    依据所述类别损失和所述三元组损失,得到所述第二待训练神经网络的损失。
  13. 根据权利要求12所述的图像处理装置,其中,所述装置还包括:
    第三确定部分,被配置为在所述确定训练图像集中的第一图像与所述训练图像集中的第二图像之间的相似度得到第一相似度,以及所述确定所述训练图像集中的第一图像与所述训练图像集中的第三图像之间的相似度得到第二相似度之前,确定第一图像的类内最难图像作为第二图像,确定第一图像的类外最难图像作为第三图像;所述类内最难图像为类内图像集中与所述第一图像之间的相似度最小的图像;所述类外最难图像为类外图像集中与所述第一图像之间的相似度最大的图像;所述类内图像集包括标签与所述第一图像的标签相同的图像;所述类外图像集包括标签与所述第一图像的标签不同的图像。
  14. 根据权利要求10至12任意一项所述的图像处理装置,其中,所述装置还包括:
    数据增强处理部分,被配置为在所述使用所述第二待训练神经网络对所述未标注图像集进行处理得到第二结果之前,对所述未标注图像集进行数据增强处理,得到增强后的图像集;
    所述处理部分,被配置为使用所述第二待训练神经网络对所述增强后的图像集和所述未标注图像集进行处理,得到所述第二结果。
  15. 一种处理器,所述处理器用于执行如权利要求1至7中任意一项所述的方法。
  16. 一种电子设备,包括:处理器、发送装置、输入装置、输出装置和存储器,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,在所述处理器执行所述计算机指令的情况下,所述电子设备执行如权利要求1至7中任意一项所述的方法。
  17. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序包括程序指令,在所述程序指令被处理器执行的情况下,使所述处理器执行权利要求1至7中任意一项所述的方法。
  18. 一种计算机程序,包括计算机可读代码,在所述计算机可读代码在电子设备中运行,被所述电子设备中的处理器执行的情况下,实现权利要求1至7中任意一项所述的方法。
PCT/CN2021/079122 2020-04-07 2021-03-04 姿态检测及视频处理方法、装置、电子设备和存储介质 WO2021203882A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2021564216A JP2022531763A (ja) 2020-04-07 2021-03-04 画像処理方法及び装置、プロセッサ、電子機器並びに記憶媒体
KR1020217034492A KR20210137213A (ko) 2020-04-07 2021-03-04 이미지 처리 방법 및 장치, 프로세서, 전자 기기, 저장 매체

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010264926.7 2020-04-07
CN202010264926.7A CN111598124B (zh) 2020-04-07 2020-04-07 图像处理及装置、处理器、电子设备、存储介质

Publications (1)

Publication Number Publication Date
WO2021203882A1 true WO2021203882A1 (zh) 2021-10-14

Family

ID=72185159

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/079122 WO2021203882A1 (zh) 2020-04-07 2021-03-04 姿态检测及视频处理方法、装置、电子设备和存储介质

Country Status (5)

Country Link
JP (1) JP2022531763A (zh)
KR (1) KR20210137213A (zh)
CN (1) CN111598124B (zh)
TW (1) TW202139062A (zh)
WO (1) WO2021203882A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111598124B (zh) * 2020-04-07 2022-11-11 深圳市商汤科技有限公司 图像处理及装置、处理器、电子设备、存储介质
US20220147761A1 (en) * 2020-11-10 2022-05-12 Nec Laboratories America, Inc. Video domain adaptation via contrastive learning
CN112749652B (zh) * 2020-12-31 2024-02-20 浙江大华技术股份有限公司 身份信息确定的方法和装置、存储介质及电子设备
KR102403174B1 (ko) * 2021-12-21 2022-05-30 주식회사 인피닉 중요도에 따른 데이터 정제 방법 및 이를 실행시키기 위하여 기록매체에 기록된 컴퓨터 프로그램
CN114742828B (zh) * 2022-06-09 2022-10-14 武汉东方骏驰精密制造有限公司 基于机器视觉的工件定损智能分析方法及装置
TWI825980B (zh) * 2022-09-07 2023-12-11 英業達股份有限公司 記憶體內計算的模擬器的設定方法

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105046196A (zh) * 2015-06-11 2015-11-11 西安电子科技大学 基于级联卷积神经网络的前车车辆信息结构化输出方法
US20160180151A1 (en) * 2014-12-17 2016-06-23 Google Inc. Generating numeric embeddings of images
CN106096538A (zh) * 2016-06-08 2016-11-09 中国科学院自动化研究所 基于定序神经网络模型的人脸识别方法及装置
CN106971556A (zh) * 2017-05-16 2017-07-21 中山大学 基于双网络结构的卡口车辆重识别方法
JP2019083002A (ja) * 2017-10-27 2019-05-30 アドビ インコーポレイテッド トリプレット損失ニューラル・ネットワーク・トレーニングを使用するフォント認識の改善
CN109902798A (zh) * 2018-05-31 2019-06-18 华为技术有限公司 深度神经网络的训练方法和装置
CN110532345A (zh) * 2019-07-15 2019-12-03 北京小米智能科技有限公司 一种未标注数据的处理方法、装置及存储介质
CN110647938A (zh) * 2019-09-24 2020-01-03 北京市商汤科技开发有限公司 图像处理方法及相关装置
CN110889463A (zh) * 2019-12-10 2020-03-17 北京奇艺世纪科技有限公司 一种样本标注方法、装置、服务器及机器可读存储介质
CN111598124A (zh) * 2020-04-07 2020-08-28 深圳市商汤科技有限公司 图像处理及装置、处理器、电子设备、存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10318889B2 (en) * 2017-06-26 2019-06-11 Konica Minolta Laboratory U.S.A., Inc. Targeted data augmentation using neural style transfer
CN110188829B (zh) * 2019-05-31 2022-01-28 北京市商汤科技开发有限公司 神经网络的训练方法、目标识别的方法及相关产品
CN110472737B (zh) * 2019-08-15 2023-11-17 腾讯医疗健康(深圳)有限公司 神经网络模型的训练方法、装置和医学图像处理系统

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160180151A1 (en) * 2014-12-17 2016-06-23 Google Inc. Generating numeric embeddings of images
CN105046196A (zh) * 2015-06-11 2015-11-11 西安电子科技大学 基于级联卷积神经网络的前车车辆信息结构化输出方法
CN106096538A (zh) * 2016-06-08 2016-11-09 中国科学院自动化研究所 基于定序神经网络模型的人脸识别方法及装置
CN106971556A (zh) * 2017-05-16 2017-07-21 中山大学 基于双网络结构的卡口车辆重识别方法
JP2019083002A (ja) * 2017-10-27 2019-05-30 アドビ インコーポレイテッド トリプレット損失ニューラル・ネットワーク・トレーニングを使用するフォント認識の改善
CN109902798A (zh) * 2018-05-31 2019-06-18 华为技术有限公司 深度神经网络的训练方法和装置
CN110532345A (zh) * 2019-07-15 2019-12-03 北京小米智能科技有限公司 一种未标注数据的处理方法、装置及存储介质
CN110647938A (zh) * 2019-09-24 2020-01-03 北京市商汤科技开发有限公司 图像处理方法及相关装置
CN110889463A (zh) * 2019-12-10 2020-03-17 北京奇艺世纪科技有限公司 一种样本标注方法、装置、服务器及机器可读存储介质
CN111598124A (zh) * 2020-04-07 2020-08-28 深圳市商汤科技有限公司 图像处理及装置、处理器、电子设备、存储介质

Also Published As

Publication number Publication date
JP2022531763A (ja) 2022-07-11
TW202139062A (zh) 2021-10-16
CN111598124A (zh) 2020-08-28
CN111598124B (zh) 2022-11-11
KR20210137213A (ko) 2021-11-17

Similar Documents

Publication Publication Date Title
WO2021203882A1 (zh) 姿态检测及视频处理方法、装置、电子设备和存储介质
WO2021238281A1 (zh) 一种神经网络的训练方法、图像分类系统及相关设备
JP7265034B2 (ja) 人体検出用の方法及び装置
WO2021155792A1 (zh) 一种处理装置、方法及存储介质
WO2020224221A1 (zh) 跟踪方法、装置、电子设备及存储介质
TWI712980B (zh) 理賠資訊提取方法和裝置、電子設備
WO2021051547A1 (zh) 暴力行为检测方法及系统
WO2022052375A1 (zh) 车辆识别方法及装置、电子设备及存储介质
CN110647938B (zh) 图像处理方法及相关装置
WO2024001123A1 (zh) 基于神经网络模型的图像识别方法、装置及终端设备
Gawande et al. SIRA: Scale illumination rotation affine invariant mask R-CNN for pedestrian detection
Iwamura et al. Downtown osaka scene text dataset
WO2024077781A1 (zh) 基于卷积神经网络模型的图像识别方法、装置及终端设备
CN111767831A (zh) 用于处理图像的方法、装置、设备及存储介质
Hou et al. A cognitively motivated method for classification of occluded traffic signs
CN114882314A (zh) 模型训练方法及相关产品、图像处理方法及相关产品
CN112396060B (zh) 基于身份证分割模型的身份证识别方法及其相关设备
Qu et al. Improved YOLOv5-based for small traffic sign detection under complex weather
CN113378790A (zh) 视点定位方法、装置、电子设备和计算机可读存储介质
CN113111684A (zh) 神经网络模型的训练方法、装置和图像处理系统
CN111199050B (zh) 一种用于对病历进行自动脱敏的系统及应用
US20220207879A1 (en) Method for evaluating environment of a pedestrian passageway and electronic device using the same
TWI764489B (zh) 人行通道環境評估方法、裝置及電子設備
CN117036658A (zh) 一种图像处理方法及相关设备
CN112214639A (zh) 视频筛选方法、视频筛选装置及终端设备

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 20217034492

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021564216

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21785629

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 31.01.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21785629

Country of ref document: EP

Kind code of ref document: A1