WO2023093151A1 - 图像筛选方法及装置、电子设备和存储介质 - Google Patents
图像筛选方法及装置、电子设备和存储介质 Download PDFInfo
- Publication number
- WO2023093151A1 WO2023093151A1 PCT/CN2022/114469 CN2022114469W WO2023093151A1 WO 2023093151 A1 WO2023093151 A1 WO 2023093151A1 CN 2022114469 W CN2022114469 W CN 2022114469W WO 2023093151 A1 WO2023093151 A1 WO 2023093151A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- score
- image
- human body
- processed
- light
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 82
- 238000012216 screening Methods 0.000 title claims abstract description 40
- 230000008569 process Effects 0.000 claims abstract description 19
- 238000011156 evaluation Methods 0.000 claims description 121
- 238000012549 training Methods 0.000 claims description 97
- 230000000875 corresponding effect Effects 0.000 claims description 41
- 238000012545 processing Methods 0.000 claims description 40
- 238000000605 extraction Methods 0.000 claims description 26
- 238000004590 computer program Methods 0.000 claims description 18
- 238000002372 labelling Methods 0.000 claims description 18
- 238000013528 artificial neural network Methods 0.000 claims description 13
- 230000002596 correlated effect Effects 0.000 claims description 11
- 238000013507 mapping Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 5
- 238000004422 calculation algorithm Methods 0.000 description 14
- 238000001514 detection method Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 8
- 238000001303 quality assessment method Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- PICXIOQBANWBIZ-UHFFFAOYSA-N zinc;1-oxidopyridine-2-thione Chemical class [Zn+2].[O-]N1C=CC=CC1=S.[O-]N1C=CC=CC1=S PICXIOQBANWBIZ-UHFFFAOYSA-N 0.000 description 5
- 244000309466 calf Species 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- RLLPVAHGXHCWKJ-IEBWSBKVSA-N (3-phenoxyphenyl)methyl (1s,3s)-3-(2,2-dichloroethenyl)-2,2-dimethylcyclopropane-1-carboxylate Chemical compound CC1(C)[C@H](C=C(Cl)Cl)[C@@H]1C(=O)OCC1=CC=CC(OC=2C=CC=CC=2)=C1 RLLPVAHGXHCWKJ-IEBWSBKVSA-N 0.000 description 1
- 241001444201 Falco tinnunculus Species 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000004821 distillation Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30168—Image quality inspection
Definitions
- the present application relates to the technical field of image processing, in particular to an image screening method and device, electronic equipment and storage media.
- Person re-identification also known as pedestrian re-identification
- pedestrian re-identification is a technology that uses computer vision technology to determine whether a specific pedestrian exists in an image or video sequence. Given a pedestrian image, retrieve the pedestrian image across devices. This technology aims to make up for the visual limitations of the current fixed camera, and can be combined with pedestrian detection and pedestrian tracking technology, and can be widely used in intelligent video detection, intelligent security and other fields.
- Embodiments of the present application provide an image screening method and device, electronic equipment, and a storage medium, which can accurately screen images meeting requirements for pedestrian re-identification.
- the first aspect of the embodiments of the present application provides an image screening method, including:
- the network model is used to process the human body image to be processed to obtain the quality score of the human body image to be processed; the network model is used to extract the image features of the human body image to be processed, and according to the image features, determine the The fuzzy score used to represent the degree of blur, the light score used to represent the brightness of the light, and the half-body score used to represent the completeness of the half-body human body of the human body image to be processed, according to the blur score, the light score and the half-body The score determines the quality score of the human body image to be processed;
- Image screening is performed according to the quality score.
- the network model of the embodiment of the present application considers fuzzy recognition, light recognition and bust recognition, and can effectively filter blurred human body images, poorly lit human body images and half-body human body images, thereby accurately screening images that meet the requirements of pedestrian re-identification, Reduce the probability of false identification in the pedestrian re-identification algorithm.
- the network model includes a feature extraction network, a fuzzy recognition network, a light recognition network, and a bust recognition network; the network model is used to process the human body image to be processed to obtain the quality score of the human body image to be processed ,include:
- the quality score is calculated based on the blur score, the ray score and the bust score.
- the fuzzy recognition network, the light recognition network, and the bust recognition network in the embodiment of the present application calculate their respective scores according to the image features of the human body image to be processed extracted by the feature extraction network, fully utilize the image features extracted by the feature extraction network, and improve the network performance.
- the generalization ability of the model is the generalization ability of the model.
- the calculating the fuzzy score according to the image features by using the fuzzy recognition network includes:
- fuzzy recognition network to perform linear mapping and normalization processing on the image features, predicting the quality prediction scores of the human body image to be processed on multiple fuzzy evaluation levels;
- the blur score is calculated according to quality prediction scores of the human body image to be processed on the plurality of blur evaluation levels.
- the blur score can be calculated according to the quality prediction scores of the human body image to be processed at multiple blur evaluation levels, and the image features can be converted into a quality score for measuring the blur degree of the image, which can accurately quantify the blur degree of the image.
- the calculating the fuzzy score according to the quality prediction scores of the human body image to be processed on the multiple fuzzy evaluation levels includes:
- the fuzzy score is calculated by the following formula:
- Q blur indicates the blur score
- P i-blur indicates the quality prediction score of the blur evaluation level
- w i-blur indicates the weight value corresponding to the blur evaluation level
- the distribution of w i-blur is positively correlated with the image quality corresponding to the blur evaluation level relevant.
- k is an integer greater than or equal to 1.
- the calculating the ray score according to the image feature by using the ray recognition network includes:
- the light score is calculated according to the quality prediction scores of the human body image to be processed on the multiple light evaluation levels.
- the light score can be calculated according to the quality prediction scores of the human body image to be processed at multiple light evaluation levels, and the image features can be converted into a quality score for measuring the light brightness of the image, which can accurately quantify the light brightness of the image.
- the calculating the light score according to the quality prediction scores of the human body image to be processed on the multiple light evaluation levels includes:
- Q light represents the light score
- P i-light represents the quality prediction score of the light evaluation level
- w i-ligh represents the weight value corresponding to the light evaluation level
- the distribution of w i-light is positively correlated with the image quality corresponding to the light evaluation level relevant.
- k is an integer greater than or equal to 1.
- the calculating the bust score according to the image features by using the bust recognition network includes:
- bust recognition network uses the bust recognition network to linearly map and normalize the image features to predict the quality prediction points of the human body image to be processed on multiple bust evaluation levels;
- the bust score is calculated according to the quality prediction scores of the human body image to be processed on the plurality of bust evaluation levels.
- the bust score can be calculated according to the quality prediction scores of the human body image to be processed on multiple bust evaluation levels, and the image features are converted into quality scores for measuring the bust degree of the image, which can accurately quantify the bust degree of the image.
- the calculating the half-body score according to the quality prediction scores of the human body image to be processed on the multiple half-body evaluation levels includes:
- the half body score is calculated by the following formula:
- Q half indicates the half-length score
- P i-half indicates the quality prediction score of the half-length evaluation level
- w i-half indicates the weight value corresponding to the half-length evaluation level
- w i-half is positively correlated with the image quality corresponding to the half-length evaluation level.
- k is an integer greater than or equal to 1.
- the calculating the quality score based on the blur score, the ray score and the half body score includes:
- the weighted summation formula includes:
- Q total represents the quality score of the human body image to be processed
- Q blur represents the blur score
- Q light represents the light score
- Q half represents the half body score
- w b represents the weight value of the blur score
- w l represents the light fraction Weight value
- w h represents the weight value of the half body score.
- the quality score of the human body image to be processed in the embodiment of the present application is calculated based on the fuzzy score, the ray score and the bust score, and the quality score of the human body image to be processed can be accurately calculated, so as to accurately measure whether the human body image to be processed is suitable for pedestrians. Re-identification requirements.
- the feature extraction network includes a lightweight neural network.
- the feature extraction network is trained with a lightweight neural network, so that the accuracy and performance of the model are well balanced, which is conducive to deployment and use in actual products, and can meet the needs of human body images under real-time processing of multi-channel video quality filtering requirements.
- the method also includes:
- the training data including training images and corresponding labeling results
- Preprocessing the training images inputting the preprocessed training images into the network model for multi-task learning training, and obtaining training results;
- the embodiment of the present application can quantify the quality of the human body image through the training method of multi-task learning, transform it into multi-task and multi-classification in deep learning, and automatically fuse the multi-classification results to obtain the quality score of the human body image.
- the framework is simple and effective , with strong explanatory power.
- the acquisition of training data includes:
- a training image is acquired, and an annotation result corresponding to the training image is generated through an image quality assessment algorithm.
- the training data generation is simpler, more automatically generated training data can be obtained relatively easily, and the labeling cost is low.
- the acquisition of training data includes:
- the acquisition of training data includes:
- the acquisition of training data includes:
- Obtaining a cropped third human body image wherein, after the third human body image is input into a human body scribing model to obtain a scribing result, the cropped third human body image is obtained by performing different degrees of cropping based on the scribing result , the third human body images that have been cropped to different degrees are marked with different bust evaluation levels.
- the method also includes:
- the quality score is greater than a first threshold, it is determined that the human body image to be processed meets the requirements for pedestrian re-identification
- the quality score is less than the first threshold, it is determined that the human body image to be processed does not meet the requirements for pedestrian re-identification.
- the method further includes:
- the second aspect of the embodiments of the present application provides an image screening device, including:
- An acquisition unit configured to acquire human body images to be processed
- An image processing unit configured to use a network model to process the human body image to be processed to obtain a quality score of the human body image to be processed;
- the network model is used to extract image features of the human body image to be processed, according to the Image features, determining the fuzzy score used to represent the degree of blur, the light score used to represent the brightness of the light, and the half-body score used to represent the integrity of the half-body human body of the human body image to be processed, according to the fuzzy score, the The ray score and the half body score determine the quality score of the human body image to be processed;
- the image screening unit is configured to perform image screening according to the quality score.
- the third aspect of the embodiments of the present application provides an electronic device, including a processor and a memory, the memory is used to store a computer program, the computer program includes program instructions, and the processor is configured to call the program The instruction is to execute the step instruction in the first aspect of the embodiment of the present application.
- a fourth aspect of the embodiments of the present application provides a computer-readable storage medium, wherein the above-mentioned computer-readable storage medium stores a computer program, the computer program includes program instructions, and when executed by a processor, the program instructions use The processor executes part or all of the steps described in the first aspect of the embodiments of this application.
- a fifth aspect of the embodiments of the present application provides a computer program product, wherein the above-mentioned computer program product includes a computer program, and the computer program includes program instructions, and when the program instructions are executed by a processor, the processor executes Part or all of the steps described in the first aspect of the embodiment of the present application.
- the computer program product may be a software installation package.
- the image screening method of the embodiment of the present application can use the network model to process the image of the human body to be processed to obtain the comprehensive quality score of the image of the human body to be processed; since the network model can extract the image features of the image of the human body to be processed, the image to be processed can be determined according to the extracted image features. Processing the blur score representing the degree of blur, the ray score representing the brightness of the light, and the half body score representing the completeness of the half body of the human body image, according to the blur score, the light score and the half body score Determine the quality score of the human body image to be processed.
- the network model considers fuzzy recognition, light recognition and half-body recognition, and can effectively filter blurred human body images, poorly lit human body images and half-body human body images, thereby accurately screening images that meet the requirements of pedestrian re-identification and reducing pedestrian re-identification The probability that the algorithm will misidentify.
- FIG. 1 is a schematic flow diagram of an image screening method provided in an embodiment of the present application
- FIG. 2 is a framework diagram of a network model provided by an embodiment of the present application.
- FIG. 3 is a schematic diagram of a training process of a network model provided by an embodiment of the present application.
- FIG. 4 is a schematic structural diagram of an image screening device provided in an embodiment of the present application.
- Fig. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
- the electronic devices involved in the embodiments of the present application may include various handheld devices with computing capabilities, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to wireless modems, as well as various forms of user equipment (user equipment) , UE), mobile station (mobile station, MS), electronic equipment (terminal device) and so on.
- user equipment user equipment
- MS mobile station
- terminal device electronic equipment
- FIG. 1 is a schematic flowchart of an image screening method provided in an embodiment of the present application. As shown in Fig. 1, the image screening method may include the following steps.
- the electronic device acquires a human body image to be processed.
- the image screening method in the embodiment of the present application can be used to screen images that meet the requirements of pedestrian re-identification.
- Person re-identification also known as pedestrian re-identification
- pedestrian re-identification is a technology that uses computer vision technology to determine whether a specific pedestrian exists in an image or video sequence. Given a pedestrian image, retrieve the pedestrian image across devices. This technology aims to make up for the visual limitations of the current fixed camera, and can be combined with pedestrian detection and pedestrian tracking technology, and can be widely used in intelligent video detection, intelligent security and other fields.
- the human body image to be processed may be a human body image (including an image of a human body) in a video or a picture captured by a camera.
- the image of the human body to be processed may be an image intercepted from a video captured by multiple video capture devices included in the video detection system (or intelligent security system).
- the half-length human body image will lead to the loss of part of the human body information, and it is difficult for the commonly used pedestrian re-identification algorithm to extract effective pedestrian human body features from the half-length human body image, so it is also easy to appear during feature comparison. misidentified.
- Half-body human body images refer to incomplete images of human body parts, which can be images of only the lower body or upper body, images of only the head and shoulders, images of only the head plus half-body, images of only the calves and below, Only images with missing headers etc. Generally speaking, the more serious the missing parts of the human body (the larger the missing percentage), the more difficult it is to re-identify the person.
- the electronic device uses the network model to process the image of the human body to be processed to obtain the quality score of the image of the human body to be processed; the network model is used to extract the image features of the image of the human body to be processed, and according to the image features, determine the blurring of the image of the human body to be processed
- the network model can be obtained by using multi-task learning training methods.
- the multi-tasks include fuzzy recognition tasks, light recognition tasks and half-body recognition tasks.
- the quality score is calculated based on the parameters learned by the multi-task learning training method.
- the previous image quality assessment provides the quantitative value of the difference between the reference image and the evaluation image through a specific mathematical model, and the peak signal to noise ratio (peak signal to noise ratio) , PSNR), structural similarity (structural similarity, SSIM) and other evaluation indicators to evaluate the image quality.
- the network model in the embodiment of the present application can be a neural network model, and the network model can be obtained by using a multi-task learning training method to quantify the quality of human body images to be processed into multi-task in deep learning, which can be based on multi-task learning training The parameters learned in the method are calculated to obtain the quality score of the human body image to be processed.
- the framework of the network model is simple and effective.
- the network model in the embodiment of the present application is a trained model.
- the fuzzy identification task is used to identify the possibility of blurring in the human body image to be processed, and the possibility of blurring in the human body image to be processed can be quantified by the blur score.
- the light recognition task is used to identify the possibility of abnormal light in the human body image to be processed, and the possibility of abnormal light in the human body image to be processed can be quantified by the light score.
- the half-body recognition task is used to identify the possibility of a half-body in the human body image to be processed, and the possibility of a half-body in the human body image to be processed can be quantified by the half-body score.
- the multi-task learning method can facilitate the expansion of the network model, such as adding new tasks, such as occlusion tasks, human body discrimination tasks, etc.
- the network model obtained by the training method of multi-task learning is equivalent to the fusion of all tasks, and its generalization ability is better than that of the network model obtained by the training method of single-task learning.
- the network model includes a feature extraction network, a fuzzy recognition network, a light recognition network, and a half-body recognition network; in step 102, the electronic device uses the network model to process the human body image to be processed to obtain the human body image to be processed
- the quality score of the image may specifically include the following steps:
- the electronic device uses the feature extraction network to extract image features from the human body image to be processed
- the electronic device uses the fuzzy recognition network to calculate the blur score according to the image features, uses the light recognition network to calculate the light score according to the image features, and uses the bust recognition network to calculate the light score according to the image features calculating said half body score;
- the electronic device calculates the quality score based on the blur score, the ray score and the half body score.
- the feature extraction network is used to extract image features from human body images to be processed
- the fuzzy recognition network is used to calculate fuzzy scores according to image features
- the light recognition network is used to calculate light scores according to image features
- the bust recognition network is used to calculate light scores according to image features.
- Image features compute bust scores.
- the fuzzy recognition network, light recognition network, and half-body recognition network all calculate their respective scores based on the image features of the human body image to be processed extracted by the feature extraction network, making full use of the image features extracted by the feature extraction network and improving the generalization ability of the network model .
- the feature extraction network may include a lightweight neural network.
- a lightweight neural network refers to a neural network with a small number of parameters and fewer floating-point operations per second (FLOPs).
- FLOPs floating-point operations per second
- the lightweight neural network can be applied to mobile devices (such as mobile phones, Tablet PCs), embedded devices and other electronic devices with relatively weak computing capabilities.
- the lightweight neural network can be a simple few layers of convolution, or it can be mobilenet, shufflenet, SNet, etc.
- Corresponding to lightweight neural networks are large neural networks, such as AlexNet, VGG, ResNet, etc. Although these large neural networks have good recognition effects, the parameters and calculations of the models are huge, which is not suitable for mobile terminals and embedded devices. run.
- the feature extraction network is trained with a lightweight neural network, which makes the accuracy and performance of the model better balanced, which is conducive to deployment and use in actual products, and can meet the quality filtering requirements of human body images under multi-channel video real-time processing.
- the feature extraction network can also improve the results through distillation learning.
- FIG. 2 is a framework diagram of a network model provided by an embodiment of the present application.
- the network model includes a feature extraction network, a fuzzy recognition network, a light recognition network and a bust recognition network.
- the feature extraction network is equivalent to the backbone part of the model, also known as the backbone, which is used to extract the deep learning features of the image.
- the fuzzy recognition network, light recognition network and half-body recognition network are used for classification tasks, and the scores of each classification are calculated to identify whether the image is blurred, whether the light is abnormal, and whether there is a half-body, and then obtain an accurate quality score.
- step (12) the electronic device uses the fuzzy recognition network to calculate the fuzzy score according to the image features, specifically including the following steps:
- the electronic device uses the fuzzy recognition network to linearly map and normalize the image features, and predict the quality prediction scores of the human body image to be processed on multiple fuzzy evaluation levels;
- the electronic device calculates the blur score according to the quality prediction scores of the human body image to be processed on the multiple blur evaluation levels.
- the fuzzy recognition network can be implemented through a fully connected layer, and the image features are linearly mapped and normalized through the fully connected layer, and converted into quality prediction scores for each fuzzy evaluation level of the fuzzy recognition task.
- the fuzzy evaluation grades may include three grades: extremely fuzzy, medium fuzzy, and clear.
- the electronic device may calculate the blur score according to the quality prediction scores of the to-be-processed human body image at multiple blur evaluation levels in a weighted calculation manner. For example, you can refer to the following formula to calculate the fuzzy score.
- Q blur represents the blur score
- P i-blur represents the quality prediction score of each blur evaluation level
- w i-blur represents the weight value corresponding to each blur evaluation level.
- w i-blur can be a weight parameter obtained from training, or a preset weight parameter.
- the quality prediction score of each fuzzy evaluation level can also be obtained by predicting probability.
- the fuzzy evaluation level can include three dimensions: extremely fuzzy, medium fuzzy, and clear.
- the higher the fuzzy score the higher the image quality of the fuzzy dimension of the human body image to be processed (the lower the fuzzy degree, the higher the clarity); the lower the fuzzy score, the higher the image quality of the fuzzy dimension of the human body image to be processed The lower (more blurriness).
- the blur score can be calculated according to the quality prediction scores of the human body image to be processed at multiple blur evaluation levels, and the image features can be converted into a quality score for measuring the blur degree of the image, and the blur degree of the image can be accurately quantified according to the blur score.
- step (12) the electronic device uses the light recognition network to calculate the light score according to the image features, specifically including the following steps:
- the electronic device uses the light recognition network to linearly map and normalize the image features, and predict the quality prediction scores of the human body image to be processed on multiple light evaluation levels;
- the electronic device calculates the light score according to the quality prediction scores of the human body image to be processed on the multiple light evaluation levels.
- the light recognition network can be implemented through a fully connected layer, through which image features are linearly mapped and normalized, and converted into quality prediction scores for each light evaluation level of the light recognition task.
- the light evaluation level may include three dimensions: extremely poor light, medium light, and good light.
- the electronic device may calculate the light score according to the quality prediction scores of the to-be-processed human body image at multiple light evaluation levels in a weighted calculation manner.
- the light fraction can be calculated with reference to the following formula.
- Q light represents the light score
- P i-light represents the quality prediction score of each light evaluation level
- w i-ligh represents the weight value corresponding to each light evaluation level.
- w i-ligh can be a weight parameter obtained from training, or a preset weight parameter.
- the quality prediction score of each light evaluation level can also be obtained by means of prediction probability.
- the light evaluation level can include three dimensions: extremely poor light, medium light, and good light.
- the predicted quality prediction scores of these three light evaluation levels are: 0.8, 0.1, and 0.1.
- the higher the ray score the higher the image quality of the ray dimension of the human body image to be processed (the higher the brightness of the light), and the lower the ray score, the lower the image quality of the ray dimension of the human body image to be processed (the light is brighter). the lower the brightness).
- the light score can be calculated according to the quality prediction scores of the human body image to be processed on multiple light evaluation levels, and the image features can be converted into a quality score for measuring the light brightness of the image, and the light brightness of the image can be accurately quantified according to the light score degree.
- step (12) the electronic device uses the bust recognition network to calculate the bust score according to the image features, specifically including the following steps:
- the electronic device uses the bust recognition network to perform linear mapping and normalization processing on the image features, and predict the quality prediction scores of the human body image to be processed on multiple bust evaluation levels;
- the electronic device calculates the half-body score according to the quality prediction scores of the human body image to be processed on the multiple half-body evaluation levels.
- the half-body recognition network can be realized by a fully connected layer, and the image features are linearly mapped and normalized through the fully connected layer, and converted into quality prediction scores for each half-body evaluation level of the half-body recognition task.
- Half-body evaluation grades can include six grades: head and shoulders only, head plus half body only, calf and below missing only, complete human body, lower body only, and head missing only.
- the electronic device may calculate the half-body score according to the quality prediction scores of the to-be-processed human body image on multiple half-body evaluation levels in a weighted calculation manner. For example, you can refer to the following formula to calculate half body score.
- Q half represents the half-body score
- p i-half represents the quality prediction score of each half-body evaluation level
- w i-half represents the weight value corresponding to each half-body evaluation level.
- w i-half can be a weight parameter obtained from training, or a preset weight parameter.
- the quality prediction score of each half-body evaluation level can also be obtained by means of prediction probability.
- the half-body evaluation level can include six dimensions: only head and shoulders, only lower body, only head plus half body, only missing calves and below, only head missing, and complete human body.
- the quality prediction scores of these six half-body evaluation levels are predicted They are: 0, 0.8, 0.1, 0.1, 0, 0, if the weights of these three half-body evaluation levels are: 1, 2, 3, 4, 5, 6 respectively, then it can be obtained that the body image to be processed is
- the lower the image quality of the half-body dimension of the image the lower the half-body degree, the more missing parts of the human body image.
- the bust score can be calculated according to the quality prediction scores of the human body image to be processed on multiple bust evaluation levels, and the image features are converted into quality scores for measuring the bust degree of the image, and the bust degree of the image can be accurately quantified according to the bust score.
- step (13) may include the following steps:
- the electronic device calculates the quality score based on the blur score, the ray score, and the half body score in a weighted summation manner.
- the quality score of the human body image to be processed can be calculated by the following formula:
- Q total represents the quality score of the human body image to be processed
- Q blur represents the blur score
- Q light represents the light score
- Q half represents the half body score
- w b represents the weight value of the blur score
- w l represents the light score
- the weight value of w h represents the weight value of the half body score.
- w b , w l , and w h may be weight parameters obtained from training, or preset weight parameters.
- the quality score of the human body image to be processed in the embodiment of the present application is calculated based on the blur score, light score, and half-body score, which can accurately Calculate the quality score of the human body image to be processed, so as to accurately measure whether the human body image to be processed meets the requirements of pedestrian re-identification according to the quality score.
- the electronic device performs image screening according to the quality score.
- image screening may be performed according to the quality score.
- the higher the quality score the higher the image quality of the human body image to be processed.
- the quality score of the human body image to be processed is greater than the first threshold, it is considered that the human body image to be processed meets the requirements of pedestrian re-identification.
- Human body images with quality scores greater than the first threshold can be screened out, pedestrian re-identification is performed on the human body image, and human body images with quality scores smaller than the first threshold are filtered out, thereby accurately screening images that meet the requirements of pedestrian re-identification, reducing The probability of misidentification in the pedestrian re-identification algorithm.
- the first threshold may be preset, and the first threshold may be stored in a memory (for example, a non-volatile memory) of the electronic device.
- the electronic device After the electronic device determines that the human body image to be processed meets the requirements for pedestrian re-identification, it can input the human body image to be processed into the pedestrian re-identification network to perform pedestrian re-identification.
- the method shown in Figure 1 may also include the following steps:
- the electronic device determines that the human body image to be processed does not meet the requirements for pedestrian re-identification.
- human body images to be processed that do not meet the requirements of pedestrian re-identification they can be directly filtered, and will not be input into the pedestrian re-identification network, reducing the probability of interference to pedestrian re-identification.
- pedestrian re-identification may be performed on the human body image to be processed.
- the method shown in Figure 1 can be applied in the software development kit (software development kit, SDK) of video stream or image stream, load the network model in the SDK, and can calculate the quality score of the image according to the input human body image to be processed , and low-quality human body images can also be filtered by thresholding.
- SDK software development kit
- the method shown in Figure 1 can be applied in the following scenarios: (1) when processing video stream or picture stream data, before performing feature extraction on the human body picture detected by the detection model, it is necessary to perform quality filtering on the human body picture; (2) ) When processing video stream data, before tracking or selecting frames (selecting high-quality human bodies) for the human body pictures detected by the detection model, it is necessary to obtain the scene of the human body picture quality score; (3) when performing joint face and human body clustering, For human behavior clustering, human body clustering, real-time clustering and other tasks, the scene of human body mass score is needed.
- the network model of the embodiment of the present application considers fuzzy recognition, light recognition and half-body recognition.
- the network model can calculate the quality score of the human body image to be processed through the blur degree, light degree and half-body degree of the image, and can effectively filter the blur through the threshold method human body image, poor light human body image and bust human body image, so as to accurately screen images that meet the requirements of pedestrian re-identification, and reduce the probability of false recognition of pedestrian re-identification algorithm.
- FIG. 3 is a schematic diagram of a training flow of a network model provided by an embodiment of the present application. As shown in FIG. 3 , the training process of the network model may include the following steps.
- the electronic device acquires training data, where the training data includes training images and corresponding labeling results.
- the network model can be trained in a supervised manner.
- the training image may be a randomly captured image containing a human body, or a human body detection result image or a human body snapshot image from a video stream or an image stream.
- the network models shown in Fig. 1 and Fig. 2 can be trained by the method shown in Fig. 3 .
- the training images in this embodiment of the present application may be images meeting the requirements of multi-task learning of the network model.
- the multitasking may include an ambiguity recognition task, a light recognition task, and a half-body recognition task.
- the training images may be feature images that can reflect the requirements of the first task.
- the first task is any one of the multi-tasks.
- Multi-tasks may include fuzzy recognition tasks, light recognition tasks, and half-body recognition tasks. Annotation results can exist in the form of labels or in the form of scores.
- the corresponding training image may significantly reflect whether the image is blurry (for example, a clear human body image, or an extremely blurred human body image).
- blurry for example, a clear human body image, or an extremely blurred human body image.
- the training data meeting the requirements of the first task may also meet the requirements of other tasks.
- the training image is not only a blurry picture, but also has poor lighting and only contains the lower body of a person, the training image not only meets the requirements of the fuzzy recognition task, but also meets the requirements of the light recognition task and half-body recognition task.
- the labeling result can label the training image. If the labeling result exists in the form of a label, if the first task is a fuzzy recognition task, the corresponding labeling result can be one of "extremely blurry”, “medium blurry” and “clear”. any kind. If the first task is a light recognition task, the corresponding labeling result may be any one of “extremely poor light”, “medium light”, and “good light”. If the first task is a half-body recognition task, the corresponding labeling results can be "only head and shoulders", “only head plus half body”, “only missing legs and below”, “complete human body”, “only lower body”, “only head Part missing".
- the labeling results can be marked manually or by software algorithms.
- the electronic device acquires training data, which may include the following steps: the electronic device acquires a training image, and generates an annotation result corresponding to the training image through an image quality assessment algorithm.
- the image quality assessment algorithm can be studied from the image quality assessment (image quality assessment, IQA), mainly considering the exposure, clarity, color, texture, shake, focus, artifacts, noise, etc. of the image.
- the image quality evaluation algorithm of the embodiment of the present application mainly conducts quality research on the evaluation of the quality of the human body image from the dimensions of blurring degree, light intensity, and half-body degree.
- Training images can be obtained from captured video streams and picture streams, or can be obtained by performing corresponding image processing on clear human body images, well-lit human body images, and full-body human body images. Due to the redefinition of human body quality (blurring degree, light intensity, half-body degree), training data generation is simpler, and more automatically generated training data can be obtained relatively easily (for example, clear human body images can be Fuzzy processing, can perform light over-exposure or over-dark processing on well-lit human body images, can perform half-body interception on full-body human body images, etc.), and the labeling cost is low.
- the electronic device preprocesses the training image, and inputs the preprocessed training image into the network model to perform multi-task learning training, and obtains a training result.
- the preprocessing may be to scale the size of the training image to the same size, and normalize the size of the training image, so that the model does not need to be adapted to images of different sizes, which facilitates the training of the model.
- the training result can be the probability of a certain label.
- the training results can be the probabilities of "extremely fuzzy”, “medium fuzzy” and "clear”.
- the electronic device optimizes the multi-task model parameters of the network model according to the training result and the labeling result.
- the electronic device may use a loss function to evaluate the fitting effect of the model parameters of the first task of the network model, and then optimize the model parameters of the first task.
- the loss function can optimize the model parameters through the adaptive gradient descent method.
- the model parameters of the first task of the network model may include model parameters of the feature extraction network and the fuzzy recognition network. If the first task is a light recognition task, the model parameters of the first task of the network model may include model parameters of the feature extraction network and the light recognition network. If the first task is a half-body recognition task, the model parameters of the first task of the network model may include model parameters of the feature extraction network and the half-body recognition network.
- the electronic device in steps 301 to 303 and the electronic device in steps 101 to 103 may be different devices, or may be the same device.
- a trained network model can be obtained.
- the trained network model can be directly deployed on the electronic device, or the trained network model can be deployed on other electronic devices after model transformation.
- the trained network model can be a Pytorch model, and the Pytorch model can be converted into a Kestrel model and then deployed on other electronic devices.
- step 301 the electronic device acquires training data, including:
- the electronic device acquires a manually marked first human body image; wherein, the marking of the first human body image is used to indicate a light evaluation level.
- the training data for the light recognition task can be obtained by manual labeling.
- step 301 the electronic device acquires training data, including:
- the electronic device acquires the second human body images processed by different degrees of Gaussian blur; wherein, the second human body images processed by different degrees of Gaussian blur are respectively marked with different blur evaluation levels.
- the training data for the fuzzy recognition task can be obtained by processing the human body image by means of Gaussian blur processing.
- Gaussian blur processing For example, for a picture with a high degree of Gaussian blur processing, its blur evaluation level is extremely blurred; for a picture with a general Gaussian blur processing degree, its blur evaluation level is medium blur; for a picture with a low degree of Gaussian blur processing or no Gaussian blur processing A sharp image of , with a blur rating of sharp.
- step 301 the electronic device acquires training data, including:
- the electronic device acquires a cropped third human body image; wherein, after the cropped third human body image is input into a human body scribing model to obtain a scribing result, different degrees of scribing are performed based on the scribing result.
- the cropping results in that the third human body images that have undergone different degrees of cropping are labeled with different bust evaluation levels.
- the training data for the half-body recognition task can be obtained by processing the human body image by means of image cropping.
- Part of the area containing the complete human body image can be cropped to obtain pictures with different half-body evaluation levels.
- the half-body evaluation level can include six levels: only head and shoulders, only head plus half body, only calf and below missing, complete human body, only lower body, and only head missing.
- the embodiments of the present application can quantify the quality of human body images through the training method of multi-task learning, and convert them into multi-task and multi-classification in deep learning (for example, the multi-classification corresponding to the fuzzy recognition task includes: the above-mentioned multiple fuzzy evaluation levels;
- the multi-classification corresponding to the recognition task includes: the above-mentioned multiple light evaluation levels;
- the multi-classification corresponding to the half-body recognition task includes: the above-mentioned multiple half-body evaluation levels
- the multi-classification results are automatically fused to obtain the quality score of the human body picture
- the framework is simple Efficient, interpretable, uses multi-task learning, and is easy to scale.
- the electronic device includes hardware structures and/or software modules corresponding to each function.
- the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software drives hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
- the embodiment of the present application may divide the electronic device into functional units according to the above method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit.
- the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units. It should be noted that the division of units in the embodiment of the present application is schematic, and is only a logical function division, and there may be another division manner in actual implementation.
- FIG. 4 is a schematic structural diagram of an image screening device provided in an embodiment of the present application.
- the image screening device 400 is applied to electronic equipment.
- the image screening device 400 may include an acquisition unit 401, an image Processing unit 402 and image evaluation unit 403, wherein:
- An acquisition unit 401 configured to acquire an image of a human body to be processed
- the image processing unit 402 is configured to use a network model to process the human body image to be processed to obtain a quality score of the human body image to be processed; the network model is used to extract image features of the human body image to be processed, according to the According to the above image features, determine the fuzzy score used to represent the degree of blur, the light score used to represent the brightness of the light, and the half-body score used to represent the integrity of the half-body human body of the human body image to be processed, according to the fuzzy score, the obtained The ray score and the bust score determine the quality score of the human body image to be processed; the image screening unit 403 is configured to perform image screening according to the quality score.
- the network model includes a feature extraction network, a fuzzy recognition network, a light recognition network and a bust recognition network; the image processing unit 402 uses the network model to process the human body image to be processed to obtain the human body to be processed
- the image quality score includes: using the feature extraction network to extract image features from the human body image to be processed; using the fuzzy recognition network to calculate the fuzzy score according to the image features, using the light recognition network according to the The image feature calculates the light score, and the bust recognition network is used to calculate the bust score according to the image feature; the quality score is calculated based on the fuzzy score, the light score and the bust score.
- the image processing unit 402 uses the blur recognition network to calculate the blur score according to the image features, including: using the blur recognition network to perform linear mapping and normalization processing on the image features, predicting Quality prediction scores of the human body image to be processed at multiple blur evaluation levels; the blur score is calculated according to the quality prediction scores of the human body image to be processed at the multiple blur evaluation levels.
- the image processing unit 402 calculates the blur score according to the quality prediction scores of the human body image to be processed on the multiple blur evaluation levels, including:
- the fuzzy score is calculated by the following formula:
- Q blur indicates the blur score
- P i-blur indicates the quality prediction score of the blur evaluation level
- w i-blur indicates the weight value corresponding to the blur evaluation level
- the distribution of w i-blur is positively correlated with the image quality corresponding to the blur evaluation level relevant.
- k is an integer greater than or equal to 1.
- the image processing unit 402 using the light recognition network to calculate the light score according to the image features includes: using the light recognition network to perform linear mapping and normalization on the image features , predicting quality prediction scores of the human body image to be processed at multiple light evaluation levels; calculating the light score according to the quality prediction scores of the human body image to be processed at the multiple light evaluation levels.
- the image processing unit 402 calculates the light score according to the quality prediction scores of the human body image to be processed on the multiple light evaluation levels, including:
- Q light represents the light score
- P i-light represents the quality prediction score of the light evaluation level
- w i-ligh represents the weight value corresponding to the light evaluation level
- the distribution of w i-light is positively correlated with the image quality corresponding to the light evaluation level relevant.
- k is an integer greater than or equal to 1.
- the image processing unit 402 uses the bust recognition network to calculate the bust score according to the image features, including: using the bust recognition network to perform linear mapping and normalization processing on the image features, predicting The quality prediction scores of the human body image to be processed on multiple bust evaluation levels; the bust score is calculated according to the quality prediction scores of the human body image to be processed on the multiple bust evaluation levels.
- the image processing unit 402 calculates the bust score according to the quality prediction scores of the human body image to be processed on the multiple bust evaluation levels, including:
- the half body score is calculated by the following formula:
- Q half indicates the half-length score
- P i-half indicates the quality prediction score of the half-length evaluation level
- w i-half indicates the weight value corresponding to the half-length evaluation level
- w i-half is positively correlated with the image quality corresponding to the half-length evaluation level.
- k is an integer greater than or equal to 1.
- the image processing unit 402 calculating the quality score based on the blur score, the light score and the half body score includes: using a weighted sum formula based on the blur score, the light score and the The mass score is calculated based on the half body score;
- the weighted summation formula includes:
- Q total represents the quality score of the human body image to be processed
- Q blur represents the blur score
- Q light represents the light score
- Q half represents the half body score
- w b represents the weight value of the blur score
- w l represents the light fraction Weight value
- w h represents the weight value of the half body score.
- the feature extraction network includes a lightweight neural network.
- the image screening device 400 may also include a training unit 404;
- the training unit 404 is configured to obtain training data, the training data includes training images and corresponding labeling results; preprocessing the training images, and inputting the preprocessed training images into the network model for multi-task learning training to obtain a training result; optimize the multi-task model parameters of the network model according to the training result and the labeling result.
- the training unit 404 acquiring training data includes: acquiring training images, and generating annotation results corresponding to the training images through an image quality assessment algorithm.
- the training unit 404 acquiring training data includes: acquiring a first human body image marked manually; wherein, the marking of the first human body image is used to indicate a light evaluation level.
- the training unit 404 acquiring training data includes: acquiring second human body images that have undergone different degrees of Gaussian blur processing; wherein, the second human body images that have undergone different degrees of Gaussian blur processing are marked as different blurs Evaluation grade.
- the training unit 404 acquiring training data includes: acquiring a cropped third human body image; wherein, the cropped third human body image is input into a human body scribing model to obtain a scribing line. After the results are obtained, different degrees of cropping are performed based on the scribing results, and the third human body images that have undergone different degrees of cropping are respectively marked with different bust evaluation levels.
- the image screening device 400 may also determine a unit (not shown in FIG. 4 );
- a determining unit configured to determine that the human body image to be processed meets the requirements for pedestrian re-identification when the quality score is greater than a first threshold
- the determination unit is further configured to determine that the human body image to be processed does not meet the requirements for pedestrian re-identification when the quality score is less than a first threshold.
- the image screening device 400 may also include a pedestrian re-identification unit (not shown in FIG. 4 );
- a pedestrian re-identification unit configured to perform pedestrian re-identification on the to-be-processed human body image after the image evaluation unit 403 determines that the to-be-processed human body image meets the requirements for pedestrian re-identification.
- the acquisition unit 401 , image processing unit 402 , image evaluation unit 403 , training unit 404 , determination unit and pedestrian re-identification unit in the embodiment of the present application may correspond to processors in electronic devices.
- the network model can be used to process the human body image to be processed to obtain the comprehensive quality score of the human body image to be processed; since the network model can extract the image features of the human body image to be processed, the human body image to be processed can be determined according to the extracted image features
- the network model considers fuzzy recognition, light recognition and half-body recognition, and can effectively filter blurred human body images, poorly lit human body images and half-body human body images, thereby accurately screening images that meet the requirements of pedestrian re-identification and reducing pedestrian re-identification The probability that the algorithm will misidentify.
- FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
- the electronic device 500 includes a processor 501 and a memory 502. 503 are interconnected.
- the communication bus 503 may be a Peripheral Component Interconnect (PCI for short) bus or an Extended Industry Standard Architecture (EISA for short) bus or the like.
- the communication bus 503 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in FIG. 5 , but it does not mean that there is only one bus or one type of bus.
- the memory 502 is used to store a computer program, the computer program includes program instructions, and the processor 501 is configured to call the program instructions, and the above program includes to execute the method shown in FIG. 1 or FIG. 3 .
- the processor 501 may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs in the above solutions.
- CPU central processing unit
- ASIC application-specific integrated circuit
- the memory 502 can be a read-only memory (read-only memory, ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM) or other types that can store information and instructions It can also be an electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), a compact disc (Compact Disc Read-Only Memory, CD-ROM) or other optical disc storage, optical disc storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be programmed by a computer Any other medium accessed, but not limited to.
- the memory can exist independently and be connected to the processor through the bus. Memory can also be integrated with the processor.
- the electronic device 500 may also include common components such as a communication interface and an antenna, which will not be described in detail here.
- the network model can be used to process the human body image to be processed to obtain the comprehensive quality score of the human body image to be processed; since the network model can extract the image features of the human body image to be processed, the human body image to be processed can be determined according to the extracted image features
- the network model considers fuzzy recognition, light recognition and half-body recognition, and can effectively filter blurred human body images, poorly lit human body images and half-body human body images, thereby accurately screening images that meet the requirements of pedestrian re-identification and reducing pedestrian re-identification The probability that the algorithm will misidentify.
- An embodiment of the present application also provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, and the computer program enables the computer to execute any image described in the above-mentioned method embodiments. Some or all steps of the screening method.
- the disclosed device can be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components can be combined or can be Integrate into another system, or some features may be ignored, or not implemented.
- the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical or other forms.
- the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated units can be implemented not only in the form of hardware, but also in the form of software program modules.
- the integrated units may be stored in a computer-readable memory if implemented in the form of a software program module and sold or used as an independent product.
- the technical solution of the present application is essentially or part of the contribution to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory.
- Several instructions are included to make a computer device (which may be a personal computer, server or network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
- the aforementioned memory includes: various media that can store program codes such as U disk, read-only memory (ROM), random access memory (RAM), mobile hard disk, magnetic disk or optical disk.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Evolutionary Biology (AREA)
- Quality & Reliability (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
本申请实施例提供一种图像筛选方法及装置、电子设备和存储介质,该图像筛选方法包括:电子设备获取待处理人体图像;利用网络模型对待处理人体图像进行处理,得到待处理人体图像的质量分数;网络模型用于提取待处理人体图像的图像特征,根据图像特征,确定待处理人体图像的用于表示模糊程度的模糊分数、用于表示光线明亮程度的光线分数以及用于表示半身人体的完整程度的半身分数,根据模糊分数、光线分数和半身分数确定待处理人体图像的质量分数;根据质量分数进行图像筛选。本申请实施例可以准确的筛选符合行人重识别的要求的图像。
Description
本申请要求2021年11月25日提交、申请号为202111415866.5,发明名称为“图像筛选方法及装置、电子设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及图像处理技术领域,具体涉及一种图像筛选方法及装置、电子设备和存储介质。
行人重识别(person re-identification)也称行人再识别,是利用计算机视觉技术判断图像或者视频序列中是否存在特定行人的技术。给定一张行人图像,检索跨设备下的该行人图像。该技术旨在弥补目前固定的摄像头的视觉局限,并可与行人检测和行人跟踪技术相结合,可广泛应用于智能视频检测、智能安防等领域。
目前在大规模人体抓拍图或检测人体图片下,由于抓拍的图像质量较差,难以准确的筛选符合行人重识别的要求的图像。
发明内容
本申请实施例提供一种图像筛选方法及装置、电子设备和存储介质,可以准确的筛选符合行人重识别的要求的图像。
本申请实施例的第一方面提供了一种图像筛选方法,包括:
获取待处理人体图像;
利用网络模型对所述待处理人体图像进行处理,得到所述待处理人体图像的质量分数;所述网络模型用于提取所述待处理人体图像的图像特征,根据所述图像特征,确定所述待处理人体图像的用于表示模糊程度的模糊分数、用于表示光线明亮程度的光线分数以及用于表示半身人体的完整程度的半身分数,根据所述模糊分数、所述光线分数和所述半身分数确定所述待处理人体图像的质量分数;
根据所述质量分数进行图像筛选。
本申请实施例的网络模型考虑了模糊识别、光线识别和半身识别,可以有效的过滤模糊的人体图像、光线差的人体图像和半身人体图像,从而准确的筛选符合行人重识别的要求的图像,降低行人重识别算法出现误识别的概率。
可选的,所述网络模型包括特征提取网络、模糊识别网络、光线识别网络和半身识别网络;所述利用网络模型对所述待处理人体图像进行处理,得到所述待处理人体图像的质量分数,包括:
利用所述特征提取网络从所述待处理人体图像提取图像特征;
利用所述模糊识别网络根据所述图像特征计算所述模糊分数,利用所述光线识别网络根据所述图像特 征计算所述光线分数,利用所述半身识别网络根据所述图像特征计算所述半身分数;
基于所述模糊分数、所述光线分数和所述半身分数计算所述质量分数。
本申请实施例的模糊识别网络、光线识别网络、半身识别网络都根据特征提取网络提取的待处理人体图像的图像特征来计算各自的分数,充分利用了特征提取网络提取的图像特征,提高了网络模型的泛化能力。
可选的,所述利用所述模糊识别网络根据所述图像特征计算所述模糊分数,包括:
利用所述模糊识别网络将所述图像特征进行线性映射并归一化处理,预测所述待处理人体图像在多个模糊评价等级上的质量预测分;
根据所述待处理人体图像在所述多个模糊评价等级上的质量预测分计算所述模糊分数。
本申请实施例可以根据待处理人体图像在多个模糊评价等级上的质量预测分计算模糊分数,将图像特征转换为衡量图像的模糊程度的质量得分,可以准确量化图像的模糊程度。
可选的,所述根据所述待处理人体图像在所述多个模糊评价等级上的质量预测分计算所述模糊分数,包括:
通过如下公式计算模糊分数:
其中,Q
blur表示模糊分数,P
i-blur表示模糊评价等级的质量预测分,w
i-blur表示模糊评价等级对应的权重值;w
i-blur的分布与模糊评价等级对应的图像质量高低正相关。其中,k为大于或等于1的整数。
可选的,所述利用所述光线识别网络根据所述图像特征计算所述光线分数,包括:
利用所述光线识别网络将所述图像特征进行线性映射并归一化处理,预测所述待处理人体图像在多个光线评价等级上的质量预测分;
根据所述待处理人体图像在所述多个光线评价等级上的质量预测分计算所述光线分数。
本申请实施例可以根据待处理人体图像在多个光线评价等级上的质量预测分计算光线分数,将图像特征转换为衡量图像的光线明亮程度的质量得分,可以准确量化图像的光线明亮程度。
可选的,所述根据所述待处理人体图像在所述多个光线评价等级上的质量预测分计算所述光线分数,包括:
其中,Q
light表示光线分数,P
i-light表示光线评价等级的质量预测分,w
i-ligh表示光线评价等级对应的权重值;w
i-light的分布与光线评价等级对应的图像质量高低正相关。其中,k为大于或等于1的整数。
可选的,所述利用所述半身识别网络根据所述图像特征计算所述半身分数,包括:
利用所述半身识别网络将所述图像特征进行线性映射并归一化处理,预测所述待处理人体图像在多个半身评价等级上的质量预测分;
根据所述待处理人体图像在所述多个半身评价等级上的质量预测分计算所述半身分数。
本申请实施例可以根据待处理人体图像在多个半身评价等级上的质量预测分计算半身分数,将图像特征转换为衡量图像的半身程度的质量得分,可以准确量化图像的半身程度。
可选的,所述根据所述待处理人体图像在所述多个半身评价等级上的质量预测分计算所述半身分数,包括:
通过如下公式计算半身分数:
其中,Q
half表示半身分数,P
i-half表示半身评价等级的质量预测分,w
i-half表示半身评价等级对应的权重值;w
i-half与半身评价等级对应的图像质量高低正相关。其中,k为大于或等于1的整数。
可选的,所述基于所述模糊分数、所述光线分数和所述半身分数计算所述质量分数,包括:
通过加权求和公式基于所述模糊分数、所述光线分数和所述半身分数计算所述质量分数;
所述加权求和公式包括:
Q
total=w
b*Q
blur+w
l*Q
light+w
h*Q
half;
Q
total表示所述待处理人体图像的质量分数,Q
blur表示模糊分数,Q
light表示光线分数,Q
half表示半身分数,w
b表示所述模糊分数的权重值,w
l表示所述光线分数的权重值,w
h表示所述半身分数的权重值。
本申请实施例的待处理人体图像的质量分数是基于模糊分数、光线分数和半身分数计算得到的,可以准确的计算待处理人体图像的质量分数,从而准确的衡量该待处理人体图像是否符合行人重识别的要求。
可选的,所述特征提取网络包括轻量级神经网络。
本申请实施例中,特征提取网络采用轻量级神经网络进行训练,使得模型的精度、性能得到了较好的平衡,利于实际产品中部署和使用,可以满足多路视频实时处理下的人体图像的质量过滤需求。
可选的,所述方法还包括:
获取训练数据,所述训练数据包括训练图像和对应的标注结果;
对所述训练图像进行预处理,将预处理后的训练图像输入所述网络模型进行多任务学习的训练,得到训练结果;
根据所述训练结果和所述标注结果对所述网络模型的多任务的模型参数进行优化。
本申请实施例通过多任务学习的训练方式,可以把人体图像质量量化,转化为深度学习中的多任务和多分类,并对多分类结果进行自动融合,得到人体图片的质量分数,框架简单有效,解释性强。
可选的,所述获取训练数据,包括:
获取训练图像,通过图像质量评估算法生成与所述训练图像对应的标注结果。
由于对人体质量进行了重新定义(模糊程度、光线强弱、半身程度),训练数据生成更为简单,可以较为轻松地获得更多的自动生成的训练数据,并且标注成本低。
可选的,所述获取训练数据,包括:
获取人工标注的第一人体图像;其中,所述第一人体图像的标注用于指示光线评价等级。
可选的,所述获取训练数据,包括:
获取经过不同程度高斯模糊处理的第二人体图像;其中,所述经过不同程度高斯模糊处理的第二人体图像分别被标注为不同的模糊评价等级。
可选的,所述获取训练数据,包括:
获取裁剪后的第三人体图像;其中,所述裁剪后的第三人体图像由所述第三人体图像输入人体划线模型得到划线结果后,基于所述划线结果进行不同程度的裁剪得到,经过不同程度的裁剪的所述第三人体图像分别被标注为不同的半身评价等级。
可选的,所述方法还包括:
在所述质量分数大于第一阈值的情况下,确定所述待处理人体图像符合行人重识别的要求;
在所述质量分数小于第一阈值的情况下,确定所述待处理人体图像不符合行人重识别的要求。
其中,对于不符合行人重识别的要求的待处理人体图像,则可以直接过滤,不会将其输入行人重识别网络,降低对行人重识别造成干扰的概率。
可选的,所述确定所述待处理人体图像符合行人重识别的要求之后,所述方法还包括:
对所述待处理人体图像进行行人重识别。
本申请实施例的第二方面提供了一种图像筛选装置,包括:
获取单元,用于获取待处理人体图像;
图像处理单元,用于利用网络模型对所述待处理人体图像进行处理,得到所述待处理人体图像的质量分数;所述网络模型用于提取所述待处理人体图像的图像特征,根据所述图像特征,确定所述待处理人体图像的用于表示模糊程度的模糊分数、用于表示光线明亮程度的光线分数以及用于表示半身人体的完整程度的半身分数,根据所述模糊分数、所述光线分数和所述半身分数确定所述待处理人体图像的质量分数;图像筛选单元,用于根据所述质量分数进行图像筛选。
本申请实施例的第三方面提供了一种电子设备,包括处理器和存储器,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行如本申请实施例第一方面中的步骤指令。
本申请实施例的第四方面提供了一种计算机可读存储介质,其中,上述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行如本申请实施例第一方面中所描述的部分或全部步骤。
本申请实施例的第五方面提供了一种计算机程序产品,其中,上述计算机程序产品包括计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行如本申请实施例第一方面中所描述的部分或全部步骤。该计算机程序产品可以为一个软件安装包。
本申请实施例的图像筛选方法,可以利用网络模型对待处理人体图像进行处理,得到待处理人体图像的综合质量分数;由于网络模型可以提取待处理人体图像的图像特征,根据提取的图像特征确定待处理人体图像的用于表示模糊程度的模糊分数、用于表示光线明亮程度的光线分数以及用于表示半身人体的完整程度的半身分数,根据所述模糊分数、所述光线分数和所述半身分数确定所述待处理人体图像的质量分数。该网络模型考虑了模糊识别、光线识别和半身识别,可以有效的过滤模糊的人体图像、光线差的人体图像和半身人体图像,从而准确的筛选符合行人重识别的要求的图像,降低行人重识别算法出现误识别的概率。
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种图像筛选方法的流程示意图;
图2是本申请实施例提供的一种网络模型的框架图;
图3是本申请实施例提供的一种网络模型的训练流程示意图;
图4为本申请实施例提供的一种图像筛选装置的结构示意图;
图5是本申请实施例提供的一种电子设备的结构示意图。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。
在本申请中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少 一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本申请所描述的实施例可以与其它实施例相结合。
本申请实施例所涉及到的电子设备可以包括各种具有运算能力的手持设备、车载设备、可穿戴设备、计算设备或连接到无线调制解调器的其他处理设备,以及各种形式的用户设备(user equipment,UE),移动台(mobile station,MS),电子设备(terminal device)等等。为方便描述,上面提到的设备统称为电子设备。
请参阅图1,图1是本申请实施例提供的一种图像筛选方法的流程示意图。如图1所示,该图像筛选方法可以包括如下步骤。
101,电子设备获取待处理人体图像。
本申请实施例中的图像筛选方法可以用于筛选符合行人重识别的要求的图像。行人重识别(person re-identification)也称行人再识别,是利用计算机视觉技术判断图像或者视频序列中是否存在特定行人的技术。给定一张行人图像,检索跨设备下的该行人图像。该技术旨在弥补目前固定的摄像头的视觉局限,并可与行人检测和行人跟踪技术相结合,可广泛应用于智能视频检测、智能安防等领域。
待处理人体图像可以是摄像头拍摄的视频或图片中的人体图像(包含人体的图像)。在智能视频检测、智能安防等领域,待处理人体图像可以是视频检测系统(或智能安防系统)包含的多个视频拍摄装置拍摄的视频中截取的图像。
如果拍摄的人体图像的分辨率较差,容易导致人体图像中的人体细节模糊不清。常用的行人重识别算法难以从分辨率较差的人体图像中提取出有效的行人人体特征,从而在特征比对的时候容易出现误识别。
如果拍摄的人体图像的光线较差,对于光线差的人体图像,由于光线昏暗导致人体衣服颜色及部位细节难以分辨,常用的行人重识别算法难以从光线差的人体图像中提取出有效的行人人体特征,从而在特征比对的时候容易出现误识别。
如果拍摄的人体图像是半身人体图像,半身人体图像会导致人体部分信息丢失,常用的行人重识别算法难以从半身人体图像中提取出有效的行人人体特征,从而在特征比对的时候也容易出现误识别。半身人体图像,指的是人体部位不完整的图像,可以只有下半身的图像,也可以是上半身的图像,还可以是只有头肩的图像、只有头加上半身的图像、只有小腿及以下的图像、只有头部缺失的图像等。一般而言,人体部位缺失得越严重(缺失百分比越大),行人重识别难度通常也越大。
102,电子设备利用网络模型对待处理人体图像进行处理,得到待处理人体图像的质量分数;网络模型用于提取待处理人体图像的图像特征,根据图像特征,确定待处理人体图像的用于表示模糊程度的模糊分数、用于表示光线明亮程度的光线分数以及用于表示半身人体的完整程度的半身分数,根据模糊分数、 光线分数和半身分数确定待处理人体图像的质量分数。
网络模型可以采用多任务学习的训练方式得到,多任务包括模糊识别任务、光线识别任务和半身识别任务,质量分数基于多任务学习的训练方式学习到的参数计算得到。
本申请实施例中,以往的图片质量评估(image quality assessment,IQA)通过某种特定的数学模型给出参考图像和评估图像之间的差异量化值,借助峰值信噪比(peak signal to noise ratio,PSNR)、结构相似度(structural similarity,SSIM)等评价指标来评估图像质量高低。本申请实施例的网络模型可以是神经网络模型,网络模型可以采用多任务学习的训练方式得到,把待处理人体图像的质量量化,转化为深度学习中的多任务,可以基于多任务学习的训练方式学习到的参数计算得到待处理人体图像的质量分数。网络模型的框架简单有效。本申请实施例的网络模型是训练好的模型。
模糊识别任务,用于识别待处理人体图像存在模糊的可能性,可以通过模糊分数来量化待处理人体图像存在模糊的可能性。光线识别任务,用于识别待处理人体图像存在光线异常的可能性,可以通过光线分数来量化待处理人体图像存在光线异常的可能性。半身识别任务,用于识别待处理人体图像存在半身的可能性,可以通过半身分数来量化待处理人体图像存在半身的可能性。
此外,采用多任务学习的方式,可以方便网络模型进行扩展,如增加新的任务,如遮挡任务、人体判别任务等。采用多任务学习的训练方式得到的网络模型相当于融合了所有任务,其泛化能力比采用单任务学习的训练方式得到的网络模型要好。
可选的,所述网络模型包括特征提取网络、模糊识别网络、光线识别网络和半身识别网络;步骤102中,电子设备利用网络模型对所述待处理人体图像进行处理,得到所述待处理人体图像的质量分数,具体可以包括如下步骤:
(11)电子设备利用所述特征提取网络从所述待处理人体图像提取图像特征;
(12)电子设备利用所述模糊识别网络根据所述图像特征计算所述模糊分数,利用所述光线识别网络根据所述图像特征计算所述光线分数,利用所述半身识别网络根据所述图像特征计算所述半身分数;
(13)电子设备基于所述模糊分数、所述光线分数和所述半身分数计算所述质量分数。
本申请实施例中,特征提取网络用于从待处理人体图像提取图像特征,模糊识别网络用于根据图像特征计算模糊分数,光线识别网络用于根据图像特征计算光线分数,半身识别网络用于根据图像特征计算半身分数。模糊识别网络、光线识别网络、半身识别网络都根据特征提取网络提取的待处理人体图像的图像特征来计算各自的分数,充分利用了特征提取网络提取的图像特征,提高了网络模型的泛化能力。
本申请实施例的多个任务(模糊识别任务、光线识别任务和半身识别任务)共享一个网络模型结构,网络模型结构里面的参数在优化的时候会被上述多个任务影响。这样在上述多个任务收敛的时候,网络模型结构就相当于融合了所有任务,因此其泛化能力比单任务要好。
其中,特征提取网络可以包括轻量级神经网络。轻量级神经网络指的是参数数量较小和每秒浮点运算次数(floating-point operations per second,FLOPs)较少的神经网络,轻量级神经网络可以应用在移动设备(比如,手机、平板电脑)、嵌入式设备等计算能力相对较弱的电子设备上。轻量级神经网络可以是简单的几层卷积、也可以是mobilenet,shufflenet、SNet等。与轻量级神经网络对应的是大型神经网络,比如AlexNet、VGG、ResNet等,这些大型神经网络虽然识别效果不错,但是模型的参数量和计算量巨大,不适合在移动端、嵌入式设备上运行。
特征提取网络采用轻量级神经网络进行训练,使得模型的精度、性能得到了较好的平衡,利于实际产品中部署和使用,可以满足多路视频实时处理下的人体图像的质量过滤需求。特征提取网络还可以通过蒸馏学习来进行结果提升。
请参阅图2,图2是本申请实施例提供的一种网络模型的框架图。如图2所示,该网络模型包括特征提取网络、模糊识别网络、光线识别网络和半身识别网络。特征提取网络相当于模型的主干部分,也可以称为backbone,作用是用来提取图像的深度学习特征。模糊识别网络、光线识别网络和半身识别网络用于做分类任务,计算各个分类的分值,从而识别图像是否模糊、是否光线异常以及是否存在半身,进而得到准确的质量分数。
可选的,步骤(12)中,电子设备利用所述模糊识别网络根据所述图像特征计算所述模糊分数,具体包括如下步骤:
(1211)电子设备利用所述模糊识别网络将所述图像特征进行线性映射并归一化处理,预测所述待处理人体图像在多个模糊评价等级上的质量预测分;
(1212)电子设备根据所述待处理人体图像在所述多个模糊评价等级上的质量预测分计算所述模糊分数。
本申请实施例中,模糊识别网络可以通过全连接层来实现,通过全连接层将图像特征进行线性映射并归一化处理,转化为模糊识别任务的各个模糊评价等级的质量预测分。模糊评价等级可以包括极其模糊、中等模糊、清晰三个等级。
电子设备可以通过加权计算的方式根据待处理人体图像在多个模糊评价等级上的质量预测分计算模糊分数。举例来说,可以参见如下公式计算模糊分数。
其中,Q
blur表示模糊分数,P
i-blur表示各个模糊评价等级的质量预测分,w
i-blur表示各个模糊评价等级对应的权重值。其中,w
i-blur的分布与模糊评价等级对应的图像质量高低正相关。例如,i=2,表示清晰,则i=2时,输出的P
2-blur乘以的权重最高;i=0,表示极其模糊,则i=0时,输出的P
0-blur乘以的权重最低。
w
i-blur可以是训练得到的权重参数,也可以是预先设定的权重参数。各个模糊评价等级的质量预测分也可以通过预测概率的方式得到。
比如,模糊评价等级可以包括极其模糊、中等模糊、清晰三个维度,预测得到的这三个模糊评价等级的质量预测分分别为:0.3、0.6、0.1,如果这三个模糊评价等级的权值分别为:1、3、5,则可以得到该待处理人体图像在多个模糊评价等级上的质量预测分计算模糊分数Q
blur=0.3*1+0.6*3+0.1*5=2.6,“*”代表乘法运算。其中,模糊分数越高,表示该待处理人体图像的模糊维度的图像质量越高(模糊程度越低,清晰程度越高);模糊分数越低,表示该待处理人体图像的模糊维度的图像质量越低(模糊程度越高)。
本申请实施例可以根据待处理人体图像在多个模糊评价等级上的质量预测分计算模糊分数,将图像特征转换为衡量图像的模糊程度的质量得分,可以根据模糊分数准确量化图像的模糊程度。
可选的,步骤(12)中,电子设备利用所述光线识别网络根据所述图像特征计算所述光线分数,具体包括如下步骤:
(1221)电子设备利用所述光线识别网络将所述图像特征进行线性映射并归一化处理,预测所述待处理人体图像在多个光线评价等级上的质量预测分;
(1222)电子设备根据所述待处理人体图像在所述多个光线评价等级上的质量预测分计算所述光线分数。
本申请实施例中,光线识别网络可以通过全连接层来实现,通过全连接层将图像特征进行线性映射并归一化处理,转化为光线识别任务的各个光线评价等级的质量预测分。光线评价等级可以包括光线极差、光线中等、光线良好三个维度。
电子设备可以通过加权计算的方式根据待处理人体图像在多个光线评价等级上的质量预测分计算光线分数。举例来说,可以参照如下公式计算光线分数。
其中,Q
light表示光线分数,P
i-light表示各个光线评价等级的质量预测分,w
i-ligh表示各个光线评价等级对应的权重值。其中,w
i-ligh的分布与光线评价等级对应的图像质量高低正相关。例如,i=2,表示光线良好,则i=2时,输出的P
2-ligh乘以的权重最高;i=0,表示光线极差,则i=0时,输出的P
0-ligh乘以的权重最低。
w
i-ligh可以是训练得到的权重参数,也可以是预先设定的权重参数。各个光线评价等级的质量预测分也可以通过预测概率的方式得到。
比如,光线评价等级可以包括光线极差、光线中等、光线良好三个维度,预测得到的这三个光线评价等级的质量预测分分别为:0.8、0.1、0.1,如果这三个光线评价等级的权值分别为:1、2、3,则可以得到 该待处理人体图像在多个光线评价等级上的质量预测分计算光线分数Q
light=0.8*1+0.1*2+0.1*3=1.3,“*”代表乘法运算。其中,光线分数越高,表示该待处理人体图像的光线维度的图像质量越高(光线明亮程度越高),光线分数越低,表示该待处理人体图像的光线维度的图像质量越低(光线明亮程度越低)。
本申请实施例可以根据待处理人体图像在多个光线评价等级上的质量预测分计算光线分数,将图像特征转换为衡量图像的光线明亮程度的质量得分,可以根据光线分数准确量化图像的光线明亮程度。
可选的,步骤(12)中,电子设备利用所述半身识别网络根据所述图像特征计算半身分数,具体包括如下步骤:
(1231)电子设备利用所述半身识别网络将所述图像特征进行线性映射并归一化处理,预测所述待处理人体图像在多个半身评价等级上的质量预测分;
(1232)电子设备根据所述待处理人体图像在所述多个半身评价等级上的质量预测分计算所述半身分数。
本申请实施例中,半身识别网络可以通过全连接层来实现,通过全连接层将图像特征进行线性映射并归一化处理,转化为半身识别任务的各个半身评价等级的质量预测分。半身评价等级可以包括只有头肩、只有头加上半身、只有小腿及以下缺失、完整人体、只有下半身、只有头部缺失这六个等级。
电子设备可以通过加权计算的方式根据待处理人体图像在多个半身评价等级上的质量预测分计算半身分数。举例来说,可以参见如下公式计算半身分数。
其中,Q
half表示半身分数,p
i-half表示各个半身评价等级的质量预测分,w
i-half表示各个半身评价等级对应的权重值。w
i-half可以是训练得到的权重参数,也可以是预先设定的权重参数。各个半身评价等级的质量预测分也可以通过预测概率的方式得到。
比如,半身评价等级可以包括只有头肩、只有下半身、只有头加上半身、只有小腿及以下缺失、只有头部缺失、完整人体这六个维度,预测得到的这六个半身评价等级的质量预测分分别为:0、0.8、0.1、0.1、0、0,如果这三个半身评价等级的权值分别为:1、2、3、4、5、6,则可以得到该待处理人体图像在多个半身评价等级上的质量预测分计算半身分数Q
half=0*1+0.8*2+0.1*3+0.1*4+0*5+0*6=2.3,“*”代表乘法运算。其中,半身分数越高,表示该待处理人体图像的半身维度的图像质量越高(半身程度越高,人体图像缺失的部位越少,越接近全身),半身分数越低,表示该待处理人体图像的半身维度的图像质量越低(半身程度越低,人体图像缺失的部位越多)。
本申请实施例可以根据待处理人体图像在多个半身评价等级上的质量预测分计算半身分数,将图像特征转换为衡量图像的半身程度的质量得分,可以根据半身分数准确量化图像的半身程度。
可选的,步骤(13)可以包括如下步骤:
电子设备通过加权求和的方式基于所述模糊分数、所述光线分数和所述半身分数计算所述质量分数。
本申请实施例中,可以通过如下公式计算待处理人体图像的质量分数:
Q
total=w
b*Q
blur+w
l*Q
light+w
h*Q
half;
其中,Q
total表示该待处理人体图像的质量分数,Q
blur表示模糊分数,Q
light表示光线分数,Q
half表示半身分数,w
b表示所述模糊分数的权重值,w
l表示所述光线分数的权重值,w
h表示所述半身分数的权重值。w
b、w
l、w
h可以是训练得到的权重参数,也可以是预先设定的权重参数。
由于人体图像的模糊、光线差、半身会对行人重识别带来较大的干扰,本申请实施例的待处理人体图像的质量分数是基于模糊分数、光线分数和半身分数计算得到的,可以准确的计算待处理人体图像的质量分数,从而根据质量分数准确的衡量该待处理人体图像是否符合行人重识别的要求。
103,电子设备根据质量分数进行图像筛选。
本申请实施例中,可以根据质量分数的大小来进行图像筛选。一般而言,质量分数越高,待处理人体图像的图像质量越高。
基于行人重识别的图像筛选而言,在待处理人体图像的质量分数大于第一阈值的情况下,认为该待处理人体图像符合行人重识别的要求。可以将质量分数大于第一阈值的人体图像筛选出来,对该人体图像进行行人重识别,将质量分数小于第一阈值的人体图像过滤掉,从而准确的筛选符合行人重识别的要求的图像,降低行人重识别算法出现误识别的概率。
本申请实施例中,第一阈值可以预先进行设定,第一阈值可以存储在电子设备的存储器(比如,非易失性存储器)中。
电子设备确定待处理人体图像符合行人重识别的要求后,可以将该待处理人体图像输入行人重识别网络,进行行人重识别。
可选的,图1所示的方法还可以包括如下步骤:
在所述质量分数小于第一阈值的情况下,电子设备确定所述待处理人体图像不符合行人重识别的要求。对于不符合行人重识别的要求的待处理人体图像,则可以直接过滤,不会将其输入行人重识别网络,降低对行人重识别造成干扰的概率。
其中,对于符合行人重识别的要求的待处理人体图像,可以对该待处理人体图像进行行人重识别。
图1所示的方法可以应用在视频流或图片流的软件开发工具包(software development kit,SDK)中,在该SDK中加载网络模型,可以根据输入的待处理人体图像计算该图像的质量分数,也可以通过阈值法对低质量的人体图像进行过滤。
图1所示的方法可以应用在如下场景:(1)在处理视频流或图片流数据时,对于检测模型检测到的人 体图片进行特征提取前,需要对人体图片进行质量过滤的场景;(2)在处理视频流数据时,对于检测模型检测到的人体图片进行跟踪或选帧(选择高质量人体)前,需要获取人体图片质量分数的场景;(3)在进行人脸人体联合聚类、人体行为聚类、人体聚类、实时聚类等任务时,需要用到人体质量分数的场景。
本申请实施例的网络模型考虑了模糊识别、光线识别和半身识别,网络模型可以通过图像的模糊程度、光线程度和半身程度来计算待处理人体图像的质量分数,可以通过阈值法有效的过滤模糊的人体图像、光线差的人体图像和半身人体图像,从而准确的筛选符合行人重识别的要求的图像,降低行人重识别算法出现误识别的概率。
请参阅图3,图3是本申请实施例提供的一种网络模型的训练流程示意图。如图3所示,该网络模型的训练流程可以包括如下步骤。
301,电子设备获取训练数据,训练数据包括训练图像和对应的标注结果。
本申请实施例中,网络模型可以采用有监督的方式进行训练。训练图像可以是随机抓拍的包含人体的图像,可以是来自视频流或者图片流的人体检测结果图或人体抓拍图。图1和图2所示的网络模型可以通过图3所示的方法进行训练。
本申请实施例的训练图像可以是符合网络模型的多任务学习的需求的图像。该多任务可以包括模糊识别任务、光线识别任务和半身识别任务。该训练图像可以是能够反映第一任务的需求的特征图像。第一任务为多任务中的任一个任务。多任务可以包括模糊识别任务、光线识别任务和半身识别任务。标注结果可以是以标签的形式存在,也可以是以分数的形式存在。
比如,若第一任务为模糊识别任务,则对应的训练图像可以显著反映是否模糊的图像(比如,清晰的人体图像,或者极度模糊的人体图像)。
需要说明的是,符合第一任务需求的训练数据,也有可能符合其他任务的需求。比如,训练图像不仅是模糊的图片,而且光线较差,并且只包含人的下半身,则该训练图像不仅符合模糊识别任务的需求,而且符合光线识别任务和半身识别任务的需求。
标注结果可以对该训练图像进行标注,如果标注结果以标签的形式存在,若第一任务是模糊识别任务,则对应的标注结果可以是“极其模糊”、“中等模糊”、“清晰”中的任一种。若第一任务是光线识别任务,则对应的标注结果可以是“光线极差”、“光线中等”、“光线良好”中的任一种。若第一任务是半身识别任务,则对应的标注结果可以是“只有头肩”、“只有头加上半身”、“只有小腿及以下缺失”、“完整人体”、“只有下半身”、“只有头部缺失”中的任一种。
标注结果可以是人工标注,也可以通过软件算法进行标注。
可选的,步骤301中,电子设备获取训练数据,可以包括如下步骤:电子设备获取训练图像,通过图像质量评估算法生成与所述训练图像对应的标注结果。
本申请实施例中,图像质量评估算法可以从图像质量评估(image quality assessment,IQA)出发进行研究,主要考虑图像的曝光、清晰度、颜色、质感、抖动、对焦、伪影、噪声等。本申请实施例的图像质量评估算法对人体图像的质量的评估主要从模糊程度、光线强弱、半身程度等维度进行质量研究。
训练图像可以从拍摄的视频流和图片流中获取,也可以对清晰的人体图像、光线良好的人体图像、全身人体图像进行相应的图像处理得到。由于对人体质量进行了重新定义(模糊程度、光线强弱、半身程度),训练数据生成更为简单,可以较为轻松地获得更多的自动生成的训练数据(比如,可以对清晰的人体图像进行模糊化处理,可以对光线良好的人体图像进行光线过曝或者过暗处理,可以对全身人体图像进行半身截取等),并且标注成本低。
302,电子设备对训练图像进行预处理,将预处理后的训练图像输入网络模型进行多任务学习的训练,得到训练结果。
本申请实施例中,预处理可以是将训练图像进行尺寸缩放到相同的大小,对训练图像的尺寸进行归一化处理,使得模型无需适配不同尺寸的图像,可以方便模型进行训练。
训练结果可以是某一个标签的概率。比如,对于模糊识别任务,则训练结果可以是“极其模糊”、“中等模糊”、“清晰”这三者的概率。
303,电子设备根据训练结果和标注结果对网络模型的多任务的模型参数进行优化。
本申请实施例中,电子设备可以采用损失函数评估网络模型的第一任务的模型参数的拟合效果,进而对第一任务的模型参数进行优化。比如,损失函数可以通过自适应梯度下降方法对模型参数进行优化。
若第一任务是模糊识别任务,则网络模型的第一任务的模型参数可以包括特征提取网络和模糊识别网络的模型参数。若第一任务是光线识别任务,则网络模型的第一任务的模型参数可以包括特征提取网络和光线识别网络的模型参数。若第一任务是半身识别任务,则网络模型的第一任务的模型参数可以包括特征提取网络和半身识别网络的模型参数。
其中,步骤301至步骤303的电子设备与步骤101至步骤103的电子设备可以是不同的设备,也可以是相同的设备。
本申请实施例中,电子设备根据训练结果和标注结果对网络模型的第一任务的模型参数进行优化后,可以得到训练好的网络模型。训练好的网络模型可以直接部署在该电子设备上,也可以将该训练好的网络模型进行模型转化后部署在其他电子设备上。比如,训练好的网络模型可以是Pytorch模型,可以将Pytorch模型转化为Kestrel模型后部署在其他电子设备上。
可选的,步骤301中,电子设备获取训练数据,包括:
电子设备获取人工标注的第一人体图像;其中,所述第一人体图像的标注用于指示光线评价等级。
本申请实施例中,对于光线识别任务的训练数据,可以通过人工标注的方式进行获取。
可选的,步骤301中,电子设备获取训练数据,包括:
电子设备获取经过不同程度高斯模糊处理的第二人体图像;其中,所述经过不同程度高斯模糊处理的第二人体图像分别被标注为不同的模糊评价等级。
本申请实施例中,对于模糊识别任务的训练数据,可以通过高斯模糊处理的方式对人体图像进行处理后得到。比如,对于高斯模糊处理程度较高的图片,其模糊评价等级为极其模糊,对于高斯模糊处理程度一般的图片,其模糊评价等级为中等模糊,对于高斯模糊处理程度较低或者未经过高斯模糊处理的清晰图片,其模糊评价等级为清晰。
可选的,步骤301中,电子设备获取训练数据,包括:
电子设备获取裁剪后的第三人体图像;其中,所述裁剪后的第三人体图像由所述第三人体图像输入人体划线模型得到划线结果后,基于所述划线结果进行不同程度的裁剪得到,经过不同程度的裁剪的所述第三人体图像分别被标注为不同的半身评价等级。
本申请实施例中,对于半身识别任务的训练数据,可以通过图像裁剪的方式对人体图像进行处理后得到。可以将包含完整人体图像的部分区域进行裁剪,得到不同半身评价等级的图片。举例来说,半身评价等级可以包括只有头肩、只有头加上半身、只有小腿及以下缺失、完整人体、只有下半身、只有头部缺失这六个等级。
本申请实施例通过多任务学习的训练方式,可以把人体图像质量量化,转化为深度学习中的多任务和多分类(比如,模糊识别任务对应的多分类包括:上述多个模糊评价等级;光线识别任务对应的多分类包括:上述多个光线评价等级;半身识别任务对应的多分类包括:上述多个半身评价等级),并对多分类结果进行自动融合,得到人体图片的质量分数,框架简单有效,解释性强,采用多任务学习,方便进行扩展。
上述主要从方法侧执行过程的角度对本申请实施例的方案进行了介绍。可以理解的是,电子设备为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所提供的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对电子设备进行功能单元的划分,例如,可以对应各个功能划分各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
与上述一致的,请参阅图4,图4为本申请实施例提供的一种图像筛选装置的结构示意图,该图像筛 选装置400应用于电子设备,该图像筛选装置400可以包括获取单元401、图像处理单元402和图像评估单元403,其中:
获取单元401,用于获取待处理人体图像;
图像处理单元402,用于利用网络模型对所述待处理人体图像进行处理,得到所述待处理人体图像的质量分数;所述网络模型用于提取所述待处理人体图像的图像特征,根据所述图像特征,确定所述待处理人体图像的用于表示模糊程度的模糊分数、用于表示光线明亮程度的光线分数以及用于表示半身人体的完整程度的半身分数,根据所述模糊分数、所述光线分数和所述半身分数确定所述待处理人体图像的质量分数;图像筛选单元403,用于根据所述质量分数进行图像筛选。
可选的,所述网络模型包括特征提取网络、模糊识别网络、光线识别网络和半身识别网络;所述图像处理单元402利用网络模型对所述待处理人体图像进行处理,得到所述待处理人体图像的质量分数,包括:利用所述特征提取网络从所述待处理人体图像提取图像特征;利用所述模糊识别网络根据所述图像特征计算所述模糊分数,利用所述光线识别网络根据所述图像特征计算所述光线分数,利用所述半身识别网络根据所述图像特征计算所述半身分数;基于所述模糊分数、所述光线分数和所述半身分数计算所述质量分数。
可选的,所述图像处理单元402利用所述模糊识别网络根据所述图像特征计算所述模糊分数,包括:利用所述模糊识别网络将所述图像特征进行线性映射并归一化处理,预测所述待处理人体图像在多个模糊评价等级上的质量预测分;根据所述待处理人体图像在所述多个模糊评价等级上的质量预测分计算所述模糊分数。
可选的,所述图像处理单元402根据所述待处理人体图像在所述多个模糊评价等级上的质量预测分计算所述模糊分数,包括:
通过如下公式计算模糊分数:
其中,Q
blur表示模糊分数,P
i-blur表示模糊评价等级的质量预测分,w
i-blur表示模糊评价等级对应的权重值;w
i-blur的分布与模糊评价等级对应的图像质量高低正相关。其中,k为大于或等于1的整数。
可选的,所述图像处理单元402所述利用所述光线识别网络根据所述图像特征计算所述光线分数,包括:利用所述光线识别网络将所述图像特征进行线性映射并归一化处理,预测所述待处理人体图像在多个光线评价等级上的质量预测分;根据所述待处理人体图像在所述多个光线评价等级上的质量预测分计算所述光线分数。
可选的,所述图像处理单元402根据所述待处理人体图像在所述多个光线评价等级上的质量预测分计算所述光线分数,包括:
其中,Q
light表示光线分数,P
i-light表示光线评价等级的质量预测分,w
i-ligh表示光线评价等级对应的权重值;w
i-light的分布与光线评价等级对应的图像质量高低正相关。其中,k为大于或等于1的整数。
可选的,所述图像处理单元402利用所述半身识别网络根据所述图像特征计算所述半身分数,包括:利用所述半身识别网络将所述图像特征进行线性映射并归一化处理,预测所述待处理人体图像在多个半身评价等级上的质量预测分;根据所述待处理人体图像在所述多个半身评价等级上的质量预测分计算所述半身分数。
可选的,所述图像处理单元402根据所述待处理人体图像在所述多个半身评价等级上的质量预测分计算所述半身分数,包括:
通过如下公式计算半身分数:
其中,Q
half表示半身分数,P
i-half表示半身评价等级的质量预测分,w
i-half表示半身评价等级对应的权重值;w
i-half与半身评价等级对应的图像质量高低正相关。其中,k为大于或等于1的整数。
可选的,所述图像处理单元402基于所述模糊分数、所述光线分数和所述半身分数计算所述质量分数,包括:通过加权求和公式基于所述模糊分数、所述光线分数和所述半身分数计算所述质量分数;
所述加权求和公式包括:
Q
total=w
b*Q
blur+w
l*Q
light+w
h*Q
half;
Q
total表示所述待处理人体图像的质量分数,Q
blur表示模糊分数,Q
light表示光线分数,Q
half表示半身分数,w
b表示所述模糊分数的权重值,w
l表示所述光线分数的权重值,w
h表示所述半身分数的权重值。
可选的,所述特征提取网络包括轻量级神经网络。
可选的,该图像筛选装置400还可以包括训练单元404;
所述训练单元404,用于获取训练数据,所述训练数据包括训练图像和对应的标注结果;对所述训练图像进行预处理,将预处理后的训练图像输入所述网络模型进行多任务学习的训练,得到训练结果;根据所述训练结果和所述标注结果对所述网络模型的多任务的模型参数进行优化。
可选的,所述训练单元404获取训练数据,包括:获取训练图像,通过图像质量评估算法生成与所述训练图像对应的标注结果。
可选的,所述训练单元404获取训练数据,包括:获取人工标注的第一人体图像;其中,所述第一人体图像的标注用于指示光线评价等级。
可选的,所述训练单元404获取训练数据,包括:获取经过不同程度高斯模糊处理的第二人体图像; 其中,所述经过不同程度高斯模糊处理的第二人体图像分别被标注为不同的模糊评价等级。
可选的,所述训练单元404获取训练数据,包括:获取裁剪后的第三人体图像;其中,所述裁剪后的第三人体图像由所述第三人体图像输入人体划线模型得到划线结果后,基于所述划线结果进行不同程度的裁剪得到,经过不同程度的裁剪的所述第三人体图像分别被标注为不同的半身评价等级。
可选的,该图像筛选装置400还可以确定单元(图4未示出);
确定单元,用于在所述质量分数大于第一阈值的情况下,确定所述待处理人体图像符合行人重识别的要求;
所述确定单元,还用于在所述质量分数小于第一阈值的情况下,确定所述待处理人体图像不符合行人重识别的要求。
可选的,该图像筛选装置400还可以包括行人重识别单元(图4未示出);
行人重识别单元,用于在所述图像评估单元403确定所述待处理人体图像符合行人重识别的要求之后,对所述待处理人体图像进行行人重识别。
其中,本申请实施例中的获取单元401、图像处理单元402、图像评估单元403、训练单元404、确定单元和行人重识别单元可以对应电子设备中的处理器。
本申请实施例中,可以利用网络模型对待处理人体图像进行处理,得到待处理人体图像的综合质量分数;由于网络模型可以提取待处理人体图像的图像特征,根据提取的图像特征确定待处理人体图像的用于表示模糊程度的模糊分数、用于表示光线明亮程度的光线分数以及用于表示半身人体的完整程度的半身分数,根据所述模糊分数、所述光线分数和所述半身分数确定所述待处理人体图像的质量分数。该网络模型考虑了模糊识别、光线识别和半身识别,可以有效的过滤模糊的人体图像、光线差的人体图像和半身人体图像,从而准确的筛选符合行人重识别的要求的图像,降低行人重识别算法出现误识别的概率。
请参阅图5,图5是本申请实施例提供的一种电子设备的结构示意图,如图5所示,该电子设备500包括处理器501和存储器502,处理器501、存储器502可以通过通信总线503相互连接。通信总线503可以是外设部件互连标准(Peripheral Component Interconnect,简称PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,简称EISA)总线等。通信总线503可以分为地址总线、数据总线、控制总线等。为便于表示,图5中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。存储器502用于存储计算机程序,计算机程序包括程序指令,处理器501被配置用于调用程序指令,上述程序包括用于执行图1或图3所示的方法。
处理器501可以是通用中央处理器(CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制以上方案程序执行的集成电路。
存储器502可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静 态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。
此外,该电子设备500还可以包括通信接口、天线等通用部件,在此不再详述。
本申请实施例中,可以利用网络模型对待处理人体图像进行处理,得到待处理人体图像的综合质量分数;由于网络模型可以提取待处理人体图像的图像特征,根据提取的图像特征确定待处理人体图像的用于表示模糊程度的模糊分数、用于表示光线明亮程度的光线分数以及用于表示半身人体的完整程度的半身分数,根据所述模糊分数、所述光线分数和所述半身分数确定所述待处理人体图像的质量分数。该网络模型考虑了模糊识别、光线识别和半身识别,可以有效的过滤模糊的人体图像、光线差的人体图像和半身人体图像,从而准确的筛选符合行人重识别的要求的图像,降低行人重识别算法出现误识别的概率。
本申请实施例还提供一种计算机可读存储介质,其中,该计算机可读存储介质存储用于电子数据交换的计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任何一种图像筛选方法的部分或全部步骤。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在申请明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。
所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器、随机存取器、磁盘或光盘等。
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。
Claims (18)
- 一种图像筛选方法,其特征在于,包括:获取待处理人体图像;利用网络模型对所述待处理人体图像进行处理,得到所述待处理人体图像的质量分数;所述网络模型用于提取所述待处理人体图像的图像特征,根据所述图像特征,确定所述待处理人体图像的用于表示模糊程度的模糊分数、用于表示光线明亮程度的光线分数以及用于表示半身人体的完整程度的半身分数,根据所述模糊分数、所述光线分数和所述半身分数确定所述待处理人体图像的质量分数;根据所述质量分数进行图像筛选。
- 根据权利要求1所述的方法,其特征在于,所述网络模型包括特征提取网络、模糊识别网络、光线识别网络和半身识别网络;所述利用网络模型对所述待处理人体图像进行处理,得到所述待处理人体图像的质量分数,包括:利用所述特征提取网络从所述待处理人体图像提取图像特征;利用所述模糊识别网络根据所述图像特征计算所述模糊分数,利用所述光线识别网络根据所述图像特征计算所述光线分数,利用所述半身识别网络根据所述图像特征计算所述半身分数;基于所述模糊分数、所述光线分数和所述半身分数计算所述质量分数。
- 根据权利要求2所述的方法,其特征在于,所述利用所述模糊识别网络根据所述图像特征计算所述模糊分数,包括:利用所述模糊识别网络将所述图像特征进行线性映射并归一化处理,预测所述待处理人体图像在多个模糊评价等级上的质量预测分;根据所述待处理人体图像在所述多个模糊评价等级上的质量预测分计算所述模糊分数。
- 根据权利要求2所述的方法,其特征在于,所述利用所述光线识别网络根据所述图像特征计算所述光线分数,包括:利用所述光线识别网络将所述图像特征进行线性映射并归一化处理,预测所述待处理人体图像在多个 光线评价等级上的质量预测分;根据所述待处理人体图像在所述多个光线评价等级上的质量预测分计算所述光线分数。
- 根据权利要求2所述的方法,其特征在于,所述利用所述半身识别网络根据所述图像特征计算所述半身分数,包括:利用所述半身识别网络将所述图像特征进行线性映射并归一化处理,预测所述待处理人体图像在多个半身评价等级上的质量预测分;根据所述待处理人体图像在所述多个半身评价等级上的质量预测分计算所述半身分数。
- 根据权利要求2~8任一项所述的方法,其特征在于,所述基于所述模糊分数、所述光线分数和所述半身分数计算所述质量分数,包括:通过加权求和公式基于所述模糊分数、所述光线分数和所述半身分数计算所述质量分数;所述加权求和公式包括:Q total=w b*Q blur+w l*Q light+w h*Q half;Q total表示所述待处理人体图像的质量分数,Q blur表示模糊分数,Q light表示光线分数,Q half表示半身分数,w b表示所述模糊分数的权重值,w l表示所述光线分数的权重值,w h表示所述半身分数的权重值。
- 根据权利要求2~8任一项所述的方法,其特征在于,所述特征提取网络包括轻量级神经网络。
- 根据权利要求1~10任一项所述的方法,其特征在于,所述方法还包括:获取训练数据,所述训练数据包括训练图像和对应的标注结果;对所述训练图像进行预处理,将预处理后的训练图像输入所述网络模型进行多任务学习的训练,得到 训练结果;根据所述训练结果和所述标注结果对所述网络模型的多任务的模型参数进行优化。
- 根据权利要求11所述的方法,其特征在于,所述获取训练数据,包括:获取人工标注的第一人体图像;其中,所述第一人体图像的标注用于指示光线评价等级。
- 根据权利要求11所述的方法,其特征在于,所述获取训练数据,包括:获取经过不同程度高斯模糊处理的第二人体图像;其中,所述经过不同程度高斯模糊处理的第二人体图像分别被标注为不同的模糊评价等级。
- 根据权利要求11所述的方法,其特征在于,所述获取训练数据,包括:获取裁剪后的第三人体图像;其中,所述裁剪后的第三人体图像由所述第三人体图像输入人体划线模型得到划线结果后,基于所述划线结果进行不同程度的裁剪得到,经过不同程度的裁剪的所述第三人体图像分别被标注为不同的半身评价等级。
- 一种图像筛选装置,其特征在于,包括:获取单元,用于获取待处理人体图像;图像处理单元,用于利用网络模型对所述待处理人体图像进行处理,得到所述待处理人体图像的质量分数;所述网络模型用于提取所述待处理人体图像的图像特征,根据所述图像特征,确定所述待处理人体图像的用于表示模糊程度的模糊分数、用于表示光线明亮程度的光线分数以及用于表示半身人体的完整程度的半身分数,根据所述模糊分数、所述光线分数和所述半身分数确定所述待处理人体图像的质量分数;图像筛选单元,用于根据所述质量分数进行图像筛选。
- 一种电子设备,其特征在于,包括处理器和存储器,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行如权利要求1~14任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行如权利要求1~14任一项所述的方法。
- 一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行用于实现权利要求1~14任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111415866.5 | 2021-11-25 | ||
CN202111415866.5A CN114066877A (zh) | 2021-11-25 | 2021-11-25 | 图像筛选方法及装置、电子设备和存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023093151A1 true WO2023093151A1 (zh) | 2023-06-01 |
Family
ID=80276452
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/114469 WO2023093151A1 (zh) | 2021-11-25 | 2022-08-24 | 图像筛选方法及装置、电子设备和存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114066877A (zh) |
WO (1) | WO2023093151A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117097989A (zh) * | 2023-10-18 | 2023-11-21 | 北京道仪数慧科技有限公司 | 一种图像优选的处理方法和装置 |
CN118154571A (zh) * | 2024-04-01 | 2024-06-07 | 深圳大学 | 一种ai图像的图像质量评价方法、系统、设备及介质 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114066877A (zh) * | 2021-11-25 | 2022-02-18 | 深圳市商汤科技有限公司 | 图像筛选方法及装置、电子设备和存储介质 |
CN116052220B (zh) * | 2023-02-07 | 2023-11-24 | 北京多维视通技术有限公司 | 行人重识别方法、装置、设备及介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108961209A (zh) * | 2018-05-22 | 2018-12-07 | 深圳市商汤科技有限公司 | 行人图像质量评价方法、电子设备及计算机可读介质 |
CN112733814A (zh) * | 2021-03-30 | 2021-04-30 | 上海闪马智能科技有限公司 | 一种基于深度学习的行人徘徊滞留检测方法、系统及介质 |
CN112800923A (zh) * | 2021-01-22 | 2021-05-14 | 北京市商汤科技开发有限公司 | 人体图像质量检测方法及装置、电子设备、存储介质 |
CN114066877A (zh) * | 2021-11-25 | 2022-02-18 | 深圳市商汤科技有限公司 | 图像筛选方法及装置、电子设备和存储介质 |
-
2021
- 2021-11-25 CN CN202111415866.5A patent/CN114066877A/zh active Pending
-
2022
- 2022-08-24 WO PCT/CN2022/114469 patent/WO2023093151A1/zh unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108961209A (zh) * | 2018-05-22 | 2018-12-07 | 深圳市商汤科技有限公司 | 行人图像质量评价方法、电子设备及计算机可读介质 |
CN112800923A (zh) * | 2021-01-22 | 2021-05-14 | 北京市商汤科技开发有限公司 | 人体图像质量检测方法及装置、电子设备、存储介质 |
CN112733814A (zh) * | 2021-03-30 | 2021-04-30 | 上海闪马智能科技有限公司 | 一种基于深度学习的行人徘徊滞留检测方法、系统及介质 |
CN114066877A (zh) * | 2021-11-25 | 2022-02-18 | 深圳市商汤科技有限公司 | 图像筛选方法及装置、电子设备和存储介质 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117097989A (zh) * | 2023-10-18 | 2023-11-21 | 北京道仪数慧科技有限公司 | 一种图像优选的处理方法和装置 |
CN117097989B (zh) * | 2023-10-18 | 2024-01-02 | 北京道仪数慧科技有限公司 | 一种图像优选的处理方法和装置 |
CN118154571A (zh) * | 2024-04-01 | 2024-06-07 | 深圳大学 | 一种ai图像的图像质量评价方法、系统、设备及介质 |
Also Published As
Publication number | Publication date |
---|---|
CN114066877A (zh) | 2022-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023093151A1 (zh) | 图像筛选方法及装置、电子设备和存储介质 | |
US8805018B2 (en) | Method of detecting facial attributes | |
CN110580428A (zh) | 图像处理方法、装置、计算机可读存储介质和电子设备 | |
WO2019033569A1 (zh) | 眼球动作分析方法、装置及存储介质 | |
KR20180109665A (ko) | 객체 검출을 위한 영상 처리 방법 및 장치 | |
CN103617432A (zh) | 一种场景识别方法及装置 | |
CN111160202A (zh) | 基于ar设备的身份核验方法、装置、设备及存储介质 | |
CN109815823B (zh) | 数据处理方法及相关产品 | |
CN113490947A (zh) | 检测模型训练方法、装置、检测模型使用方法及存储介质 | |
CN111935479A (zh) | 一种目标图像确定方法、装置、计算机设备及存储介质 | |
CN113688820B (zh) | 频闪条带信息识别方法、装置和电子设备 | |
CN115424171A (zh) | 火焰和烟雾检测方法、装置和存储介质 | |
CN112287802A (zh) | 人脸图像检测方法、系统、存储介质及设备 | |
CN113781510A (zh) | 边缘检测方法、装置及电子设备 | |
CN117392733B (zh) | 一种痤疮分级检测方法、装置、电子设备及存储介质 | |
CN109977875A (zh) | 基于深度学习的手势识别方法及设备 | |
CN113936302A (zh) | 行人重识别模型的训练方法、装置、计算设备及存储介质 | |
WO2024174726A1 (zh) | 基于深度学习的手写及打印文本检测方法和装置 | |
CN109064444B (zh) | 基于显著性分析的轨道板病害检测方法 | |
CN114332993A (zh) | 人脸识别方法、装置、电子设备及计算机可读存储介质 | |
CN113052234A (zh) | 一种基于图像特征和深度学习技术的玉石分类方法 | |
CN116703925B (zh) | 一种轴承缺陷检测方法、装置、电子设备及存储介质 | |
CN113362221A (zh) | 用于门禁的人脸识别系统与人脸识别方法 | |
CN117351184A (zh) | 一种定位方法、装置、设备及存储介质 | |
CN115601712B (zh) | 适用于现场安全措施的图像数据处理方法及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22897246 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |