WO2023246921A1 - Target attribute recognition method and apparatus, and model training method and apparatus - Google Patents

Target attribute recognition method and apparatus, and model training method and apparatus Download PDF

Info

Publication number
WO2023246921A1
WO2023246921A1 PCT/CN2023/101952 CN2023101952W WO2023246921A1 WO 2023246921 A1 WO2023246921 A1 WO 2023246921A1 CN 2023101952 W CN2023101952 W CN 2023101952W WO 2023246921 A1 WO2023246921 A1 WO 2023246921A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
image
mask
attribute recognition
attributes
Prior art date
Application number
PCT/CN2023/101952
Other languages
French (fr)
Chinese (zh)
Inventor
刘宪彬
安占福
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Publication of WO2023246921A1 publication Critical patent/WO2023246921A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Definitions

  • the present application relates to the field of computer vision, and in particular to a target attribute recognition method, model training method and device.
  • This application provides a target attribute recognition method, training method and device.
  • a target attribute identification method which specifically includes:
  • the target mask use the target attribute recognition model to perform a mask operation on the image to be recognized and obtain a target mask image
  • the target attribute recognition model is used to perform target attribute recognition, and attributes of the target of the image to be recognized are output, where the attributes include multi-label attributes of the target.
  • a preset target attribute recognition model to perform target recognition on the received image to be recognized, and outputting the target mask further includes:
  • a second feature map is obtained by pixel space alignment based on a segmentation algorithm
  • the target attribute recognition model is used to perform area detection on the second feature map and output a target mask.
  • the target attribute recognition model includes a feature extraction network, a first feature map pyramid network and a region generation network;
  • the use of the target attribute recognition model to extract features from the image to be recognized and output the first feature map further includes:
  • the method of using the target attribute recognition model to perform region detection on the first feature map and outputting a plurality of region filtering frames further includes: using the region generation network to perform region detection on the first feature map according to a preset anchor frame. Region detection and output of multiple region filter boxes.
  • the target attribute recognition model includes a mask prediction branch, a regression prediction branch and a classification prediction branch;
  • Using the target attribute recognition model to perform region detection on the second feature map and outputting a target mask further includes:
  • the step of using the target attribute recognition model to perform a masking operation on the image to be recognized and obtaining the target mask image according to the target mask further includes: combining the target mask with the image to be recognized. Perform multiplication operations and obtain the target mask image;
  • the step of using the target attribute recognition model to perform target attribute recognition according to the target mask image, and outputting the attributes of the target in the image to be recognized further includes: using the corresponding attributes in the target attribute recognition model according to the target classification.
  • the recognition model performs target attribute recognition on the target mask image and outputs the attributes of the target in the image to be recognized.
  • the attribute recognition model is a multi-task multi-label classification model.
  • using the target attribute recognition model to perform a masking operation on the image to be recognized and obtaining the target mask image further includes: combining the output target frame with the image to be recognized. Perform a multiplication operation to obtain the target frame mask image, perform a multiplication operation on the target mask and the target frame mask image, and obtain the target mask image;
  • the step of using the target attribute recognition model to perform target attribute recognition according to the target mask image, and outputting the attributes of the target in the image to be recognized further includes: using the corresponding attributes in the target attribute recognition model according to the target classification.
  • the recognition model performs target attribute recognition on the target mask image and outputs the attributes of the target in the image to be recognized.
  • the attribute recognition model is a multi-task multi-label classification model.
  • the feature extraction network is one of a VGG network, a googlenet network, a resnet network, and a resnext network.
  • the pedestrian mask use the pedestrian attribute recognition model to perform a mask operation on the image to be recognized and obtain a pedestrian mask image
  • the pedestrian attribute recognition model is used to perform pedestrian attribute recognition, and the attributes of the pedestrian in the image to be recognized are output, where the attributes include multi-label attributes of the pedestrian.
  • the multi-label attributes include at least three of gender attributes, headgear attributes, hairstyle attributes, clothing attributes, clothing color attributes, accessories attributes, occlusion attributes, truncation attributes and orientation attributes.
  • a model training method including:
  • the target attribute recognition model includes a mask prediction branch, a regression prediction branch and a classification prediction branch, as well as a multi-label classification loss function,
  • the mask prediction branch, regression prediction branch and branch are calculated through the preset loss function and adjust the model parameters;
  • the model parameters of the target attribute recognition model are adjusted through the multi-label classification loss function.
  • a target attribute identification device including:
  • a target mask acquisition unit used to perform target recognition on the received image to be recognized, and output a target mask, where the target mask is obtained by pixel space alignment based on a segmentation algorithm;
  • a target mask image acquisition unit configured to perform a masking operation on the image to be recognized according to the target mask and acquire the target mask image
  • a target attribute recognition unit is configured to perform target attribute recognition on the target mask image, and output attributes of the target in the image to be recognized, where the attributes include multi-label attributes of the target.
  • a pedestrian attribute recognition device including:
  • a pedestrian mask acquisition unit is used to perform pedestrian recognition on the received image to be recognized, and output a pedestrian mask, which is obtained by pixel space alignment based on a segmentation algorithm;
  • a pedestrian mask image acquisition unit configured to perform a masking operation on the image to be recognized according to the pedestrian mask and obtain a pedestrian mask image
  • a pedestrian attribute recognition unit is configured to perform pedestrian attribute recognition on the pedestrian mask image and output attributes of the pedestrian in the image to be recognized, where the attributes include multi-label attributes of the pedestrian.
  • a model training device including:
  • the labeling unit is used to obtain multiple sample recognition images and label the targets of each sample recognition image according to the pixel space alignment;
  • the training unit is used to perform target recognition training on the target attribute recognition model using multiple labeled sample recognition images.
  • a computer-readable storage medium is provided with a computer program stored thereon
  • the program when executed by the processor, implements a method as described in one aspect
  • the program when executed by the processor implements a method as described in another aspect
  • the program when executed by the processor, implements a method as described in yet another aspect.
  • a computer device including a memory, a processor, and a computer program stored in the memory and executable on the processor,
  • Figure 1 shows a flow chart of a target attribute identification method according to an embodiment of the present application
  • Figure 2 shows a block diagram of a target attribute identification method according to another embodiment of the present application
  • Figure 3 shows a schematic diagram of an anchor frame according to an embodiment of the present application
  • Figure 4 shows a schematic diagram of the target mask and the target mask image according to an embodiment of the present application
  • Figure 5 shows a schematic diagram of an image to be recognized and target attributes according to an embodiment of the present application
  • Figure 6 shows a structural diagram of a target attribute identification device according to another embodiment of the present application.
  • Figure 7 shows a structural diagram of a pedestrian attribute recognition device according to another embodiment of the present application.
  • Figure 8 shows a structural diagram of a model training device according to another embodiment of the present application.
  • Figure 9 shows a schematic structural diagram of a computer device according to another embodiment of the present application.
  • the YOLACT algorithm is used to filter pedestrian attribute background information and splice different sizes of Feature maps are used for multi-task network prediction to improve
  • the gradient weight loss function is used to train the model; another example is to use human posture key points to obtain the human body area, combine the extracted detail key points with shallow features, combine the extracted human body area with deep features, and combine the combined data and deep features are respectively input into the regional guidance module to obtain multiple prediction vectors, and the multiple prediction vectors are fused to obtain the final prediction result.
  • the above methods all require additional key point detection. This step requires high computing power of the device and increases the corresponding processing time.
  • one embodiment of the present application provides a target attribute identification method, which is implemented based on a segmentation algorithm.
  • the method includes:
  • the target mask use the target attribute recognition model to perform a mask operation on the image to be recognized and obtain a target mask image
  • the target attribute recognition model is used to perform target attribute recognition, and attributes of the target of the image to be recognized are output, where the attributes include multi-label attributes of the target.
  • the embodiment of the present application uses the target attribute identification method based on the segmentation algorithm, compared to the identification method using additional key points, bypassing the step of processing key points, reducing the performance requirements for the hardware, and shortening the identification time. , and can filter out non-target areas to the greatest extent, and perform attribute recognition through the target mask image, which can avoid environmental interference on attribute recognition, significantly improve the recognition speed and accuracy, and can achieve rapid filtering and assisted search, which greatly It improves work efficiency and has broad application prospects.
  • the image to be recognized 100 is read, target recognition 200 is performed, and a target mask 300 is output.
  • Feature Map which is the result of the input image being convolved by a neural network, and its resolution depends on the step size of the previous convolution kernel.
  • Region detection 230 that is, using the Region Proposal Network (RPN) to extract candidate frames for "region selection” and outputting multiple region filtering frames 240, regional feature matching 250 and outputting the second feature map 260, and performing region selection again.
  • RPN Region Proposal Network
  • an image 100 to be recognized is input into a preset backbone convolutional neural network (Backbone Convolutional Neural Networks, Backbone CNN) that has completed training.
  • the backbone convolutional neural network is mainly used to extract the to-be-recognized image.
  • Feature maps of the image 100 are identified for use by subsequent networks.
  • the feature extraction network is a vgg network, a googlenet network, a resnet network, or a resnext network.
  • Feature extraction is performed on the image to be identified through one of the above feature extraction networks.
  • VGG Visual Geometry Group
  • a deep convolutional neural network is constructed by using a series of small-sized convolution kernels of size 3x3 and pooling layers, which has a simple structure and strong applicability. specialty.
  • the convolution block is called the Inception block.
  • the Inception block is equivalent to a sub-network with 4 paths.
  • Information is extracted in parallel through convolution layers and maximum pooling layers of different window shapes, and uses 1 ⁇ 1
  • the convolutional layer reduces the channel dimension at each pixel level thereby reducing model complexity.
  • the ResNeXt network adopts both the stacking idea of the VGG network and the split-transform-merge idea of the inception block, which has stronger scalability and basically does not change or reduce the complexity of the model while increasing the accuracy.
  • the ResNet network is a residual learning structure proposed to address the problem that deeper neural networks are difficult to train. It increases the depth of the network while reducing the number of parameters, and is widely used in detection, segmentation, recognition and other fields.
  • the feature extraction network adopts ResNet50 network.
  • the ResNet50 network outputs multiple feature maps.
  • This embodiment of the present application uses a feature map pyramid network (Feature Pyramid Network, FPN) to fuse the feature maps output by the last three layers and output the feature map 220.
  • FPN Feature Pyramid Network
  • Feature Pyramid Network is a top-down feature fusion method and a multi-scale target detection algorithm, which uses more than 1 feature prediction layer to combine multiple stages.
  • the feature maps are fused together to extract not only the semantic features of the high-level feature maps, but also the low-level contour features.
  • the ResNet50 network is used to extract the features of the image 100 to be recognized. map, and further use the FPN network to perform feature fusion and form the first feature map 220.
  • the feature maps extracted at various stages through the ResNet50 network can extract not only the semantic features of the high-level feature map, but also the low-level contour features. This solves the problem that smaller objects cannot be detected.
  • the RPN network is input to perform region detection 230, thereby extracting the region filtering frame 240.
  • the first feature map 220 is subjected to a 3 ⁇ 3 convolution operation to obtain a feature map with 256 channels, the size of which is the same as the first feature map 220 .
  • the feature map with a channel number of 256 is regarded as having H ⁇ W vectors, each of which has 256 dimensions. Continue to do this on this vector.
  • Two fully connected operations yield 2 scores and 4 coordinates respectively, which is equivalent to performing two 1 ⁇ 1 convolutions on the feature map with 256 channels, resulting in a 2 ⁇ H ⁇ W and a 4 ⁇ H ⁇ W size feature map.
  • the 2 ⁇ H ⁇ W feature map that is, 2 confidence levels, represents the scores of the foreground and the background, because the PRN network is only responsible for extracting the area filtering frame 240 and does not need to judge the image 100 to be recognized.
  • the category of the item so the confidence of the foreground and background is used to determine whether it is an item;
  • the feature map of 4 ⁇ H ⁇ W size that is, 4 coordinates, represents the offset coordinates (x, y, w,h).
  • the offset coordinates are the coordinates of the image 100 to be recognized. Since the image 100 to be recognized is different from the first feature map 220 in width and height, in order to obtain the image 100 to be recognized, The picture coordinates, introduce the anchor point (Anchor). Specifically include:
  • the scaling ratio between the image 100 to be recognized and the first feature map 220 is 8: 1, then the mapped box is 8 ⁇ 8, set the upper left corner or center point of this box as the anchor point, and generate several anchor boxes (Anchor Box) based on this anchor point according to pre-configured rules.
  • the number of anchor frames is K, that is, each anchor point generates K frames.
  • the first feature map 220 includes H ⁇ W points, each point corresponding to the to-be-identified Image 100 There are K frames, then there are a total of H ⁇ W ⁇ K region filtering frames 240. Through the RPN, it is judged whether these frames are objects and their offset coordinates on the image 100 to be recognized, that is, the Region filter box 240.
  • ROI Pooling region of interest pooling
  • ROI Pooling adopts a downward rounding method, it is easy to cause errors and cannot be guaranteed.
  • the feature layer exactly corresponds to the pixels of the input layer, which cannot meet the requirements of the semantic segmentation task. Therefore, the embodiment of the present application adopts the ROI Align method to cancel the rounding operation and instead use bilinear interpolation to obtain the pixel values of the fixed four point coordinates, thereby making the discontinuous operations continuous and effectively reducing the Error, realize the spatial alignment of the pixels, and complete the regional feature matching 250.
  • the embodiment of the present application uses the ROI Align method to perform regional feature matching 250 on the region filtering frame and output the second feature map 260, that is, based on the segmentation algorithm, pixel space alignment is performed to obtain the second feature map 260; implemented in The precise coordinate pixel value of the object to be detected (foreground object) is identified in the image 100 to be identified.
  • the embodiments of this application take into account the performance and accuracy requirements of the target attribute recognition model.
  • the detection speed can be significantly improved by introducing the RPN network for area detection, and it is easier to combine with other neural networks; on the other hand, by using ROI Align This method achieves the pixel spatial alignment, which can effectively reduce errors.
  • the region detection 270 is performed again and the target mask 300 is obtained. Specifically, it includes inputting the second feature map into three prediction branches respectively.
  • the second feature map 260 is introduced into the classification prediction branch to perform classification prediction and output target classification.
  • a softmax layer is connected.
  • the softmax layer receives an N-dimensional vector as input, and each The value of the dimension is converted into a real number between (0, 1) to map the output of the fully connected layer into a probability distribution, which is specifically used to implement foreground and background classification in the embodiment of the present application.
  • the second feature map 260 is introduced into the regression prediction branch to perform classification prediction and output the target box.
  • a bounding box regression layer (Bounding Box Regression, bbox reg) is connected, and the regression prediction is obtained. More accurate coordinate pixel values, which are the precise coordinates of the object to be detected (foreground object) identified in the image 100 to be recognized.
  • the second feature map 260 is introduced into the mask prediction branch to perform classification prediction and output the target mask, and a fully connected layer is connected after a head layer (Head), and the head layer will
  • the output dimension of the second feature map 260 is expanded to increase the mask prediction accuracy, and then a fully connected network (FCN) operation is performed in each ROI to generate the target mask 300 as shown in Figure 4.
  • FCN fully connected network
  • the embodiment of this application obtains target classification, target box and target mask through three branch operations respectively.
  • the embodiment of the present application operates sequentially through three branches. For example, in the prediction stage, the classification prediction and regression prediction operations are first performed, and the obtained results are passed into the mask prediction branch, which is fast and accurate. to get the target mask.
  • a mask operation 400 is performed using the target mask 300 and the image to be recognized 100, and a target mask image 500 is output.
  • the target mask 300 includes two elements, 0 and 1, where 0 represents black and 1 represents transparent.
  • the mask operation 400 is to generate a slice picture according to the target mask 300, that is, a multiplication operation is performed between the image to be recognized 100 and the target mask 300, and the 0 in the target mask 300 is the original picture.
  • the RGB value is set to 0, and the 1 in the target mask 300 does not change the RGB value of the image 100 to be recognized.
  • the target mask image 500 is generated by segmenting the target to be measured from the image.
  • the target mask image 500 does not contain the background of the environment and can effectively reduce the noise caused by the environment.
  • the target mask image 500 is used to perform target attribute recognition 600, and the target attribute 700 is output.
  • a convolution operation is performed on the target mask image 500, and multi-task multi-label classification is performed through multi-layer convolution operations.
  • the recognition results are shown in Figure 5.
  • the embodiment of the present application uses the target attribute recognition method based on the segmentation algorithm to complete the target attribute recognition of the image 100 to be recognized, and output the target attribute 700.
  • the attribute recognition model performs target attribute recognition on the target mask image and outputs the attributes of the target in the image to be recognized.
  • the attribute recognition model is a multi-task multi-label classification model.
  • the target mask obtained by the mask prediction branch is multiplied with the image to be recognized to obtain the target mask image, and the attribute recognition model of the corresponding multi-task multi-label classification model is selected according to the target classification obtained by the classification prediction branch. Perform attribute recognition on the target mask image and output the attributes of the target.
  • the target is a vehicle
  • the target is a dog
  • the attribute recognition model of the dog classification model performs attribute recognition on the target mask image and outputs the dog's attributes; for example, if the target is a pedestrian, select the attribute recognition model of the corresponding multi-task multi-label pedestrian classification model to perform attribute recognition on the target mask image and Output pedestrian attributes.
  • the target attribute recognition model is used to perform a masking operation on the image to be recognized and obtain a target mask image.
  • the output target frame is first combined with the image to be recognized. Perform a multiplication operation to obtain the target frame mask image, and then perform a multiplication operation on the target mask and the target frame mask image to obtain the target mask image; then use the attribute recognition model to identify the target attribute and output the image to be recognized.
  • the attributes of the target specifically, use the corresponding attribute recognition model in the target attribute recognition model to perform target attribute recognition on the target mask image according to the target classification, and output the attributes of the target of the image to be recognized, the The attribute recognition model is a multi-task multi-label classification model.
  • the target frame obtained through the regression prediction branch is multiplied by the image to be recognized to obtain the target frame mask image, and then the target mask obtained by the mask prediction branch is multiplied by the target frame mask image to obtain the target.
  • Mask image and select the attribute recognition model of the corresponding multi-task multi-label classification model according to the target classification obtained by the classification prediction branch to identify the attributes of the target mask image and output the attributes of the target, which can further improve the accuracy of obtaining the target mask image. Rate.
  • the embodiment of the present application selects the ResNet50 network to extract feature maps of multiple stages of the image to be recognized 100, and further uses the FPN network to fuse the features of at least one stage together to form the first feature map 220, thereby utilizing ResNet50
  • the features extracted at each stage of the network not only extract the semantic features of the high-level feature map, but also extract the low-level contour features to solve the problem that smaller objects cannot be detected; at the same time, the embodiment of the present application introduces the RPN network for area detection, and the RPN network There is no need to search for all area filtering frames, which can significantly improve the detection speed and make it easier to combine with other neural networks; on the other hand, the embodiment of the present application uses ROI Align to achieve the pixel space alignment, which can effectively reduce errors; then , perform the classification prediction and regression prediction operations, pass the obtained results into the mask prediction branch, and quickly and accurately obtain the target mask 300; the target mask image 500 does not contain the background of the environment, which can effectively reduce Noise brought by the environment; perform a convolution operation
  • the target attribute identification method of the embodiment of the present application can be further expanded into a pedestrian attribute identification method based on a segmentation algorithm.
  • a segmentation algorithm where the same and common parts as those in the first embodiment described in this application will not be described again, and only the special parts of pedestrian recognition will be specifically explained.
  • the security field as the number of scenes that need to be monitored increases, the density of people flow increases, and the monitoring time generally requires 7 ⁇ 24 hours, resulting in a surge in the amount of monitoring data. In this case, relying solely on manpower for investigation is time-consuming, labor-intensive, and inaccurate. There is no guarantee, so there is an urgent need to use computer vision algorithms to complete automated monitoring and achieve rapid identification and accurate search.
  • pedestrian attributes are the most critical factor in the pedestrian recognition process.
  • Computer vision is used, through deep learning algorithms, and the flexibility and speed of convolutional neural networks. Segmenting the image to be recognized only retains the pedestrian area of interest, and extracts pedestrian features for the pedestrian area to complete the identification of pedestrian attributes, which can greatly improve work efficiency.
  • the second embodiment of the present application provides a method for identifying pedestrian attributes, which is implemented based on a segmentation algorithm.
  • the method includes:
  • the pedestrian mask use the pedestrian attribute recognition model to perform a mask operation on the image to be recognized and obtain a pedestrian mask image
  • the pedestrian attribute recognition model is used to perform pedestrian attribute recognition, and the attributes of the pedestrian in the image to be recognized are output, where the attributes include multi-label attributes of the pedestrian.
  • the pedestrian attribute recognition method can use a preset pedestrian attribute recognition model to perform pedestrian recognition on the received image to be recognized, and output a pedestrian mask, and use the pedestrian mask to segment from the image to be recognized.
  • the pedestrian mask image is generated, and the attributes of the pedestrian are identified for the pedestrian mask image, and finally the multi-label attributes of the pedestrian are output, which has a high degree of recognition and accuracy.
  • the multi-label attributes include at least three of gender attributes, headgear attributes, hairstyle attributes, clothing attributes, clothing color attributes, accessory attributes, occlusion attributes, truncation attributes and orientation attributes.
  • an image 100 to be recognized is obtained.
  • the source of the image to be recognized includes but not Limited to a certain frame in the video file or a certain frame in the surveillance video stream
  • the image to be recognized 100 is input into a preset backbone convolutional neural network that has completed training, taking into account the recognition speed and recognition
  • the ResNet50 network of the ResNet series is selected to extract feature maps of multiple stages, and then the Feature Pyramid Network (FPN) is introduced to fuse the feature maps of at least one stage together and output the first feature map.
  • FPN Feature Pyramid Network
  • the pedestrian mask image 500 does not contain the background of the environment and can effectively reduce the noise caused by the environment.
  • the pedestrian mask image 500 is subjected to multi-task multi-label classification and pedestrian attributes are output.
  • the recognition results are shown in Figure 5.
  • the pedestrian attributes include but are not limited to gender attributes, headgear attributes, hairstyle attributes, clothing attributes, At least three attributes from the clothing color attribute, accessory attribute, occlusion attribute, truncation attribute and orientation attribute.
  • the third embodiment of this application provides a model training method, including:
  • the target attribute recognition model is trained through labeled sample recognition images. For example, labeled sample recognition images of pedestrians, vehicles, and dogs are input into the target attribute recognition model.
  • the target recognition model performs feature extraction on the sample recognition image and outputs a first feature map, performs region detection on the first feature map through the region detection model of the target attribute recognition model and outputs a plurality of region filtering frames, and performs region detection on the first feature map through the region detection model of the target attribute recognition model.
  • the regional feature matching model performs regional feature matching on the regional filtering frame and outputs the second feature map. It performs region detection on the second feature map through the region detection model of the target attribute recognition model and outputs the target mask; and then obtains the target mask through the mask operation.
  • the target attribute recognition model includes a mask prediction branch, a regression prediction branch and a classification prediction branch, as well as a multi-label classification loss function.
  • the target attribute recognition model is trained using multiple labeled sample recognition images for target recognition. Further includes:
  • the mask prediction branch, regression prediction branch and classification prediction branch are respectively calculated through the preset loss function and the model parameters are adjusted;
  • the model parameters of the attribute recognition model are adjusted through the multi-label classification loss function.
  • each prediction branch calculates the loss function and obtains the total loss value, and uses the first accuracy threshold to determine the loss value until it meets until the first accuracy threshold; similarly, according to the preset second accuracy threshold, the loss value calculated by the multi-label classification loss function is judged until the second accuracy threshold is met.
  • the open source MS COCO data set is selected as the training set
  • the sample material with fine annotations of pedestrians in the Cityscapes data set is selected as the first test set
  • the backup video from a security system that has been running for one year is selected.
  • the sample materials manually organized and labeled in the data are used as the second test set.
  • the MS COCO data set is preset with 80 different objects, which is very suitable for training pedestrian detection models, and can effectively distinguish pedestrians and other related objects in sample materials, such as cars, cats, dogs, trees and signs, etc. .
  • the model trained by the convolutional neural network using the training set is sensitive to the target resolution, especially for pedestrian detection problems, target recall will occur when a model trained on one data set is used to test pedestrians in another data set.
  • the problem of low rate, and the image resolution of the MS COCO data set is inconsistent.
  • the input image is uniformly processed to a resolution of 1024 ⁇ 1024. On the premise of ensuring the original aspect ratio of the sample image, Fill other parts with 0s.
  • the Cityscapes data set contains 5,000 sample images with fine annotations, and not all of the images contain pedestrians. Therefore, in the embodiment of this application, 2,900 images with pedestrians are screened out for pedestrian detection as the third image. A test set for testing.
  • the second test set was obtained from the backup video data of the real security system, and was uniformly processed to a resolution of 1024 ⁇ 1024, and the relevant information was manually annotated by relevant technical personnel. A total of 500 images were used as the second test set for testing.
  • the first test set and the second test may also be mixed into a third test set, which will not be described again here.
  • the target attribute recognition model includes a mask prediction branch, a regression prediction branch and a classification prediction branch, as well as a multi-label classification loss function.
  • the target attribute recognition model uses multiple labeled sample recognition images.
  • Conducting target recognition training further includes:
  • the mask prediction branch selects the cross-entropy loss function
  • the regression prediction branch selects the smooth L1 loss function
  • the classification prediction branch selects the cross-entropy loss function
  • sets the first accuracy rate Thresholds to calculate and adjust model parameters.
  • the first accuracy threshold is set to 90%.
  • the cross-entropy loss function is selected, and in the embodiment of this application, the second accuracy threshold is set to 90%.
  • the attribute recognition model performs model parameter adjustment.
  • the embodiments of this application make targeted designs in the selection of data sets, neural network architecture, and loss function selection.
  • an efficient and stable model can be trained , thereby realizing target attribute recognition or pedestrian attribute recognition based on segmentation algorithm.
  • this application does not specifically limit other details of model training, such as the selection of initial parameters, the selection of GPU hardware, etc. Those skilled in the art should make selections based on actual application requirements, and will not be described again here.
  • this application also provides a target attribute identification device 700, which includes:
  • the target mask acquisition unit 701 is used to perform target recognition on the received image to be recognized, and output a target mask, which is obtained by pixel space alignment based on a segmentation algorithm;
  • the target mask image acquisition unit 702 is configured to perform a mask operation on the image to be recognized according to the target mask and acquire the target mask image;
  • the target attribute recognition unit 703 is configured to perform target attribute recognition on the target mask image, and output attributes of the target in the image to be recognized, where the attributes include multi-label attributes of the target.
  • this application also provides a pedestrian attribute recognition device 800, which includes:
  • the pedestrian mask acquisition unit 801 is used to perform pedestrian recognition on the received image to be recognized, and output a pedestrian mask, which is obtained by pixel space alignment based on a segmentation algorithm;
  • a pedestrian mask image acquisition unit 802 configured to perform a mask operation on the image to be recognized according to the pedestrian mask and obtain a pedestrian mask image
  • Pedestrian attribute identification unit 803 is used to identify pedestrian attributes on the pedestrian mask image, And output the attributes of the pedestrian in the image to be recognized, where the attributes include multi-label attributes of the pedestrian.
  • this application also provides a model training device 900, which includes:
  • Annotation unit 901 is used to obtain multiple sample identification images and annotate the targets of each sample identification image according to the pixel space alignment;
  • the training unit 902 is used to perform target recognition training on the target attribute recognition model using multiple labeled sample recognition images.
  • Another embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements: the target attribute identification method based on the segmentation algorithm, or the pedestrian identification method based on the segmentation algorithm. Attribute identification method, or model training method.
  • the computer-readable storage medium may be any combination of one or more computer-readable media.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections having one or more conductors, portable computer disks, hard drives, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable storage medium may be any tangible medium containing or storing a program, which may be used by or in combination with an instruction execution system, apparatus or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can be sent, propagated, or transmitted for use by an instruction execution system or device. or a program for use with or in conjunction with the device.
  • Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wire, optical cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for performing the operations of the present application may be written in one or more programming languages, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional Procedural programming language—such as "C" or a similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as an Internet service provider through Internet connection
  • FIG. 9 Another embodiment of the present application provides a schematic structural diagram of a computer device.
  • the computer device 12 shown in FIG. 9 is only an example and should not bring any limitations to the functions and scope of use of the embodiments of the present application.
  • computer device 12 is embodied in the form of a general purpose computing device.
  • the components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, system memory 28, and a bus 18 connecting various system components, including system memory 28 and processing unit 16.
  • Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics accelerated port, a processor, or a local bus using any of a variety of bus structures.
  • these architectures include, but are not limited to, the Industry Standard Architecture (ISA) bus, the Micro Channel Architecture (MAC) bus, the Enhanced ISA bus, the Video Electronics Standards Association (VESA) local bus, and the Peripheral Component Interconnect ( PCI) bus.
  • ISA Industry Standard Architecture
  • MAC Micro Channel Architecture
  • VESA Video Electronics Standards Association
  • PCI Peripheral Component Interconnect
  • Computer device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer device 12, including volatile and nonvolatile media, removable and non-removable media.
  • System memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32 .
  • Computer device 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
  • storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in Figure 9, commonly referred to as a "hard drive”).
  • hard drive non-removable, non-volatile magnetic media
  • provision may be made for removable Disk drives that read and write removable non-volatile disks (such as "floppy disks"), and optical disk drives that read and write removable non-volatile optical disks (such as CD-ROM, DVD-ROM or other optical media).
  • each drive may be connected to bus 18 through one or more data media interfaces.
  • the memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present application.
  • a program/utility 40 having a set of (at least one) program modules 42 may be stored, for example, in memory 28 , each of these examples or some combination may include the implementation of a network environment.
  • Program modules 42 generally perform functions and/or methods in the embodiments described herein.
  • Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with Any device (eg, network card, modem, etc.) that enables the computer device 12 to communicate with one or more other computing devices. This communication may occur through input/output (I/O) interface 22.
  • computer device 12 may also communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through network adapter 20. As shown in FIG. 9, network adapter 20 communicates with other modules of computer device 12 via bus 18.
  • network adapter 20 communicates with other modules of computer device 12 via bus 18.
  • the processor unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, a target attribute identification method based on a segmentation algorithm, or a pedestrian attribute identification method based on a segmentation algorithm, or a Model training methods.

Abstract

The present application discloses a target attribute recognition method and apparatus, and a model training method and apparatus. The target attribute recognition method comprises: using a preset target attribute recognition model to perform target recognition on a received image to be recognized, and outputting a target mask; using the target mask to segment a target mask image from the image to be recognized; and performing target attribute recognition on the target mask image, and finally outputting a multi-label attribute of a target. According to the present application, a region unrelated to the target is filtered by means of a segmentation algorithm, and attribute recognition is performed by means of a pedestrian mask image, so that interference caused by the environment can be avoided, and the recognition speed and accuracy are remarkably improved. In particular, in the security field, according to a pedestrian attribute recognition method in an embodiment of the present application, automated monitoring can be implemented, rapid filtering and assisted search are effectively realized, the working efficiency is improved, and the present application thus has a wide application prospect.

Description

目标属性识别方法、模型训练方法和装置Target attribute identification method, model training method and device
本申请要求于2022年06月23日提交的申请号为202210714705.4、发明名称为“一种基于分割算法的目标属性识别方法、训练方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application with application number 202210714705.4 and the invention title "A target attribute identification method, training method and device based on segmentation algorithm" submitted on June 23, 2022, the entire content of which is incorporated by reference incorporated in this application.
技术领域Technical field
本申请涉及计算机视觉领域,特别涉及一种目标属性识别方法、模型训练方法和装置。The present application relates to the field of computer vision, and in particular to a target attribute recognition method, model training method and device.
背景技术Background technique
随着视频监控场景的增多,产生了大量的视频数据。如何在大量的视频数据中快速、准确地识别出目标,是目前亟待解决的问题。With the increase in video surveillance scenarios, a large amount of video data is generated. How to quickly and accurately identify targets in a large amount of video data is an urgent problem that needs to be solved.
发明内容Contents of the invention
本申请提供了一种目标属性识别方法、训练方法和装置。This application provides a target attribute recognition method, training method and device.
一方面,提供了一种目标属性识别方法,具体包括:On the one hand, a target attribute identification method is provided, which specifically includes:
使用预设置的目标属性识别模型对接收的待识别图像进行目标识别,并输出目标掩码,所述目标掩码为基于分割算法进行像素空间对齐获得的;Use a preset target attribute recognition model to perform target recognition on the received image to be recognized, and output a target mask, which is obtained by pixel space alignment based on a segmentation algorithm;
根据所述目标掩码,使用所述目标属性识别模型对所述待识别图像进行掩码操作并获取目标掩码图像;According to the target mask, use the target attribute recognition model to perform a mask operation on the image to be recognized and obtain a target mask image;
根据所述目标掩码图像,使用所述目标属性识别模型进行目标属性识别,并输出待识别图像的目标的属性,所述属性包括所述目标的多标签属性。According to the target mask image, the target attribute recognition model is used to perform target attribute recognition, and attributes of the target of the image to be recognized are output, where the attributes include multi-label attributes of the target.
进一步地,所述使用预设置的目标属性识别模型对接收的待识别图像进行目标识别,并输出目标掩码进一步包括:Further, using a preset target attribute recognition model to perform target recognition on the received image to be recognized, and outputting the target mask further includes:
使用所述目标属性识别模型对所述待识别图像进行特征提取并输出第一特征图;Use the target attribute recognition model to perform feature extraction on the image to be recognized and output a first feature map;
使用所述目标属性识别模型对所述第一特征图进行区域检测并输出多个区域筛选框;Use the target attribute recognition model to perform region detection on the first feature map and output multiple region filtering frames;
使用所述目标属性识别模型对所述区域筛选框进行区域特征匹配并输出 第二特征图,所述第二特征图为基于分割算法进行像素空间对齐获得的;Use the target attribute recognition model to perform regional feature matching on the regional filter box and output it A second feature map, the second feature map is obtained by pixel space alignment based on a segmentation algorithm;
使用所述目标属性识别模型对所述第二特征图进行区域检测并输出目标掩码。The target attribute recognition model is used to perform area detection on the second feature map and output a target mask.
进一步地,所述目标属性识别模型包括特征提取网络、第一特征图金字塔网络和区域生成网络;Further, the target attribute recognition model includes a feature extraction network, a first feature map pyramid network and a region generation network;
所述使用所述目标属性识别模型对所述待识别图像进行特征提取并输出第一特征图进一步包括:The use of the target attribute recognition model to extract features from the image to be recognized and output the first feature map further includes:
使用所述特征提取网络对所述待识别图像进行特征提取并输出多层特征原图;Use the feature extraction network to perform feature extraction on the image to be recognized and output a multi-layer feature original image;
使用所述第一特征图金字塔网络,根据至少一层所述特征原图输出所述第一特征图;Use the first feature map pyramid network to output the first feature map according to at least one layer of the original feature map;
所述使用所述目标属性识别模型对所述第一特征图进行区域检测并输出多个区域筛选框进一步包括:根据预设置的锚框,使用所述区域生成网络对所述第一特征图进行区域检测并输出多个区域筛选框。The method of using the target attribute recognition model to perform region detection on the first feature map and outputting a plurality of region filtering frames further includes: using the region generation network to perform region detection on the first feature map according to a preset anchor frame. Region detection and output of multiple region filter boxes.
进一步地,所述目标属性识别模型包括掩码预测分支、回归预测分支和分类预测分支;Further, the target attribute recognition model includes a mask prediction branch, a regression prediction branch and a classification prediction branch;
使用所述目标属性识别模型对所述第二特征图进行区域检测并输出目标掩码进一步包括:Using the target attribute recognition model to perform region detection on the second feature map and outputting a target mask further includes:
使用所述掩码预测分支对所述第二特征图进行区域预测并输出目标掩码;Use the mask prediction branch to perform region prediction on the second feature map and output a target mask;
使用所述回归预测分支对所述第二特征图进行区域预测并输出目标框;Use the regression prediction branch to perform regional prediction on the second feature map and output a target frame;
使用所述分类预测分支对所述第二特征图进行分类预测并输出目标分类。Use the classification prediction branch to perform classification prediction on the second feature map and output a target classification.
进一步地,所述根据所述目标掩码,使用所述目标属性识别模型对所述待识别图像进行掩码操作并获取目标掩码图像进一步包括:将所述目标掩码与所述待识别图像进行乘法操作并获取目标掩码图像;Further, the step of using the target attribute recognition model to perform a masking operation on the image to be recognized and obtaining the target mask image according to the target mask further includes: combining the target mask with the image to be recognized. Perform multiplication operations and obtain the target mask image;
所述根据所述目标掩码图像,使用所述目标属性识别模型进行目标属性识别,并输出待识别图像的目标的属性进一步包括:根据所述目标分类使用所述目标属性识别模型中对应的属性识别模型对所述目标掩码图像进行目标属性识别,并输出待识别图像的目标的属性,所述属性识别模型为多任务多标签分类模型。 The step of using the target attribute recognition model to perform target attribute recognition according to the target mask image, and outputting the attributes of the target in the image to be recognized further includes: using the corresponding attributes in the target attribute recognition model according to the target classification. The recognition model performs target attribute recognition on the target mask image and outputs the attributes of the target in the image to be recognized. The attribute recognition model is a multi-task multi-label classification model.
进一步地,所述根据所述目标掩码,使用所述目标属性识别模型对所述待识别图像进行掩码操作并获取目标掩码图像进一步包括:将所述输出目标框与所述待识别图像进行乘法操作以获取目标框掩码图像,将所述目标掩码与目标框掩码图像进行乘法操作并获取目标掩码图像;Further, according to the target mask, using the target attribute recognition model to perform a masking operation on the image to be recognized and obtaining the target mask image further includes: combining the output target frame with the image to be recognized. Perform a multiplication operation to obtain the target frame mask image, perform a multiplication operation on the target mask and the target frame mask image, and obtain the target mask image;
所述根据所述目标掩码图像,使用所述目标属性识别模型进行目标属性识别,并输出待识别图像的目标的属性进一步包括:根据所述目标分类使用所述目标属性识别模型中对应的属性识别模型对所述目标掩码图像进行目标属性识别,并输出待识别图像的目标的属性,所述属性识别模型为多任务多标签分类模型。The step of using the target attribute recognition model to perform target attribute recognition according to the target mask image, and outputting the attributes of the target in the image to be recognized further includes: using the corresponding attributes in the target attribute recognition model according to the target classification. The recognition model performs target attribute recognition on the target mask image and outputs the attributes of the target in the image to be recognized. The attribute recognition model is a multi-task multi-label classification model.
进一步地,所述特征提取网络为VGG网络、googlenet网络、resnet网络、以及resnext网络中的一个。Further, the feature extraction network is one of a VGG network, a googlenet network, a resnet network, and a resnext network.
另一方面,提供了一种行人属性识别方法:On the other hand, a pedestrian attribute recognition method is provided:
使用预设置的行人属性识别模型对接收的待识别图像进行行人识别,并输出行人掩码,所述行人掩码为基于分割算法进行像素空间对齐获得的;Use a preset pedestrian attribute recognition model to perform pedestrian recognition on the received image to be recognized, and output a pedestrian mask, which is obtained by pixel space alignment based on a segmentation algorithm;
根据所述行人掩码,使用所述行人属性识别模型对所述待识别图像进行掩码操作并获取行人掩码图像;According to the pedestrian mask, use the pedestrian attribute recognition model to perform a mask operation on the image to be recognized and obtain a pedestrian mask image;
根据所述行人掩码图像,使用所述行人属性识别模型进行行人属性识别,并输出待识别图像的行人的属性,所述属性包括所述行人的多标签属性。According to the pedestrian mask image, the pedestrian attribute recognition model is used to perform pedestrian attribute recognition, and the attributes of the pedestrian in the image to be recognized are output, where the attributes include multi-label attributes of the pedestrian.
进一步地,所述多标签属性包括性别属性、头饰属性、发型属性、服饰属性、服饰颜色属性、配饰属性、遮挡属性、截断属性和朝向属性中的至少三个。Further, the multi-label attributes include at least three of gender attributes, headgear attributes, hairstyle attributes, clothing attributes, clothing color attributes, accessories attributes, occlusion attributes, truncation attributes and orientation attributes.
又一方面,提供了一种模型训练方法,包括:On the other hand, a model training method is provided, including:
获取多个样本识别图像,并对各样本识别图像的目标按照像素空间对齐方式进行标注;Obtain multiple sample recognition images, and label the targets of each sample recognition image according to the pixel space alignment;
使用已标注的多个样本识别图像对目标属性识别模型进行目标识别训练。Use the labeled multiple sample recognition images to train the target attribute recognition model for target recognition.
进一步地,所述目标属性识别模型包括掩码预测分支、回归预测分支和分类预测分支,以及多标签分类损失函数,Further, the target attribute recognition model includes a mask prediction branch, a regression prediction branch and a classification prediction branch, as well as a multi-label classification loss function,
所述使用已标注的多个样本识别图像对目标属性识别模型进行目标识别训练进一步包括:The target recognition training of the target attribute recognition model using multiple labeled sample recognition images further includes:
根据预设置的第一准确率阈值,所述掩码预测分支、回归预测分支和分 类预测分支分别通过预设置的损失函数进行计算并调整模型参数;According to the preset first accuracy threshold, the mask prediction branch, regression prediction branch and branch The class prediction branches are calculated through the preset loss function and adjust the model parameters;
根据预设置的第二准确率阈值,通过所述多标签分类损失函数对所述目标属性识别模型进行模型参数调整。According to the preset second accuracy threshold, the model parameters of the target attribute recognition model are adjusted through the multi-label classification loss function.
再一方面,提供了一种目标属性识别装置,包括:On the other hand, a target attribute identification device is provided, including:
目标掩码获取单元,用于对接收的待识别图像进行目标识别,并输出目标掩码,所述目标掩码为基于分割算法进行像素空间对齐获得的;A target mask acquisition unit, used to perform target recognition on the received image to be recognized, and output a target mask, where the target mask is obtained by pixel space alignment based on a segmentation algorithm;
目标掩码图像获取单元,用于根据所述目标掩码对所述待识别图像进行掩码操作并获取目标掩码图像;A target mask image acquisition unit, configured to perform a masking operation on the image to be recognized according to the target mask and acquire the target mask image;
目标属性识别单元,用于对所述目标掩码图像进行目标属性识别,并输出待识别图像的目标的属性,所述属性包括所述目标的多标签属性。A target attribute recognition unit is configured to perform target attribute recognition on the target mask image, and output attributes of the target in the image to be recognized, where the attributes include multi-label attributes of the target.
还一方面,提供了一种行人属性识别装置,包括:In yet another aspect, a pedestrian attribute recognition device is provided, including:
行人掩码获取单元,用于对接收的待识别图像进行行人识别,并输出行人掩码,所述行人掩码为基于分割算法进行像素空间对齐获得的;A pedestrian mask acquisition unit is used to perform pedestrian recognition on the received image to be recognized, and output a pedestrian mask, which is obtained by pixel space alignment based on a segmentation algorithm;
行人掩码图像获取单元,用于根据所述行人掩码对所述待识别图像进行掩码操作并获取行人掩码图像;A pedestrian mask image acquisition unit, configured to perform a masking operation on the image to be recognized according to the pedestrian mask and obtain a pedestrian mask image;
行人属性识别单元,用于对所述行人掩码图像进行行人属性识别,并输出待识别图像的行人的属性,所述属性包括所述行人的多标签属性。A pedestrian attribute recognition unit is configured to perform pedestrian attribute recognition on the pedestrian mask image and output attributes of the pedestrian in the image to be recognized, where the attributes include multi-label attributes of the pedestrian.
还一方面,提供了一种模型训练装置,包括:In another aspect, a model training device is provided, including:
标注单元,用于获取多个样本识别图像,并对各样本识别图像的目标按照像素空间对齐方式进行标注;The labeling unit is used to obtain multiple sample recognition images and label the targets of each sample recognition image according to the pixel space alignment;
训练单元,用于使用已标注的多个样本识别图像对目标属性识别模型进行目标识别训练。The training unit is used to perform target recognition training on the target attribute recognition model using multiple labeled sample recognition images.
还一方面,提供了一种计算机可读存储介质,其上存储有计算机程序,In yet another aspect, a computer-readable storage medium is provided with a computer program stored thereon,
该程序被处理器执行时实现如一方面所述的方法;The program, when executed by the processor, implements a method as described in one aspect;
或者or
该程序被处理器执行时实现如另一方面所述的方法;The program when executed by the processor implements a method as described in another aspect;
或者or
该程序被处理器执行时实现如又一方面所述的方法。The program, when executed by the processor, implements a method as described in yet another aspect.
还一方面,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,In yet another aspect, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor,
所述处理器执行所述程序时实现如一方面所述的方法; When the processor executes the program, the method as described in one aspect is implemented;
或者or
所述处理器执行所述程序时实现如另一方面所述的方法;When the processor executes the program, the method as described in another aspect is implemented;
或者or
所述处理器执行所述程序时实现如又一方面所述的方法。When the processor executes the program, the method as described in yet another aspect is implemented.
附图说明Description of the drawings
图1示出本申请的一个实施例所述目标属性识别方法的流程图;Figure 1 shows a flow chart of a target attribute identification method according to an embodiment of the present application;
图2示出本申请的另一个实施例所述目标属性识别方法的框图;Figure 2 shows a block diagram of a target attribute identification method according to another embodiment of the present application;
图3示出本申请的一个实施例所述锚框示意图;Figure 3 shows a schematic diagram of an anchor frame according to an embodiment of the present application;
图4示出本申请的一个实施例所述目标掩码和目标掩码图像示意图;Figure 4 shows a schematic diagram of the target mask and the target mask image according to an embodiment of the present application;
图5示出本申请的一个实施例所述待识别图像和目标属性示意图;Figure 5 shows a schematic diagram of an image to be recognized and target attributes according to an embodiment of the present application;
图6示出本申请的另一个实施例所述一种目标属性识别装置结构图;Figure 6 shows a structural diagram of a target attribute identification device according to another embodiment of the present application;
图7示出本申请的另一个实施例所述一种行人属性识别装置结构图;Figure 7 shows a structural diagram of a pedestrian attribute recognition device according to another embodiment of the present application;
图8示出本申请的另一个实施例所述一种模型训练装置结构图;Figure 8 shows a structural diagram of a model training device according to another embodiment of the present application;
图9示出本申请的另一个实施例所述的一种计算机设备的结构示意图。Figure 9 shows a schematic structural diagram of a computer device according to another embodiment of the present application.
具体实施方式Detailed ways
为了更清楚地说明本申请方案,下面结合实施例和附图对本申请方案做进一步的说明。附图中相似的部件以相同的附图标记进行表示。本领域技术人员应当理解,下面所具体描述的内容是说明性的而非限制性的,不应以此限制本申请的保护范围。In order to explain the solution of the present application more clearly, the solution of the present application will be further described below in conjunction with the embodiments and drawings. Similar parts are designated with the same reference numerals in the drawings. Those skilled in the art should understand that the content specifically described below is illustrative rather than restrictive, and should not be used to limit the scope of protection of the present application.
在安防场景中,通常采用计算机视觉算法实现自动化监控。具体的,通过深度学习算法,首先对待处理图像进行分割,只保留感兴趣的区域,其次针对行人区域进行行人特征的提取,完成属性的识别。伴随监控场景的增多,产生大量的视频数据,在海量数据量的前提下,如何能够快速、准确的过滤行人属性,准确查找到目标行人成为亟待解决的问题。In security scenarios, computer vision algorithms are usually used to achieve automated monitoring. Specifically, through the deep learning algorithm, the image to be processed is first segmented and only the area of interest is retained. Secondly, pedestrian features are extracted for the pedestrian area to complete attribute recognition. With the increase of surveillance scenes, a large amount of video data is generated. Under the premise of massive data volume, how to quickly and accurately filter pedestrian attributes and accurately find target pedestrians has become an urgent problem to be solved.
在相关技术中,进行行人属性识别时,广泛使用计算机视觉技术,通过深度学习算法,例如针对行人区域进行行人特征的提取,完成属性的识别。相对于以前的传统图像处理,目前的特征提取主流方式都是使用卷积神经网络,使用深度学习的方法来解决此类问题,例如,使用YOLACT算法对行人属性背景信息进行过滤,拼接不同大小的特征图进行多任务网络预测,提 出梯度权重损失函数进行模型的训练;再例如使用人体姿态关键点获取人体区域,将提取的细节关键点和浅层特征进行结合,将提取的人体区域和深层特征进行结合,将结合后的数据和深层特征分别输入到区域引导模块得到多个预测向量,将多个预测向量进行融合,得到最终的预测结果。然而,上述方法均需要进行额外的关键点检测,该步骤对设备的计算能力要求较高,并且需要增加相应的处理时间,考虑到在实际的应用中,存在数据量巨大的待识别图像时、以及实时判别能力要求时,对识别速度和准确率提出了较高要求,因此如何快速、准确地识别目标属性成为亟待解决的技术问题。In related technologies, when identifying pedestrian attributes, computer vision technology is widely used, and deep learning algorithms are used, such as extracting pedestrian features for pedestrian areas to complete attribute identification. Compared with previous traditional image processing, the current mainstream method of feature extraction uses convolutional neural networks and deep learning methods to solve such problems. For example, the YOLACT algorithm is used to filter pedestrian attribute background information and splice different sizes of Feature maps are used for multi-task network prediction to improve The gradient weight loss function is used to train the model; another example is to use human posture key points to obtain the human body area, combine the extracted detail key points with shallow features, combine the extracted human body area with deep features, and combine the combined data and deep features are respectively input into the regional guidance module to obtain multiple prediction vectors, and the multiple prediction vectors are fused to obtain the final prediction result. However, the above methods all require additional key point detection. This step requires high computing power of the device and increases the corresponding processing time. Considering that in actual applications, there is a huge amount of data to be identified in the image, As well as the requirements for real-time discrimination capabilities, higher requirements are put forward for recognition speed and accuracy. Therefore, how to quickly and accurately identify target attributes has become an urgent technical problem to be solved.
针对上述情况,如图1所示,本申请的一个实施例提供了一种目标属性识别方法,该目标属性识别方法基于分割算法实现。该方法包括:In response to the above situation, as shown in Figure 1, one embodiment of the present application provides a target attribute identification method, which is implemented based on a segmentation algorithm. The method includes:
使用预设置的目标属性识别模型对接收的待识别图像进行目标识别,并输出目标掩码,所述目标掩码为基于分割算法进行像素空间对齐获得的;Use a preset target attribute recognition model to perform target recognition on the received image to be recognized, and output a target mask, which is obtained by pixel space alignment based on a segmentation algorithm;
根据所述目标掩码,使用所述目标属性识别模型对所述待识别图像进行掩码操作并获取目标掩码图像;According to the target mask, use the target attribute recognition model to perform a mask operation on the image to be recognized and obtain a target mask image;
根据所述目标掩码图像,使用所述目标属性识别模型进行目标属性识别,并输出待识别图像的目标的属性,所述属性包括所述目标的多标签属性。According to the target mask image, the target attribute recognition model is used to perform target attribute recognition, and attributes of the target of the image to be recognized are output, where the attributes include multi-label attributes of the target.
本申请实施例通过所述基于分割算法的目标属性识别方法,相对于所述使用额外的关键点的识别方法,绕开了处理关键点的步骤,降低了对硬件的性能要求,缩短了识别时间,并且能够最大程度的过滤掉非目标区域,通过目标掩码图像进行属性识别,能够避免环境对属性识别造成的干扰,显著提高识别速度和准确率,能够实现快速的过滤和协助查找,极大的提高工作效率,具有广泛的应用前景。The embodiment of the present application uses the target attribute identification method based on the segmentation algorithm, compared to the identification method using additional key points, bypassing the step of processing key points, reducing the performance requirements for the hardware, and shortening the identification time. , and can filter out non-target areas to the greatest extent, and perform attribute recognition through the target mask image, which can avoid environmental interference on attribute recognition, significantly improve the recognition speed and accuracy, and can achieve rapid filtering and assisted search, which greatly It improves work efficiency and has broad application prospects.
在一个具体的示例中,如图2所示,所述属性识别分为三个步骤:In a specific example, as shown in Figure 2, the attribute identification is divided into three steps:
首先,读取所述待识别图像100,进行目标识别200,输出目标掩码300。First, the image to be recognized 100 is read, target recognition 200 is performed, and a target mask 300 is output.
在本申请实施例中,具体包括如下步骤:In the embodiment of this application, the following steps are specifically included:
对所述待识别图像100进行特征提取210,输出第一特征图220,即Feature Map,为输入图像经过神经网络卷积获取的结果,其分辨率大小取决于先前卷积核的步长。Perform feature extraction 210 on the image to be recognized 100, and output a first feature map 220, that is, Feature Map, which is the result of the input image being convolved by a neural network, and its resolution depends on the step size of the previous convolution kernel.
区域检测230,即使用提取候选框的网络区域生成网络(Region Proposal Network,RPN)进行“区域选取”并输出多个区域筛选框240,区域特征匹配250并输出第二特征图260,再次进行区域检测270并输出目标掩码300。 Region detection 230, that is, using the Region Proposal Network (RPN) to extract candidate frames for "region selection" and outputting multiple region filtering frames 240, regional feature matching 250 and outputting the second feature map 260, and performing region selection again. Detect 270 and output target mask 300.
具体的,将一张待识别图像100输入到一个预置的已完成训练的主干卷积神经网络中(Backbone Convolutional Neural Networks,Backbone CNN),所述主干卷积神经网络主要用于提取所述待识别图像100的特征图以供后续网络使用。Specifically, an image 100 to be recognized is input into a preset backbone convolutional neural network (Backbone Convolutional Neural Networks, Backbone CNN) that has completed training. The backbone convolutional neural network is mainly used to extract the to-be-recognized image. Feature maps of the image 100 are identified for use by subsequent networks.
在一个可选的实施例中,所述特征提取网络为vgg网络、googlenet网络、resnet网络、或者resnext网络。In an optional embodiment, the feature extraction network is a vgg network, a googlenet network, a resnet network, or a resnext network.
通过上述特征提取网络中的一个对待识别图像进行特征提取。Feature extraction is performed on the image to be identified through one of the above feature extraction networks.
具体的,所述VGG(视觉几何组网络,Visual Geometry Group)网络中,通过使用一系列大小为3x3的小尺寸卷积核和池化层构造深度卷积神经网络,具有结构简单、应用性强的特点。Specifically, in the VGG (Visual Geometry Group) network, a deep convolutional neural network is constructed by using a series of small-sized convolution kernels of size 3x3 and pooling layers, which has a simple structure and strong applicability. specialty.
在所述GoogLeNet网络中,卷积块被称为Inception块,Inception块相当于一个有4条路径的子网络,通过不同窗口形状的卷积层和最大汇聚层来并行抽取信息,并使用1×1卷积层减少每像素级别上的通道维数从而降低模型复杂度。In the GoogLeNet network, the convolution block is called the Inception block. The Inception block is equivalent to a sub-network with 4 paths. Information is extracted in parallel through convolution layers and maximum pooling layers of different window shapes, and uses 1× 1 The convolutional layer reduces the channel dimension at each pixel level thereby reducing model complexity.
ResNeXt网络同时采用了VGG网络的堆叠思想和inception块的split-transform-merge思想,具有更强的可扩展性,在增加准确率的同时基本不会改变或降低模型的复杂度。The ResNeXt network adopts both the stacking idea of the VGG network and the split-transform-merge idea of the inception block, which has stronger scalability and basically does not change or reduce the complexity of the model while increasing the accuracy.
ResNet网络是针对更深层次的神经网络难以训练的问题、提出的一种残差学习的结构,在增加了网络深度的同时减少参数的数量,在检测、分割、识别等领域获得广泛应用。The ResNet network is a residual learning structure proposed to address the problem that deeper neural networks are difficult to train. It increases the depth of the network while reducing the number of parameters, and is widely used in detection, segmentation, recognition and other fields.
在本申请实施例中,所述特征提取网络采用ResNet50网络。所述ResNet50网络输出多个特征图,本申请实施例利用特征图金字塔网络(Feature Pyramid Network,FPN)将最后三层输出的特征图进行融合并输出特征图220。In this embodiment of the present application, the feature extraction network adopts ResNet50 network. The ResNet50 network outputs multiple feature maps. This embodiment of the present application uses a feature map pyramid network (Feature Pyramid Network, FPN) to fuse the feature maps output by the last three layers and output the feature map 220.
其中,特征图金字塔网络(Feature Pyramid Network,FPN)是一种自顶向下的特征融合方法,并且是一种多尺度的目标检测算法,即使用大于1个的特征预测层,将多个阶段的特征图融合在一起,既提取高层特征图的语义特征,又提取低层的轮廓特征。Among them, Feature Pyramid Network (FPN) is a top-down feature fusion method and a multi-scale target detection algorithm, which uses more than 1 feature prediction layer to combine multiple stages. The feature maps are fused together to extract not only the semantic features of the high-level feature maps, but also the low-level contour features.
值得说明的是,本申请对FPN网络进行特征图融合的数量不作具体限定,本领域技术人员应当根据实际应用需求,例如网络的处理速度和特征图的性能选择适当数量的特征图进行融合,在此不再赘述。It is worth noting that this application does not specifically limit the number of feature maps for FPN network fusion. Those skilled in the art should select an appropriate number of feature maps for fusion based on actual application requirements, such as the processing speed of the network and the performance of the feature maps. This will not be described again.
本申请实施例通过采用ResNet50网络提取所述待识别图像100的特征 图,并进一步使用FPN网络进行特征融合并形成所述第一特征图220,能够通过ResNet50网络在各个阶段提取的特征图,既能够提取高层特征图的语义特征,又能够提取低层的轮廓特征,从而解决较小物体无法检测的问题。In this embodiment of the present application, the ResNet50 network is used to extract the features of the image 100 to be recognized. map, and further use the FPN network to perform feature fusion and form the first feature map 220. The feature maps extracted at various stages through the ResNet50 network can extract not only the semantic features of the high-level feature map, but also the low-level contour features. This solves the problem that smaller objects cannot be detected.
基于所述第一特征图220,输入RPN网络进行区域检测230,从而提取所述区域筛选框240。Based on the first feature map 220, the RPN network is input to perform region detection 230, thereby extracting the region filtering frame 240.
具体的,将所述第一特征图220进行3×3的卷积操作,得到一个通道(channel)数256的特征图,其尺寸和所述第一特征图220相同。例如,所述第一特征图220的长为H,宽为W,则所述通道数为256的特征图,视为具有H×W个向量,每个向量是256维,继续对此向量做两次全连接操作,分别得到2个分数和4个坐标,等同于对所述通道数为256的特征图做两次1×1的卷积,得到一个2×H×W和一个4×H×W大小的特征图。Specifically, the first feature map 220 is subjected to a 3×3 convolution operation to obtain a feature map with 256 channels, the size of which is the same as the first feature map 220 . For example, if the length of the first feature map 220 is H and the width is W, then the feature map with a channel number of 256 is regarded as having H×W vectors, each of which has 256 dimensions. Continue to do this on this vector. Two fully connected operations yield 2 scores and 4 coordinates respectively, which is equivalent to performing two 1×1 convolutions on the feature map with 256 channels, resulting in a 2×H×W and a 4×H ×W size feature map.
具体的,2×H×W的特征图,即2个置信度,表示前景和背景的分数,因为所述PRN网络只负责提取所述区域筛选框240,不需要判断所述待识别图像100中物品的类别,因此利用前景和背景的置信度判断是否为物品;4×H×W大小的特征图,即4个坐标,表示在所述待识别图像100中的偏移坐标(x,y,w,h)。Specifically, the 2×H×W feature map, that is, 2 confidence levels, represents the scores of the foreground and the background, because the PRN network is only responsible for extracting the area filtering frame 240 and does not need to judge the image 100 to be recognized. The category of the item, so the confidence of the foreground and background is used to determine whether it is an item; the feature map of 4×H×W size, that is, 4 coordinates, represents the offset coordinates (x, y, w,h).
值得注意的是,所述偏移坐标是所述待识别图像100的坐标,因所述待识别图像100与所述第一特征图220的宽和高不同,为了获取所述待识别图像100中的图片坐标,引入锚点(Anchor)。具体包括:It is worth noting that the offset coordinates are the coordinates of the image 100 to be recognized. Since the image 100 to be recognized is different from the first feature map 220 in width and height, in order to obtain the image 100 to be recognized, The picture coordinates, introduce the anchor point (Anchor). Specifically include:
在所述第一特征图220中随机选取一个点,该点能够映射到所述待识别图像100的一个框,例如所述待识别图像100与所述第一特征图220的缩放比例为8:1,则所述映射的框为8×8,设置此框的左上角或者中心点为所述锚点,基于此锚点按照预先配置的规则生成若干锚框(Anchor Box),每个锚框的大小由缩放比(scale)和宽高比(ratio)两个参数来确定,例如预先设置scale=[128],ratio=[0.5,1,1.5],则每个像素点可以产生3个不同大小的框。如图3所示,三个框面积相同,通过ratio的值来改变其长宽比,从而产生不同形状的框。值得注意的是,本申请对所述锚框的个数、缩放比例、以及宽高比不作具体限定,本领域技术人员应当根据实际应用需求,例如网络的处理速度和性能进行适当的选择,在此不再赘述。Randomly select a point in the first feature map 220 that can be mapped to a frame of the image 100 to be recognized. For example, the scaling ratio between the image 100 to be recognized and the first feature map 220 is 8: 1, then the mapped box is 8×8, set the upper left corner or center point of this box as the anchor point, and generate several anchor boxes (Anchor Box) based on this anchor point according to pre-configured rules. Each anchor box The size is determined by two parameters: scale and aspect ratio. For example, if scale=[128] and ratio=[0.5,1,1.5] are preset, each pixel can produce 3 different size box. As shown in Figure 3, the three boxes have the same area, and their aspect ratios are changed by the value of ratio, thereby producing boxes of different shapes. It is worth noting that this application does not specifically limit the number, scaling ratio, and aspect ratio of the anchor frames. Those skilled in the art should make appropriate selections based on actual application requirements, such as network processing speed and performance. This will not be described again.
在本申请实施例中,例如所述锚框个数为K,即每个锚点产生K个框,所述第一特征图220,包含H×W个点,每个点对应所述待识别图像100 有K个框,则总共有H×W×K个所述区域筛选框240,通过所述RPN判断这些框是否是物体以及其在所述待识别图像100上的偏移坐标,即得到所述区域筛选框240。In this embodiment of the present application, for example, the number of anchor frames is K, that is, each anchor point generates K frames. The first feature map 220 includes H×W points, each point corresponding to the to-be-identified Image 100 There are K frames, then there are a total of H×W×K region filtering frames 240. Through the RPN, it is judged whether these frames are objects and their offset coordinates on the image 100 to be recognized, that is, the Region filter box 240.
进一步地,考虑到相关技术中采用的感兴趣区域池化(region of interest pooling,ROI Pooling)来处理候选区域尺寸不同的问题,由于ROI Pooling采用向下取整的方式容易导致产生误差且无法保证所述特征层和所述输入层像素精确对应,无法达到语义分割任务的要求。因此本申请实施例采用ROI对齐(ROI Align)的方式,取消取整操作,改用双线性插值得到固定四个点坐标的像素值,从而使得不连续的操作变得连续起来,能够有效降低误差,实现所述像素空间对齐,完成所述区域特征匹配250。换句话说,本申请实施例使用ROI Align的方式实现对所述区域筛选框进行区域特征匹配250并输出第二特征图260,即基于分割算法进行像素空间对齐获得第二特征图260;实现在所述待识别图像100中识别出待测物品(前景物体)的精确的坐标像素值。Furthermore, considering that region of interest pooling (ROI Pooling) is used in related technologies to deal with the problem of different candidate area sizes, since ROI Pooling adopts a downward rounding method, it is easy to cause errors and cannot be guaranteed. The feature layer exactly corresponds to the pixels of the input layer, which cannot meet the requirements of the semantic segmentation task. Therefore, the embodiment of the present application adopts the ROI Align method to cancel the rounding operation and instead use bilinear interpolation to obtain the pixel values of the fixed four point coordinates, thereby making the discontinuous operations continuous and effectively reducing the Error, realize the spatial alignment of the pixels, and complete the regional feature matching 250. In other words, the embodiment of the present application uses the ROI Align method to perform regional feature matching 250 on the region filtering frame and output the second feature map 260, that is, based on the segmentation algorithm, pixel space alignment is performed to obtain the second feature map 260; implemented in The precise coordinate pixel value of the object to be detected (foreground object) is identified in the image 100 to be identified.
本申请实施例考虑到目标属性识别模型的性能和准确度的要求,一方面通过引入RPN网络进行区域检测能够显著提高检测速度,并且更容易与其他神经网络结合;另一方面通过采用ROI Align的方式实现所述像素空间对齐,能够有效降低误差。The embodiments of this application take into account the performance and accuracy requirements of the target attribute recognition model. On the one hand, the detection speed can be significantly improved by introducing the RPN network for area detection, and it is easier to combine with other neural networks; on the other hand, by using ROI Align This method achieves the pixel spatial alignment, which can effectively reduce errors.
基于所述第二特征图260再次进行区域检测270并获取所述目标掩码300。具体包括分别将所述第二特征图输入三个预测分支。Based on the second feature map 260, the region detection 270 is performed again and the target mask 300 is obtained. Specifically, it includes inputting the second feature map into three prediction branches respectively.
具体的,将所述第二特征图260引入所述分类预测分支进行分类预测并输出目标分类,在一个全连接层后接入一个softmax层,softmax层接收一个N维向量作为输入,把每一维的值转换成(0,1)之间的一个实数,实现将所述全连接层的输出映射成一个概率的分布,在本申请实施例中具体用于实现前景和背景分类。Specifically, the second feature map 260 is introduced into the classification prediction branch to perform classification prediction and output target classification. After a fully connected layer, a softmax layer is connected. The softmax layer receives an N-dimensional vector as input, and each The value of the dimension is converted into a real number between (0, 1) to map the output of the fully connected layer into a probability distribution, which is specifically used to implement foreground and background classification in the embodiment of the present application.
具体的,将所述第二特征图260引入所述回归预测分支进行分类预测并输出目标框,在一个全连接层后接入一个边框回归层(Bounding Box Regression,bbox reg),通过回归预测得到更加精确的坐标像素值,所述坐标像素值为所述待识别图像100中识别出待测物品(前景物体)的精确坐标。Specifically, the second feature map 260 is introduced into the regression prediction branch to perform classification prediction and output the target box. After a fully connected layer, a bounding box regression layer (Bounding Box Regression, bbox reg) is connected, and the regression prediction is obtained. More accurate coordinate pixel values, which are the precise coordinates of the object to be detected (foreground object) identified in the image 100 to be recognized.
具体的,将所述第二特征图260引入所述掩码预测分支进行分类预测并输出目标掩码,在一个头部层(Head)后接入一个全连接层,所述头部层将 所述第二特征图260的输出维度扩大,增加掩码预测精确度,然后在每一个ROI里面进行全连接网络(FCN)操作,生成如图4所示的所述目标掩码300。Specifically, the second feature map 260 is introduced into the mask prediction branch to perform classification prediction and output the target mask, and a fully connected layer is connected after a head layer (Head), and the head layer will The output dimension of the second feature map 260 is expanded to increase the mask prediction accuracy, and then a fully connected network (FCN) operation is performed in each ROI to generate the target mask 300 as shown in Figure 4.
本申请实施例通过三个分支分别操作获得目标分类、目标框和目标掩码。The embodiment of this application obtains target classification, target box and target mask through three branch operations respectively.
在一个可选的实施例中,本申请实施例通过三个分支依次操作,例如在预测阶段先进行所述分类预测和回归预测操作,将所得结果传入所述掩码预测分支,快速、准确的得到所述目标掩码。In an optional embodiment, the embodiment of the present application operates sequentially through three branches. For example, in the prediction stage, the classification prediction and regression prediction operations are first performed, and the obtained results are passed into the mask prediction branch, which is fast and accurate. to get the target mask.
其次,使用所述目标掩码300和所述待识别图像100进行掩码操作400,并输出目标掩码图像500。在本申请实施例中,如图4所述,所述目标掩码300包含两种元素0和1,0代表黑色,1代表透明。所述掩码操作400为根据所述目标掩码300生成切片图片,即所述待识别图像100和所述目标掩码300之间进行相乘操作,所述目标掩码300的0将原图片RGB数值置为0,所述目标掩码300的1不改变所述待识别图像100的RGB数值。如图4所示,生成的所述目标掩码图像500是将待测目标从图像中分割出来。所述目标掩码图像500不包含环境的背景,能够有效降低环境带来的噪声。Secondly, a mask operation 400 is performed using the target mask 300 and the image to be recognized 100, and a target mask image 500 is output. In this embodiment of the present application, as shown in Figure 4, the target mask 300 includes two elements, 0 and 1, where 0 represents black and 1 represents transparent. The mask operation 400 is to generate a slice picture according to the target mask 300, that is, a multiplication operation is performed between the image to be recognized 100 and the target mask 300, and the 0 in the target mask 300 is the original picture. The RGB value is set to 0, and the 1 in the target mask 300 does not change the RGB value of the image 100 to be recognized. As shown in Figure 4, the target mask image 500 is generated by segmenting the target to be measured from the image. The target mask image 500 does not contain the background of the environment and can effectively reduce the noise caused by the environment.
最后,使用所述目标掩码图像500,进行目标属性识别600,并输出目标属性700。Finally, the target mask image 500 is used to perform target attribute recognition 600, and the target attribute 700 is output.
在本申请实施例中,对所述目标掩码图像500进行卷积操作,通过多层卷积操作,进行多任务多标签分类,识别结果如图5所示。In this embodiment of the present application, a convolution operation is performed on the target mask image 500, and multi-task multi-label classification is performed through multi-layer convolution operations. The recognition results are shown in Figure 5.
至此,本申请实施例使用所述基于分割算法的目标属性识别方法,完成对所述待识别图像100的目标属性识别,并输出目标属性700。At this point, the embodiment of the present application uses the target attribute recognition method based on the segmentation algorithm to complete the target attribute recognition of the image 100 to be recognized, and output the target attribute 700.
在一个可选的实施例中,先使用所述目标属性识别模型的掩码预测分支对所述第二特征图进行区域预测并输出目标掩码,将所述目标掩码与所述待识别图像进行乘法操作并获取目标掩码图像;再使用所述目标属性识别模型进行目标属性识别并输出待识别图像的目标的属性,具体的,根据所述目标分类使用所述目标属性识别模型中对应的属性识别模型对所述目标掩码图像进行目标属性识别,并输出待识别图像的目标的属性,所述属性识别模型为多任务多标签分类模型。In an optional embodiment, first use the mask prediction branch of the target attribute recognition model to perform area prediction on the second feature map and output a target mask, and compare the target mask with the image to be recognized. Perform a multiplication operation and obtain the target mask image; then use the target attribute recognition model to perform target attribute recognition and output the attributes of the target of the image to be recognized. Specifically, use the corresponding in the target attribute recognition model according to the target classification. The attribute recognition model performs target attribute recognition on the target mask image and outputs the attributes of the target in the image to be recognized. The attribute recognition model is a multi-task multi-label classification model.
本申请实施例通过掩码预测分支获取的目标掩码与待识别图像进行乘法操作以获取目标掩码图像,并根据分类预测分支获取的目标分类选择对应的多任务多标签分类模型的属性识别模型对目标掩码图像进行属性识别并输出目标的属性。 In the embodiment of the present application, the target mask obtained by the mask prediction branch is multiplied with the image to be recognized to obtain the target mask image, and the attribute recognition model of the corresponding multi-task multi-label classification model is selected according to the target classification obtained by the classification prediction branch. Perform attribute recognition on the target mask image and output the attributes of the target.
具体的,例如目标为车辆,则选择对应的多任务多标签车辆分类模型的属性识别模型对目标掩码图像进行属性识别并输出车辆的属性;例如目标为狗,则选择对应的多任务多标签狗分类模型的属性识别模型对目标掩码图像进行属性识别并输出狗的属性;例如目标为行人,则选择对应的多任务多标签行人分类模型的属性识别模型对目标掩码图像进行属性识别并输出行人的属性。Specifically, for example, if the target is a vehicle, select the attribute recognition model of the corresponding multi-task multi-label vehicle classification model to perform attribute recognition on the target mask image and output the attributes of the vehicle; for example, if the target is a dog, select the corresponding multi-task multi-label model. The attribute recognition model of the dog classification model performs attribute recognition on the target mask image and outputs the dog's attributes; for example, if the target is a pedestrian, select the attribute recognition model of the corresponding multi-task multi-label pedestrian classification model to perform attribute recognition on the target mask image and Output pedestrian attributes.
在另一个可选的实施例中,使用所述目标属性识别模型对所述待识别图像进行掩码操作并获取目标掩码图像,具体的,先将所述输出目标框与所述待识别图像进行乘法操作以获取目标框掩码图像,再将所述目标掩码与目标框掩码图像进行乘法操作并获取目标掩码图像;再使用所述属性识别模型进行目标属性识别并输出待识别图像的目标的属性,具体的,根据所述目标分类使用所述目标属性识别模型中对应的属性识别模型对所述目标掩码图像进行目标属性识别,并输出待识别图像的目标的属性,所述属性识别模型为多任务多标签分类模型。In another optional embodiment, the target attribute recognition model is used to perform a masking operation on the image to be recognized and obtain a target mask image. Specifically, the output target frame is first combined with the image to be recognized. Perform a multiplication operation to obtain the target frame mask image, and then perform a multiplication operation on the target mask and the target frame mask image to obtain the target mask image; then use the attribute recognition model to identify the target attribute and output the image to be recognized. The attributes of the target, specifically, use the corresponding attribute recognition model in the target attribute recognition model to perform target attribute recognition on the target mask image according to the target classification, and output the attributes of the target of the image to be recognized, the The attribute recognition model is a multi-task multi-label classification model.
本申请实施例通过回归预测分支获取的目标框与待识别图像进行乘法操作以获取目标框掩码图像,然后通过掩码预测分支获取的目标掩码与目标框掩码图像进行乘法操作以获取目标掩码图像,并根据分类预测分支获取的目标分类选择对应的多任务多标签分类模型的属性识别模型对目标掩码图像进行属性识别并输出目标的属性,能够进一步提高目标掩码图像的获取准确率。In the embodiment of this application, the target frame obtained through the regression prediction branch is multiplied by the image to be recognized to obtain the target frame mask image, and then the target mask obtained by the mask prediction branch is multiplied by the target frame mask image to obtain the target. Mask image, and select the attribute recognition model of the corresponding multi-task multi-label classification model according to the target classification obtained by the classification prediction branch to identify the attributes of the target mask image and output the attributes of the target, which can further improve the accuracy of obtaining the target mask image. Rate.
本申请实施例选用ResNet50网络提取所述待识别图像100的多个阶段的特征图,并进一步使用FPN网络,将至少一个阶段的特征融合在一起,形成所述第一特征图220,从而利用ResNet50网络各个阶段提取到的特征,既提取高层特征图的语义特征,又提取低层的轮廓特征,解决较小物体无法检测的问题;同时,本申请实施例引入RPN网络进行区域检测,所述RPN网络不需要查找所有的区域筛选框,能够显著提高检测速度,并且更容易与其他神经网络结合;另一方面,本申请实施例采用ROI Align的方式实现所述像素空间对齐,能够有效降低误差;然后,进行所述分类预测和回归预测操作,将所得结果传入所述掩码预测分支,快速、准确的得到所述目标掩码300;所述目标掩码图像500不包含环境的背景能够有效降低环境带来的噪声;对所述目标掩码图像500进行卷积操作,通过多层卷积操作,对提取的特征进行多任务多标签分类,从而提取出所述目标属性700;本申请实施例能够实 现快速的过滤和协助查询,极大的提高工作效率,具有广泛的应用前景。The embodiment of the present application selects the ResNet50 network to extract feature maps of multiple stages of the image to be recognized 100, and further uses the FPN network to fuse the features of at least one stage together to form the first feature map 220, thereby utilizing ResNet50 The features extracted at each stage of the network not only extract the semantic features of the high-level feature map, but also extract the low-level contour features to solve the problem that smaller objects cannot be detected; at the same time, the embodiment of the present application introduces the RPN network for area detection, and the RPN network There is no need to search for all area filtering frames, which can significantly improve the detection speed and make it easier to combine with other neural networks; on the other hand, the embodiment of the present application uses ROI Align to achieve the pixel space alignment, which can effectively reduce errors; then , perform the classification prediction and regression prediction operations, pass the obtained results into the mask prediction branch, and quickly and accurately obtain the target mask 300; the target mask image 500 does not contain the background of the environment, which can effectively reduce Noise brought by the environment; perform a convolution operation on the target mask image 500, and perform multi-task and multi-label classification on the extracted features through multi-layer convolution operations, thereby extracting the target attributes 700; Embodiments of the present application able to implement It can quickly filter and assist queries, greatly improve work efficiency, and has broad application prospects.
基于本申请实施例的目标属性识别方法,在实际应用中,例如在商超、街道等监控场景中,本申请实施例所述目标属性识别方法进一步能够扩展成一种基于分割算法的行人属性识别方法,其中与本申请所述第一个实施例相同和共性的部分不再赘述,仅针对行人识别特殊的部分做出具体说明。在安防领域伴随需监控的场景增多,人流密级程度增加,监控时长普遍需要7×24小时,导致监控数据量激增,在此情况下,如单纯依靠人力进行排查,耗时耗力,且准确性无法保证,因此迫切需要利用计算机视觉算法来完成自动化的监控,实现快速识别和精确查找。Based on the target attribute identification method of the embodiment of the present application, in practical applications, such as in surveillance scenarios such as shopping malls and streets, the target attribute identification method of the embodiment of the present application can be further expanded into a pedestrian attribute identification method based on a segmentation algorithm. , where the same and common parts as those in the first embodiment described in this application will not be described again, and only the special parts of pedestrian recognition will be specifically explained. In the security field, as the number of scenes that need to be monitored increases, the density of people flow increases, and the monitoring time generally requires 7×24 hours, resulting in a surge in the amount of monitoring data. In this case, relying solely on manpower for investigation is time-consuming, labor-intensive, and inaccurate. There is no guarantee, so there is an urgent need to use computer vision algorithms to complete automated monitoring and achieve rapid identification and accurate search.
在行人识别领域中,行人属性是行人识别过程中最为关键的因素,利用计算机视觉的方式,通过深度学习算法,利用卷积神经网络灵活和快速的优势。对待识别图像进行分割,只保留感兴趣的行人区域,并针对行人区域进行行人特征提取,完成行人属性的识别,能够极大的提高工作效率。In the field of pedestrian recognition, pedestrian attributes are the most critical factor in the pedestrian recognition process. Computer vision is used, through deep learning algorithms, and the flexibility and speed of convolutional neural networks. Segmenting the image to be recognized only retains the pedestrian area of interest, and extracts pedestrian features for the pedestrian area to complete the identification of pedestrian attributes, which can greatly improve work efficiency.
本申请的第二个实施例提供了一种行人属性识别方法,该行人属性识别方法基于分割算法实现。该方法包括:The second embodiment of the present application provides a method for identifying pedestrian attributes, which is implemented based on a segmentation algorithm. The method includes:
使用预设置的行人属性识别模型对接收的待识别图像进行行人识别,并输出行人掩码,所述行人掩码为基于分割算法进行像素空间对齐获得的;Use a preset pedestrian attribute recognition model to perform pedestrian recognition on the received image to be recognized, and output a pedestrian mask, which is obtained by pixel space alignment based on a segmentation algorithm;
根据所述行人掩码,使用所述行人属性识别模型对所述待识别图像进行掩码操作并获取行人掩码图像;According to the pedestrian mask, use the pedestrian attribute recognition model to perform a mask operation on the image to be recognized and obtain a pedestrian mask image;
根据所述行人掩码图像,使用所述行人属性识别模型进行行人属性识别,并输出待识别图像的行人的属性,所述属性包括所述行人的多标签属性。According to the pedestrian mask image, the pedestrian attribute recognition model is used to perform pedestrian attribute recognition, and the attributes of the pedestrian in the image to be recognized are output, where the attributes include multi-label attributes of the pedestrian.
在本申请实施例中,行人属性识别方法能够使用预设置的行人属性识别模型对接收的待识别图像进行行人识别,并输出行人掩码,利用所述行人掩码从所述待识别图像中分割出所述行人掩码图像,并针对所述行人掩码图像进行所述行人属性的识别,最终输出所述行人的多标签属性,具有较高的识别度和准确率。In the embodiment of the present application, the pedestrian attribute recognition method can use a preset pedestrian attribute recognition model to perform pedestrian recognition on the received image to be recognized, and output a pedestrian mask, and use the pedestrian mask to segment from the image to be recognized. The pedestrian mask image is generated, and the attributes of the pedestrian are identified for the pedestrian mask image, and finally the multi-label attributes of the pedestrian are output, which has a high degree of recognition and accuracy.
在一个可选的实施例中,所述多标签属性包括性别属性、头饰属性、发型属性、服饰属性、服饰颜色属性、配饰属性、遮挡属性、截断属性和朝向属性中的至少三个。In an optional embodiment, the multi-label attributes include at least three of gender attributes, headgear attributes, hairstyle attributes, clothing attributes, clothing color attributes, accessory attributes, occlusion attributes, truncation attributes and orientation attributes.
以一个具体的示例进行说明:Let’s illustrate with a specific example:
首先获取一张所述待识别图像100,所述待识别图像的来源,包括但不 限定于视频文件中的某一帧或者监控视频流中的某一帧,将所述待识别图像100输入到一个预置的已完成训练的主干卷积神经网络中,综合考虑到识别速度和识别精度的要求,选用ResNet系列的ResNet50网络提取多个阶段的特征图,再引入特征图金字塔网络(Feature Pyramid Network,FPN),将至少一个阶段的特征图融合在一起并输出第一特征图,既提取高层特征图的语义特征,又提取低层的轮廓特征。基于所述第一特征图220输入RPN网络进行区域检测230以提取所述区域筛选框240,再采用ROI对齐(ROI Align)的方式实现所述像素空间对齐以完成区域特征匹配250并输出第二特征图260,再次进行区域检测270以提取所述行人掩码300,然后将所述行人掩码300和所述待识别图像100进行所述掩码操作400并输出行人掩码图像500。所述行人掩码图像500不包含环境的背景,能够有效降低环境带来的噪声。First, an image 100 to be recognized is obtained. The source of the image to be recognized includes but not Limited to a certain frame in the video file or a certain frame in the surveillance video stream, the image to be recognized 100 is input into a preset backbone convolutional neural network that has completed training, taking into account the recognition speed and recognition For accuracy requirements, the ResNet50 network of the ResNet series is selected to extract feature maps of multiple stages, and then the Feature Pyramid Network (FPN) is introduced to fuse the feature maps of at least one stage together and output the first feature map. The semantic features of high-level feature maps are extracted, and the low-level contour features are extracted. Input the RPN network based on the first feature map 220 to perform region detection 230 to extract the region filtering frame 240, and then use ROI alignment (ROI Align) to achieve the pixel space alignment to complete regional feature matching 250 and output the second In the feature map 260, the area detection 270 is performed again to extract the pedestrian mask 300, and then the pedestrian mask 300 and the image to be recognized 100 are subjected to the mask operation 400 and the pedestrian mask image 500 is output. The pedestrian mask image 500 does not contain the background of the environment and can effectively reduce the noise caused by the environment.
最后,对所述行人掩码图像500进行多任务多标签分类并输出行人属性,识别结果如图5所示,所述行人属性包括但不限定于性别属性、头饰属性、发型属性、服饰属性、服饰颜色属性、配饰属性、遮挡属性、截断属性和朝向属性中至少三个属性。Finally, the pedestrian mask image 500 is subjected to multi-task multi-label classification and pedestrian attributes are output. The recognition results are shown in Figure 5. The pedestrian attributes include but are not limited to gender attributes, headgear attributes, hairstyle attributes, clothing attributes, At least three attributes from the clothing color attribute, accessory attribute, occlusion attribute, truncation attribute and orientation attribute.
本申请的第三个实施例提供了一种模型训练方法,包括:The third embodiment of this application provides a model training method, including:
获取多个样本识别图像,并对各样本识别图像的目标按照像素空间对齐方式进行标注;Obtain multiple sample recognition images, and label the targets of each sample recognition image according to the pixel space alignment;
使用已标注的多个样本识别图像对目标属性识别模型进行目标识别训练。Use the labeled multiple sample recognition images to train the target attribute recognition model for target recognition.
在本申请实施例中,通过已标注的样本识别图像对目标属性识别模型进行训练,例如将已标注的行人、车辆、狗的样本识别图像输入到目标属性识别模型中,通过目标属性识别模型的目标识别模型对所述样本识别图像进行特征提取并输出第一特征图,通过目标属性识别模型的区域检测模型对第一特征图进行区域检测并输出多个区域筛选框,通过目标属性识别模型的区域特征匹配模型对区域筛选框进行区域特征匹配并输出第二特征图,通过目标属性识别模型的区域检测模型对第二特征图进行区域检测并输出目标掩码;再通过掩码操作获取目标掩码图像,并利用目标属性识别模型的属性识别模型进行目标属性识别,根据获取的目标属性判断目标属性识别模型的准确率,如果未达到预设目标,则进一步调整参数并继续训练,直到满足预设目标为止。 In the embodiment of the present application, the target attribute recognition model is trained through labeled sample recognition images. For example, labeled sample recognition images of pedestrians, vehicles, and dogs are input into the target attribute recognition model. The target recognition model performs feature extraction on the sample recognition image and outputs a first feature map, performs region detection on the first feature map through the region detection model of the target attribute recognition model and outputs a plurality of region filtering frames, and performs region detection on the first feature map through the region detection model of the target attribute recognition model. The regional feature matching model performs regional feature matching on the regional filtering frame and outputs the second feature map. It performs region detection on the second feature map through the region detection model of the target attribute recognition model and outputs the target mask; and then obtains the target mask through the mask operation. code image, and use the attribute recognition model of the target attribute recognition model to perform target attribute recognition, and judge the accuracy of the target attribute recognition model based on the obtained target attributes. If the preset target is not reached, further adjust the parameters and continue training until the preset target is met. Until you set goals.
进一步地,所述目标属性识别模型包括掩码预测分支、回归预测分支和分类预测分支,以及多标签分类损失函数,所述使用已标注的多个样本识别图像对目标属性识别模型进行目标识别训练进一步包括:Further, the target attribute recognition model includes a mask prediction branch, a regression prediction branch and a classification prediction branch, as well as a multi-label classification loss function. The target attribute recognition model is trained using multiple labeled sample recognition images for target recognition. Further includes:
根据预设置的第一准确率阈值,所述掩码预测分支、回归预测分支和分类预测分支分别通过预设置的损失函数进行计算并调整模型参数;According to the preset first accuracy threshold, the mask prediction branch, regression prediction branch and classification prediction branch are respectively calculated through the preset loss function and the model parameters are adjusted;
根据预设置的第二准确率阈值,通过所述多标签分类损失函数对所述属性识别模型进行模型参数调整。According to the preset second accuracy threshold, the model parameters of the attribute recognition model are adjusted through the multi-label classification loss function.
在本申请实施例中,根据预设置的第一准确率阈值,例如90%,各预测分支进行损失函数的计算并获得总的损失值,使用第一准确率阈值判断所述损失值,直到满足第一准确率阈值为止;同理,根据预设置的第二准确率阈值,对多标签分类损失函数计算的损失值进行判断,知道满足第二准确率阈值为止。In this embodiment of the present application, according to the preset first accuracy threshold, for example, 90%, each prediction branch calculates the loss function and obtains the total loss value, and uses the first accuracy threshold to determine the loss value until it meets until the first accuracy threshold; similarly, according to the preset second accuracy threshold, the loss value calculated by the multi-label classification loss function is judged until the second accuracy threshold is met.
在本申请实施例中,选用开源的MS COCO数据集做为训练集,选用Cityscapes数据集中带有行人的精细标注的样本材料作为第一测试集,选用从一个已运行一年的安防系统备份视频资料中手工整理和标注的样本素材做为第二测试集。In the embodiment of this application, the open source MS COCO data set is selected as the training set, the sample material with fine annotations of pedestrians in the Cityscapes data set is selected as the first test set, and the backup video from a security system that has been running for one year is selected. The sample materials manually organized and labeled in the data are used as the second test set.
所述MS COCO数据集,预置了80种不同的物体,非常适合训练行人检测模型的情况,能够有效的区分样本素材中行人和其他相关物体,如车、猫、狗、树木和标识牌等。另外,由于卷积神经网络用训练集训练的模型对目标分辨率是敏感的,特别是针对行人检测问题,当使用一个数据集训练的模型去测试另一个数据集的行人时,会出现目标召回率低的问题,并且恰好所述MS COCO数据集的图像分辨率并不一致,为提高模型的准确度,把输入图像统一处理为1024×1024分辨率,在保证样本图像原始纵横比的前提下,对于其他部分进行补0处理。The MS COCO data set is preset with 80 different objects, which is very suitable for training pedestrian detection models, and can effectively distinguish pedestrians and other related objects in sample materials, such as cars, cats, dogs, trees and signs, etc. . In addition, because the model trained by the convolutional neural network using the training set is sensitive to the target resolution, especially for pedestrian detection problems, target recall will occur when a model trained on one data set is used to test pedestrians in another data set. The problem of low rate, and the image resolution of the MS COCO data set is inconsistent. In order to improve the accuracy of the model, the input image is uniformly processed to a resolution of 1024×1024. On the premise of ensuring the original aspect ratio of the sample image, Fill other parts with 0s.
所述Cityscapes数据集包含5000张带有张精细标注的样本图片,其中并非所有图像中都包含行人,因此本申请实施例中,针对行人检测筛选出带有行人的2900张图像做为所述第一测试集进行测试。The Cityscapes data set contains 5,000 sample images with fine annotations, and not all of the images contain pedestrians. Therefore, in the embodiment of this application, 2,900 images with pedestrians are screened out for pedestrian detection as the third image. A test set for testing.
所述第二测试集是从真实的安防系统的备份视频资料中获取,并且统一处理为1024×1024分辨率,由相关技术人员手工标注相关信息,总共500张做为第二测试集进行测试。在一个可选的实施例中,也可以将所述第一测试集和所述第二测试混合成第三测试集,在此不再赘述。 The second test set was obtained from the backup video data of the real security system, and was uniformly processed to a resolution of 1024×1024, and the relevant information was manually annotated by relevant technical personnel. A total of 500 images were used as the second test set for testing. In an optional embodiment, the first test set and the second test may also be mixed into a third test set, which will not be described again here.
在本申请实施例中,所述目标属性识别模型包括掩码预测分支、回归预测分支和分类预测分支,以及多标签分类损失函数,所述使用已标注的多个样本识别图像对目标属性识别模型进行目标识别训练进一步包括:In the embodiment of this application, the target attribute recognition model includes a mask prediction branch, a regression prediction branch and a classification prediction branch, as well as a multi-label classification loss function. The target attribute recognition model uses multiple labeled sample recognition images. Conducting target recognition training further includes:
根据预设置的第一准确率阈值,所述掩码预测分支选用交叉熵损失函数、所述回归预测分支选用smooth L1损失函数,所述分类预测分支选用交叉熵损失函数,并设置第一准确率阈值,以便计算和调整模型参数。在本申请实施例中设定第一准确率阈值为90%,另一方面,在所述多标签分类,选用交叉熵损失函数,在本申请实施例中设定第二准确率阈值为90%,所述属性识别模型进行模型参数调整。According to the preset first accuracy threshold, the mask prediction branch selects the cross-entropy loss function, the regression prediction branch selects the smooth L1 loss function, the classification prediction branch selects the cross-entropy loss function, and sets the first accuracy rate Thresholds to calculate and adjust model parameters. In the embodiment of this application, the first accuracy threshold is set to 90%. On the other hand, in the multi-label classification, the cross-entropy loss function is selected, and in the embodiment of this application, the second accuracy threshold is set to 90%. , the attribute recognition model performs model parameter adjustment.
本申请实施例根据行人属性识别的具体需求,在数据集的选取、神经网络架构和损失函数的选取等方面做出针对性的设计,运用所述模型训练方法,能够训练出高效且稳定的模型,从而实现基于分割算法的目标属性识别、或者行人属性识别。需要说明的是,本申请对模型训练的其他细节信息,例如初始参数的选择、GPU硬件的选择等不作具体限定,本领域技术人员应当根据实际应用需求进行选择即可,在此不再赘述。According to the specific needs of pedestrian attribute recognition, the embodiments of this application make targeted designs in the selection of data sets, neural network architecture, and loss function selection. By using the model training method, an efficient and stable model can be trained , thereby realizing target attribute recognition or pedestrian attribute recognition based on segmentation algorithm. It should be noted that this application does not specifically limit other details of model training, such as the selection of initial parameters, the selection of GPU hardware, etc. Those skilled in the art should make selections based on actual application requirements, and will not be described again here.
相应地,本申请还提供一种目标属性识别装置700,如图6所示包括:Correspondingly, this application also provides a target attribute identification device 700, which includes:
目标掩码获取单元701,用于对接收的待识别图像进行目标识别,并输出目标掩码,所述目标掩码为基于分割算法进行像素空间对齐获得的;The target mask acquisition unit 701 is used to perform target recognition on the received image to be recognized, and output a target mask, which is obtained by pixel space alignment based on a segmentation algorithm;
目标掩码图像获取单元702,用于根据所述目标掩码对所述待识别图像进行掩码操作并获取目标掩码图像;The target mask image acquisition unit 702 is configured to perform a mask operation on the image to be recognized according to the target mask and acquire the target mask image;
目标属性识别单元703,用于对所述目标掩码图像进行目标属性识别,并输出待识别图像的目标的属性,所述属性包括所述目标的多标签属性。The target attribute recognition unit 703 is configured to perform target attribute recognition on the target mask image, and output attributes of the target in the image to be recognized, where the attributes include multi-label attributes of the target.
前述实施方式也适用于本申请实施例提供的目标属性识别装置,在本申请实施例中不再详细描述。前述实施例和随之带来的有益效果同样适用于本申请实施例,因此,相同的部分不再赘述。The foregoing embodiments are also applicable to the target attribute identification device provided in the embodiments of the present application, and will not be described in detail in the embodiments of the present application. The aforementioned embodiments and the accompanying beneficial effects are also applicable to the embodiments of the present application, and therefore the same parts will not be described again.
相应地,本申请还提供一种行人属性识别装置800,如图7所示包括:Correspondingly, this application also provides a pedestrian attribute recognition device 800, which includes:
行人掩码获取单元801,用于对接收的待识别图像进行行人识别,并输出行人掩码,所述行人掩码为基于分割算法进行像素空间对齐获得的;The pedestrian mask acquisition unit 801 is used to perform pedestrian recognition on the received image to be recognized, and output a pedestrian mask, which is obtained by pixel space alignment based on a segmentation algorithm;
行人掩码图像获取单元802,用于根据所述行人掩码对所述待识别图像进行掩码操作并获取行人掩码图像;A pedestrian mask image acquisition unit 802, configured to perform a mask operation on the image to be recognized according to the pedestrian mask and obtain a pedestrian mask image;
行人属性识别单元803,用于对所述行人掩码图像进行行人属性识别, 并输出待识别图像的行人的属性,所述属性包括所述行人的多标签属性。Pedestrian attribute identification unit 803 is used to identify pedestrian attributes on the pedestrian mask image, And output the attributes of the pedestrian in the image to be recognized, where the attributes include multi-label attributes of the pedestrian.
前述实施方式也适用于本申请实施例提供的行人属性识别装置,在本申请实施例中不再详细描述。前述实施例和随之带来的有益效果同样适用于本申请实施例,因此,相同的部分不再赘述。The foregoing embodiments are also applicable to the pedestrian attribute recognition device provided in the embodiments of the present application, and will not be described in detail in the embodiments of the present application. The aforementioned embodiments and the accompanying beneficial effects are also applicable to the embodiments of the present application, and therefore the same parts will not be described again.
相应地,本申请还提供一种模型训练装置900,如图8所示包括:Correspondingly, this application also provides a model training device 900, which includes:
标注单元901,用于获取多个样本识别图像,并对各样本识别图像的目标按照像素空间对齐方式进行标注;Annotation unit 901 is used to obtain multiple sample identification images and annotate the targets of each sample identification image according to the pixel space alignment;
训练单元902,用于使用已标注的多个样本识别图像对目标属性识别模型进行目标识别训练。The training unit 902 is used to perform target recognition training on the target attribute recognition model using multiple labeled sample recognition images.
前述实施方式也适用于本申请实施例提供的模型训练装置,在本申请实施例中不再详细描述。前述实施例和随之带来的有益效果同样适用于本申请实施例,因此,相同的部分不再赘述。The foregoing embodiments are also applicable to the model training device provided in the embodiments of the present application, and will not be described in detail in the embodiments of the present application. The aforementioned embodiments and the accompanying beneficial effects are also applicable to the embodiments of the present application, and therefore the same parts will not be described again.
本申请的另一个实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现:所述基于分割算法的目标属性识别方法,或者基于分割算法的行人属性识别方法,或者模型训练方法。Another embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements: the target attribute identification method based on the segmentation algorithm, or the pedestrian identification method based on the segmentation algorithm. Attribute identification method, or model training method.
在实际应用中,所述计算机可读存储介质可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请实施例中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。In practical applications, the computer-readable storage medium may be any combination of one or more computer-readable media. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections having one or more conductors, portable computer disks, hard drives, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the embodiments of the present application, the computer-readable storage medium may be any tangible medium containing or storing a program, which may be used by or in combination with an instruction execution system, apparatus or device.
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置 或者器件使用或者与其结合使用的程序。A computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can be sent, propagated, or transmitted for use by an instruction execution system or device. or a program for use with or in conjunction with the device.
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、电线、光缆、RF等等,或者上述的任意合适的组合。Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wire, optical cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言或其组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing the operations of the present application may be written in one or more programming languages, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional Procedural programming language—such as "C" or a similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In situations involving remote computers, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through Internet connection).
如图9所示,本申请的另一个实施例提供的一种计算机设备的结构示意图。图9显示的计算机设备12仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。As shown in Figure 9, another embodiment of the present application provides a schematic structural diagram of a computer device. The computer device 12 shown in FIG. 9 is only an example and should not bring any limitations to the functions and scope of use of the embodiments of the present application.
如图9所示,计算机设备12以通用计算设备的形式表现。计算机设备12的组件可以包括但不限于:一个或者多个处理器或者处理单元16,系统存储器28,连接不同系统组件(包括系统存储器28和处理单元16)的总线18。As shown in Figure 9, computer device 12 is embodied in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, system memory 28, and a bus 18 connecting various system components, including system memory 28 and processing unit 16.
总线18表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(ISA)总线,微通道体系结构(MAC)总线,增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics accelerated port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include, but are not limited to, the Industry Standard Architecture (ISA) bus, the Micro Channel Architecture (MAC) bus, the Enhanced ISA bus, the Video Electronics Standards Association (VESA) local bus, and the Peripheral Component Interconnect ( PCI) bus.
计算机设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被计算机设备12访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。Computer device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer device 12, including volatile and nonvolatile media, removable and non-removable media.
系统存储器28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(RAM)30和/或高速缓存存储器32。计算机设备12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(图9未显示,通常称为“硬盘驱动器”)。尽管图9中未示出,可以提供用于对可移 动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如CD-ROM,DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线18相连。存储器28可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本申请各实施例的功能。System memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32 . Computer device 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in Figure 9, commonly referred to as a "hard drive"). Although not shown in Figure 9, provision may be made for removable Disk drives that read and write removable non-volatile disks (such as "floppy disks"), and optical disk drives that read and write removable non-volatile optical disks (such as CD-ROM, DVD-ROM or other optical media). In these cases, each drive may be connected to bus 18 through one or more data media interfaces. The memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present application.
具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如存储器28中,这样的程序模块42包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本申请所描述的实施例中的功能和/或方法。A program/utility 40 having a set of (at least one) program modules 42, including but not limited to an operating system, one or more application programs, other program modules, and program data, may be stored, for example, in memory 28 , each of these examples or some combination may include the implementation of a network environment. Program modules 42 generally perform functions and/or methods in the embodiments described herein.
计算机设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信,还可与一个或者多个使得用户能与该计算机设备12交互的设备通信,和/或与使得该计算机设备12能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22进行。并且,计算机设备12还可以通过网络适配器20与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图9所示,网络适配器20通过总线18与计算机设备12的其它模块通信。应当明白,尽管图9中未示出,可以结合计算机设备12使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with Any device (eg, network card, modem, etc.) that enables the computer device 12 to communicate with one or more other computing devices. This communication may occur through input/output (I/O) interface 22. Furthermore, computer device 12 may also communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through network adapter 20. As shown in FIG. 9, network adapter 20 communicates with other modules of computer device 12 via bus 18. It should be understood that, although not shown in Figure 9, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tapes drives and data backup storage systems, etc.
处理器单元16通过运行存储在系统存储器28中的程序,从而执行各种功能应用以及数据处理一种基于分割算法的目标属性识别方法,或者一种基于分割算法的行人属性识别方法,或者一种模型训练方法。The processor unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, a target attribute identification method based on a segmentation algorithm, or a pedestrian attribute identification method based on a segmentation algorithm, or a Model training methods.
显然,本申请的上述实施例仅仅是为清楚地说明本申请所作的举例,而并非是对本申请的实施方式的限定,对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动,这里无法对所有的实施方式予以穷举,凡是属于本申请的技术方案所引伸出的显而易见的变化或变动仍处于本申请的保护范围之列。 Obviously, the above-mentioned embodiments of the present application are only examples to clearly illustrate the present application, and are not intended to limit the implementation of the present application. For those of ordinary skill in the art, based on the above description, they can also make There are other different forms of changes or modifications, and it is impossible to exhaustively enumerate all the implementations here. All obvious changes or modifications derived from the technical solutions of the present application are still within the protection scope of the present application.

Claims (13)

  1. 一种目标属性识别方法,所述方法包括:A target attribute identification method, the method includes:
    使用预设置的目标属性识别模型对接收的待识别图像进行目标识别,并输出目标掩码,所述目标掩码为基于分割算法进行像素空间对齐获得的;Use a preset target attribute recognition model to perform target recognition on the received image to be recognized, and output a target mask, which is obtained by pixel space alignment based on a segmentation algorithm;
    根据所述目标掩码,使用所述目标属性识别模型对所述待识别图像进行掩码操作并获取目标掩码图像;According to the target mask, use the target attribute recognition model to perform a mask operation on the image to be recognized and obtain a target mask image;
    根据所述目标掩码图像,使用所述目标属性识别模型进行目标属性识别,并输出所述待识别图像中的目标的属性,所述属性包括所述目标的多标签属性。According to the target mask image, the target attribute recognition model is used to perform target attribute recognition, and attributes of the target in the image to be recognized are output, where the attributes include multi-label attributes of the target.
  2. 根据权利要求1所述的目标属性识别方法,所述使用预设置的目标属性识别模型对接收的待识别图像进行目标识别,并输出目标掩码进一步包括:The target attribute recognition method according to claim 1, wherein using a preset target attribute recognition model to perform target recognition on the received image to be recognized and outputting the target mask further includes:
    使用所述目标属性识别模型对所述待识别图像进行特征提取并输出第一特征图;Use the target attribute recognition model to perform feature extraction on the image to be recognized and output a first feature map;
    使用所述目标属性识别模型对所述第一特征图进行区域检测并输出多个区域筛选框;Use the target attribute recognition model to perform region detection on the first feature map and output multiple region filtering frames;
    使用所述目标属性识别模型对所述区域筛选框进行区域特征匹配并输出第二特征图,所述第二特征图为基于分割算法进行像素空间对齐获得的;Use the target attribute recognition model to perform regional feature matching on the regional filtering frame and output a second feature map, where the second feature map is obtained by pixel space alignment based on a segmentation algorithm;
    使用所述目标属性识别模型对所述第二特征图进行区域检测并输出目标掩码。The target attribute recognition model is used to perform area detection on the second feature map and output a target mask.
  3. 根据权利要求2所述的目标属性识别方法,所述目标属性识别模型包括特征提取网络、第一特征图金字塔网络和区域生成网络;The target attribute identification method according to claim 2, the target attribute identification model includes a feature extraction network, a first feature map pyramid network and a region generation network;
    所述使用所述目标属性识别模型对所述待识别图像进行特征提取并输出第一特征图进一步包括:The use of the target attribute recognition model to extract features from the image to be recognized and output the first feature map further includes:
    使用所述特征提取网络对所述待识别图像进行特征提取并输出多层特征原图;Use the feature extraction network to perform feature extraction on the image to be recognized and output a multi-layer feature original image;
    使用所述第一特征图金字塔网络,根据至少一层所述特征原图输出所述第一特征图; Use the first feature map pyramid network to output the first feature map according to at least one layer of the original feature map;
    所述使用所述目标属性识别模型对所述第一特征图进行区域检测并输出多个区域筛选框进一步包括:根据预设置的锚框,使用所述区域生成网络对所述第一特征图进行区域检测并输出多个区域筛选框。The method of using the target attribute recognition model to perform region detection on the first feature map and outputting a plurality of region filtering frames further includes: using the region generation network to perform region detection on the first feature map according to a preset anchor frame. Region detection and output of multiple region filter boxes.
  4. 根据权利要求2所述的目标属性识别方法,所述目标属性识别模型包括掩码预测分支、回归预测分支和分类预测分支;The target attribute identification method according to claim 2, the target attribute identification model includes a mask prediction branch, a regression prediction branch and a classification prediction branch;
    使用所述目标属性识别模型对所述第二特征图进行区域检测并输出目标掩码进一步包括:Using the target attribute recognition model to perform region detection on the second feature map and outputting a target mask further includes:
    使用所述掩码预测分支对所述第二特征图进行区域预测并输出目标掩码;Use the mask prediction branch to perform region prediction on the second feature map and output a target mask;
    使用所述回归预测分支对所述第二特征图进行区域预测并输出目标框;Use the regression prediction branch to perform regional prediction on the second feature map and output a target frame;
    使用所述分类预测分支对所述第二特征图进行分类预测并输出目标分类。Use the classification prediction branch to perform classification prediction on the second feature map and output a target classification.
  5. 根据权利要求4所述的目标属性识别方法,The target attribute identification method according to claim 4,
    所述根据所述目标掩码,使用所述目标属性识别模型对所述待识别图像进行掩码操作并获取目标掩码图像进一步包括:将所述目标掩码与所述待识别图像进行乘法操作并获取目标掩码图像;According to the target mask, using the target attribute recognition model to perform a masking operation on the image to be identified and obtaining the target mask image further includes: performing a multiplication operation on the target mask and the image to be identified. And obtain the target mask image;
    所述根据所述目标掩码图像,使用所述目标属性识别模型进行目标属性识别,并输出待识别图像的目标的属性进一步包括:根据所述目标分类使用所述目标属性识别模型中对应的属性识别模型对所述目标掩码图像进行目标属性识别,并输出待识别图像的目标的属性,所述属性识别模型为多任务多标签分类模型。The step of using the target attribute recognition model to perform target attribute recognition according to the target mask image, and outputting the attributes of the target in the image to be recognized further includes: using the corresponding attributes in the target attribute recognition model according to the target classification. The recognition model performs target attribute recognition on the target mask image and outputs the attributes of the target in the image to be recognized. The attribute recognition model is a multi-task multi-label classification model.
  6. 根据权利要求4所述的目标属性识别方法,The target attribute identification method according to claim 4,
    所述根据所述目标掩码,使用所述目标属性识别模型对所述待识别图像进行掩码操作并获取目标掩码图像进一步包括:将所述输出目标框与所述待识别图像进行乘法操作以获取目标框掩码图像,将所述目标掩码与目标框掩码图像进行乘法操作并获取目标掩码图像;According to the target mask, using the target attribute recognition model to perform a masking operation on the image to be recognized and obtaining the target mask image further includes: performing a multiplication operation on the output target frame and the image to be recognized. To obtain the target frame mask image, perform a multiplication operation on the target mask and the target frame mask image and obtain the target mask image;
    所述根据所述目标掩码图像,使用所述目标属性识别模型进行目标属性 识别,并输出待识别图像的目标的属性进一步包括:根据所述目标分类使用所述目标属性识别模型中对应的属性识别模型对所述目标掩码图像进行目标属性识别,并输出待识别图像的目标的属性,所述属性识别模型为多任务多标签分类模型。According to the target mask image, use the target attribute recognition model to identify target attributes. Identifying and outputting the attributes of the target of the image to be recognized further includes: using the corresponding attribute recognition model in the target attribute recognition model to perform target attribute recognition on the target mask image according to the target classification, and outputting the attributes of the image to be recognized. The attribute of the target, the attribute recognition model is a multi-task multi-label classification model.
  7. 根据权利要求3所述的目标属性识别方法,所述特征提取网络为vgg网络、googlenet网络、resnet网络、以及resnext网络中的一个。According to the target attribute identification method of claim 3, the feature extraction network is one of a vgg network, a googlenet network, a resnet network, and a resnext network.
  8. 一种基于分割算法的行人属性识别方法,所述方法包括:A pedestrian attribute recognition method based on segmentation algorithm, the method includes:
    使用预设置的行人属性识别模型对接收的待识别图像进行行人识别,并输出行人掩码,所述行人掩码为基于分割算法进行像素空间对齐获得的;Use a preset pedestrian attribute recognition model to perform pedestrian recognition on the received image to be recognized, and output a pedestrian mask, which is obtained by pixel space alignment based on a segmentation algorithm;
    根据所述行人掩码,使用所述行人属性识别模型对所述待识别图像进行掩码操作并获取行人掩码图像;According to the pedestrian mask, use the pedestrian attribute recognition model to perform a mask operation on the image to be recognized and obtain a pedestrian mask image;
    根据所述行人掩码图像,使用所述行人属性识别模型进行行人属性识别,并输出所述待识别图像中的行人的属性,所述属性包括所述行人的多标签属性。According to the pedestrian mask image, the pedestrian attribute recognition model is used to perform pedestrian attribute recognition, and the attributes of the pedestrian in the image to be recognized are output, where the attributes include multi-label attributes of the pedestrian.
  9. 根据权利要求8所述的行人属性识别方法,所述多标签属性包括性别属性、头饰属性、发型属性、服饰属性、服饰颜色属性、配饰属性、遮挡属性、截断属性和朝向属性中的至少三个。The pedestrian attribute identification method according to claim 8, the multi-label attributes include at least three of gender attributes, headgear attributes, hairstyle attributes, clothing attributes, clothing color attributes, accessory attributes, occlusion attributes, truncation attributes and orientation attributes. .
  10. 一种模型训练方法,所述方法包括:A model training method, the method includes:
    获取多个样本识别图像,并对各样本识别图像的目标按照像素空间对齐方式进行标注;Obtain multiple sample recognition images, and label the targets of each sample recognition image according to the pixel space alignment;
    使用已标注的多个样本识别图像对目标属性识别模型进行目标识别训练。Use the labeled multiple sample recognition images to train the target attribute recognition model for target recognition.
  11. 根据权利要求10所述的模型训练方法,所述目标属性识别模型包括掩码预测分支、回归预测分支和分类预测分支,以及多标签分类损失函数,The model training method according to claim 10, the target attribute recognition model includes a mask prediction branch, a regression prediction branch and a classification prediction branch, and a multi-label classification loss function,
    所述使用已标注的多个样本识别图像对目标属性识别模型进行目标识别 训练进一步包括:The use of labeled multiple sample recognition images to perform target recognition on the target attribute recognition model Training further includes:
    根据预设置的第一准确率阈值,所述掩码预测分支、回归预测分支和分类预测分支分别通过预设置的损失函数进行计算并调整模型参数;According to the preset first accuracy threshold, the mask prediction branch, regression prediction branch and classification prediction branch are respectively calculated through the preset loss function and the model parameters are adjusted;
    根据预设置的第二准确率阈值,通过所述多标签分类损失函数对所述目标属性识别模型进行模型参数调整。According to the preset second accuracy threshold, the model parameters of the target attribute recognition model are adjusted through the multi-label classification loss function.
  12. 一种计算机可读存储介质,其上存储有计算机程序,A computer-readable storage medium having a computer program stored thereon,
    该程序被处理器执行时实现如权利要求1-7中任一项所述的方法;When the program is executed by the processor, the method as described in any one of claims 1-7 is implemented;
    或者or
    该程序被处理器执行时实现如权利要求8-9中任一项所述的方法;When the program is executed by the processor, the method as described in any one of claims 8-9 is implemented;
    或者or
    该程序被处理器执行时实现如权利要求10所述的方法。The program implements the method of claim 10 when executed by the processor.
  13. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,A computer device including a memory, a processor and a computer program stored in the memory and executable on the processor,
    所述处理器执行所述程序时实现如权利要求1-7中任一项所述的方法;When the processor executes the program, the method according to any one of claims 1-7 is implemented;
    或者or
    所述处理器执行所述程序时实现如权利要求8-9中任一项所述的方法;When the processor executes the program, the method according to any one of claims 8-9 is implemented;
    或者or
    所述处理器执行所述程序时实现如权利要求10所述的方法。 When the processor executes the program, the method according to claim 10 is implemented.
PCT/CN2023/101952 2022-06-23 2023-06-21 Target attribute recognition method and apparatus, and model training method and apparatus WO2023246921A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210714705.4 2022-06-23
CN202210714705.4A CN115100469A (en) 2022-06-23 2022-06-23 Target attribute identification method, training method and device based on segmentation algorithm

Publications (1)

Publication Number Publication Date
WO2023246921A1 true WO2023246921A1 (en) 2023-12-28

Family

ID=83292086

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/101952 WO2023246921A1 (en) 2022-06-23 2023-06-21 Target attribute recognition method and apparatus, and model training method and apparatus

Country Status (2)

Country Link
CN (1) CN115100469A (en)
WO (1) WO2023246921A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100469A (en) * 2022-06-23 2022-09-23 京东方科技集团股份有限公司 Target attribute identification method, training method and device based on segmentation algorithm

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109360633A (en) * 2018-09-04 2019-02-19 北京市商汤科技开发有限公司 Medical imaging processing method and processing device, processing equipment and storage medium
CN111598155A (en) * 2020-05-13 2020-08-28 北京工业大学 Fine-grained image weak supervision target positioning method based on deep learning
CN111950346A (en) * 2020-06-28 2020-11-17 中国电子科技网络信息安全有限公司 Pedestrian detection data expansion method based on generation type countermeasure network
CN114332586A (en) * 2021-12-23 2022-04-12 广州华多网络科技有限公司 Small target detection method and device, equipment, medium and product thereof
CN115100469A (en) * 2022-06-23 2022-09-23 京东方科技集团股份有限公司 Target attribute identification method, training method and device based on segmentation algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109360633A (en) * 2018-09-04 2019-02-19 北京市商汤科技开发有限公司 Medical imaging processing method and processing device, processing equipment and storage medium
CN111598155A (en) * 2020-05-13 2020-08-28 北京工业大学 Fine-grained image weak supervision target positioning method based on deep learning
CN111950346A (en) * 2020-06-28 2020-11-17 中国电子科技网络信息安全有限公司 Pedestrian detection data expansion method based on generation type countermeasure network
CN114332586A (en) * 2021-12-23 2022-04-12 广州华多网络科技有限公司 Small target detection method and device, equipment, medium and product thereof
CN115100469A (en) * 2022-06-23 2022-09-23 京东方科技集团股份有限公司 Target attribute identification method, training method and device based on segmentation algorithm

Also Published As

Publication number Publication date
CN115100469A (en) 2022-09-23

Similar Documents

Publication Publication Date Title
CN109344701B (en) Kinect-based dynamic gesture recognition method
Li et al. Scale-aware fast R-CNN for pedestrian detection
Liao et al. Rotation-sensitive regression for oriented scene text detection
EP3961485A1 (en) Image processing method, apparatus and device, and storage medium
Lu et al. Gated and axis-concentrated localization network for remote sensing object detection
US20220051405A1 (en) Image processing method and apparatus, server, medical image processing device and storage medium
Chen et al. Adversarial occlusion-aware face detection
Wang et al. Small-object detection based on yolo and dense block via image super-resolution
CN111401293B (en) Gesture recognition method based on Head lightweight Mask scanning R-CNN
WO2021103868A1 (en) Method for structuring pedestrian information, device, apparatus and storage medium
CN111563550B (en) Sperm morphology detection method and device based on image technology
CN110188766B (en) Image main target detection method and device based on convolutional neural network
Wang et al. S 3 d: scalable pedestrian detection via score scale surface discrimination
CN111985367A (en) Pedestrian re-recognition feature extraction method based on multi-scale feature fusion
WO2023246921A1 (en) Target attribute recognition method and apparatus, and model training method and apparatus
CN115797736B (en) Training method, device, equipment and medium for target detection model and target detection method, device, equipment and medium
Li et al. Lcnn: Low-level feature embedded cnn for salient object detection
Cheng et al. A direct regression scene text detector with position-sensitive segmentation
CN114821014A (en) Multi-mode and counterstudy-based multi-task target detection and identification method and device
CN113850136A (en) Yolov5 and BCNN-based vehicle orientation identification method and system
Liu et al. Multi-component fusion network for small object detection in remote sensing images
WO2019100348A1 (en) Image retrieval method and device, and image library generation method and device
Chen et al. Learning to locate for fine-grained image recognition
Yu et al. SignHRNet: Street-level traffic signs recognition with an attentive semi-anchoring guided high-resolution network
CN110020688B (en) Shielded pedestrian detection method based on deep learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23826566

Country of ref document: EP

Kind code of ref document: A1