CN114359787A

CN114359787A - Target attribute identification method and device, computer equipment and storage medium

Info

Publication number: CN114359787A
Application number: CN202111496093.8A
Authority: CN
Inventors: 张洪; 肖嵘; 王孝宇
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2021-12-08
Filing date: 2021-12-08
Publication date: 2022-04-15

Abstract

The invention relates to the technical field of artificial intelligence, and discloses a target attribute identification method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring videos to be recognized, and performing target tracking processing on the videos to be recognized to obtain target identifications of all targets and image sets associated with the target identifications; performing target attribute identification on each target image of each image set to obtain a target identification result set which corresponds to each image set and contains an image attribute result, wherein the image attribute result comprises a branch attribute value and a branch confidence coefficient; and performing confidence voting on all branch attribute values and branch confidence degrees in each target recognition result set to obtain each final target attribute value and target confidence degree result of each target. Therefore, the method and the device realize the automatic identification of each final target attribute value and the corresponding target confidence result of each target in the video to be identified, and improve the accuracy and the correctness of the confidence of the target attribute identification.

Description

Target attribute identification method and device, computer equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a target attribute identification method and device, computer equipment and a storage medium.

Background

With the popularization of surveillance videos, how to accurately and effectively utilize information of a target object in the videos and mining related attributes of the target have great value, for example: pedestrian information in the pedestrian monitoring video, vehicle information in the vehicle monitoring video, and the like. The existing target identification method is generally characterized in that a depth network based on an attention mechanism is constructed, different attributes can be learned according to an attention allocation rule of an image, and an attribute result of each target in a single image is obtained.

Disclosure of Invention

The invention provides a target attribute identification method, a target attribute identification device, computer equipment and a storage medium, which can realize automatic identification of target identification and an image set in a video to be identified, accurately identify the final target attribute and the target confidence result of each target and improve the accuracy and correctness of target attribute identification.

A target attribute identification method is characterized by comprising the following steps:

acquiring videos to be recognized, and performing target tracking processing on the videos to be recognized to obtain target identifications of all targets and image sets associated with the target identifications;

performing target attribute identification on each target image of each image set to obtain a target identification result set corresponding to each image set one by one, wherein the target identification result set comprises a plurality of image attribute results, and the image attribute results comprise branch attribute values of a plurality of target attributes and branch confidence degrees corresponding to the branch attribute values one by one;

and performing confidence voting on all the branch attribute values and the branch confidence degrees in each target recognition result set to obtain each final target attribute value of each target and a target confidence degree result corresponding to each final target attribute value one by one.

An object property identification apparatus comprising:

the acquisition module is used for acquiring a video to be recognized and carrying out target tracking processing on the video to be recognized to obtain target identifications of all targets and an image set associated with the target identifications;

the identification module is used for carrying out target attribute identification on each target image of each image set to obtain a target identification result set corresponding to each image set one by one, wherein the target identification result set comprises a plurality of image attribute results, and the image attribute results comprise branch attribute values of a plurality of target attributes and branch confidence coefficients corresponding to the branch attribute values one by one;

and the voting module is used for performing confidence voting on all the branch attribute values and the branch confidence degrees in each target recognition result set to obtain each final target attribute value of each target and a target confidence degree result corresponding to each final target attribute value one to one.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above target property identification method when executing the computer program.

A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned target property identification method.

According to the target attribute identification method, the target attribute identification device, the computer equipment and the storage medium, the target identification of each target and the image set associated with the target identification are obtained by acquiring the video to be identified and carrying out target tracking processing on the video to be identified; performing target attribute identification on each target image of each image set to obtain a target identification result set corresponding to each image set one by one, wherein the target identification result set comprises a plurality of image attribute results, and the image attribute results comprise branch attribute values of a plurality of target attributes and branch confidence degrees corresponding to the branch attribute values one by one; performing confidence voting on all the branch attribute values and the branch confidence degrees in each target recognition result set to obtain each final target attribute value of each target and a target confidence degree result corresponding to each final target attribute value one by one, so that the automatic recognition of the target identification and the image set of each target in the video to be recognized is realized, and each final target attribute value and the corresponding target confidence degree result of each target in the image set are automatically and accurately recognized by using the target attribute recognition and the confidence degree voting, therefore, a large number of images of the targets are not required to be extracted from the video manually, the condition of artificial missing recognition or false recognition is avoided, the target attribute recognition and the confidence degree voting of each image set of each target are objectively and scientifically realized, and the final target attribute and the target confidence degree result of each target can be accurately recognized, the labor cost is greatly saved, and the accuracy and correctness of target attribute identification are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic diagram of an application environment of a target attribute identification method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a target attribute identification method in an embodiment of the invention;

FIG. 3 is a flowchart of step S10 of the target attribute identification method in one embodiment of the invention;

FIG. 4 is a flowchart of step S20 of the target attribute identification method in an embodiment of the present invention;

FIG. 5 is a functional block diagram of a target attribute identification device in accordance with an embodiment of the present invention;

FIG. 6 is a schematic diagram of a computer device in an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The target attribute identification method provided by the invention can be applied to the application environment shown in fig. 1, wherein a client (computer equipment or terminal) communicates with a server through a network. The client (computer device or terminal) includes, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.

In an embodiment, as shown in fig. 2, a method for identifying a target attribute is provided, which mainly includes the following steps S10-S30:

s10, obtaining a video to be recognized, and carrying out target tracking processing on the video to be recognized to obtain target identifications of all targets and an image set associated with the target identifications.

Understandably, the video to be recognized is a video which contains a target to be recognized and from which a target related attribute needs to be recognized, the video to be recognized is a video collected by the same lens, and the video to be recognized meets the requirement of same-lens tracking shooting, so that the characteristic that an attribute value embodied by the target in the same lens is basically unchanged is embodied, wherein the target can be set according to requirements, such as a pedestrian, an animal, a vehicle or a human face, and the like, the video to be recognized can be acquired by uploading the video to be recognized to a server or a database by a user and downloading the video from the server or the database, wherein the target tracking process comprises a multi-target detection process and a multi-target tracking process, and the process of performing the target tracking process on the video to be recognized is as follows: firstly, performing frame division processing on the video to be identified, and splitting a video frame image of each frame in the video to be identified; secondly, performing multi-target detection processing on each video frame image to obtain a detection result of each video frame image; thirdly, performing multi-target tracking processing based on all the detection results to obtain each target identification in the video to be recognized; and finally, extracting target images containing different targets from each video frame image, and taking the target image marked with the same target mark as the image set associated with the target mark.

The process of the multi-target detection processing may be implemented by a trained target detection model, where the target detection model is used to identify a model of a coordinate region of the target in each video frame image in the video to be identified, a network structure of the target detection model may be set according to requirements, for example, the network structure of the target detection model may be a network structure such as fast R-CNN, SSD, and YOLO, preferably, the network structure of the target detection model is a network structure of CenterNet, the processing process of the target detection model is to pre-process an input video frame image and scale the video frame image to a preset size, the pre-processing includes image scaling processing, that is, using an image scaling technique to scale a long edge and a short edge of the image to a preset size in a manner of scaling the long edge and padding zero, then, performing image channel separation on the zoomed video frame image, separating an image to be processed containing three channels, separating the image channels into red, green and blue channels in the image, inputting the image to be processed into a target detection model, extracting target characteristics through a ResNet50 network in the target detection model based on CenterNet, wherein the target detection model is used for detecting a target as one point, namely the target is represented by a central point of a target area, predicting the central point offset (offset) and the width (size) of the target to obtain an object actual area, the target characteristics are related characteristics which are specific to the target to be identified, such as human body characteristics, animal characteristics or vehicle characteristics, and the like, then performing up-sampling, namely deconvolution, on the extracted target characteristics, and performing up-sampling for three times to obtain a characteristic map to be predicted, finally, the feature diagram to be predicted is subjected to prediction of three branch networks, the three branch networks are a thermodynamic diagram prediction network, a length and width prediction network and an area center offset prediction network respectively, the feature diagram to be predicted is subjected to target classification through the thermodynamic diagram prediction network, the center point prediction and the radius of a Gaussian circle of a target area of each target are calculated, the center point is taken as the center of a circle and is decreased outwards according to a Gaussian function along the calculated radius, so that a thermodynamic diagram corresponding to the feature diagram to be predicted is obtained, the length and width area prediction of a plurality of targets is carried out through the length and width prediction network, the length and width diagram of the target corresponding to the feature diagram to be predicted is obtained, the deviation values of the plurality of targets are predicted through the area center offset prediction network, the center deviation value of the target corresponding to the feature diagram to be predicted is obtained, and according to the thermodynamic diagram, the length and width diagram and the center deviation value, determining the area of each target, namely the detection result of each video frame image, so as to obtain the detection result of each video frame image, wherein the training process of the target detection model is to train through collected sample images associated with target labels, perform thermodynamic diagram prediction, length and width prediction and center deviation value prediction on the sample images to obtain a prediction result, calculate the loss values of the prediction result and the target labels, wherein the loss values comprise thermodynamic diagram loss values, center deviation value loss values and length and width prediction loss values, when the loss values are detected to not reach the convergence condition, iteratively update parameters in the target detection model, and perform the thermodynamic diagram prediction, length and width prediction and center deviation value prediction again until the loss values reach the convergence condition, stop training, and obtain the trained target detection model.

Wherein the process of performing multi-target tracking processing based on all the detection results can be realized by a trained target tracking model, performing multi-target tracking processing on each detection result through a target tracking model, wherein the target tracking model is used for predicting an action track or a moving track of a target, the network structure of the target tracking model may be set according to requirements, for example, the network structure of the target tracking model may be a network structure such as DeepSORT, Siammask, and the like, and preferably, the network structure of the target Tracking model is a DeepsORT network structure, and a DeepsORT (Deep learning target Tracking) algorithm is a multi-target Tracking algorithm which integrates appearance information (aspect information) and focuses on simple and efficient algorithms, the process of the target tracking model for performing the multi-target tracking processing on each detection result can be as follows: and performing IOU matching on target areas of two adjacent video frame images by using an IOU matching algorithm, calculating an IOU value, determining the two target areas as being associated with the same target when the IOU value is greater than a preset threshold value, and recording the two target areas as the same target identifier. The process of the target tracking model for performing the multi-target tracking processing on each detection result can also be as follows: firstly, performing target prediction on the video frame image corresponding to the detection result, wherein the target prediction is a prediction frame for predicting a next frame of the video frame image corresponding to the detection result, namely predicting the prediction frame (including a confirmed prediction frame and an unconfirmed prediction frame) of the next frame according to Kalman filtering, judging the prediction frame according to appearance information of a target of the current video frame image, and determining whether the prediction frame is the confirmed prediction frame and the unconfirmed prediction frame; secondly, observing the current video frame image, namely detecting the next video frame image in the time sequence order of the current video frame image by a detector, detecting the detection result corresponding to the next video frame image, determining a detection frame (also called a target coordinate region), applying an IOU matching algorithm, wherein the IOU matching algorithm is an algorithm for calculating the coincidence rate of the intersection ratio of two region frames, matching each confirmed prediction frame with each detection frame, if the IOU values of the two frames reach a preset threshold value, associating the matched confirmed prediction frame with the detection frame, if the IOU values of the two frames do not reach the preset threshold value, carrying out IOU matching again, if the IOU values of the two frames do not reach the preset threshold value for three times, showing that the matched prediction frame and detection frame are not the motion track of the target, not associating the two frames, and finally calculating the error between the prediction frame and the detection frame according to the IOU values, updating the parameters predicted by Kalman filtering according to the error, further taking the next video frame image as the current video frame image, continuously and circularly predicting by Kalman filtering, performing multi-target tracking processing on each detection result by the target tracking model, associating the detection frames of each target in each video frame image, thereby drawing the tracking chain of the detection frames in the detection results associated with each other, forming the track of each target, giving each target a unique target identifier, marking the given target identifier in the associated video frame image, extracting the image of the area marked by the target identifier from the video frame image, recording the image of the area as the target image, thereby obtaining the target identifier corresponding to each target and the target image containing the target, recording all target images labeled with the same target identifier as the image set associated with the target identifier, wherein the image set represents a set of target images containing the same target identifier, and the target images are regional images which are extracted from the video frame images and labeled with the target identifier and contain targets.

In an embodiment, as shown in fig. 3, in the step S10, that is, performing the target tracking process on the video to be recognized to obtain the target identifier of each target and the image set associated with the target identifier, the method includes:

s101, performing framing processing on the video to be identified to obtain a plurality of video frame images.

Understandably, the video to be recognized is an image set composed of a plurality of images of one frame and played in time sequence, the framing processing is an operation process of segmenting each frame of the input video or extracting one frame of image, the framing processing is performed on the video to be recognized, a plurality of video frame images can be obtained, and the video frame images are each frame of image in the video to be recognized.

And S102, performing multi-target detection processing on each video frame image to obtain a detection result of each video frame image.

Understandably, the process of performing multi-target detection processing on each video frame image may be implemented by the trained target detection model, inputting the video frame image into the target detection model, performing data transformation and convolution on the video frame image through the target detection model, and extracting a target feature, where the target feature is a related feature specific to a target to be identified, such as a human body feature (including a feature that can represent a human body, such as a head, a hair, a hand, a face, a trunk, clothes, legs, and feet), an animal feature, or a vehicle feature (including a feature that can represent a vehicle, such as a head, a body, and a tail), and the like, the network structure of the target detection model is a network structure of CenterNet, and the processing process of the target detection model is to pre-process the input video frame image, and scale the video frame image to a preset size (for example, 512 × 512), the preprocessing comprises image scaling, namely, using an image scaling technology to scale the long edge and the short edge of an image to a preset size according to a long edge scaling and short edge zero padding mode, then performing image channel separation on the scaled video frame image to separate an image to be processed containing three channels, wherein the image channel separation is to separate red, green and blue channels in the image, inputting the image to be processed into a target detection model, extracting a target characteristic through a ResNet50 network in the target detection model based on CenterNet, wherein the target detection model takes a target as a point to detect, namely, the target is represented by the central point of a target area, predicting the central point offset (offset) and the width (size) of the target to obtain an object actual area, and the target characteristic is a related characteristic specific to the target to be identified, such as human body characteristics (including head, hair, hands, faces, trunk, clothes, legs, feet and the like which can embody the characteristics of a human body), animal characteristics or vehicle characteristics (including head, body, tail and the like which can embody the characteristics of a vehicle) and the like, then performing up-sampling on the extracted target characteristics by a deconvolution module, namely deconvolution (the inverse process of convolution), performing up-sampling for three times to obtain a characteristic diagram to be predicted, finally performing prediction on the characteristic diagram to be predicted by three branch networks which are respectively a thermodynamic diagram prediction network, a length and width prediction network and a region center offset prediction network, performing target classification on the characteristic diagram to be predicted by the thermodynamic diagram prediction network, performing prediction on the center point of a target region of each target and calculating the radius of a Gaussian circle, and gradually decreasing the center point as the center point outwards according to the Gaussian function along the calculated radius, thus obtaining a thermodynamic diagram corresponding to the characteristic diagram to be predicted, performing long and wide region prediction of a plurality of targets through a long and wide prediction network to obtain a long and wide diagram of the targets corresponding to the characteristic diagram to be predicted, performing prediction on deviation values of the plurality of targets through a region center deviation prediction network to obtain a center deviation value of the targets corresponding to the characteristic diagram to be predicted, determining the region of each target, namely the detection result of the video frame image according to the thermodynamic diagram, the long and wide diagram and the center deviation value, thereby obtaining the detection result of each video frame image, thus, the target detection model can change the feature with higher dimensionality into the feature with lower dimensionality through a feature space to represent, and then performs classification and identification through the feature with lower dimensionality to identify the region range with the target feature, thereby identifying the detection result of the coordinate position (region range) of the targets in the video frame image, the detection result represents a coordinate position (area range) with a target feature in the video frame image, and the detection result may be a result of coordinate positions (area ranges) of a plurality of targets corresponding to one video frame image.

S103, performing multi-target tracking processing on all the detection results to obtain the target identification of each target.

Understandably, the process of performing the multi-target Tracking processing on all the detection results obtained by the multi-target detection processing may be implemented by the target Tracking model completed through training, a network structure of the target Tracking model is preferably a network structure of Deep scart, a Deep learning target Tracking algorithm (Deep learning target Tracking) algorithm is a multi-target Tracking algorithm that integrates appearance information (aspect information) and focuses on a simple and efficient algorithm, and the process of performing the multi-target Tracking processing on each detection result by the target Tracking model may also be: firstly, performing target prediction on the video frame image corresponding to the detection result, wherein the target prediction is a prediction frame for predicting a next frame of the video frame image corresponding to the detection result, namely predicting the prediction frame (including a confirmed prediction frame and an unconfirmed prediction frame) of the next frame according to kalman filtering, judging the prediction frame according to appearance information of a target of the current video frame image, and determining whether the prediction frame is the confirmed prediction frame and the unconfirmed prediction frame, wherein the kalman filtering prediction is to perform noise removal (gaussian noise removal) by using kalman filtering, and perform linear prediction on a region of the detection result after the noise removal, namely predicting the prediction frame by using a parameter predicted by the kalman filtering, for example, performing a translation prediction method by using a first-order linear function; secondly, observing the current video frame image, namely detecting the next video frame image in the time sequence order of the current video frame image by a detector, detecting the detection result corresponding to the next video frame image, determining a detection frame, applying an IOU matching algorithm, wherein the IOU matching algorithm is an algorithm for calculating the coincidence rate of the intersection ratio of two area frames, matching each confirmed prediction frame with each detection frame, measuring the matching degree of the two frames by calculating the output IOU value, if the IOU value of the two frames reaches a preset threshold value, associating the matched confirmed prediction frame with the detection frame, if the IOU value of the two frames does not reach the preset threshold value, carrying out IOU matching again, if the IOU value of the two frames does not reach the preset threshold value for three times, showing that the matched prediction frame and the detection frame are not the movement track of a target, not associating the matched prediction frame with the detection frame, and finally calculating the error between the prediction frame and the detection frame according to the IOU value, updating the parameters predicted by Kalman filtering according to the errors, further taking the next video frame image as the current video frame image, continuously and circularly predicting by using Kalman filtering, performing multi-target tracking processing on each detection result by using the target tracking model, associating the detection frames of each target in each video frame image, describing the tracking chain of the detection frames in the detection results associated with each other, forming the track of each target, giving each target a unique target identifier, and marking the given target identifier in the associated video frame image, thereby obtaining the target identifier corresponding to each target and the video frame image which is associated with the target identifier and contains the target identifier.

In an embodiment, in the step S103, that is, performing multi-target tracking processing on all the detection results to obtain each target identifier, the method includes:

and carrying out movement tracking analysis on each target coordinate area in the detection result of the adjacent video frame images to obtain a plurality of target tracks.

Understandably, the detection result includes the target coordinate area, the target coordinate area is an area range of the identified target in the video frame image, the motion tracking analysis is an analysis process of performing prediction of a target track and noise removal and matching comparison on the detection result of two video frame images adjacent in time sequence, and the process of the motion tracking analysis is as follows: firstly, performing target prediction on the video frame image corresponding to the detection result, wherein the target prediction is a prediction frame for predicting a next frame of the video frame image corresponding to the detection result, namely predicting the prediction frame (including a confirmed prediction frame and an unconfirmed prediction frame) of the next frame according to kalman filtering, judging the prediction frame according to appearance information of a target of the current video frame image, and determining whether the prediction frame is the confirmed prediction frame and the unconfirmed prediction frame, wherein the kalman filtering prediction is to perform noise removal (gaussian noise removal) by using kalman filtering, and perform linear prediction on a region of the detection result after the noise removal, namely predicting the prediction frame by using a parameter predicted by the kalman filtering, for example, performing a translation prediction method by using a first-order linear function; secondly, observing the current video frame image, namely detecting the next video frame image in the time sequence order of the current video frame image by a detector, detecting the detection result corresponding to the next video frame image, determining a target coordinate area, applying an IOU matching algorithm, wherein the IOU matching algorithm is an algorithm for calculating the coincidence rate of the intersection ratio of two area frames, matching each confirmed prediction frame with each target coordinate area, measuring the matching degree of the two by calculating the output IOU value, if the IOU value of the two reaches a preset threshold value, associating the matched confirmed prediction frame with the target coordinate area, if the IOU value of the two does not reach the preset threshold value, carrying out IOU matching again, if the IOU value of the two does not reach the preset threshold value for three times, indicating that the matched prediction frame and the target coordinate area are not the motion track of the target and do not associate the two, and finally, calculating an error between the prediction frame and the target coordinate area according to the IOU value, updating a parameter predicted by Kalman filtering according to the error, further taking the next video frame image as the current video frame image, continuously predicting by using Kalman filtering, continuously and circularly executing, and associating the target coordinate areas of all targets in all the video frame images after performing multi-target tracking processing on all the detection results by using the target tracking model, so that the target coordinate areas in the detection results which are associated with each other are depicted by a tracking chain to form the track of each target, and the target track of each target is obtained.

Wherein, the target track represents the record of the action track of a target in the video to be recognized.

And coding and identifying each target track to obtain each target identification.

Understandably, the target track is subjected to coding identification of the target, and the coding identification is an identification method giving a unique code, so that the target identification corresponding to each target track one to one can be identified.

The invention realizes the movement tracking analysis of each target coordinate area in the detection result of the adjacent video frame images to obtain a plurality of target tracks; and coding and identifying each target track to obtain each target identification, so that the motion tracking analysis can be carried out on the inspection result of the video frame image, and the motion track of the same target can be automatically analyzed, so that the target can be accurately and quickly identified, and the target identification can be generated, therefore, the motion track of the same target in the video to be identified can be accurately found, and an accurate data basis is provided for the subsequent target attribute identification.

And S104, performing association and object extraction of the same object identifier on each video frame image to obtain each image set associated with each object identifier.

Understandably, according to the target track corresponding to each target identifier, determining the video frame images related to the same target identifier, so as to associate the video frame images related to the same target identifier with the target identifier, and performing target extraction on the associated video frame images, where the target extraction process is a process of extracting an image containing a target from the video frame images related to the target identifier according to a target area in the detection result, recording the extracted image as a target image, and recording all the target images extracted from all the video frame images related to the target identifier as the image set related to the target identifier, so that each image set related to each target identifier can be obtained.

The invention realizes the framing processing of the video to be identified to obtain a plurality of video frame images; performing multi-target detection processing on each video frame image to obtain a detection result of each video frame image; performing multi-target tracking processing on all the detection results to obtain the target identification of each target; the same target identification association and target extraction are carried out on each video frame image to obtain each image set associated with each target identification, so that the target identification corresponding to each target and the associated image set can be automatically associated by using frame division processing, multi-target detection processing and multi-target tracking processing, the image sets of a plurality of targets in the video to be identified can be accurately identified, the video frame images related to each target can be accurately, scientifically and objectively distinguished, and the accuracy and reliability of subsequent target attribute identification are improved.

And S20, performing target attribute identification on each target image of each image set to obtain a target identification result set corresponding to each image set one by one, wherein the target identification result set comprises a plurality of image attribute results, and the image attribute results comprise branch attribute values of a plurality of target attributes and branch confidence degrees corresponding to the branch attribute values one by one.

Understandably, before the target attribute identification is performed on each target image in each image set, image preprocessing may be performed on each image set, the image preprocessing process may be image enhancement processing, and the image enhancement processing method may be set according to requirements, for example, the image enhancement may include a method of removing noise, enhancing contrast, enhancing color, and the like, or may be a processing method of performing image completion on the image to be detected before the image enhancement processing, the image completion may be a method of supplementing a truncated or blocked portion in the image to be detected so that the image to be detected can contain a complete target, such as a human body or a vehicle, and the target attribute identification process on the target image may be implemented by a trained attribute identification model for identifying the target attribute based on deep learning, the attribute identification model can be a multitask learning based deep learning neural network model, namely, the identification process of learning the category of the target related attribute through a plurality of branch tasks, for example, at least one branch task of the human body attribute is realized by the branches of the respective independent tasks for learning the category of the human body attribute, and can also be realized by sharing one main network, the main network can learn the shared characteristics of the input images and extract the shared characteristics, the vectors of the shared characteristics can be extracted and provided for the branch tasks corresponding to the category of the target related attribute to be identified together, the processing amount of a processor is reduced, the extracted shared characteristic graphs are subjected to pooling operation without the extraction of the characteristics in different directions of the multitask, the pooled shared characteristic graphs are obtained, and the branch network of the target related attribute is the shared characteristic graphs after receiving pooling, and identifying a full connection layer of a certain attribute type of a target corresponding to a target identifier of an input target image based on the pooled shared characteristic graph, and summarizing results output by branch networks of all target related attributes to obtain a target identification result set, wherein the target identification result set represents an array set of types of each concerned attribute of each target image in an image set associated with the target identifier.

Wherein the identification of the target attribute is an identification of a human body attribute or a plurality of human body attributes.

In an embodiment, as shown in fig. 4, in step S20, the performing object attribute recognition on the respective object images of each image set to obtain an object recognition result set corresponding to each image set one by one includes:

s201, carrying out shared feature extraction on each target image in the image set through a backbone network in a pre-trained attribute recognition model to obtain a shared feature map of each image.

Understandably, the attribute identification model includes a backbone network, a pooling layer and a full connection layer, the backbone network is a trained deep learning-based neural network model, the network structure of the backbone network may be a network structure of a MobileNet series, a ResNet series, and the like, preferably, the network structure of the backbone network is a network structure of ResNet18, the shared feature is a recessive feature obtained by learning or convolution in an input image shared between target attributes, the shared feature extraction is performed on each channel image in the target image through the backbone network, so as to obtain a shared feature map, for example, the input target images are 256 and 128 in size and contain 3 channel images, a shared feature map in the shape of (512, 8, 4), that is, a feature vector in the size of 8 × 4 of 512 channels is obtained through extraction of the shared features of the backbone network, wherein, the training process of the attribute identification model comprises the steps of collecting an image sample marked with target related attributes, presetting initial parameters of the attribute identification model, extracting shared features of the image sample by using a main network in the attribute identification model to obtain a shared feature map, performing pooling treatment on the shared feature map by using a pooling layer in the attribute identification model to obtain a pooling result, performing attribute identification of each branch network on the pooling result, identifying image attribute results which correspond to the marked target related attributes one by one, calculating attribute loss values between the marked target related attributes and corresponding attribute categories in the image attribute results by using a cross entropy loss function, iteratively updating the initial parameters of the attribute identification model when detecting that the attribute loss values do not reach a convergence condition, and re-executing the step of extracting the shared features of the image sample by using the main network in the attribute identification model, and stopping training until the attribute loss value is detected to reach the convergence condition, and recording the converged attribute identification model as the pre-trained attribute identification model.

The pooling layer is a level for performing pooling operation on the extracted shared feature map, and the fully-connected layer is a level for receiving the pooled shared feature map output by the pooling layer and identifying the attribute category of the target based on the pooled shared feature map.

S202, carrying out global average pooling on the shared feature map through a pooling layer in the attribute identification model to obtain a pooled shared feature map.

Understandably, the Global Average Pooling process is also called Global Average Pooling, and means that all pixel values in the shared feature map are added to find a mean to obtain a numerical value, that is, the numerical value represents a corresponding feature map, the whole network can be regularized from the structure to prevent overfitting, the features of full-connection-layer black box operation are removed, the actual target attribute category meaning of each channel is directly given, and thus the pooled shared feature map is obtained, for example: and after the shared feature map with the shape of (512, 8, 4) is subjected to global average pooling of the pooling layer, the shape of the shared feature map after pooling is obtained and is (512, 1, 1).

And S203, performing attribute identification on the pooled shared feature map through the full connection layer in the attribute identification model to obtain an image attribute result corresponding to the target image.

Understandably, the attribute identification process performs feature extraction of the category of each target attribute for each branch network in the attribute identification model, the feature extraction of the category of each target attribute is performed by training through various samples containing different target attributes, for example, training through a network structure of VGG16, learning common features of the categories of different target attributes, and performing full-connected layer classification according to the learned common features of the categories of different target attributes, so as to identify the category of the target attribute of each sample, thereby obtaining respective branch tasks for identifying various different target attributes after training, inputting the shared feature map into the branch networks identified by different attributes respectively, and performing corresponding feature extraction through the branch networks identified by different attributes one by one, and classifying one by one according to the extracted features, so as to identify results of corresponding categories of different target attributes according to classification results, such as: identifying attributes of each human body (such as age, gender, body direction, hair color, hair length, head-wearing type, wearing glasses, carrying shoulders or shoulders, carrying a trolley or a baby, carrying a handbag or a handbag, wearing an overcoat or the like, riding or the like, carrying a draw-bar box or the like, answering a call or the like, coat color, coat texture, coat type, coat color, coat-off texture, coat type and the like, and reflecting the structural attribute category of the human body) through a branch network related to the human body attributes, wherein the image attribute result reflects the result of the target related attributes in the target image, the image attribute result comprises at least one category of related attributes concerned by the target, the image attribute result is an array of a plurality of attribute categories, and one target image corresponds to one image attribute result.

In an embodiment, in step S203, that is, performing attribute identification on the pooled shared feature map through the full connection layer in the attribute identification model to obtain an image attribute result corresponding to the target image, includes:

and performing attribute identification of each branch task on the pooled shared characteristic graph through each branch network in the full connection layer to obtain attribute branch results corresponding to each branch network one by one, and merging all the attribute branch results into the image attribute result corresponding to the target image.

Understandably, the attribute of each branch task is identified as the identification of the branch network of the category of different target attributes, preferably, the branch task is the category of human body attribute, different human body attribute branch networks learn different categories of human body attribute, and learn respective characteristics according to the categories of respective human body attribute, so as to perform the identification process of corresponding attributes, thereby identifying the attribute branch results corresponding to each human body attribute branch network one by one, because the identification is performed by sharing a feature map, the categories of each target attribute have a certain relevance, and the image attribute results can be output by using the learned relevance (linear relevance between parameters) through the mutual relevance between the attribute branch results, and the attribute branch results embody the categories identified by the human body attribute branch networks, and the confidence degree of the category is included, all the attribute branch results are finally merged into a result, and the result is recorded as the image attribute result, wherein the image attribute result comprises a branch attribute value and a branch confidence degree.

S204, summarizing the image attribute results corresponding to all the target images in the image set to obtain the target identification result set corresponding to the image set.

Understandably, the target recognition result set is a set of the image attribute results corresponding to the same target identifier, for example: the result of each image attribute of obj _1 is (male, 0.8, female, 0.2; glasses worn, 0.7, glasses not worn, 0.3), (male, 0.7, female, 0.3; glasses worn, 0.7, glasses not worn, 0.3), (male, 0.9, female, 0.1; glasses worn, 0.8, glasses not worn, 0.2), (male, 0.4, female, 0.6; glasses worn, 0.8, glasses not worn, 0.2), (male, 0.3, female, 0.7; glasses worn, 0.9, glasses not worn, 0.1), and the target recognition result set of obj _1 is [ (male, 0.8, female, 0.2; glasses worn, 0.7, glasses not worn, 0.3); (male, 0.7, female, 0.3; wearing glasses, 0.7, not wearing glasses, 0.3); (male, 0.9, female, 0.1; wear glasses, 0.8, not wear glasses, 0.2); (male, 0.4, female, 0.6; wear glasses, 0.8, not wear glasses, 0.2); (male, 0.3, female, 0.7; wearing glasses, 0.9; not wearing glasses, 0.1) ], so that the target identification result sets corresponding to all the image sets one by one can be summarized.

The method realizes the shared feature extraction of the target images in the image set through the backbone network in the attribute identification model to obtain the shared feature graph of each target image; performing global average pooling on the shared feature map through a pooling layer in the attribute identification model to obtain a pooled shared feature map; performing attribute identification on the pooled shared characteristic graph through a full connection layer in the attribute identification model to obtain an image attribute result corresponding to the target image; the image attribute results corresponding to all the target images in the image set are summarized to obtain the target identification result set corresponding to the image set, so that shared feature extraction of the target images by sharing one backbone network is realized, the target identification result set of the attribute category concerned by the target is automatically identified based on the shared feature, the category of each target attribute can be accurately identified through the relevance among all the target attributes in the shared feature, a data basis is provided for subsequent attribute voting, the accurate attribute category is provided for the subsequent screening of training samples, and the accuracy and the correctness of screening are ensured.

And S30, performing confidence voting on all the branch attribute values and the branch confidence degrees in each target recognition result set to obtain each final target attribute value of each target and a target confidence degree result corresponding to each final target attribute value one by one.

Understandably, performing confidence voting on each target recognition result set by using a comprehensive voting mechanism, wherein the comprehensive voting mechanism is a mechanism for performing confidence output by combining different voting results, the confidence voting is a result of calculating an attribute to be paid attention in each target recognition result set by using the comprehensive voting mechanism, a preset attribute-to-be-paid result can be screened out from the target recognition result sets, and the confidence voting and correction processing process can be a process of performing majority voting on each target recognition result set to obtain a first attribute value of each target attribute corresponding to each target identifier and a first confidence corresponding to the first attribute value; carrying out a mean value voting mechanism on each target identification result set to obtain a second attribute value of each target attribute corresponding to each target identification and a second confidence corresponding to the second attribute value; when the first attribute value and the second attribute value of the same target attribute of the same target are detected to be the same, taking the first attribute value or the second attribute value as a final target attribute value of the corresponding target attribute of the target identification result set; and multiplying the first confidence coefficient and the second confidence coefficient to obtain the target confidence coefficient result corresponding to each final target attribute value, wherein the target confidence coefficient result represents the confidence coefficient that one target attribute identified by one target image in the target identification result set is close to the real attribute, and high-quality images can be screened out through the target confidence coefficient result subsequently, so that the subsequent model training is facilitated, and the training efficiency is improved.

The method and the device realize that the target identification of each target and the image set associated with the target identification are obtained by acquiring the video to be identified and carrying out target tracking processing on the video to be identified; performing target attribute identification on each target image of each image set to obtain a target identification result set corresponding to each image set one by one, wherein the target identification result set comprises a plurality of image attribute results, and the image attribute results comprise branch attribute values of a plurality of target attributes and branch confidence degrees corresponding to the branch attribute values one by one; performing confidence voting on all the branch attribute values and the branch confidence degrees in each target recognition result set to obtain each final target attribute value of each target and a target confidence degree result corresponding to each final target attribute value one by one, so that the automatic recognition of the target identification and the image set of each target in the video to be recognized is realized, and each final target attribute value and the corresponding target confidence degree result of each target in the image set are automatically and accurately recognized by using the target attribute recognition and the confidence degree voting, therefore, a large number of images of the targets are not required to be extracted from the video manually, the condition of artificial missing recognition or false recognition is avoided, the target attribute and the target confidence degree result of each target can be accurately recognized by objectively and scientifically recognizing the target attribute and the confidence degree voting on the image set of each target, the labor cost is greatly saved, and the accuracy and correctness of target attribute identification are improved.

In an embodiment, in the step S30, that is, performing confidence voting on all the branch attribute values and the branch confidence levels in each target recognition result set to obtain each final target attribute value of each target and a target confidence level result corresponding to each final target attribute value in a one-to-one manner, the method includes:

and performing a majority voting mechanism on all the branch attribute values and the branch confidence degrees of each target attribute in each target recognition result set to obtain a first attribute value of each target attribute of each target and a first confidence degree corresponding to the first attribute value.

Understandably, the majority voting mechanism is a mechanism that takes the branch attribute value with the largest number under the same target attribute in the target recognition result set as the first attribute value of the target recognition result set, that is, the mechanism satisfies the principle that minority obeys majority, counts all the branch attribute values under each target attribute, takes the branch attribute value with the largest number to determine the first attribute value corresponding to the target attribute of the target, the first attribute value is the category of the attribute obtained after passing through the majority voting mechanism, the first confidence degrees are in one-to-one correspondence with the first attribute values, the calculation method of the first confidence degree may be the average value of all the branch confidence degrees under the same target attribute of one target, which are the same as the first attribute value, or may be weighted average calculation, that is, for each image attribute result in the target recognition result set, and if the first attribute value is the same as that of the first attribute value, adding one to the weight, otherwise, subtracting one, and finally taking the average value.

And performing a mean value voting mechanism on all the branch attribute values and the branch confidence degrees of each target attribute in each target recognition result set to obtain a second attribute value of each target attribute of each target and a second confidence degree corresponding to the second attribute value.

Understandably, the mean voting mechanism is to take a mean value of branch confidence degrees corresponding to the same branch attribute value under the same target attribute in the target recognition result set, take a branch attribute value corresponding to a maximum mean value under the target attribute as the second attribute value corresponding to the target attribute of the target, and take the maximum mean value as the second confidence degree corresponding to the second attribute value.

And when the first attribute value and the second attribute value of the same target attribute of the same target are the same, taking the first attribute value or the second attribute value as the final target attribute value of the target attribute of the target.

Understandably, whether the first attribute value and the second attribute value of the same target attribute of the same target are the same or not is judged, if the first attribute value and the second attribute value of the same target attribute of the targets in the image set are detected to be the same, the identification accuracy of the target attribute is high, and the first attribute value or the second attribute value is taken as the final target attribute value of the target attribute of the target.

And multiplying the first confidence coefficient and the second confidence coefficient corresponding to each final target attribute value to obtain the target confidence coefficient result corresponding to each final target attribute value.

Understandably, after the final target attribute value of the target attribute of a target is confirmed, the first confidence degree and the second confidence degree corresponding to the target attribute are multiplied, so as to calculate the target confidence degree result corresponding to the final target attribute value, wherein the target confidence degree result represents the confidence degree or the degree that a target attribute identified by a target image in the target identification result set is close to the real attribute.

The invention realizes that a majority voting mechanism is carried out on all the branch attribute values and the branch confidence degrees of each target attribute in each target recognition result set to obtain the first attribute value of each target attribute of each target and the first confidence degree corresponding to the first attribute value; carrying out a mean value voting mechanism on all the branch attribute values and the branch confidence degrees of each target attribute in each target recognition result set to obtain a second attribute value of each target attribute of each target and a second confidence degree corresponding to the second attribute value; when the first attribute value and the second attribute value of the same target attribute of the same target are the same, taking the first attribute value or the second attribute value as a final target attribute value of the target attribute of the target; and multiplying the first confidence coefficient and the second confidence coefficient corresponding to each final target attribute value to obtain a target confidence coefficient result corresponding to each final target attribute value, so that the final target attribute values of each target attribute corresponding to the target identification are automatically voted and output through the integration of a majority voting mechanism and a mean voting mechanism, and the target confidence coefficient results of each target attribute of the target identification can be accurately and scientifically voted and output, a basis of accurate and high-confidence sample data is provided for subsequent model training, and the accuracy and the correctness of the confidence coefficient of target attribute identification are improved.

In an embodiment, after obtaining the second attribute value of each target attribute of each target and the second confidence corresponding to the second attribute value, the method further includes:

and when the first attribute value and the second attribute value of the same target attribute of the same target are detected to be different, removing the target attribute under the target identification.

Understandably, whether the first attribute value and the second attribute value of the same target attribute of the same target are the same or not is judged, when the first attribute value and the second attribute value of one target attribute of one target are detected to be different, the identification accuracy of the target attribute is not high, the situation that voting is inconsistent occurs is indicated, all branch attributes and branch confidence coefficients which are the same as the target attribute in the target identification result set need to be discarded, the situation that the identification of the target attribute is mistakenly identified is indicated, the target attribute of the target needs to be discarded, and the accuracy of target attribute identification is improved.

In an embodiment, after obtaining the target confidence result corresponding to each final target attribute value, the method further includes:

and carrying out variance voting mechanism processing on all the branch confidence degrees corresponding to each final target attribute value to obtain a third confidence degree corresponding to each final target attribute value.

Understandably, the variance voting mechanism processing includes variance obtaining and confidence adjustment, where the variance obtaining is to average all the branch confidence values corresponding to each of the final target attribute values to obtain an average confidence value corresponding to each of the final target attribute values one to one, and according to all the branch confidence values corresponding to one of the final target attribute values and the average confidence value, calculate a variance of the final target attribute value, that is, apply a variance calculation formula:

σ²＝Σ(X-μ)²/N

wherein σ²A variance corresponding to a final target attribute value; x is the branch confidence corresponding to the final target attribute value; mu is the average confidence corresponding to the final target attribute value; n is the number of all the branch confidence degrees corresponding to the final target attribute value; the variance is used to measure the deviation degree between the confidence of all branches and the mean value thereof, so as to obtain the processing procedure of variance corresponding to each final target attribute value one by one, the confidence is adjusted to add or subtract the corresponding average confidence based on the variance of the final target attribute to obtain the third confidence corresponding to the final target attribute value, and the confidence is adjusted to be higher than the first confidenceThe adjustment may be set according to a requirement, for example, the confidence degree is adjusted to a ratio according to the variance to the average confidence degree, the range of the ratio is determined to map out a corresponding increase or decrease supplement confidence degree, and the average confidence degree is increased or decreased to calculate the supplement confidence degree, so as to obtain a third confidence degree.

Multiplying the target confidence result corresponding to each of the final target attribute values by the third confidence to correct each of the target confidence results.

Understandably, the target confidence result corresponding to the same final target attribute value is multiplied by the third confidence to obtain a multiplied result, and the result is revised for the corresponding target confidence result to obtain a new target confidence result corresponding to the final target attribute value.

The invention realizes that a majority voting mechanism is carried out on all the branch attribute values and the branch confidence degrees of each target attribute in each target recognition result set to obtain the first attribute value of each target attribute of each target and the first confidence degree corresponding to the first attribute value; carrying out a mean value voting mechanism on all the branch attribute values and the branch confidence degrees of each target attribute in each target recognition result set to obtain a second attribute value of each target attribute of each target and a second confidence degree corresponding to the second attribute value; when the first attribute value and the second attribute value of the same target attribute of the same target are the same, taking the first attribute value or the second attribute value as a final target attribute value of the target attribute of the target; multiplying the first confidence coefficient and the second confidence coefficient corresponding to each final target attribute value to obtain a target confidence coefficient result corresponding to each final target attribute value; performing variance voting mechanism processing on all the branch confidence degrees corresponding to each final target attribute value to obtain a third confidence degree corresponding to each final target attribute value; and multiplying the target confidence result corresponding to each final target attribute value by the third confidence to correct each target confidence result, so that the target confidence result of each target attribute can be automatically corrected by carrying out comprehensive processing through a majority voting mechanism, a mean voting mechanism and a variance voting mechanism, the target confidence result of each target attribute can be more accurately and scientifically voted and output the target confidence of the target identification, an accurate data basis is provided for subsequent target attribute identification, and the accuracy and the correctness of the confidence of the target attribute identification are improved.

In an embodiment, after obtaining each final target attribute value of each target and a target confidence result corresponding to each final target attribute value one to one, the method includes:

and performing attribute correction on corresponding branch attributes in each target recognition result set according to each final target attribute value to obtain a new target recognition result set of each target.

Understandably, since there will be a situation that different branch attribute values of the same target attribute of one target exist in the target identification result set, and the video to be identified belongs to the same mirror, the attribute values hardly change, the case of such different branch attribute values is therefore an error in recognition, which needs to be corrected, therefore, the attribute correction of the corresponding branch attribute in the target recognition result set needs to be performed according to the final target attribute value, the attribute is corrected to correct a branch attribute value different from the final target attribute value to the same value as the final target attribute value, thereby obtaining a new target recognition result set of the target, so that the branch attribute value with errors in the target recognition result set can be automatically corrected, so as to obtain an accurate new target recognition result set of each target and provide accurate attribute values for subsequent model training.

In one embodiment, the target attribute identification method further includes:

and performing threshold value screening on all the target confidence coefficient results of all the target attributes, labeling the corresponding image set according to the target confidence coefficient results after threshold value screening, and recording the labeled image set as a training sample.

Understandably, the threshold screening is performed on all the target confidence level results of the same target attribute of different targets, the threshold screening may be performed through a preset screening threshold, the preset screening threshold may be set according to requirements, for example, the preset screening threshold is 85%, the target confidence level results greater than or equal to the preset screening threshold are screened to obtain the target confidence level results after the threshold screening, the threshold screening may also be performed through sample threshold identification performed through a threshold identification model, a corresponding sample threshold is identified, screening is performed according to the sample threshold, the sample threshold is identified as an identification process of identifying the sample threshold in all the target confidence level results through the threshold identification model, a threshold feature is extracted through the threshold identification model, and a corresponding sample threshold is identified based on the threshold feature, the threshold feature is an adjacent jump feature in all final confidence levels, usually, a higher confidence level is in a stable state, and an instantaneous larger jump occurs in the process of confidence level reduction, at this time, the final confidence level of the neighborhood before the jump can be determined as a sample threshold according to the jump, and a target confidence level result greater than or equal to the sample threshold is screened to obtain a target confidence level result after threshold screening.

The labeling process is a process of associating the branch attribute values in the corrected target recognition result set corresponding to the target confidence result after threshold screening with the target images in the corresponding image set, so that the training sample for subsequent training can be obtained, wherein the training sample is an image sample which meets the threshold requirement and is labeled completely and is required by the subsequent training.

Therefore, the method and the device realize accurate and automatic screening of the training samples meeting the requirements through sample threshold identification and threshold screening, so that a large number of images do not need to be extracted from the video manually, confidence threshold values are searched manually, the training samples are screened out, the labor cost is greatly saved, invalid samples in a target identification result set are removed, the target attribute identification accuracy and correctness are improved, and the efficiency and the model performance are improved for subsequent model training.

In an embodiment, the threshold screening is performed on all the target confidence level results of each target attribute, and according to the target confidence level results after the threshold screening, the corresponding image set is labeled, and the labeled image set is recorded as a training sample, including:

and sequencing all the target confidence degree results of the same target attribute to obtain a sequencing result.

Understandably, the manner of sorting all the target confidence degrees of the same target attribute may be set according to requirements, for example, according to a descending sorting manner or according to an ascending sorting manner, and all the sorted target confidence degree results are recorded as the sorting result.

And extracting threshold characteristics of the sorting result through a pre-trained threshold identification model, and identifying a sample threshold of the sorting result according to the extracted threshold characteristics.

Understandably, the threshold recognition model is a trained model for recognizing a sample threshold in an input sorting result, the threshold recognition model can perform convolution learning by inputting a series of sorting samples marked with thresholds, learn threshold features in the sorting samples, continuously learn the threshold features, and finally train and complete the threshold recognition model, the threshold features are adjacent jump features in all final confidence degrees, generally, a state that a higher confidence degree is stable, an instantaneous larger jump occurs in a process of confidence degree reduction, and at this time, the adjacent final confidence degree before the jump can be determined as a sample threshold according to the jump.

Wherein, the training process of the threshold recognition model is to set initial model parameters for the threshold recognition model through a historically collected sample array containing threshold labels, the network structure of the threshold recognition model can be the network structure of a Convolutional Neural Network (CNN), the threshold recognition model is used for carrying out threshold characteristic extraction on the sample array, extracting the characteristic with large instantaneous jump span, outputting a boundary value according to the extracted threshold characteristic convolution, the boundary value embodies the value of starting jump in the sample array, calculating the boundary value and the threshold loss value of the threshold labels by using a cross entropy loss function, iteratively updating the initial model parameters of the threshold recognition model when detecting that the threshold loss value does not reach the preset threshold convergence condition, carrying out threshold characteristic extraction on the sample array by the threshold recognition model, and recalculating the threshold loss value, and stopping training until the threshold loss value is detected to reach the preset threshold convergence condition, and recording the converged threshold recognition model as the trained threshold recognition model.

And screening the sorting result by using a threshold value, screening out the target confidence coefficient result corresponding to the sample threshold value or more, and recording the result as a final screening result.

Understandably, the process of threshold screening is a process of screening target confidence results larger than or equal to a sample threshold, and the sorting results after threshold screening are recorded as the final screening results.

And labeling the corresponding image set according to the new target identification result set corresponding to all the final screening results to obtain the training sample.

Understandably, the process of labeling the corresponding image set is a process of associating the branch attribute value or/and the branch confidence in the new target recognition result set corresponding to the target confidence result after the threshold screening with the corresponding target image in the image set one by one, so that the training sample for subsequent training can be obtained, wherein the training sample is a high-quality image for training.

The invention realizes that the sequencing result is obtained by sequencing all the target confidence results of the same target attribute; extracting threshold features of the sorting results through a pre-trained threshold recognition model, and recognizing sample thresholds of the sorting results according to the extracted threshold features; performing threshold screening on the sorting result, screening out the target confidence degree result corresponding to the sample threshold value or more, and recording the result as a final screening result; and labeling the corresponding image set according to the new target recognition result set corresponding to all the final screening results to obtain the training sample, so that the sample thresholds in all the target reliability results with the same target attribute are automatically recognized, the training sample is automatically screened out through threshold screening and labeling, the sample threshold corresponding to the specified accuracy does not need to be manually searched, the workload of manual recognition is greatly reduced, the accuracy and the correctness of target attribute recognition are improved, and the efficiency and the performance of the model are improved for subsequent model training.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

In an embodiment, a target attribute identification device is provided, and the target attribute identification device corresponds to the target attribute identification method in the above embodiment one to one. As shown in fig. 5, the object attribute identifying apparatus includes an acquiring module 11, an identifying module 12, and a voting module 13. The functional modules are explained in detail as follows:

the acquisition module 11 is configured to acquire a video to be identified, and perform target tracking processing on the video to be identified to obtain a target identifier of each target and an image set associated with the target identifier;

the identification module 12 is configured to perform target attribute identification on each target image of each image set to obtain a target identification result set corresponding to each image set one to one, where the target identification result set includes a plurality of image attribute results, and the image attribute results include branch attribute values of a plurality of target attributes and branch confidence coefficients corresponding to the branch attribute values one to one;

a voting module 13, configured to perform confidence voting on all the branch attribute values and the branch confidence levels in each target identification result set to obtain each final target attribute value of each target and a target confidence level result corresponding to each final target attribute value one to one.

For the specific definition of the target attribute identification device, reference may be made to the above definition of the target attribute identification method, which is not described herein again. The modules in the target attribute identification device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a client or a server, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a readable storage medium and an internal memory. The readable storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the readable storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a target property identification method.

In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the target attribute identification method in the above embodiments is implemented.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, implements the target property identification method in the above-described embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A target attribute identification method is characterized by comprising the following steps:

2. The method for identifying object attributes according to claim 1, wherein the performing the object tracking process on the video to be identified to obtain the object identifier of each object and the image set associated with the object identifier includes:

performing frame division processing on the video to be identified to obtain a plurality of video frame images;

performing multi-target detection processing on each video frame image to obtain a detection result of each video frame image;

performing multi-target tracking processing on all the detection results to obtain the target identification of each target;

and carrying out association and object extraction of the same object identifier on each video frame image to obtain each image set associated with each object identifier.

3. The method of object attribute recognition according to claim 1, wherein said performing object attribute recognition on each object image of each image set to obtain an object recognition result set corresponding to each image set one by one comprises:

carrying out shared feature extraction on each target image in the image set through a backbone network in a pre-trained attribute recognition model to obtain a shared feature map of each target image;

performing global average pooling on the shared feature map through a pooling layer in the attribute identification model to obtain a pooled shared feature map;

performing attribute identification on the pooled shared characteristic graph through a full connection layer in the attribute identification model to obtain an image attribute result corresponding to the target image;

summarizing the image attribute results corresponding to all the target images in the image set to obtain the target identification result set corresponding to the image set.

4. The method for identifying object attributes according to claim 1, wherein the performing confidence voting on all the branch attribute values and the branch confidence degrees in each of the object identification result sets to obtain respective final object attribute values of each object and object confidence degree results corresponding to the respective final object attribute values one to one includes:

performing a majority voting mechanism on all the branch attribute values and the branch confidence degrees of each target attribute in each target recognition result set to obtain a first attribute value of each target attribute of each target and a first confidence degree corresponding to the first attribute value;

carrying out a mean value voting mechanism on all the branch attribute values and the branch confidence degrees of each target attribute in each target recognition result set to obtain a second attribute value of each target attribute of each target and a second confidence degree corresponding to the second attribute value;

when the first attribute value and the second attribute value of the same target attribute of the same target are the same, taking the first attribute value or the second attribute value as a final target attribute value of the target attribute of the target;

5. The method of object attribute identification of claim 4 further comprising, after obtaining the object confidence results corresponding to each final object attribute value:

performing variance voting mechanism processing on all the branch confidence degrees corresponding to each final target attribute value to obtain a third confidence degree corresponding to each final target attribute value;

6. The object attribute identification method according to any one of claims 1 to 5, comprising, after obtaining respective final object attribute values of each object and object confidence results corresponding to the respective final object attribute values one to one:

7. The object property identification method of claim 6, wherein the method further comprises:

8. The method of claim 7, wherein the threshold-screening all the target confidence results of each target attribute, labeling the corresponding image set according to the threshold-screened target confidence results, and recording the labeled image set as a training sample, comprises:

sequencing all the target confidence results of the same target attribute to obtain a sequencing result;

extracting threshold features of the sorting results through a pre-trained threshold recognition model, and recognizing sample thresholds of the sorting results according to the extracted threshold features;

performing threshold screening on the sorting result, screening out the target confidence degree result corresponding to the sample threshold value or more, and recording the result as a final screening result;

9. An object attribute identification device, comprising:

10. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the target property identification method according to any one of claims 1 to 8 when executing the computer program.

11. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the object property identification method according to any one of claims 1 to 8.