CN110837580A

CN110837580A - Pedestrian picture marking method and device, storage medium and intelligent device

Info

Publication number: CN110837580A
Application number: CN201911042640.8A
Authority: CN
Inventors: 张国辉; 康振
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2020-02-25
Also published as: WO2021082692A1

Abstract

The application is suitable for the technical field of information processing, and provides a pedestrian picture marking method, a pedestrian picture marking device, a storage medium and intelligent equipment. The method comprises the following steps: playing a video to be marked, and detecting pedestrians on the basis of each frame of video picture contained in the video to be marked; acquiring a target video picture of a detected pedestrian, and extracting pedestrian characteristic information in the target video picture; acquiring picture characteristic information of the target video picture; and marking the target video picture according to the picture characteristic information and the pedestrian characteristic information. According to the method and the device, the marking accuracy can be improved while the efficiency of processing the marking of the mass video pedestrian pictures is improved, so that the applicability of the marked pedestrian pictures is higher.

Description

Pedestrian picture marking method and device, storage medium and intelligent device

Technical Field

The application belongs to the technical field of information processing, and particularly relates to a pedestrian picture labeling method, a pedestrian picture labeling device, a storage medium and intelligent equipment.

Background

The pedestrian detection can be widely applied to the fields of vehicle auxiliary driving, human motion analysis, intelligent video monitoring and the like. The pedestrian detection is to judge whether a pedestrian exists in a given image and a given video, further, if the pedestrian exists, the pedestrian is labeled, and the labeled pedestrian image can be used for research experiments, such as model training.

According to the traditional pedestrian picture marking method, only the pedestrian characteristic information in the picture is marked, the marking information is not accurate enough, and the marked pedestrian picture is low in applicability.

Disclosure of Invention

In view of this, embodiments of the present application provide a pedestrian picture labeling method, apparatus, storage medium, and intelligent device, so as to solve the problems in the prior art that a conventional pedestrian picture labeling method only labels pedestrian feature information in a picture, labeling information is not accurate enough, and the labeled pedestrian picture has low applicability.

In a first aspect, an embodiment of the present application provides a pedestrian picture labeling method, including:

playing a video to be marked, and detecting pedestrians on the basis of each frame of video picture contained in the video to be marked;

acquiring a target video picture of a detected pedestrian, and extracting pedestrian characteristic information in the target video picture;

acquiring picture characteristic information of the target video picture;

and marking the target video picture according to the picture characteristic information and the pedestrian characteristic information.

Further, the pedestrian feature information includes clothing feature information, and the extracting pedestrian feature information in the target video picture includes:

acquiring an image of a human body region in the target video picture;

carrying out image preprocessing on the image of the human body region;

and applying the trained deep learning model for dressing recognition to perform dressing recognition on the image of the human body region subjected to image preprocessing to obtain dressing characteristic information of the human body region.

Further, the acquiring the picture feature information of the target video picture includes:

constructing an initial picture sequence according to each frame of video picture of the video to be marked and the playing sequence of each frame of video picture;

labeling each frame of video picture in the initial picture sequence in sequence to obtain a sequence number of each frame of video picture, wherein the sequence number is used for identifying the position of the video picture in the video to be labeled;

sequentially extracting target video pictures of detected pedestrians and serial numbers thereof from the initial picture sequence;

and respectively determining the position information of the target video picture in the video to be marked according to the serial number of the target video picture.

Further, after the step of labeling the target video picture according to the picture feature information and the pedestrian feature information, the method further includes:

according to the pedestrian characteristic information, a personal picture set used for storing a target video picture of a pedestrian corresponding to the pedestrian characteristic information is constructed, and the target video picture in the personal picture set is subjected to association marking;

according to the association mark, carrying out statistical analysis on the target video pictures in the personal picture set according to a specified statistical algorithm;

and taking the result of the statistical analysis as new labeling information, and carrying out secondary labeling on the labeled target video picture.

Further, the pedestrian picture labeling method further comprises the following steps:

storing the marked target video picture into a temporary picture set, and performing marking verification on the target video picture in the temporary picture set;

and storing the temporary picture set with the verified mark into a mark folder corresponding to the video to be marked.

Further, the performing annotation verification on the target video picture in the temporary picture set includes:

extracting a specified number of target video pictures from the temporary video set, and acquiring the labeling information of the specified number of target video pictures;

taking the specified number of target video pictures as verification sample pictures, and sending the verification sample pictures to a labeling platform for manual labeling to obtain manual labeling information;

calculating the text similarity between the marking information of the verification sample picture and the artificial marking information;

and if the text similarity reaches a preset similarity threshold, the target video picture in the temporary picture set passes verification.

In a second aspect, an embodiment of the present application provides a pedestrian picture annotation device, including:

the pedestrian detection unit is used for playing a video to be marked and carrying out pedestrian detection on the basis of each frame of video picture contained in the video to be marked;

the pedestrian characteristic information acquisition unit is used for acquiring a target video picture of a detected pedestrian and extracting pedestrian characteristic information in the target video picture;

the picture characteristic information acquisition unit is used for acquiring the picture characteristic information of the video picture of the target;

and the pedestrian picture marking unit is used for marking the target video picture according to the picture characteristic information and the pedestrian characteristic information.

In a third aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the method for annotating a pedestrian picture as set forth in the first aspect of the embodiment of the present application is implemented.

In a fourth aspect, an embodiment of the present application provides an intelligent device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor, when executing the computer program, implements the pedestrian picture annotation method as set forth in the first aspect of the embodiment of the present application.

In a fifth aspect, an embodiment of the present application provides a computer program product, which when running on a terminal device, causes the terminal device to execute the method for labeling a pedestrian picture according to the first aspect.

In the embodiment of the application, treat the video of mark through the broadcast, based on treat that each frame video picture that the video of mark contains carries out pedestrian detection, when detecting the pedestrian the automatic acquisition detects pedestrian's target video picture, draws pedestrian's characteristic information in the target video picture reacquires the picture characteristic information of target video picture, according to picture characteristic information with pedestrian characteristic information is automatic right the target video picture marks, can improve the efficiency of handling pedestrian's picture mark in the magnanimity video to, combine pedestrian characteristic information and picture characteristic information to mark the pedestrian picture in the video, can improve the accuracy and the validity of mark, thereby make the suitability of the pedestrian picture after the mark higher.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a flowchart of an implementation of a pedestrian picture annotation method according to an embodiment of the present invention;

fig. 2 is a flowchart of a specific implementation of extracting pedestrian feature information by the pedestrian picture labeling method according to the embodiment of the present invention;

fig. 3 is a flowchart of a specific implementation of the pedestrian picture labeling method S103 according to the embodiment of the present invention;

fig. 4 is a flowchart of a specific implementation of the pedestrian picture labeling method S104 according to the embodiment of the present invention;

FIG. 5 is a flowchart illustrating an implementation of a method for labeling a pedestrian picture according to another embodiment of the present invention;

fig. 6 is a flowchart illustrating an implementation of the pedestrian picture annotation method S505 according to another embodiment of the present invention;

fig. 7 is a block diagram of a pedestrian picture labeling apparatus according to an embodiment of the present invention;

fig. 8 is a schematic diagram of an intelligent device provided in an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail. Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

The pedestrian picture marking method can process pedestrian picture marking of massive videos, effectively reduces time and labor for marking the pedestrian pictures of the videos, and improves efficiency for marking the pedestrian pictures.

Fig. 1 shows an implementation process of a pedestrian picture annotation method provided by the embodiment of the application, and the method includes steps S101 to S104. The specific realization principle of each step is as follows:

s101: and playing the video to be marked, and detecting the pedestrian based on each frame of video picture contained in the video to be marked.

Specifically, in the embodiment of the present application, the pedestrian detection is performed in the playing process of the video to be labeled, that is, the pedestrian detection is performed in the aspect of playing the video to be labeled while playing. The video to be marked is played and refers to the video to be subjected to pedestrian picture marking. And carrying out pedestrian detection based on each frame of video picture contained in the video to be labeled by playing the video to be labeled. Each frame of video picture included in the video to be labeled refers to each frame of picture (i.e., video picture) displayed during video playing. The pedestrian detection means detecting whether a pedestrian exists in the video picture. The pedestrian detection comprises face detection, and whether a face exists in the video picture is detected through a face recognition technology. The pedestrian detection also comprises human body detection, and whether a human body exists in the video picture is detected through a human body identification technology.

Optionally, in this embodiment of the application, in the playing process of the video to be labeled, a video pause instruction of a user is monitored, where the video pause instruction is used to pause the video, and perform pedestrian detection on a video picture corresponding to a currently played picture when the video is paused. Alternatively, to save labor, pedestrian detection may be performed by automatically performing pedestrian detection on each frame of video picture constituting the video. In the embodiment of the application, the pedestrian detection can be automatically performed on each frame of video picture of the video to be marked, and the pedestrian detection can also be performed on the video picture corresponding to the video playing pause picture by monitoring the video pause instruction of the user. The video pause instruction comprises a key pause instruction, a touch pause instruction and a voice pause instruction. The user can pause playing the video by triggering the pause key, can also pause playing the video by touching the screen, and can pause playing the video by voice containing the specified keywords.

Optionally, in this embodiment of the present application, the playing speed of the video to be labeled is adjustable, that is, the playing speed may be fast forward or slow down. And playing the video to be marked according to the playing speed carried by the playing speed control instruction by acquiring the playing speed control instruction input by the user. The video to be annotated can be an original video shot by the camera device, can be a video shot by the camera devices arranged at different positions in the same time period, can also be a video shot by the same camera device in different time periods, or a video shot by the camera devices at different positions in different time periods.

Optionally, in this embodiment of the application, two or more video channels are constructed, and two or more videos to be annotated are played simultaneously based on the two or more video channels. Specifically, if a multi-video playing instruction of the user is monitored, two or more than two video playing windows are started to simultaneously play different videos to be marked according to the multi-video playing instruction, and the number of the videos to be marked which are simultaneously played corresponds to the multi-video playing instruction, so that the marking efficiency can be further improved.

S102: the method comprises the steps of obtaining a target video picture of a detected pedestrian and extracting pedestrian characteristic information in the target video picture.

In this embodiment of the application, after the pedestrian detection is performed according to the step S101, if a pedestrian is detected in the video picture of the video to be labeled, the current playing picture is automatically captured, a target video picture in which the pedestrian is detected is obtained, and the pedestrian feature information of the pedestrian in the target video picture is extracted. Optionally, after a target video picture of a detected pedestrian is obtained, before the pedestrian characteristic information in the target video picture is extracted, denoising processing is performed on the obtained target video picture to improve the definition of the target video picture, so that the extracted pedestrian characteristic information is more accurate.

In the embodiment of the application, the pedestrian characteristic information includes one or more of face characteristic information, body characteristic information and dressing characteristic information. And when the human face and/or the human body are/is detected in the video playing picture to be marked, intercepting a target video picture of the detected human face and/or the detected human body, and extracting pedestrian characteristic information from the target video picture. Specifically, the human body features may include the outline of a pedestrian, and in different cameras, due to changes in scale, illumination and angle, the appearance features of the same pedestrian in different pictures may change to some extent, so that in the above-mentioned target video picture in which a pedestrian is detected, the pedestrian is not necessarily distinguished, but only whether there is pedestrian feature information in the target video picture is detected. When the human body characteristic information is detected, the human face characteristic information and the dressing characteristic information are further extracted, so that the accuracy of labeling is improved. The dressing characteristic information comprises the types of dresses such as coats and single clothes, the colors of dresses such as white and red, and whether a backpack or a hat is worn. The face feature information comprises the face shape and five sense organs, such as the eye size, the nose bridge height and the like, and the accuracy of pedestrian re-identification can be improved by detecting the face feature information and the dressing feature information.

Optionally, as an embodiment of the present application, the pedestrian feature information includes face feature information, the face feature information of the pedestrian extracted from the target video picture may be obtained by detecting face key points in the target video picture, the face key points are pre-specified face feature points, a face region in the target video picture is obtained according to the face key points, the feature extraction is performed on the face region, and the face feature information of the pedestrian in the target video picture is obtained. Specifically, the MTCNN is used to detect the target video picture, and face key points are detected in the target video picture, where the face key points include left and right eyes, a nose tip, and left and right mouth corners in this embodiment. Further, the faces detected in the target video picture are not necessarily right opposite, and faces at various angles bring difficulties to a face feature extraction algorithm. In the embodiment of the application, the face image in the target video image is corrected by affine transformation according to the face key points and the unified rule, and the influence caused by the diversity of face posture angles can be reduced by correction operation, so that the extraction of face feature information is more accurate.

Optionally, as an embodiment of the present application, the pedestrian feature information includes human feature information, the human feature information of the pedestrian in the target video picture is extracted by detecting a human key point in the target video picture, a human region in the target video picture is obtained according to the detected human key point, the human region is subjected to feature extraction, and the human feature information of the pedestrian in the target video picture is obtained. The human body key points are pre-designated human body feature points. Specifically, openpos is used to detect a specified number of human key points, for example, 18 specified human key points are detected, and a human region in the target video picture is obtained according to the detected specified number of human key points. In particular, human body key points generally correspond to keys with a certain degree of freedom on the human body, such as neck, shoulder, elbow, wrist, waist, knee, ankle, and the like. In the embodiment of the application, after a specified number of human key points are detected, the detected human key points in the picture are correctly associated according to the preset position relationship of each human key point, and a human body area is determined according to the association.

Optionally, as an embodiment of the present application, the pedestrian feature information includes clothing feature information, and the clothing feature information includes one or more of a clothing type, a clothing color, a clothing style, and an accessory, and fig. 2 shows a specific implementation process for extracting pedestrian feature information in the target video picture, which is provided by an embodiment of the present application and is detailed as follows:

a1: and acquiring an image of the human body region in the target video picture. Specifically, image segmentation is performed on a target video picture of a detected pedestrian, an image of a human body region in the target video picture is obtained, or the image of the human body region in the target video picture is obtained according to the extraction of the human body feature information. In this embodiment, the human body region includes at least one of a head region, an upper body region and a lower body region, and may be at least one of other types of human body regions such as a head region, an upper limb region, a lower limb region, an upper torso region, and the like, which is not limited herein.

A2: and carrying out image preprocessing on the image of the human body region. The image preprocessing operation comprises one or more of cutting, rotating, turning, adjusting brightness and adjusting contrast. And the image which is clearer and easier to distinguish is obtained through image preprocessing, so that the accuracy of image identification is improved.

A3: and applying the trained deep learning model for dressing recognition to perform dressing recognition on the image of the human body region subjected to image preprocessing to obtain dressing characteristic information of the human body region.

Specifically, in the embodiment of the present application, the training of the deep learning model for clothing recognition includes:

a1, acquiring an image of a sample human body area and sample dressing characteristic information corresponding to the image of the sample human body area;

a2, preprocessing the image of the sample human body area;

a3, calling a deep learning engine to train the images of the sample human body area subjected to the image preprocessing and the sample clothing characteristic information corresponding to the images of the sample human body area until the error is within an allowable range, and obtaining a trained deep learning model.

Furthermore, the dressing characteristic information can comprise one or more of clothing type, clothing color, clothing style and accessories, and the clothing type, clothing color, clothing style and accessories are also various, so that the training can be respectively carried out according to different sample dressing characteristic information corresponding to the images of the sample human body area to obtain a plurality of deep learning models, and the trained deep learning models are applied to carry out dressing identification on the images of the human body area subjected to image preprocessing, so that the precision of the dressing characteristic information identification is improved.

S103: and acquiring picture characteristic information of the target video picture.

Specifically, the picture feature information includes position information of the target video picture, where the position information refers to a position of the intercepted target video picture in the video to be labeled.

As an embodiment of the present application, the picture feature information includes picture position information, and fig. 3 shows a specific implementation flow of the pedestrian picture labeling method S103 provided in the embodiment of the present application, which is detailed as follows:

b1: and constructing an initial picture sequence according to each frame of video picture of the video to be marked and the playing sequence of each frame of video picture. Specifically, the video to be labeled is composed of one frame and one frame of video pictures, and the initial picture sequence includes all the video pictures constituting the video to be labeled.

B2: and labeling each frame of video picture in the initial picture sequence in sequence to obtain a sequence number of each frame of video picture, wherein the sequence number is used for identifying the position of the video picture in the video to be labeled. Specifically, the video pictures in the initial picture sequence are sequentially labeled according to the playing sequence of each frame of video picture, so as to obtain the serial numbers of the video pictures in each frame in the initial picture sequence. For example, the video to be labeled is composed of N frames of video pictures, the initial picture sequence includes N frames of video pictures, and the video pictures in the initial picture sequence are sequentially labeled according to the playing sequence of the N frames of video pictures, the sequence numbers of the video pictures in the initial picture sequence are 1 to N, and the sequence number 1 or the sequence number N represents the position of the video picture in the initial picture sequence, that is, the video to be labeled.

B3: and sequentially extracting the target video pictures of the detected pedestrians and the serial numbers thereof from the initial picture sequence. For example, if a pedestrian is detected in the 1 st frame of video picture, the 2 nd frame of video picture, the 6 th frame of picture, the 7 th frame of picture and the nth frame of picture in the initial picture sequence, the 1 st frame of video picture, the 2 nd frame of video picture, the 6 th frame of picture, the 7 th frame of picture and the nth frame of picture are target video pictures, the 1 st frame of video picture, the 2 nd frame of video picture, the 6 th frame of picture, the 7 th frame of picture and the nth frame of picture are sequentially extracted from the initial picture sequence, and the serial numbers corresponding to the extracted target video pictures are 1, 2, 6, 7 and N respectively.

B4: and respectively determining the position information of the target video picture in the video to be marked according to the serial number of the target video picture.

Specifically, the video to be labeled comprises a frame of video picture, an initial picture sequence of the video to be labeled is constructed according to the playing sequence of the video to be labeled, the frames of the picture sequence are respectively labeled in sequence to obtain the serial numbers of the frames of the video picture, and the position information of the intercepted video picture in the video to be labeled can be determined according to the serial numbers of the initial picture sequence forming the video to be labeled. And sequentially extracting target video pictures of detected pedestrians and serial numbers of the target video pictures in the videos to be labeled according to the playing sequence of the videos to be labeled in the picture sequence to construct a pedestrian picture sequence, wherein the pedestrian picture sequence comprises the target video pictures of the detected pedestrians, which are sequenced according to the playing sequence of the videos to be labeled, sequencing the target video pictures in the pedestrian picture sequence according to the serial numbers, and searching the positions of the target video pictures in the initial picture sequence of the videos to be labeled according to the serial numbers. Illustratively, the initial picture sequence includes N frames of video pictures, a sequence number of a video picture in the initial picture sequence is 1 to N, if a sequence number of a target video picture of a pedestrian is detected to be 1, the target video picture is a1 st frame of video picture in the initial picture sequence of the video to be annotated, and if the sequence number of the target video picture is N, the target video picture is an nth frame of video picture in the initial picture sequence of the video to be annotated.

In the embodiment of the application, the position of the target video picture of the detected pedestrian in the video to be labeled can be determined through the position information, so that the target video picture of the pedestrian can be quickly searched from the video.

Optionally, the picture feature information of the target video picture further includes shooting address information of the video picture, where the shooting address information refers to an address of a scene in the target video picture, that is, an address of video shooting, and the address information of the target video picture can be determined by querying a database for storing a correspondence between an ID of a video to be tagged and the video shooting address.

S104: and marking the target video picture according to the picture characteristic information and the pedestrian characteristic information.

Specifically, the labeling of the target video picture is to add information available for analysis to the target video picture, for example, to use the labeled target video picture as a sample picture for neural network model training. In the embodiment of the application, the image characteristic information and the pedestrian characteristic information are used as marking information, and the mapping relation between the marking information and the target video picture is established and stored, so that the target video picture is automatically marked, the automatic pedestrian picture marking is realized, and the labor is saved.

Optionally, in this embodiment of the application, the face feature information further includes a size of the face region and a position of the face region in the target video picture, a closed curve is generated according to the size of the face region and the position of the face region, and the face region may be framed by the closed curve, for example, a rectangular frame. The human body feature information further comprises the size of the human body region and the position of the human body region in the target video picture. The body region may be framed by a closed curve, such as a rectangular frame. If the same target video picture comprises more than one pedestrian, namely a plurality of faces or a plurality of human bodies are detected, a plurality of rectangular frames appear on the target video picture, and the rectangular frames are marked and distinguished by numbers. Specifically, the corresponding relationship between the image feature information, the pedestrian feature information and a closed curve, such as a rectangular frame, on the target video image is established, and the corresponding relationship is stored, and may be stored in a table form, or may be stored after being processed according to a specified format rule.

Optionally, as an embodiment of the present application, as shown in fig. 4, after the step S104, the method further includes:

c1: according to the pedestrian characteristic information, constructing a personal picture set used for storing a target video picture of a pedestrian corresponding to the pedestrian characteristic information, carrying out association marking on the target video pictures in the personal picture set, specifically, the same pedestrian characteristic information corresponds to one pedestrian, different pedestrian characteristic information corresponds to different pedestrians, and the same pedestrian characteristic information is determined as pedestrian characteristic information corresponding to one pedestrian, according to the above description, by enclosing the curve frame to the human face or human body area in the target video picture, and determining the closed curve corresponding to the pedestrian characteristic information, namely determining all the target video pictures corresponding to the pedestrian characteristic information according to the corresponding relation among the picture characteristic information, the pedestrian characteristic information and the closed curve on the target video picture, and related marking all the target video pictures corresponding to the same pedestrian characteristic information. For example, the pedestrian number corresponding to the pedestrian characteristic information is determined, and all target video pictures corresponding to the same pedestrian characteristic information are marked with the pedestrian number.

C2: and according to the association marks, carrying out statistical analysis on the target video pictures in the personal picture set according to a specified statistical algorithm. Specifically, the number of frames of the target video picture in the personal picture set is counted.

C3: and taking the result of the statistical analysis as new labeling information, and carrying out secondary labeling on the labeled target video picture.

Specifically, in the embodiment of the present application, a target video picture including the same pedestrian feature information is extracted from the pedestrian picture sequence, a personal picture set which is specially used for storing the target video picture of the pedestrian corresponding to the same pedestrian feature information is constructed, the target video picture in the personal picture set may be labeled in a correlated manner, the number of frames of the target video picture in the personal picture set is counted, the time length of the pedestrian corresponding to the pedestrian feature information appearing in the video to be labeled is determined, and the number of times and the position of the pedestrian corresponding to the pedestrian feature information appearing in the video to be labeled can be determined according to the picture feature information of the target video picture in the personal picture set, so that the accuracy of labeling is further improved.

In the embodiment of the application, treat the video of mark through the broadcast, based on treat that each frame video picture of the video of mark carries out pedestrian detection, when detecting the pedestrian automatic acquisition target video picture draws pedestrian's characteristic information in the target video picture reacquires the picture characteristic information of target video picture, according to picture characteristic information with pedestrian's characteristic information, it is automatic right target video picture marks, can improve the efficiency of handling pedestrian's picture mark in the magnanimity video to, combine pedestrian's characteristic information and picture characteristic information to mark the pedestrian's picture in the video, can improve the accuracy and the validity of mark, thereby make the suitability of the pedestrian's picture after the mark higher.

Further, another embodiment of the present application is provided based on the pedestrian picture labeling method provided in the embodiment of fig. 1. In this embodiment of the application, on the basis of steps S101 to S104 shown in fig. 1, as shown in fig. 5, the pedestrian picture annotation method further includes:

s505: and storing the marked target video picture into a temporary picture set, and performing marking verification on the target video picture in the temporary picture set.

In the embodiment of the application, the marked target video picture is stored in a temporary picture set, and marking verification is performed on the target video picture in the temporary picture set. Specifically, the marked target video pictures are sequentially stored in a temporary picture set according to the position sequence of the target video pictures in the video to be marked, and the target video pictures in the temporary picture set are manually checked and verified.

As an embodiment of the present application, as shown in fig. 6, the step of performing annotation verification on the target video picture in the temporary picture set specifically includes:

d1: and extracting a specified number of target video pictures from the temporary video set, and acquiring the labeling information of the specified number of target video pictures. Specifically, a random algorithm is used for extracting a plurality of target video pictures from the temporary video set to serve as verification sample pictures, and the extracted number is determined in proportion according to the total number of the target video pictures in the temporary video set.

D2: and taking the specified number of target video pictures as verification sample pictures, sending the verification sample pictures to a labeling platform for manual labeling, and acquiring manual labeling information. Specifically, the manual labeling means that the pedestrian characteristic information in the verification sample picture is obtained through manual confirmation, the picture characteristic information of the verification sample picture is obtained through manual work, and the pedestrian characteristic information and the picture characteristic information which are obtained through manual work are used as manual labeling information. Optionally, a picture sequence is constructed by target video pictures in the temporary video set according to a position sequence, the target video pictures with specified intervals are extracted as verification sample pictures, and the verification sample pictures are sent to a labeling platform for manual labeling.

D3: and calculating the text similarity between the marking information of the verification sample picture and the artificial marking information.

D4: and if the text similarity reaches a preset similarity threshold, the target video picture in the temporary picture set passes verification. And if the text similarity does not reach a preset similarity threshold value, the verification of the target video picture in the temporary picture set is not passed.

In the embodiment of the application, a certain number of labeled target video pictures are randomly extracted to carry out manual labeling, the similarity comparison is carried out on the labeling information of the automatically labeled target video pictures and the labeling information of the manual labeling, the accuracy of automatic labeling is verified, and the reliability of automatic pedestrian picture labeling is effectively guaranteed.

S506: and storing the temporary picture set with the verified mark into a mark folder corresponding to the video to be marked.

In the embodiment of the application, treat the video of mark through the broadcast, based on treat that each frame video picture that the video of mark contains carries out pedestrian detection, when detecting the pedestrian automatic acquisition target video picture draws pedestrian's characteristic information in the target video picture, then obtains the picture characteristic information of target video picture, according to picture characteristic information with pedestrian characteristic information is automatic right target video picture marks, can improve the efficiency of handling pedestrian's picture mark in the magnanimity video to, combine pedestrian characteristic information and picture characteristic information to mark the pedestrian picture in the video, can improve the accuracy and the validity of mark, thereby make the suitability of the pedestrian picture after the mark higher. And finally, the temporary picture set with the verified mark is stored into a mark folder corresponding to the video to be marked, so that the marking effectiveness is further improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 7 shows a structural block diagram of a pedestrian image annotation device provided in the embodiment of the present application, which corresponds to the pedestrian image annotation method described in the foregoing embodiment, and only shows parts related to the embodiment of the present application for convenience of description.

Referring to fig. 7, the pedestrian picture labeling apparatus includes: a pedestrian detection unit 71, a pedestrian characteristic information acquisition unit 72, a picture characteristic information acquisition unit 73, a pedestrian picture labeling unit 74, wherein:

the pedestrian detection unit 71 is configured to play a video to be labeled, and perform pedestrian detection based on each frame of video picture included in the video to be labeled;

a pedestrian characteristic information acquiring unit 72, configured to acquire a target video picture in which a pedestrian is detected, and extract pedestrian characteristic information in the target video picture;

a picture feature information obtaining unit 73, configured to obtain picture feature information of the target video picture;

and a pedestrian picture labeling unit 74, configured to label the target video picture according to the picture feature information and the pedestrian feature information.

Optionally, the pedestrian feature information includes dress feature information, and the pedestrian feature information acquiring unit 72 includes:

the region image acquisition module is used for acquiring an image of a human body region in the target video picture;

the image preprocessing module is used for preprocessing the image of the human body region;

and the dressing information acquisition module is used for applying the trained deep learning model for dressing identification to perform dressing identification on the image of the human body area subjected to image preprocessing to obtain the dressing characteristic information of the human body area.

Optionally, the picture feature information includes picture position information, and the picture feature information obtaining unit 73 includes:

the initial sequence construction module is used for constructing an initial picture sequence according to each frame of video picture of the video to be marked and the playing sequence of each frame of video picture;

a picture labeling module, configured to label, in sequence, each frame of video picture in the initial picture sequence, to obtain a sequence number of each frame of video picture, where the sequence number is used to identify a position of the video picture in the video to be labeled;

the serial number extraction module is used for sequentially extracting a target video picture of the detected pedestrian and a serial number thereof from the initial picture sequence;

and the position information determining module is used for respectively determining the position information of the target video picture in the video to be marked according to the serial number of the target video picture.

Optionally, the pedestrian picture labeling device further includes:

the association marking unit is used for constructing a personal picture set used for storing a target video picture of a pedestrian corresponding to the pedestrian characteristic information according to the pedestrian characteristic information and carrying out association marking on the target video picture in the personal picture set;

the information analysis unit is used for carrying out statistical analysis on the target video pictures in the personal picture set according to the association marks and a specified statistical algorithm;

and the picture labeling unit is used for performing secondary labeling on the labeled target video picture by taking the result of the statistical analysis as new labeling information.

Optionally, the pedestrian picture labeling device further includes:

the annotation verification unit is used for storing the annotated target video picture into a temporary picture set and performing annotation verification on the target video picture in the temporary picture set;

and the information storage unit is used for storing the temporary picture set with the verified annotation into an annotation folder corresponding to the video to be annotated.

Optionally, the annotation verification unit includes:

the information extraction module is used for extracting a specified number of target video pictures from the temporary video set and acquiring the labeling information of the specified number of target video pictures;

the manual labeling module is used for sending the specified number of target video pictures as verification sample pictures to a labeling platform for manual labeling to obtain manual labeling information;

the similarity calculation module is used for calculating the text similarity between the marking information of the verification sample picture and the artificial marking information; and if the text similarity reaches a preset similarity threshold, the target video picture in the temporary picture set passes verification.

An embodiment of the present application further provides a computer-readable storage medium, where computer-readable instructions are stored, and when executed by a processor, the computer-readable instructions implement the steps of any one of the pedestrian picture labeling methods shown in fig. 1 to 6.

The embodiment of the present application further provides a computer program product, which when running on an intelligent device, causes the intelligent device to execute the steps of implementing any one of the pedestrian picture labeling methods shown in fig. 1 to fig. 6.

The embodiment of the present application further provides an intelligent device, which includes a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, where the processor executes the computer readable instructions to implement the steps of any one of the pedestrian picture labeling methods shown in fig. 1 to 6.

Fig. 8 is a schematic diagram of an intelligent device provided in an embodiment of the present application. As shown in fig. 8, the smart device 8 of this embodiment includes: a processor 80, a memory 81, and computer readable instructions 82 stored in the memory 81 and executable on the processor 80. The processor 80, when executing the computer readable instructions 82, implements the steps in the above embodiments of the method for labeling a pedestrian picture, such as the steps 101 to 104 shown in fig. 1. Alternatively, the processor 80, when executing the computer readable instructions 82, implements the functions of the modules/units in the device embodiments described above, such as the functions of the units 71 to 74 shown in fig. 7.

Illustratively, the computer readable instructions 82 may be partitioned into one or more modules/units that are stored in the memory 81 and executed by the processor 80 to accomplish the present application. The one or more modules/units may be a series of computer-readable instruction segments capable of performing specific functions, which are used to describe the execution of the computer-readable instructions 82 in the smart device 8.

The intelligent device 8 may be a server, a notebook, a palm computer, a cloud intelligent device, or other computing devices. The smart device 8 may include, but is not limited to, a processor 80, a memory 81. Those skilled in the art will appreciate that fig. 8 is merely an example of a smart device 8 and does not constitute a limitation of the smart device 8 and may include more or less components than those shown, or combine certain components, or different components, for example, the smart device 8 may also include input-output devices, network access devices, buses, etc.

The Processor 80 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 81 may be an internal storage unit of the intelligent device 8, such as a hard disk or a memory of the intelligent device 8. The memory 81 may also be an external storage device of the Smart device 8, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the Smart device 8. Further, the memory 81 may also include both an internal storage unit and an external storage device of the smart device 8. The memory 81 is used to store the computer readable instructions and other programs and data required by the smart device. The memory 81 may also be used to temporarily store data that has been output or is to be output.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-Only Memory (ROM), random-access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A pedestrian picture labeling method is characterized by comprising the following steps:

acquiring picture characteristic information of the target video picture;

2. The method for labeling pedestrian pictures according to claim 1, wherein the pedestrian feature information includes clothing feature information, and the extracting pedestrian feature information from the target video picture includes:

acquiring an image of a human body region in the target video picture;

carrying out image preprocessing on the image of the human body region;

3. The pedestrian picture labeling method according to claim 1, wherein the picture feature information includes picture position information, and the acquiring the picture feature information of the target video picture includes:

4. The method for labeling pedestrian pictures according to claim 1, wherein after labeling the target video picture according to the picture characteristic information and the pedestrian characteristic information, the method comprises:

5. The pedestrian picture labeling method according to claim 1, further comprising:

6. The pedestrian picture labeling method of claim 5, wherein the performing labeling verification on the target video picture in the temporary picture set comprises:

7. A pedestrian picture labeling device is characterized by comprising:

the picture characteristic information acquisition unit is used for acquiring the picture characteristic information of the target video picture;

8. The pedestrian picture labeling apparatus of claim 7, wherein said pedestrian picture labeling apparatus further comprises:

9. A computer-readable storage medium, in which a computer program is stored, and the computer program is executed by a processor to implement the pedestrian picture labeling method according to any one of claims 1 to 6.

10. An intelligent device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the pedestrian picture annotation method according to any one of claims 1 to 6 when executing the computer program.