WO2017000115A1 - 行人再识别方法及设备 - Google Patents

行人再识别方法及设备 Download PDF

Info

Publication number
WO2017000115A1
WO2017000115A1 PCT/CN2015/082639 CN2015082639W WO2017000115A1 WO 2017000115 A1 WO2017000115 A1 WO 2017000115A1 CN 2015082639 W CN2015082639 W CN 2015082639W WO 2017000115 A1 WO2017000115 A1 WO 2017000115A1
Authority
WO
WIPO (PCT)
Prior art keywords
pedestrian
depth image
frame
depth
skeleton joint
Prior art date
Application number
PCT/CN2015/082639
Other languages
English (en)
French (fr)
Inventor
俞刚
李超
尚泽远
何奇正
Original Assignee
北京旷视科技有限公司
北京小孔科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京旷视科技有限公司, 北京小孔科技有限公司 filed Critical 北京旷视科技有限公司
Priority to CN201580000333.7A priority Critical patent/CN105518744B/zh
Priority to PCT/CN2015/082639 priority patent/WO2017000115A1/zh
Publication of WO2017000115A1 publication Critical patent/WO2017000115A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present disclosure relates to image processing, and in particular to a pedestrian re-identification method, apparatus, and computer program product.
  • Person re-identification refers to the identification of a target pedestrian from a pedestrian image library or video stream originating from a plurality of non-overlapping camera fields of view.
  • pedestrian re-identification can achieve long-term tracking and monitoring of specific pedestrians in different background environments and multi-camera settings, so it has a very large application prospect in the field of monitoring.
  • re-identification of pedestrians in shopping malls makes it possible to track the trajectory of the pedestrian under multiple cameras, and then to analyze and count the possible consumption behaviors.
  • the pedestrian re-identification technology can automatically identify the target pedestrian and report to the monitoring system operator, so that the operator does not need to perform time-consuming and laborious manual observation and recognition.
  • pedestrian recognition is usually based on the underlying information such as color and texture of pedestrians in images or videos.
  • the effect is often not ideal.
  • the main reason is that pedestrians may have different viewing angles under different cameras; The areas covered by the camera often do not overlap; the lighting conditions at different camera locations may be different, resulting in the appearance of the same object under different cameras may vary greatly; pedestrians may walk sideways or sideways towards the camera, resulting in the inability to capture To the face information, or even if the face information can be captured, since the resolution of the surveillance camera is usually low, the face cannot be clearly seen.
  • a pedestrian re-recognition method including: detecting a pedestrian in each frame depth image of the depth video; performing skeleton joint point extraction for each pedestrian in each frame depth image; a skeleton joint point, normalizing each pedestrian's posture in each frame depth image to a posture at a predetermined angle of view; for each pedestrian in each frame depth image, extracting the attribute characteristics of the pedestrian after the posture is normalized; And identifying a target pedestrian from the depth video based on a similarity of the attribute feature to a corresponding attribute feature of the target pedestrian.
  • a pedestrian re-identification device including: a processor; a memory; and computer program instructions stored in the memory.
  • the computer program instructions when executed by the processor, perform the steps of: detecting pedestrians in each frame depth image of the depth video; performing skeleton joint point extraction for each pedestrian in each frame depth image; a skeleton joint point, normalizing each pedestrian's posture in each frame depth image to a posture at a predetermined angle of view; for each pedestrian in each frame depth image, extracting the attribute characteristics of the pedestrian after the posture is normalized; And identifying a target pedestrian from the depth video based on a similarity of the attribute feature to a corresponding attribute feature of the target pedestrian.
  • a computer program product for pedestrian re-identification comprising a computer readable storage medium on which computer program instructions are stored, the computer program instructions being The processor executes to cause the processor to: detect a pedestrian in each frame depth image of the depth video; for each pedestrian in each frame depth image, perform skeleton joint point extraction; according to the extracted skeleton joint point, each The pose of each pedestrian in a frame of depth image is normalized to a pose at a predetermined angle of view; for each pedestrian in each frame of the depth image, the attribute of the pedestrian after normalizing the pose is extracted; and based on the attribute feature The similarity of the corresponding attribute features of the target pedestrian, identifying the target pedestrian from the depth video.
  • a pedestrian re-recognition apparatus comprising: detecting means configured to detect a pedestrian in each frame depth image of the depth video; skeleton extraction means configured to be in each frame depth image Each of the pedestrians performs skeleton joint point extraction; the normalization device is configured to normalize the posture of each pedestrian in each frame depth image to a posture at a predetermined angle of view according to the extracted skeleton joint points; the feature extraction device, Configuring to extract, for each pedestrian in each frame depth image, an attribute feature of the pedestrian after the posture is normalized; and identifying means configured to determine a similarity between the attribute feature and a corresponding attribute feature of the target pedestrian from the Identify target pedestrians in depth video.
  • the method, apparatus, and computer program product according to the above aspects of the present disclosure effectively utilizes depth information of pedestrians in images and videos, greatly improving the accuracy of re-recognition of people in different background environments and multi-camera settings.
  • FIG. 1 shows a schematic flow chart of a pedestrian re-identification method according to an embodiment of the present disclosure.
  • FIG. 2 illustrates an exemplary sub-image area obtained after segmentation of the foreground area.
  • Figure 3 shows a schematic skeleton joint point distribution for a pedestrian.
  • FIG. 4 shows a process performed for each pixel in a sub-image area corresponding to the pedestrian in the frame depth image when the skeleton joint point extraction processing is performed for a certain pedestrian in a certain frame depth image.
  • FIG. 5 illustrates an exemplary schematic diagram of a predetermined viewing angle of a shot.
  • FIG. 6 shows an exemplary structural block diagram of a pedestrian re-identification device in accordance with an embodiment of the present disclosure.
  • FIG. 7 shows a block diagram of an exemplary computing device for implementing an embodiment of the present disclosure.
  • depth images are used in the present disclosure for re-identification of pedestrians. It is well known in the art that a depth image is an image in which the value of each pixel in the image represents the distance between a point in the scene and the camera. Compared to grayscale images (color images), depth images have depth (distance) information of objects and are not affected by lighting conditions, and are therefore suitable for various applications requiring stereo information or scene change.
  • step S110 a pedestrian is detected in each frame depth image of the depth video.
  • the pedestrian re-identification technique according to the present disclosure can be applied to a case where the background environment is different and shooting is performed using a plurality of cameras. More specifically, according to the pedestrian re-identification technique of the present disclosure, the target depth video including the target pedestrian as the recognition target and the depth video to be analyzed from which the target pedestrian needs to be identified may be Shooting with different cameras or by a single camera at different times (different backgrounds).
  • the depth video described in this step is the depth video to be analyzed from which the target pedestrian needs to be identified, which is taken at a certain moment by a single depth camera different from the depth camera of the target pedestrian.
  • the depth camera that captures the depth video to be analyzed is configured in the same manner as the depth camera of the target pedestrian.
  • depth cameras are mounted at heights greater than 2 meters and are shot at a top view.
  • pedestrians can be detected from each frame depth image of the depth video to be analyzed by any suitable image detection technique in the art, which is not limited in the present disclosure.
  • any suitable image detection technique in the art, which is not limited in the present disclosure.
  • a brief description of one possible detection method will be made for the sake of completeness of the description.
  • the foreground region therein is first determined according to the value of each pixel in the image.
  • the foreground area is the area where the depth is different from the depth of the scene obtained by background modeling.
  • the process of acquiring the foreground area is well known in the art, and a detailed description thereof is omitted here.
  • the foreground region is segmented based on the depth information to obtain a plurality of sub-image regions.
  • connected area analysis (CCA) and pedestrian body detection methods eg, "Integral Channel Features" published by P. Dollar, Z. Tu, P. Perona, and S. Belongie, etc. at BMVC 2009.
  • FIG. 2 illustrates an exemplary sub-image area obtained after segmentation of the foreground area. As illustrated in Fig. 2, the sub-image area is represented by a rectangular frame circumscribing the body contour of the detected pedestrian.
  • each pedestrian detected in each frame depth image may be tracked to determine which other frames of the depth video to be analyzed appear in the pedestrian, and determine the location of the pedestrian in the frames.
  • the depth video to be analyzed is taken by a single depth camera at a certain moment, so the tracking here is tracking under a single camera, such as Hungarian algorithm, A Milan, S Roth. K Schindler performs the tracking in various commonly used methods in the field, such as the method of "Continuous energy minimization for multitarget tracking" published by IEEE Transaction on Pattern Recognition and Machine Intelligence, 2014, to obtain a tracking segment of each pedestrian.
  • the tracking segment includes at least data describing which of the frame depth images in the depth video to be analyzed and the position in each frame depth image.
  • step S120 a skeleton is performed for each pedestrian in each frame depth image. Off node extraction.
  • the skeleton joint points can well describe the posture of the pedestrian, and the specific number can be set as needed. For example, it can be set to 20 defined in Microsoft Kinect, or set to 15 defined in Openni, and so on.
  • the skeleton joint points are set to six, representing the head, the left hand, the right hand, the chest center, the left foot, and the right foot.
  • step S120 the skeleton joint point extraction processing in step S120 will be described in detail with reference to FIG. 4 shows that when a skeleton joint point extraction process is performed for a certain pedestrian (for example, pedestrian A) in a certain frame depth image (for example, the Nth frame), for the frame depth image (Nth frame), the corresponding The processing performed by each pixel in the sub-image area of the pedestrian (Pedestrian A).
  • step S1201 matching pixels in the pre-established training set matching the current pixel (for example, pixel a) are determined, the training set includes a plurality of pedestrian depth images, and each pedestrian depth image is pre-marked The skeleton joint point of the pedestrian.
  • the matching pixels may be determined based on a feature description of the pixel and a relative position of the pixel in the sub-image area. Specifically, the feature description of the pixel a and its position in the sub-image region and the corresponding features of each pixel in the training set may be compared by various conventional methods in the art, such as a random forest algorithm, a hash algorithm, and the like. Find the matching pixels in the training set.
  • the feature description may be any suitable feature for describing a pixel.
  • each neighboring pixel in the range of 3 ⁇ 3 around the pixel a can be compared with the depth value of the pixel a, if the neighboring pixel is assigned a value of 1, otherwise the neighboring pixel is assigned a value of 0, and then the 3 is assigned.
  • a vector formed by combining the assigned values of each adjacent pixel within the range of ⁇ 3 is described as a feature of the pixel a.
  • the feature of the pixel a can also be simply described as its feature.
  • step S1202 the marker data of the matching pixel is extracted, and the marker data includes an offset of the matching pixel with respect to the skeleton joint point of the pedestrian in the pedestrian depth image in which it is located.
  • the marker data is pre-assigned when the training set is established, wherein the offset may be a three-dimensional position offset in space and includes a corresponding offset for each skeleton joint point of the pedestrian.
  • step S1203 based on the marker data and the relative position of the pixel in the sub-image area, each of the skeleton joint points of the pedestrian is voted.
  • the tag data of the matching pixel is used as the tag data of the pixel a. Since the tag data includes the offset of the pixel from the skeleton joint point of the pedestrian, the pixel a can be based on the pixel a in the sub-image region. Relative position and the marked data, it is estimated that each skeleton of pedestrian A is closed The location of the node.
  • This process is actually a voting process. Voting is a common method in the field of image processing (for example, the voting method is adopted in the classic Hough transform), which will not be described in detail here.
  • the skeleton joint points of the pedestrian may be voted based on the marker data of the plurality of matching pixels and the relative position of the pixel a in the sub-image region. More specifically, for example, the average value of the marker data of the plurality of matching pixels can be used as the marker data of the pixel a, and the position of each skeleton joint point of the pedestrian A can be estimated.
  • the processing performed on, for example, the pixel a in the sub-image area corresponding to the pedestrian in the frame depth image when the skeleton joint point extraction processing is performed on, for example, the pedestrian A in the depth image of the Nth frame is described with reference to FIG. .
  • the voting for each pixel may be accumulated and passed through, for example, mean shift ( The algorithm such as means-shift) determines the point with the most votes as the skeleton joint point. Thereby, each skeleton joint point of the pedestrian A can be extracted.
  • the above describes the extraction processing of the pedestrian skeleton joint point by taking the pedestrian A in the N-th depth image as an example.
  • the above-described processing is performed for each pedestrian in each frame depth image to extract its skeleton joint point.
  • the skeleton joint points extracted as described above can be optimized to eliminate the effects of errors that may exist during the voting process.
  • the extracted skeleton joint points can be optimized by a smoothing operation. Still taking the pedestrian A in the depth image of the Nth frame as an example, after extracting the skeleton joint point as above, based on the tracking segment of the pedestrian A, determining that the first m frame of the depth image of the Nth frame includes the pedestrian A The depth image and the last n frame include the depth image of the pedestrian A, and then the Nth frame is performed by, for example, a smoothing operation based on the skeleton joint points of the pedestrian A in the front m frame depth image and the back n frame depth image. The depth joint of the skeleton A of the pedestrian image is optimized.
  • step S130 the pose of each pedestrian in each frame depth image is normalized to the posture at a predetermined angle of view based on the extracted skeleton joint points.
  • the viewing angles of pedestrians under different cameras may vary greatly, and at different times, pedestrians may have different postures such as facing, facing or sideways toward the camera. Aspects can lead to image comparability due to differences in viewing angle and attitude The reduction, on the other hand, can result in the inability to obtain useful pedestrian attribute information, thus affecting the accuracy of re-identification. Therefore, in this step, the extracted skeleton joint points are used to normalize the posture of each pedestrian in each frame depth image to the posture at a predetermined viewing angle, thereby enhancing the comparability and increase between the images. Obtain useful attribute information to improve the accuracy of re-identification.
  • the posture of the pedestrian A can be normalized to the posture at the predetermined angle of view by the following processes (S1) and (S2):
  • the movement direction of the pedestrian A can be determined by calculating the difference between the position of each skeleton joint point of the pedestrian A in the previous frame and the corresponding position in the current frame, and the movement direction is taken as pedestrian A.
  • the orientation
  • the position coordinates of the normalized skeleton joint point are obtained by spatial coordinate transformation of the position coordinates of the skeleton joint point of the pedestrian, and the posture of the pedestrian is normalized to the posture at a predetermined angle of view.
  • the predetermined viewing angle may be preset according to specific needs.
  • the predetermined viewing angle includes a first viewing angle and a second viewing angle, wherein the first viewing angle is that the front side of the pedestrian is facing the camera, and the camera is horizontally aligned with the front side of the pedestrian, and the second viewing angle is the back side of the pedestrian.
  • the camera is horizontally aligned with the predetermined position on the back of the pedestrian.
  • FIG. 5 illustrates an exemplary schematic diagram of the first angle of view. As shown in Figure 5, the camera is perpendicular to the plane in which the pedestrian is located, i.e., the front of the pedestrian is facing the camera, and the camera is horizontally aligned with the tip of the nose of the pedestrian.
  • the pedestrian's posture should be normalized to The attitude at the first angle of view; if it is determined that the orientation of the pedestrian is within a range of 90° from the back side facing the camera to the left side and 90 degrees from the back side to the right side of the camera, the pedestrian's posture should be normalized to the second The attitude from the perspective.
  • the normalization of the above posture can be realized by spatial coordinate transformation of the position coordinates of the skeleton joint points of the pedestrian. Specifically, in the processing, first, the position coordinates of the skeleton joint point of the pedestrian are converted from the image coordinate system to the world coordinate system, and then the coordinate position in the world coordinate system is normalized, and finally the normalized The coordinate position in the world coordinate system is transformed back to the image coordinate system.
  • the spatial coordinate transformation process described above may be implemented in any suitable manner in the art, and the disclosure does not limit this. Below, for the sake of illustrative completeness, for a possible spatial coordinate transformation The process is described in a summary.
  • Transforming the position coordinates of the pedestrian's skeleton joint point from the image coordinate system to the world coordinate system can be realized by calibrating the internal and external parameters of the camera to obtain a rotation matrix and a translation matrix for coordinate transformation, which is a well-known technique in the art. A detailed description thereof is omitted.
  • the normalization processing of the coordinate positions in the world coordinate system can be realized by constructing a normalized transformation matrix by the least squares method. Taking the six skeleton joint points shown in Fig. 3 as an example, the joint points at the center of the chest are used as normalized reference points (of course, other joint points can also be selected), and the coordinates before and after the joint points of the chest center are normalized are assumed. Expressed by x_2 and y_2, respectively, Therefore, according to the positional relationship between the skeleton joint points shown in FIG.
  • ⁇ _1, ⁇ _2, ⁇ _3, ⁇ _1, and ⁇ _2 are preset parameters based on the proportion of the human body.
  • A is a 3 ⁇ 3 normalized transformation matrix
  • x_i and y_i respectively represent coordinates of each skeleton joint point before and after normalization
  • x_i and y_i are three-dimensional vectors.
  • the coordinate position in the world coordinate system after normalization can be obtained.
  • the predetermined viewing angle may be set to include four viewing angles, in addition to the foregoing first viewing angle and second viewing angle, including a third viewing angle of the right side facing the camera and a fourth viewing angle of the left side facing the camera.
  • the predetermined viewing angle may be set to include six viewing angles, and may include, in addition to the foregoing first to fourth viewing angles, a 45° angle of view facing the camera and a 45° angle of view facing away from the camera.
  • step S140 for each pedestrian in each frame depth image, the attribute feature of the pedestrian after the posture normalization is extracted.
  • the semantics of an image is hierarchical and can be specifically divided into low-level semantics, middle-level semantics, and high-level semantics.
  • the low-level semantics are used to describe the visual features of images, such as colors, textures, shapes, etc., with objectivity, which can be obtained directly from the image without any external knowledge
  • high-level semantics is the high-level image according to human cognition.
  • Abstract semantics including scene semantics, behavioral semantics, and emotional semantics
  • middle-level semantic features are proposed to reduce the semantic gap between low-level and high-level semantic features, and can usually be generated on the basis of low-level semantic feature analysis. Corresponds to visual word packages and semantic topics.
  • various middle-level semantic attribute features of the pedestrian after the posture normalization may be extracted, and at least the height of the pedestrian in the real world is included.
  • the underlying semantic features may include color features, texture features, gradient features, and the like as described above.
  • the color features adopt three different color channels of RGB, LUV, and YCbCr, and are represented by a histogram;
  • the texture features adopt a local binary mode, and are also represented by a histogram.
  • Gradient features are passed The image uses the sobel operator to find the gradient and is also represented in the form of a histogram.
  • the face feature is only used when the pedestrian is normalized to the posture at the first perspective (ie, the front of the pedestrian is facing the camera), and various face detection algorithms can be used to determine the specific location of the face and find the face.
  • the motion feature may be a position of a skeleton joint point after the posture of the skeleton joint point normalized by the pedestrian in the current frame depth image and the posture of the skeleton joint point in the depth image of the previous several frames (for example, the first 10 frames) The change in coordinates is expressed.
  • step S150 the target pedestrian is identified from the depth video based on the similarity of the attribute feature to the corresponding attribute feature of the target pedestrian.
  • the attribute feature of the pedestrian after the posture normalization has been extracted for each pedestrian in each frame depth image, and thus the correspondence between the attribute feature of each pedestrian and the target pedestrian can be obtained in this step.
  • the attribute characteristics are compared to identify the target pedestrians.
  • the corresponding attribute feature of the target pedestrian refers to the corresponding attribute feature of the target pedestrian extracted after the skeleton joint point extraction and the posture normalization processing are performed on the target pedestrian.
  • the tracking segment includes at least data describing which of the frame depth images in the depth video to be analyzed and the position in each frame depth image, and therefore, in this step, each frame may be A tracking segment for each pedestrian in the depth image determines all of the different pedestrians that appear in the depth video.
  • the target pedestrian After determining all the different pedestrians that appear in the depth video, you can determine if the target pedestrian is included. Specifically, for a pedestrian appearing in the depth video (which may appear in the multi-frame depth image of the depth video), if the gesture is normalized from the at least T-frame depth image including the pedestrian If the similarity between the attribute of the pedestrian and the corresponding attribute of the target pedestrian is greater than a predetermined threshold, then the pedestrian is determined to be the target pedestrian.
  • the value of T can be set according to specific needs.
  • the T value For example, if you want to reduce the amount of calculation for similarity comparison to quickly determine if the target contains pedestrians, you can set the T value to 1, so that for a pedestrian, as long as one frame contains the depth image of the pedestrian, If the similarity between the extracted attribute feature and the corresponding attribute feature of the target pedestrian is greater than a predetermined threshold, the pedestrian can be determined as the target pedestrian, so that the similarity between the depth image containing the pedestrian and the target pedestrian is no longer needed.
  • the amount of calculation of the degree comparison is more concerned with the accuracy of pedestrian re-identification, and the value of T can be increased accordingly.
  • the similarity comparison of the corresponding attribute features when the similarity comparison of the corresponding attribute features is performed, only the pedestrians having the same normalized posture as the target pedestrian may be compared with the target pedestrians. Specifically, if the posture after the normalization of the target pedestrian is the posture at the first perspective, only the pedestrian after the normalization in the depth video to be analyzed is similar to the pedestrian at the first perspective is similar to the target pedestrian. The degree of comparison makes it possible to reduce the amount of calculation of the similarity comparison.
  • the attribute characteristics of the pedestrians extracted from the depth image may be multiple, so when comparing the similarity with the corresponding attribute features of the target pedestrian, the corresponding attribute of each attribute of the pedestrian and the target pedestrian may be The features are compared separately, and the corresponding similarities are obtained, and then the overall similarity is determined by obtaining a weighted average.
  • the weight of each feature may be set according to a specific situation. For example, the weight of the face feature may be the largest, the weight of the bottom semantic feature is second, the middle layer semantic feature is again, and the weight of the pedestrian motion feature is the smallest.
  • the depth image of each of the pedestrians to be analyzed in the depth video to be analyzed may be determined based on the tracking segment of the pedestrian, thereby realizing re-recognition of the target pedestrian.
  • the continuity verification on the space-time domain may be performed to verify the re-identification result.
  • the continuity verification on the space-time domain can take various appropriate verification methods. For example, each feature of a pedestrian should generally be similar between adjacent frames. If the feature of the pedestrian in the adjacent frame depth image that contains the target pedestrian is determined to be too different, the re-recognition result may be considered to be If there is a problem, it may be necessary to re-recognize it.
  • the pedestrian re-identification method according to an embodiment of the present disclosure is described above with reference to the accompanying drawings, by which a target pedestrian can be identified from a depth video to be analyzed from a certain camera.
  • the target pedestrian can be identified from the large number of depth images to be analyzed by performing the re-identification method for each depth video to be analyzed.
  • the space-time domain analysis may be performed in advance to reduce the calculation amount of pedestrian re-recognition, so that the target pedestrians are quickly located in the plurality of videos.
  • the time and space domain analysis can be performed in a variety of suitable ways. For example, if it is determined that there is a target pedestrian in a depth image to be analyzed from a certain camera, according to the space-time domain continuity, the target pedestrian should be next appearing in the area near the camera, so that it can only come from Performing in the depth video to be analyzed of the camera near the camera Re-identification of the pedestrian.
  • the pedestrian re-recognition method utilizes depth video for target pedestrian recognition, which effectively utilizes depth information of pedestrians in images and videos to reduce the influence of illumination conditions, and by pedestrians
  • the normalization of the posture reduces the influence of the different viewing angles of different cameras and the incomplete information caused by the pedestrians facing away or the side facing the camera, thereby improving the accuracy of pedestrian recognition.
  • FIG. 6 shows an exemplary structural block diagram of a pedestrian re-identification device 600 in accordance with an embodiment of the present disclosure.
  • the pedestrian re-identification device may include a detection device 610, a skeleton extraction device 620, a normalization device 630, a feature extraction device 640, and an identification device 650, which may respectively perform the operations described above in connection with FIG.
  • the individual steps/functions of the pedestrian re-identification method The main functions of each device of the pedestrian re-identification device 600 will be described below, and the details already described above are omitted.
  • Detection device 610 can detect pedestrians in each frame depth image of the depth video.
  • the depth video is the depth video to be analyzed from which the target pedestrian needs to be identified, which is taken at a certain moment by a single depth camera different from the depth camera of the target pedestrian.
  • the detecting device 610 can detect pedestrians from each frame depth image of the depth video to be analyzed by any suitable image detecting technology in the art, which is not limited in the present disclosure.
  • the detecting device 610 may track each pedestrian detected in each frame depth image to determine which other frames of the depth video to be analyzed appear in the pedestrian, and determine that the pedestrian is in the The position in these frames.
  • the skeleton extraction device 620 can perform skeleton joint point extraction for each pedestrian in each frame depth image.
  • the skeleton joint points can well describe the posture of the pedestrian, and the specific number can be set as needed. As mentioned above, there are six skeleton joint points, which represent the head, left hand, right hand, chest center, left foot and right foot.
  • the skeleton extraction device 620 may further include a matching unit, a marker extraction unit, a voting unit, and a joint point extraction unit.
  • a matching unit a marker extraction unit
  • a voting unit a voting unit
  • a joint point extraction unit a joint point extraction unit
  • the matching unit determines a matching pixel matched with the pre-established training set for each pixel in the N-th image corresponding to the sub-image area of the pedestrian A, the training set includes a plurality of pedestrian depth images, and each of the pedestrian depth images The skeleton joint points of the pedestrians are pre-marked. Can be based on pixels The feature description and the relative position of the pixel in the sub-image area determine the matching pixel, wherein the feature description may be any suitable feature for describing the pixel.
  • the marker extracting unit extracts, for each of the pixels, marker data of the matching pixel that matches the pixel, the marker data including an offset of the matched pixel relative to the skeleton joint point of the pedestrian in the pedestrian depth image in which it is located.
  • the marker data is pre-assigned when the training set is established, wherein the offset may be a three-dimensional position offset in space and includes a corresponding offset for each skeleton joint point of the pedestrian.
  • the voting unit votes for each of the pixels. Specifically, taking the voting for the pixel a as an example, the voting unit votes on the skeleton joint points of the pedestrian based on the marker data of the matching pixel corresponding to the pixel a and the relative position of the pixel a in the sub-image region. . More specifically, the voting unit uses the tag data of the matching pixel as the tag data of the pixel a. Since the tag data includes the offset of the pixel relative to the skeleton joint point of the pedestrian, it can be based on the pixel a in the sub-image area. Based on the relative position and the marker data, the position of each skeleton joint point of the pedestrian A is estimated. This process is actually a voting process.
  • the voting unit may use, for example, the average value of the tag data of the plurality of matching pixels as the tag data of the pixel a, and further estimate the skeletons of the pedestrian A. The location of the node.
  • the off node extracting unit may accumulate votes for each pixel by the voting unit for each skeleton joint point to be extracted of the pedestrian A, and determine a point with the highest number of votes as the skeleton joint point. Thereby, each skeleton joint point of the pedestrian A can be extracted.
  • the extraction operation of the pedestrian skeleton joint point is described above by taking the pedestrian A in the Nth frame depth image as an example, and the skeleton extraction device 620 performs the above operation for each pedestrian in each frame depth image to extract the skeleton thereof. Off node.
  • the skeleton extraction device 620 may further include a smoothing unit for performing a smoothing operation on the extracted skeleton joint points of each pedestrian in each frame depth image to eliminate errors due to possible errors in the voting process. The impact of coming.
  • the normalization means 630 may normalize the pose of each pedestrian in each frame depth image to the attitude at a predetermined angle of view based on the extracted skeleton joint points. Specifically, the normalization device 630 may further include an orientation determining unit and a normalization unit. Next, the processing performed by the normalization device 630 will be described by taking the pedestrian A in the Nth frame depth image as an example.
  • the orientation determining unit determines the direction of movement of the pedestrian A as its orientation. Specifically, the orientation determining unit may calculate the position of each skeleton joint point of the pedestrian A in the previous frame and in the current frame. The difference in position determines the direction of movement of the pedestrian A and uses the direction of motion as the orientation of the pedestrian A.
  • the normalization unit normalizes the posture of the pedestrian into a predetermined angle by performing spatial coordinate transformation on the position coordinates of the skeleton joint point of the pedestrian A according to the orientation determined by the orientation determining unit to obtain the position coordinates of the normalized skeleton joint point.
  • the predetermined viewing angle may be preset according to specific needs.
  • the predetermined viewing angle includes a first viewing angle and a second viewing angle, wherein the first viewing angle is that the front side of the pedestrian is facing the camera, and the camera is horizontally aligned with the front side of the pedestrian, and the second viewing angle is the back side of the pedestrian.
  • the camera For the camera, and the camera is horizontally aligned with the predetermined position on the back of the pedestrian.
  • the normalization unit determines, based on the orientation determined by the orientation determining unit, which posture of the pedestrian should be normalized to which posture at a predetermined angle of view.
  • the pedestrian's posture should be normalized to the first angle of view.
  • the posture of the pedestrian if the orientation determining unit determines that the orientation of the pedestrian is within a range of 90° from the back side facing the camera to the left side and 90 degrees from the back side to the right side of the camera, the pedestrian's posture should be normalized to the second position. The attitude from the perspective.
  • the normalization of the above posture can be realized by spatial coordinate transformation of the position coordinates of the skeleton joint points of the pedestrian. Specifically, the normalization unit first transforms the position coordinates of the skeleton joint point of the pedestrian from the image coordinate system to the world coordinate system, and then normalizes the coordinate position in the world coordinate system, and finally normalizes the world coordinates. The coordinate position in the system is transformed back to the image coordinate system.
  • the above spatial coordinate transformation process can be implemented in any suitable manner in the art, and will not be described in detail herein.
  • the feature extraction means 640 may extract, for each pedestrian in each frame of the depth image, the attribute of the pedestrian after the gesture is normalized.
  • the feature extraction device 640 may extract various middle-level semantic attribute features of the pedestrian after the posture normalization, and at least include the height of the pedestrian in the real world.
  • the feature extraction device 640 can also To extract one or more of the underlying semantic features, face features, and motion features of the pedestrian.
  • the identification device 650 can identify the target pedestrian from the depth video based on the similarity of the attribute feature to the corresponding attribute feature of the target pedestrian. Since the feature extraction device 640 has extracted the attribute feature of the pedestrian after the posture normalization for each pedestrian in each frame depth image, the identification device 650 can pass the attribute feature of each pedestrian to the corresponding attribute feature of the target pedestrian. Compare to identify the target pedestrians. It should be noted that the corresponding attribute feature of the target pedestrian refers to the attribute feature of the target pedestrian extracted after the skeleton joint point extraction and the posture normalization processing are performed on the target pedestrian.
  • the identification device 650 does not need to have the attribute characteristics of each pedestrian in each frame.
  • the corresponding attribute features of the pedestrians are compared, but only the attribute features of the different pedestrians in the depth video are compared with the corresponding attribute features of the target pedestrian.
  • the identification device 650 can determine all the different pedestrians appearing in the depth video according to the tracking segment of each pedestrian in each frame depth image.
  • the recognition device 650 determines whether a target pedestrian is included therein. Specifically, for a pedestrian appearing in the depth video (which may appear in the multi-frame depth image of the depth video), if the gesture is normalized from the at least T-frame depth image including the pedestrian If the similarity between the attribute of the pedestrian and the corresponding attribute of the target pedestrian is greater than a predetermined threshold, the identification device 650 determines that the pedestrian is the target pedestrian.
  • the value of T can be set according to specific needs.
  • the identifying device 650 may only compare the similarity between the pedestrian and the target pedestrian having the same normalized posture as the target pedestrian. Specifically, if the posture after the normalization of the target pedestrian is the posture at the first perspective, only the pedestrian after the normalization in the depth video to be analyzed is similar to the pedestrian at the first perspective is similar to the target pedestrian. The degree of comparison makes it possible to reduce the amount of calculation of the similarity comparison.
  • the attribute characteristics of the pedestrian extracted from the depth image may be plural, so when the similarity comparison is made with the corresponding attribute feature of the target pedestrian, the identification device 650 may associate each attribute feature of the pedestrian with the target pedestrian.
  • the corresponding attribute features are compared separately, and the corresponding similarities are obtained, and then the overall similarity is determined by obtaining a weighted average value and the like.
  • the weight of each feature can be set according to the specific situation.
  • the identification device 650 can be based on the pedestrian The tracking segment determines the depth image of each frame of the pedestrian to be analyzed, thereby realizing the re-recognition of the target pedestrian.
  • the pedestrian re-identification device 600 according to an embodiment of the present disclosure is described above with reference to the accompanying drawings, by which a target pedestrian can be identified from a depth video to be analyzed from a certain camera.
  • the target pedestrian can be identified from the large number of depth images to be analyzed by applying the pedestrian re-identification device for re-identification of each depth video to be analyzed.
  • the pedestrian re-identification device 600 may perform time-space domain analysis in advance to reduce the calculation amount of pedestrian re-recognition, thereby quickly in multiple videos. Target the target pedestrian.
  • the pedestrian re-recognition device 600 utilizes depth video for target pedestrian recognition, which effectively utilizes depth information of pedestrians in images and videos to reduce the influence of illumination conditions, and
  • the normalization of the pedestrian's posture reduces the influence of different cameras' different angles of view and the information incompleteness caused by the pedestrian's back or side facing the camera, thereby improving the accuracy of pedestrian recognition.
  • the computing device can be a computer or server equipped with a depth camera.
  • computing device 700 includes one or more processors 702, storage device 704, depth camera 706, and output device 708 that are interconnected by bus system 710 and/or other forms of connection mechanisms (not shown). Even. It should be noted that the components and structures of computing device 700 shown in FIG. 7 are merely exemplary and not limiting, and computing device 700 may have other components and structures as desired.
  • Processor 702 can be a central processing unit (CPU) or other form of processing unit with data processing capabilities and/or instruction execution capabilities, and can control other components in computing device 700 to perform the desired functions.
  • CPU central processing unit
  • Processor 702 can be a central processing unit (CPU) or other form of processing unit with data processing capabilities and/or instruction execution capabilities, and can control other components in computing device 700 to perform the desired functions.
  • Storage device 704 can include one or more computer program products, which can include various forms of computer readable storage media, such as volatile memory and/or nonvolatile memory.
  • the volatile memory may include, for example, a random access memory (RAM) and/or a cache or the like.
  • the nonvolatile memory may include, for example, a read only memory (ROM), a hard disk, a flash memory, or the like.
  • One or more computer program instructions can be stored on the computer readable storage medium, and the processor 702 can execute the program instructions to implement the The functions of the embodiments of the present disclosure and/or other desired functions.
  • Various applications and various data may also be stored in the computer readable storage medium, such as depth video, location information of each pedestrian detected in each frame of depth image, tracking segments of pedestrians, depth for each frame
  • the skeleton joint points extracted by each pedestrian in the image, the matching pixels of each pixel, the pre-established training set, the voting result of each pixel point, the orientation of each pedestrian in each frame depth image, and the skeleton joint point are normalized. Position coordinates, attribute features extracted for each pedestrian in each frame depth image, skeleton joint points of the target pedestrian, book features of the target pedestrian, and the like.
  • the depth camera 706 is used to capture the depth video to be analyzed, and the captured depth video is stored in the storage device 704 for use by other components.
  • the depth video may also be captured using other photographing devices, and the photographed depth video may be transmitted to the pedestrian re-identification device 700. In this case, the depth camera 706 can be omitted.
  • the output device 708 can output various information such as image information, sound information, pedestrian recognition results to the outside (eg, a user), and can include one or more of a display, a speaker, and the like.
  • embodiments of the present disclosure may also be computer program products for performing re-identification of pedestrians.
  • the computer program product comprises a computer readable storage medium on which are stored computer program instructions executable by a processor such that the processor is at each frame depth image of depth video Detecting pedestrians; performing skeleton joint point extraction for each pedestrian in each frame depth image; normalizing each pedestrian's posture in each frame depth image to a posture at a predetermined angle of view according to the extracted skeleton joint points For each pedestrian in each frame depth image, extracting the attribute feature of the pedestrian after the posture is normalized; and identifying the target pedestrian from the depth video based on the similarity between the attribute feature and the corresponding attribute feature of the target pedestrian .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

公开了行人再识别方法、设备和计算机程序产品。所述方法包括:在深度视频的每一帧深度图像中检测行人;对于每一帧深度图像中的每个行人,进行骨架关节点提取;根据提取的骨架关节点,将每一帧深度图像中的每个行人的姿态正规化为预定视角下的姿态;对于每一帧深度图像中的每个行人,提取姿态正规化后该行人的属性特征;以及基于所述属性特征与目标行人的对应属性特征的相似度,从所述深度视频中识别目标行人。利用所述方法、设备和计算机程序产品,提高了在不同的背景环境以及多摄像头设置的情况下行人再识别的准确度。

Description

行人再识别方法及设备 技术领域
本公开涉及图像处理,并且具体涉及行人再识别方法、设备和计算机程序产品。
背景技术
行人再识别(Person re-identification)是指从来源于非交叠的多个摄像机视场的行人图像库或视频流中识别出目标行人。不同于单摄像头下普通的行人跟踪,行人再识别可以在不同的背景环境以及多摄像头设置下实现对特定行人的长期跟踪与监视,因此其在监控领域有着非常大的应用前景。比如,对于商场消费者的行人再识别使得可以跟踪该行人在多个摄像头下的运动轨迹,进而可以对其可能的消费行为进行分析和统计。再比如,在智能视频监控系统中,通过行人再识别技术能够自动识别出目标行人并向监控系统操作人员进行报告,从而使得操作人员无需进行费时费力的人工观察和识别。
目前,行人再识别通常是依据来自图像或视频中的行人的颜色、纹理等底层信息来进行的,其效果往往并不理想,主要原因在于:行人在不同摄像头下的视角可能差别很大;不同摄像头所覆盖的区域往往并不交叠;不同摄像头所在位置处的光照条件可能不同,从而导致同一物体在不同摄像头下的外貌可能相差很大;行人可能背对或侧面朝向摄像头行走,导致无法捕捉到人脸信息,或者即使能捕捉到人脸信息,由于监控摄像头的分辨率通常较低,也无法清晰的看到人脸。
发明内容
根据本公开的一个方面,提供一种行人再识别方法,包括:在深度视频的每一帧深度图像中检测行人;对于每一帧深度图像中的每个行人,进行骨架关节点提取;根据提取的骨架关节点,将每一帧深度图像中的每个行人的姿态正规化为预定视角下的姿态;对于每一帧深度图像中的每个行人,提取姿态正规化后该行人的属性特征;以及基于所述属性特征与目标行人的对应属性特征的相似度,从所述深度视频中识别目标行人。
根据本公开的另一方面,提供一种行人再识别设备,包括:处理器;存 储器;和存储在所述存储器中的计算机程序指令。所述计算机程序指令在被所述处理器运行时执行以下步骤:在深度视频的每一帧深度图像中检测行人;对于每一帧深度图像中的每个行人,进行骨架关节点提取;根据提取的骨架关节点,将每一帧深度图像中的每个行人的姿态正规化为预定视角下的姿态;对于每一帧深度图像中的每个行人,提取姿态正规化后该行人的属性特征;以及基于所述属性特征与目标行人的对应属性特征的相似度,从所述深度视频中识别目标行人。
根据本公开的另一方面,提供了一种用于行人再识别的计算机程序产品,包括计算机可读存储介质,在所述计算机可读存储介质上存储了计算机程序指令,所述计算机程序指令可由处理器执行以使得所述处理器:在深度视频的每一帧深度图像中检测行人;对于每一帧深度图像中的每个行人,进行骨架关节点提取;根据提取的骨架关节点,将每一帧深度图像中的每个行人的姿态正规化为预定视角下的姿态;对于每一帧深度图像中的每个行人,提取姿态正规化后该行人的属性特征;以及基于所述属性特征与目标行人的对应属性特征的相似度,从所述深度视频中识别目标行人。
根据本公开的另一方面,提供了一种行人再识别设备,包括:检测装置,配置为在深度视频的每一帧深度图像中检测行人;骨架提取装置,配置为对于每一帧深度图像中的每个行人,进行骨架关节点提取;正规化装置,配置为根据提取的骨架关节点,将每一帧深度图像中的每个行人的姿态正规化为预定视角下的姿态;特征提取装置,配置为对于每一帧深度图像中的每个行人,提取姿态正规化后该行人的属性特征;以及识别装置,配置为基于所述属性特征与目标行人的对应属性特征的相似度,从所述深度视频中识别目标行人。
根据本公开的上述方面的方法、设备和计算机程序产品有效利用图像和视频中行人的深度信息,大大提高了在不同的背景环境以及多摄像头设置的情况下行人再识别的准确度。
附图说明
通过结合附图对本公开实施例进行更详细的描述,本公开的上述以及其它目的、特征和优势将变得更加明显。附图用来提供对本公开实施例的进一步理解,并且构成说明书的一部分,与本公开实施例一起用于解释本公开, 并不构成对本公开的限制。在附图中,相同的参考标号通常代表相同部件或步骤。
图1示出了根据本公开实施例的行人再识别方法的示意性流程图。
图2例示了对前景区域进行分割之后得到的一个示例性的子图像区域。
图3示出了某个行人的示意性的骨架关节点分布。
图4示出了在对于某一帧深度图像中的某个行人进行骨架关节点提取处理时、对于该帧深度图像中对应于该行人的子图像区域中的每个像素执行的处理。
图5例示了一个拍摄的预定视角的示例性示意图。
图6示出了根据本公开实施例的行人再识别设备的示例性结构框图。
图7示出了用于实现本公开的实施例的示例性计算设备的框图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
如前所述,目前依据来自图像或视频中的行人的颜色、纹理等底层信息进行行人再识别的效果往往并不理想。针对这一情况,在本公开中,将有效地利用图像或视频中行人的深度信息来进行行人的再识别。更明确的说,在本公开中将利用深度图像来进行行人的再识别。本领域中公知,深度图像是图像中每一像素的值表示场景中某一点与摄像机之间的距离的图像。相比于灰度图像(彩色图像),深度图像具有物体的深度(距离)信息,并且不受光照条件的影响,因此适合于需要立体信息或场景变换的各种应用。
下面,参照图1来描述根据本公开实施例的行人再识别方法。
如图1所示,在步骤S110,在深度视频的每一帧深度图像中检测行人。
如上文中提到的,不同于单摄像头下普通的行人跟踪识别,根据本公开的行人再识别技术可以应用于背景环境不同以及采用多个摄像头进行拍摄的情形。更明确的说,根据本公开的行人再识别技术,包含作为识别对象的目标行人的目标深度视频与需要从中识别该目标行人的待分析深度视频可以由 不同的摄像头拍摄,或者由单个摄像头在不同时刻(不同背景环境下)拍摄。
该步骤中所述的深度视频即需要从中识别目标行人的待分析深度视频,其是由与拍摄目标行人的深度摄像头不同的单个深度摄像头在某一时刻拍摄的。可选的,拍摄所述待分析深度视频的深度摄像头与拍摄目标行人的深度摄像头以相同的方式进行配置。例如,深度摄像头均安装在高于2米的高度,并且以俯视的角度进行拍摄。
在该步骤中,可以采用本领域中任何适当的图像检测技术从待分析深度视频的各帧深度图像中检测行人,本公开对此不做限制。下面,仅仅是为了说明的完整性,对一种可能的检测方式进行简要的描述。
具体的,在该步骤中,对于每一帧深度图像,首先根据该图像中各像素的值确定其中的前景区域。所谓前景区域即深度不同于通过背景建模得到的场景深度的区域。该获取前景区域的处理过程是本领域中公知的,此处省略其详细描述。随后,基于深度信息对该前景区域进行分割,得到多个子图像区域。此处,可以采用连通区域分析法(CCA)和行人身体检测方法(例如:P.Dollar,Z.Tu,P.Perona和S.Belongie等人在BMVC 2009上发表的“Integral Channel Features”等)等本领域中的常用方法对前景区域进行分割,以得到每一个中都包含一个行人的多个子图像区域,由此确定当前帧深度图像中每个行人的具体位置。图2例示了对前景区域进行分割之后得到的一个示例性的子图像区域。如图2所例示的,该子图像区域用外接于检测到的行人的身体轮廓的矩形框来表示。
可选的,可以对于每一帧深度图像中检测出的每个行人进行跟踪,以确定该行人在所述待分析深度视频的其他哪些帧中出现了,并确定该行人在这些帧中的位置。如前所述,所述待分析深度视频是由单个深度摄像头在某一时刻拍摄的,因此此处的跟踪是单摄像头下的跟踪,可以采用诸如匈牙利算法(Hungarian algorithm)、A Milan,S Roth,K Schindler在IEEE Transaction on Pattern Recognition and Machine Intelligence,2014发表的“Continuous energy minimization for multitarget tracking”的方法等本领域中的各种常用方法进行所述跟踪,以获得每个行人的跟踪片段,所述跟踪片段至少包括描述该行人在待分析深度视频中的哪些帧深度图像中出现以及在各帧深度图像中的位置的数据。
回到图1,在步骤S120,对于每一帧深度图像中的每个行人,进行骨架 关节点提取。
骨架关节点可以很好地描述行人的姿态,其具体数量可以根据需要来设定。例如,可以设定为Microsoft Kinect中定义的20个,也可以设定为Openni中定义的15个等等。此处为了简便起见,如图3所示,设定骨架关节点为6个,分别代表头部、左手、右手、胸部中心、左脚和右脚。
下面,将参考图4对该步骤S120中的骨架关节点提取处理进行详细描述。图4示出了在对于某一帧深度图像(例如第N帧)中的某个行人(例如行人A)进行骨架关节点提取处理时、对于该帧深度图像(第N帧)中对应于该行人(行人A)的子图像区域中的每个像素执行的处理。
如图4所述,在步骤S1201,确定预先建立的训练集中与当前像素(例如像素a)匹配的匹配像素,所述训练集中包含有多张行人深度图像,并且每张行人深度图像中预先标明了行人的骨架关节点。
可以基于像素的特征描述及像素在子图像区域中的相对位置,确定所述匹配像素。具体的,可以采用诸如随机森林算法、哈希算法等本领域中各种常规方法将该像素a的特征描述及其在子图像区域中的位置与训练集中各像素的对应特征进行比较,由此找到训练集中的匹配像素。
所述特征描述可以是用于描述像素的任何适当的特征。例如,可以将该像素a周围3×3范围内的每个邻近像素与像素a的深度值进行比较,大于则为该邻近像素分配数值1,否则为该邻近像素分配数值0,然后将该3×3范围内的每个邻近像素被分配的数值组合形成的向量作为所述像素a的特征描述。再比如,也可以简单的将像素a的特征作为其特征描述。
在步骤S1202,提取该匹配像素的标记数据,所述标记数据包括该匹配像素相对于其所在的行人深度图像中行人的骨架关节点的偏移量。
所述标记数据是在建立训练集时预先标明的,其中的偏移量可以是空间中的三维位置偏移量,并且对于行人的每个骨架关节点都包括一个对应的偏移量。
在步骤S1203,基于所述标记数据及该像素在该子图像区域中的相对位置,对该行人的各骨架关节点进行投票。
具体的,在该步骤中将匹配像素的标记数据作为像素a的标记数据,由于标记数据中包含有像素相对于行人的骨架关节点的偏移量,因此可以基于像素a在子图像区域中的相对位置及所述标记数据,推测行人A的各骨架关 节点的位置。这一过程实际就是一个投票的过程,投票是图像处理领域中的一种常用方法(例如在经典的霍夫变换中就采用了投票的方式),此处不再对其进行详细介绍。
需要说明的是,在步骤S1201中确定的匹配像素可能有多个。此时,可以基于该多个匹配像素的标记数据及该像素a在该子图像区域中的相对位置,对该行人的各骨架关节点进行投票。更明确的说,可以将该多个匹配像素的标记数据的例如平均值作为像素a的标记数据,进而推测行人A的各骨架关节点的位置。
以上,结合图4描述了在对于例如第N帧深度图像中的例如行人A进行骨架关节点提取处理时、对于该帧深度图像中对应于该行人的子图像区域中的例如像素a执行的处理。在如上所述对该子图像区域中的每个像素执行了相同的处理之后,针对该行人A的每一个待提取的骨架关节点,可以对各个像素的投票进行累计,并通过诸如均值漂移(means-shift)等算法确定投票次数最多的点作为该骨架关节点。由此,可以提取出该行人A的各个骨架关节点。
以上以第N帧深度图像中的行人A为例描述了行人骨架关节点的提取处理。在所述步骤S120中,对于每一帧深度图像中的每个行人,均执行上述的处理,以提取其骨架关节点。
可选的,可以对如上所述提取的骨架关节点进行优化,以消除由于投票过程中可能存在的误差所带来的影响。例如,对于每一帧深度图像中的每个行人,可以通过平滑操作来优化所提取的骨架关节点。仍然以第N帧深度图像中的行人A为例,在如上提取出其骨架关节点后,可以基于该行人A的跟踪片段,确定该第N帧深度图像的前m帧包含有该行人A的深度图像和后n帧包含有该行人A的深度图像,然后基于所述前m帧深度图像和后n帧深度图像中的该行人A的各骨架关节点,通过例如平滑操作来对第N帧深度图像的行人A的骨架关节点进行优化。
回到图1,在步骤S130,根据提取的骨架关节点,将每一帧深度图像中的每个行人的姿态正规化为预定视角下的姿态。
如上文中提到的,在多摄像头的情况下,行人在不同摄像头下的视角可能差别很大,另外在不同的时刻行人可能会有正对、背对或侧面朝向摄像头等不同的姿态,这一方面会导致由于视角和姿态的差异造成的图像的可比性 的降低,另一方面会导致无法获取有用的行人属性信息,从而影响再识别的准确性。因此,在该步骤中,将利用提取到的骨架关节点,将每一帧深度图像中的每个行人的姿态正规化为预定视角下的姿态,由此增强图像之间的可比性、增加可获取的有用的属性信息,进而提高再识别的准确性。
仍然以第N帧深度图像中的行人A为例,在该步骤中可以通过以下处理(S1)和(S2)将该行人A的姿态正规化为预定视角下的姿态:
(S1)确定该行人的运动方向,作为其朝向。
在该处理中,可以通过计算该行人A的各骨架关节点在前一帧中的位置与在当前帧中的相应位置的差,确定该行人A的运动方向,并将该运动方向作为行人A的朝向。
(S2)根据所述朝向,通过对该行人的骨架关节点的位置坐标进行空间坐标变换以得到正规化后的骨架关节点的位置坐标,将行人的姿态正规化为预定视角下的姿态。
所述预定视角可以根据具体需要预先设定。例如,在本实施例中,所述预定视角包括第一视角和第二视角,其中第一视角为行人的正面正对摄像头、并且摄像头水平对齐行人正面预定位置,第二视角为行人的背面正对摄像头、并且摄像头水平对齐行人背面预定位置。图5例示了所述第一视角的示例性示意图。如图5所示,摄像头垂直于行人所在的平面,即该行人的正面正对摄像头,并且摄像头水平对齐行人脸部的鼻尖处。
在该处理中,根据在处理(S1)中确定的行人的朝向,确定行人的姿态应当被正规化为哪种预定视角下的姿态。具体的,如果在处理(S1)中确定行人的朝向在从正面正对摄像头向左侧偏转90°到从正面正对摄像头向右侧偏转90°的范围内,则行人的姿态应正规化为第一视角下的姿态;如果确定行人的朝向在从背面正对摄像头向左侧偏转90°到从背面正对摄像头向右侧偏转90°的范围内,则行人的姿态应正规化为第二视角下的姿态。
上述姿态正规化可以通过对行人的骨架关节点的位置坐标进行空间坐标变换来实现。具体的,在该处理中,首先将行人的骨架关节点的位置坐标从图像坐标系变换到世界坐标系,然后对该世界坐标系中的坐标位置进行正规化处理,最后将该正规化后的世界坐标系中的坐标位置变换回图像坐标系。上述空间坐标变换过程可以采用本领域中任何适当的方式来实现,本公开对此不做限制。下面,仅仅是为了说明的完整性,对一种可能的空间坐标变换 过程进行概要的描述。
将行人的骨架关节点的位置坐标从图像坐标系变换到世界坐标系可以通过标定摄像机的内参和外参以得到进行坐标变换的旋转矩阵和平移矩阵来实现,这是本领域的公知技术,此处省略对其的详细描述。
对该世界坐标系中的坐标位置进行正规化处理可以通过利用最小二乘法构造正规化变换矩阵来实现。以图3所示的6个骨架关节点为例,将胸部中心的关节点作为正规化参考点(当然也可以选择其他关节点),并且假设该胸部中心的关节点正规化之前和之后的坐标分别用x_2和y_2表示,则
Figure PCTCN2015082639-appb-000001
由此根据图3所示的各个骨架关节点之间的位置关系可以推知:头部关节点正规化之后的坐标为
Figure PCTCN2015082639-appb-000002
左手关节点正规化之后的坐标为
Figure PCTCN2015082639-appb-000003
右手关节点正规化之后的坐标为y_4=
Figure PCTCN2015082639-appb-000004
左脚关节点正规化之后的坐标为
Figure PCTCN2015082639-appb-000005
右脚关节点正规化之后的坐标为
Figure PCTCN2015082639-appb-000006
其中α_1,α_2,α_3,β_1,β_2是基于人体身体比例预先设定的参数。这样,可以通过最小二乘法求解如表达式(1)所示的目标方程,得到正规化变换矩阵的近似解。
Figure PCTCN2015082639-appb-000007
其中,A是3×3的正规化变换矩阵,x_i和y_i分别表示各骨架关节点在正规化之前和之后的坐标,其中x_i和y_i均为三维向量。
在构造所述正规化变换矩阵A后,通过对各骨架关节点在世界坐标系中的坐标位置应用该变换矩阵A进行变换,即可得到正规化之后的世界坐标系中的坐标位置。
此后,将各骨架关节点的正规化之后的世界坐标系中的坐标位置变换回图像坐标系同样可以通过上文中提到的旋转矩阵和平移矩阵来实现,这同样 是本领域的公知技术,此处省略对其的详细描述。
由此,完成了行人的骨架关节点的位置坐标的空间坐标变换,得到了正规化后的骨架关节点的位置坐标,实现了行人姿态的正规化。
需要说明的是,虽然以上通过骨架关节点的位置坐标的空间坐标转换实现了行人姿态的正规化,但是实际上仅根据正规化后的骨架关节点坐标并不能确定行人到底被正规化为哪种姿态了,而是需要结合处理(S1)中确定的行人朝向确定该行人正规化后的姿态到底是哪种姿态。
能够理解,尽管以上以预定视角包括第一视角和第二视角为例进行了描述,但这仅仅是一个示例,而并非是对本公开的限制,本领域技术人员可以根据具体情况设置不同的预定视角。例如,可以设定预定视角包括四个视角,除了前述第一视角和第二视角之外,还包括右侧面部正对摄像头的第三视角和左侧面部正对摄像头的第四视角。再比如,可以设定预定视角包括六个视角,除了前述第一至第四视角之外,还可以包括45°面向摄像头的第五视角和45°背向摄像头的第六视角。
回到图1,在步骤S140,对于每一帧深度图像中的每个行人,提取姿态正规化后该行人的属性特征。
本领域中公知,图像的语义是层次化的,并且可以具体划分为低层语义、中层语义和高层语义。低层语义用于描述图像的视觉特征,如颜色、纹理、形状等,其带有客观性,可以直接从图像中得到,不需要任何外部知识;高层语义是按人的认知方式对图像进行高层抽象而得到的语义,包括场景语义、行为语义和情感语义等;中层语义特征是为了减小低层和高层语义特征之间的语义鸿沟而提出的,通常可在低层语义特征分析的基础上产生,对应于视觉词包和语义主题。
在该步骤中,可选的,对于每一帧深度图像中的每个行人,可以提取姿态正规化后的该行人的各种中层语义属性特征,并且其中至少包括该行人在现实世界的高度。
另外,可选的,在该步骤中,还可以提取行人的底层语义特征、人脸特征和运动特征中的一个或多个。所述底层语义特征如上所述可以包括颜色特征、纹理特征和梯度特征等。在本实施例中,作为示例,颜色特征采用RGB,LUV,YCbCr三种不同的颜色通道,并采用直方图的形式来表示;纹理特征采用局部二值模式,并且也采用直方图的形式来表示;梯度特征则通过对于 图像应用sobel算子来求取梯度,并且同样采用直方图形式来表示。所述人脸特征只有当行人正规化为第一视角下的姿态(即行人的正面正对摄像头)时采用,可以采用各种人脸检测算法来确定人脸的具体位置,并找到人脸中的各个标记点。所述运动特征可以通过行人在当前帧深度图像中的姿态正规化后的骨架关节点的位置坐标与其在前若干帧(例如前10帧)深度图像中的姿态正规化后的骨架关节点的位置坐标的变化来表示。
在步骤S150,基于所述属性特征与目标行人的对应属性特征的相似度,从所述深度视频中识别目标行人。
在前面的步骤中,已经对于每一帧深度图像中的每个行人提取了姿态正规化后该行人的属性特征,因而在该步骤中可以通过将各个行人的所述属性特征与目标行人的对应属性特征进行比较,来识别其中的目标行人。需要说明的是,所述目标行人的对应属性特征是指对该目标行人进行了上述骨架关节点提取、和姿态正规化处理之后提取的该目标行人的对应属性特征。
能够理解,在一个待分析的深度视频中,同一个行人可能出现在该视频中的多帧深度图像中,因而在该步骤中,并不需要将每一帧中的每个行人的属性特征都与目标行人的对应属性特征进行比较,而是只需将该深度视频中的各个不同的行人的属性特征与目标行人的对应属性特征进行比较即可。具体的,如前文所述,跟踪片段至少包括描述行人在待分析深度视频中的哪些帧深度图像中出现以及在各帧深度图像中的位置的数据,因此在该步骤中,可以根据每一帧深度图像中的每个行人的跟踪片段,确定所述深度视频中出现的所有不同行人。
在确定了深度视频中出现的所有不同行人后,可以判断其中是否包括有目标行人。具体的,对于深度视频中出现的某个行人(其可能在该深度视频的多帧深度图像中出现),如果从包含有该行人的至少T帧深度图像中提取的、姿态正规化后的该行人的属性特征与目标行人的对应属性特征的相似度大于预定阈值,则确定该行人为目标行人。T的值可以根据具体需要来设定。例如,如果希望减少进行相似度比较的计算量,以便快速确定视频中是否包含有目标行人,则可以将T值设定为1,这样对于某个行人,只要有一帧包含该行人的深度图像中提取的属性特征与目标行人的对应属性特征的相似度大于预定阈值,就可以确定该行人为目标行人,从而不必再对其他包含有该行人的深度图像与目标行人进行相似度的比较。当然,如果相比于减少相似 度比较的计算量更关注于行人再识别的准确性,则可以相应地增大T的值。
可选的,在进行对应属性特征的相似度比较时,可以仅将与目标行人具有相同正规化姿态的行人与目标行人进行相似度比较。具体来说,如果目标行人正规化之后的姿态为第一视角下的姿态,可以仅将待分析深度视频中经过正规化之后的姿态同样为第一视角下的姿态的行人与该目标行人进行相似度比较,由此可以减少相似度比较的计算量。
如前所述,从深度图像中提取的行人的属性特征可能是多个,因此在与目标行人的对应属性特征进行相似度比较时,可以将该行人的每个属性特征与目标行人的对应属性特征分别进行比较,得到各自对应的相似度,然后通过求取加权平均值等方式来确定总体的相似度。各个特征的权重可以根据具体情况来设定,例如可选的,可以设定人脸特征的权重最大、底层语义特征的权重次之、中层语义特征再次之、行人运动特征的权重最小。
在如上确定了某个行人是目标行人之后,可以基于该行人的跟踪片段,确定待分析深度视频中包含有该某个行人的各帧深度图像,由此实现目标行人的再识别。
可选的,在确定待分析深度视频中包含目标行人并从中识别出该目标行人之后,可以再进行时空域上的连续性验证,以验证再识别结果。所述时空域上的连续性验证可以采取各种适当的检验方式。例如,一个行人的各个特征在相邻两帧间通常应当是相似的,如果在最终确定包含有目标行人的相邻帧深度图像中该行人的特征差别太大,则认为该再识别结果可能是有问题的,可能需要重新进行再识别处理。
以上结合附图描述了根据本公开实施例的行人再识别方法,通过该方法可以从来自于某个摄像头的一个待分析深度视频中识别目标行人。当存在来自多个不同的摄像头的大量待分析深度视频时,通过针对每个待分析深度视频执行该再识别方法,可以从所述大量待分析深度视频中识别目标行人。
可选的,当存在来自多个不同的摄像头的大量待分析深度视频时,可以事先进行时空域分析,以减少行人再识别的计算量,从而快速在多个视频中定位出目标行人。可以采取各种适当的方式进行所述时空域分析。例如,如果确定在来自于某个摄像头的一个待分析深度视频中存在目标行人,则根据时空域连续性可知该目标行人接下来应该在该摄像头附近的区域中出现,因此接下来可以仅在来自于该摄像头附近的摄像头的待分析深度视频中进行目 标行人的再识别。
如上文中所描述的,根据本公开实施例的行人再识别方法利用深度视频进行目标行人的识别,其有效利用了图像和视频中行人的深度信息从而减小了光照条件的影响,并且通过对行人的姿态进行正规化减小了不同摄像头的视角不同、以及行人背对或侧面朝向摄像头所导致的信息不全的影响,进而提高了行人再识别的准确度。
下面,将参照图6来描述根据本公开的实施例的行人再识别设备的框图。图6示出了根据本公开实施例的行人再识别设备600的示例性结构框图。如图6所示,该行人再识别设备可以包括检测装置610、骨架提取装置620、正规化装置630、特征提取装置640以及识别装置650,所述各个装置可分别执行上文中结合图1描述的行人再识别方法的各个步骤/功能。以下仅对该行人再识别设备600的各装置的主要功能进行描述,而省略以上已经描述过的细节内容。
检测装置610可以在深度视频的每一帧深度图像中检测行人。所述深度视频即需要从中识别目标行人的待分析深度视频,其是由与拍摄目标行人的深度摄像头不同的单个深度摄像头在某一时刻拍摄的。检测装置610可以采用本领域中任何适当的图像检测技术从待分析深度视频的各帧深度图像中检测行人,本公开对此不做限制。
可选的,所述检测装置610可以对于每一帧深度图像中检测出的每个行人进行跟踪,以确定该行人在所述待分析深度视频的其他哪些帧中出现了,并确定该行人在这些帧中的位置。
骨架提取装置620可以对于每一帧深度图像中的每个行人进行骨架关节点提取。骨架关节点可以很好地描述行人的姿态,其具体数量可以根据需要来设定。如前所述,此处设定骨架关节点为6个,分别代表头部、左手、右手、胸部中心、左脚和右脚。
具体的,骨架提取装置620可以进一步包括匹配单元、标记提取单元、投票单元和关节点提取单元。下面,以对深度视频的第N帧中的行人A进行骨架关节点提取为例,对骨架提取装置620执行的操作进行描述。
匹配单元对于第N帧对应于行人A的子图像区域中的每个像素确定预先建立的训练集中与其匹配的匹配像素,所述训练集中包含有多张行人深度图像,并且每张行人深度图像中预先标明了行人的骨架关节点。可以基于像素 的特征描述及像素在子图像区域中的相对位置,确定所述匹配像素,其中所述特征描述可以是用于描述像素的任何适当的特征。
标记提取单元对于所述每个像素提取与之匹配的匹配像素的标记数据,所述标记数据包括该匹配像素相对于其所在的行人深度图像中行人的骨架关节点的偏移量。所述标记数据是在建立训练集时预先标明的,其中的偏移量可以是空间中的三维位置偏移量,并且对于行人的每个骨架关节点都包括一个对应的偏移量。
投票单元针对所述每个像素进行投票。具体的,以针对像素a进行投票为例,该投票单元基于与像素a对应的匹配像素的标记数据及像素a在所述子图像区域中的相对位置,对该行人的各骨架关节点进行投票。更明确的说,投票单元将匹配像素的标记数据作为像素a的标记数据,由于标记数据中包含有像素相对于行人的骨架关节点的偏移量,因此可以基于像素a在子图像区域中的相对位置及所述标记数据,推测行人A的各骨架关节点的位置。这一过程实际就是一个投票的过程。需要说明的是,匹配单元所确定的匹配像素可能有多个,此时,投票单元可以将该多个匹配像素的标记数据的例如平均值作为像素a的标记数据,进而推测行人A的各骨架关节点的位置。
关节点提取单元可以针对该行人A的每一个待提取的骨架关节点,对由投票单元针对各个像素进行的投票进行累计,并确定投票次数最多的点作为该骨架关节点。由此,可以提取出该行人A的各个骨架关节点。
以上以第N帧深度图像中的行人A为例描述了行人骨架关节点的提取操作,所述骨架提取装置620对于每一帧深度图像中的每个行人,均执行上述操作,以提取其骨架关节点。
可选的,骨架提取装置620可以进一步包括平滑单元,用于对于每一帧深度图像中的每个行人的所提取的骨架关节点进行平滑操作,以消除由于投票过程中可能存在的误差所带来的影响。
正规化装置630可以根据提取的骨架关节点,将每一帧深度图像中的每个行人的姿态正规化为预定视角下的姿态。具体的,正规化装置630可以进一步包括朝向确定单元和正规化单元。下面,仍然以第N帧深度图像中的行人A为例,对正规化装置630执行的处理进行描述。
朝向确定单元确定行人A的运动方向,作为其朝向。具体的,朝向确定单元可以通过计算该行人A的各骨架关节点在前一帧中的位置与在当前帧中 的位置的差,确定该行人A的运动方向,并将该运动方向作为行人A的朝向。
正规化单元根据由朝向确定单元确定的朝向,通过对该行人A的骨架关节点的位置坐标进行空间坐标变换以得到正规化后的骨架关节点的位置坐标,将行人的姿态正规化为预定视角下的姿态。
所述预定视角可以根据具体需要预先设定。例如,在本实施例中,所述预定视角包括第一视角和第二视角,其中第一视角为行人的正面正对摄像头、并且摄像头水平对齐行人正面预定位置,第二视角为行人的背面正对摄像头、并且摄像头水平对齐行人背面预定位置。正规化单元根据由朝向确定单元确定的朝向,确定行人的姿态应当被正规化为哪种预定视角下的姿态。具体的,如果朝向确定单元确定行人的朝向在从正面正对摄像头向左侧偏转90°到从正面正对摄像头向右侧偏转90°的范围内,则行人的姿态应正规化为第一视角下的姿态;如果朝向确定单元确定行人的朝向在从背面正对摄像头向左侧偏转90°到从背面正对摄像头向右侧偏转90°的范围内,则行人的姿态应正规化为第二视角下的姿态。
上述姿态正规化可以通过对行人的骨架关节点的位置坐标进行空间坐标变换来实现。具体的,正规化单元首先将行人的骨架关节点的位置坐标从图像坐标系变换到世界坐标系,然后对该世界坐标系中的坐标位置进行正规化处理,最后将该正规化后的世界坐标系中的坐标位置变换回图像坐标系。上述空间坐标变换过程可以采用本领域中任何适当的方式来实现,此处不再详细描述。
需要说明的是,虽然以上通过骨架关节点的位置坐标的空间坐标转换实现了行人姿态的正规化,但是实际上仅根据正规化后的骨架关节点坐标并不能确定行人到底被正规化为哪种姿态了,而是需要结合朝向确定单元确定的行人朝向确定该行人正规化后的姿态到底是哪种姿态。
能够理解,尽管以上以预定视角包括第一视角和第二视角为例进行了描述,但这仅仅是一个示例,而并非是对本公开的限制,本领域技术人员可以根据具体情况设置不同的预定视角。
特征提取装置640可对于每一帧深度图像中的每个行人,提取姿态正规化后该行人的属性特征。可选的,对于每一帧深度图像中的每个行人,特征提取装置640可以提取姿态正规化后的该行人的各种中层语义属性特征,并且其中至少包括该行人在现实世界的高度。可选的,特征提取装置640还可 以提取行人的底层语义特征、人脸特征和运动特征中的一个或多个。
识别装置650可以基于所述属性特征与目标行人的对应属性特征的相似度,从所述深度视频中识别目标行人。由于特征提取装置640已经对于每一帧深度图像中的每个行人提取了姿态正规化后该行人的属性特征,因而识别装置650可以通过将各个行人的所述属性特征与目标行人的对应属性特征进行比较,来识别其中的目标行人。需要说明的是,所述目标行人的对应属性特征是指对该目标行人进行了上述骨架关节点提取、和姿态正规化处理之后提取的该目标行人的属性特征。
能够理解,在一个待分析的深度视频中,同一个行人可能出现在该视频中的多帧深度图像中,因而识别装置650并不需要将每一帧中的每个行人的属性特征都与目标行人的对应属性特征进行比较,而是只需将该深度视频中的各个不同的行人的属性特征与目标行人的对应属性特征进行比较即可。具体的,识别装置650可以根据每一帧深度图像中的每个行人的跟踪片段,确定所述深度视频中出现的所有不同行人。
在确定了深度视频中出现的所有不同行人后,识别装置650判断其中是否包括有目标行人。具体的,对于深度视频中出现的某个行人(其可能在该深度视频的多帧深度图像中出现),如果从包含有该行人的至少T帧深度图像中提取的、姿态正规化后的该行人的属性特征与目标行人的对应属性特征的相似度大于预定阈值,则识别装置650确定该行人为目标行人。T的值可以根据具体需要来设定。
可选的,在进行对应属性特征的相似度比较时,识别装置650可以仅将与目标行人具有相同正规化姿态的行人与目标行人进行相似度比较。具体来说,如果目标行人正规化之后的姿态为第一视角下的姿态,可以仅将待分析深度视频中经过正规化之后的姿态同样为第一视角下的姿态的行人与该目标行人进行相似度比较,由此可以减少相似度比较的计算量。
如前所述,从深度图像中提取的行人的属性特征可能是多个,因此在与目标行人的对应属性特征进行相似度比较时,识别装置650可以将该行人的每个属性特征与目标行人的对应属性特征分别进行比较,得到各自对应的相似度,然后通过求取加权平均值等方式来确定总体的相似度。各个特征的权重可以根据具体情况来设定。
在如上确定了某个行人是目标行人之后,识别装置650可以基于该行人 的跟踪片段,确定待分析深度视频中包含有该某个行人的各帧深度图像,由此实现目标行人的再识别。
以上结合附图描述了根据本公开实施例的行人再识别设备600,通过该设备可以从来自于某个摄像头的一个待分析深度视频中识别目标行人。当存在来自多个不同的摄像头的大量待分析深度视频时,通过应用该行人再识别设备针对每个待分析深度视频进行再识别,可以从所述大量待分析深度视频中识别目标行人。
可选的,当存在来自多个不同的摄像头的大量待分析深度视频时,所述行人再识别设备600可以事先进行时空域分析,以减少行人再识别的计算量,从而快速在多个视频中定位出目标行人。
如上文中所描述的,根据本公开实施例的行人再识别设备600利用深度视频进行目标行人的识别,其有效利用了图像和视频中行人的深度信息从而减小了光照条件的影响,并且通过对行人的姿态进行正规化减小了不同摄像头的视角不同、以及行人背对或侧面朝向摄像头所导致的信息不全的影响,进而提高了行人再识别的准确度。
下面,参照图7来描述可用于实现本公开的实施例的示例性计算设备的框图。该计算设备可以是配备有深度摄像头的计算机或服务器。
如图7所示,计算设备700包括一个或多个处理器702、存储装置704、深度摄像头706和输出装置708,这些组件通过总线系统710和/或其它形式的连接机构(未示出)互连。应当注意,图7所示的计算设备700的组件和结构只是示例性的,而非限制性的,根据需要,计算设备700也可以具有其他组件和结构。
处理器702可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元,并且可以控制计算设备700中的其它组件以执行期望的功能。
存储装置704可以包括一个或多个计算机程序产品,所述计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器702可以运行所述程序指令,以实现上文所述的 本公开的实施例的功能以及/或者其它期望的功能。在所述计算机可读存储介质中还可以存储各种应用程序和各种数据,例如深度视频、每帧深度图像中检测出的每个行人的位置信息、行人的跟踪片段、对于每一帧深度图像中的每个行人提取的骨架关节点、各个像素的匹配像素、预先建立的训练集、每个像素点投票结果、每一帧深度图像中的每个行人的朝向、骨架关节点正规化后的位置坐标、针对每一帧深度图像中的每个行人提取的属性特征、目标行人的骨架关节点、目标行人的书香特征等等。
深度摄像头706用于拍摄待分析的深度视频,并且将所拍摄的深度视频存储在存储装置704中以供其它组件使用。当然,也可以利用其他拍摄设备拍摄所述深度视频,并且将拍摄的深度视频发送给行人再识别设备700。在这种情况下,可以省略深度摄像头706。
输出装置708可以向外部(例如用户)输出各种信息,例如图像信息、声音信息、行人再识别结果,并且可以包括显示器、扬声器等中的一个或多个。
除了上述方法和设备以外,本公开的实施例还可以是计算机程序产品,用于进行行人的再识别。该计算机程序产品包括计算机可读存储介质,在所述计算机可读存储介质上存储了计算机程序指令,所述计算机程序指令可由处理器执行以使得所述处理器在深度视频的每一帧深度图像中检测行人;对于每一帧深度图像中的每个行人,进行骨架关节点提取;根据提取的骨架关节点,将每一帧深度图像中的每个行人的姿态正规化为预定视角下的姿态;对于每一帧深度图像中的每个行人,提取姿态正规化后该行人的属性特征;以及基于所述属性特征与目标行人的对应属性特征的相似度,从所述深度视频中识别目标行人。
以上结合具体实施例描述了本公开的基本原理,但是,需要指出的是,在本公开中提及的优点、优势、效果等仅是示例而非限制,不能认为这些优点、优势、效果等是本公开的各个实施例必须具备的。另外,上述公开的具体细节仅是为了示例的作用和便于理解的作用,而非限制,上述细节并不限制本公开为必须采用上述具体的细节来实现。
本公开中涉及的器件、装置、设备、系统的方框图仅作为例示性的例子并且不意图要求或暗示必须按照方框图示出的方式进行连接、布置、配置。如本领域技术人员将认识到的,可以按任意方式连接、布置、配置这些器件、 装置、设备、系统。诸如“包括”、“包含”、“具有”等等的词语是开放性词汇,指“包括但不限于”,且可与其互换使用。这里所使用的词汇“或”和“和”指词汇“和/或”,且可与其互换使用,除非上下文明确指示不是如此。这里所使用的词汇“诸如”指词组“诸如但不限于”,且可与其互换使用。
还需要指出的是,在本公开的设备和方法中,各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本公开的等效方案。
提供所公开的方面的以上描述以使本领域的任何技术人员能够做出或者使用本公开。对这些方面的各种修改对于本领域技术人员而言是非常显而易见的,并且在此定义的一般原理可以应用于其他方面而不脱离本公开的范围。因此,本公开不意图被限制到在此示出的方面,而是按照与在此公开的原理和新颖的特征一致的最宽范围。
为了例示和描述的目的已经给出了以上描述。此外,此描述不意图将本公开的实施例限制到在此公开的形式。尽管以上已经讨论了多个示例方面和实施例,但是本领域技术人员将认识到其某些变型、修改、改变、添加和子组合。

Claims (20)

  1. 一种行人再识别方法,包括:
    在深度视频的每一帧深度图像中检测行人;
    对于每一帧深度图像中的每个行人,进行骨架关节点提取;
    根据提取的骨架关节点,将每一帧深度图像中的每个行人的姿态正规化为预定视角下的姿态;
    对于每一帧深度图像中的每个行人,提取姿态正规化后该行人的属性特征;以及
    基于所述属性特征与目标行人的对应属性特征的相似度,从所述深度视频中识别目标行人。
  2. 如权利要求1所述的行人再识别方法,其中所述目标行人包含在由深度摄像头拍摄的目标深度视频中,并且所述目标深度视频和所述深度视频是由不同的深度摄像头拍摄的,或者所述目标深度视频和所述深度视频是单个深度摄像头在不同时刻拍摄的。
  3. 如权利要求1所述的行人再识别方法,还包括:
    对每一帧深度图像中检测出的每个行人进行跟踪,以获得该行人的跟踪片段,所述跟踪片段包括描述该行人在所述深度视频中的哪些帧深度图像中出现以及该行人在各帧深度图像中的位置的数据。
  4. 如权利要求3所述的行人再识别方法,其中对于每一帧深度图像中的每个行人进行骨架关节点提取包括:
    对于该帧深度图像中对应于该行人的子图像区域中的每个像素:
    确定预先建立的训练集中与其匹配的匹配像素,所述训练集中包含有多张行人深度图像,并且每张行人深度图像中预先标明了行人的骨架关节点;
    提取该匹配像素的标记数据,所述标记数据包括该匹配像素相对于其所在的行人深度图像中行人的骨架关节点的偏移量;
    基于所述标记数据及该像素在该子图像区域中的相对位置,对该行人的各骨架关节点进行投票;
    对于该行人的每一个待提取的骨架关节点,确定所述子图像区域中的各个像素投票次数最多的点作为该骨架关节点。
  5. 如权利要求4所述的行人再识别方法,其中对于该帧深度图像中对应 于该行人的子图像区域中的每个像素确定预先建立的训练集中与其匹配的匹配像素包括:
    对于所述每个像素,基于该像素的特征描述及该像素在该子图像区域中的相对位置,确定所述匹配像素。
  6. 如权利要求4所述的行人再识别方法,其中对于每一帧深度图像中的每个行人进行骨架关节点提取还包括:
    基于该行人的跟踪片段,确定该帧深度图像的前m帧包含有该行人的深度图像和后n帧包含有该行人的深度图像;
    对于所确定的该帧深度图像中的该行人的各骨架关节点,基于所述前m帧深度图像和后n帧深度图像中的该行人的各骨架关节点进行优化。
  7. 如权利要求1所述的行人再识别方法,其中根据提取的骨架关节点将每一帧深度图像中的每个行人的姿态正规化为预定视角下的姿态包括:
    确定该行人的运动方向,作为其朝向;
    根据所述朝向,通过对该行人的骨架关节点的位置坐标进行空间坐标变换以得到正规化后的骨架关节点的位置坐标,将行人的姿态正规化为预定视角下的姿态。
  8. 如权利要求7所述的行人再识别方法,其中所述预定视角包括第一视角和第二视角,所述第一视角为行人的正面正对摄像头,并且摄像头水平对齐行人正面预定位置,所述第二视角为行人的背面正对摄像头,并且摄像头水平对齐行人背面预定位置。
  9. 如权利要求7所述的行人再识别方法,其中对于每一帧深度图像中的每个行人提取姿态正规化后该行人的属性特征包括:提取该行人的中层语义特征,该中层语义特征至少包括该行人在现实世界的高度。
  10. 如权利要求9所述的行人再识别方法,其中对于每一帧深度图像中的每个行人提取姿态正规化后该行人的属性特征还包括:提取该行人的底层语义特征、人脸特征和运动特征中的一个或多个。
  11. 如权利要求10所述的行人再识别方法,其中该行人的运动特征通过其在当前帧深度图像中的姿态正规化后的骨架关节点的位置坐标与其在前若干帧深度图像中的姿态正规化后的骨架关节点的位置坐标的变化来表示。
  12. 如权利要求3所述的行人再识别方法,其中基于所述属性特征与目标行人的对应属性特征的相似度,从所述深度视频中识别目标行人包括:
    根据每一帧深度图像中的每个行人的跟踪片段,确定所述深度视频中出现的所有不同行人;
    判断所述深度视频中出现的各个行人中是否包括目标行人,其中对于深度视频中出现的某个行人,如果从包含该某个行人的至少一帧深度图像中提取的、姿态正规化后的该行人的属性特征与目标行人的对应属性特征的相似度大于预定阈值,则确定该某个行人为目标行人;
    基于该某个行人的跟踪片段,确定所述视频中包含有该某个行人的各帧深度图像。
  13. 一种行人再识别设备,包括:
    处理器;
    存储器;和
    存储在所述存储器中的计算机程序指令,在所述计算机程序指令被所述处理器运行时执行以下步骤:
    在深度视频的每一帧深度图像中检测行人;
    对于每一帧深度图像中的每个行人,进行骨架关节点提取;
    根据提取的骨架关节点,将每一帧深度图像中的每个行人的姿态正规化为预定视角下的姿态;
    对于每一帧深度图像中的每个行人,提取姿态正规化后该行人的属性特征;以及
    基于所述属性特征与目标行人的对应属性特征的相似度,从所述深度视频中识别目标行人。
  14. 如权利要求13所述的行人再识别设备,还包括:
    深度摄像头,配置为拍摄所述深度视频。
  15. 如权利要求13所述的行人再识别设备,还包括:
    对每一帧深度图像中检测出的每个行人进行跟踪,以获得该行人的跟踪片段,所述跟踪片段包括描述该行人在所述深度视频中的哪些帧深度图像中出现以及该行人在各帧深度图像中的位置的数据。
  16. 如权利要求13所述的行人再识别设备,其中对于每一帧深度图像中的每个行人进行骨架关节点提取包括:
    对于该帧深度图像中对应于该行人的子图像区域中的每个像素:
    确定预先建立的训练集中与其匹配的匹配像素,所述训练集中包含有多 张行人深度图像,并且每张行人深度图像中预先标明了行人的骨架关节点;
    提取该匹配像素的标记数据,所述标记数据包括该匹配像素相对于其所在的行人深度图像中行人的骨架关节点的偏移量;
    基于所述标记数据及该像素在该子图像区域中的相对位置,对该行人的各骨架关节点进行投票;
    对于该行人的每一个待提取的骨架关节点,确定所述子图像区域中的各个像素投票次数最多的点作为该骨架关节点。
  17. 如权利要求13所述的行人再识别设备,其中根据提取的骨架关节点将每一帧深度图像中的每个行人的姿态正规化为预定视角下的姿态包括:
    确定该行人的运动方向,作为其朝向;
    基于所述朝向,通过对该行人的骨架关节点的位置坐标进行空间坐标变换以得到正规化后的骨架关节点的位置坐标,将行人的姿态正规化为预定视角下的姿态。
  18. 如权利要求17所述的行人再识别设备,其中所述预定视角包括第一视角和第二视角,所述第一视角为行人的正面正对摄像头,并且摄像头水平对齐行人正面预定位置,所述第二视角为行人的背面正对摄像头,并且摄像头水平对齐行人背面预定位置。
  19. 如权利要求15所述的行人再识别设备,其中基于所述属性特征与目标行人的对应属性特征的相似度,从所述深度视频中识别目标行人包括:
    根据每一帧深度图像中的每个行人的跟踪片段,确定所述深度视频中出现的所有不同行人;
    判断所述深度视频中出现的各个行人中是否包括目标行人,其中对于深度视频中出现的某个行人,如果从包含该某个行人的至少一帧深度图像中提取的、姿态正规化后的该行人的属性特征与目标行人的对应属性特征的相似度大于预定阈值,则确定该某个行人为目标行人;
    基于该某个行人的跟踪片段,确定所述视频中包含有该某个行人的各帧深度图像。
  20. 一种用于行人再识别的计算机程序产品,包括计算机可读存储介质,在所述计算机可读存储介质上存储了计算机程序指令,所述计算机程序指令可由处理器执行以使得所述处理器:
    在深度视频的每一帧深度图像中检测行人;
    对于每一帧深度图像中的每个行人,进行骨架关节点提取;
    根据提取的骨架关节点,将每一帧深度图像中的每个行人的姿态正规化为预定视角下的姿态;
    对于每一帧深度图像中的每个行人,提取姿态正规化后该行人的属性特征;以及
    基于所述属性特征与目标行人的对应属性特征的相似度,从所述深度视频中识别目标行人。
PCT/CN2015/082639 2015-06-29 2015-06-29 行人再识别方法及设备 WO2017000115A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201580000333.7A CN105518744B (zh) 2015-06-29 2015-06-29 行人再识别方法及设备
PCT/CN2015/082639 WO2017000115A1 (zh) 2015-06-29 2015-06-29 行人再识别方法及设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/082639 WO2017000115A1 (zh) 2015-06-29 2015-06-29 行人再识别方法及设备

Publications (1)

Publication Number Publication Date
WO2017000115A1 true WO2017000115A1 (zh) 2017-01-05

Family

ID=55725036

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/082639 WO2017000115A1 (zh) 2015-06-29 2015-06-29 行人再识别方法及设备

Country Status (2)

Country Link
CN (1) CN105518744B (zh)
WO (1) WO2017000115A1 (zh)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583315A (zh) * 2018-11-02 2019-04-05 北京工商大学 一种面向智能视频监控的多通道快速人体姿态识别方法
CN109753901A (zh) * 2018-12-21 2019-05-14 上海交通大学 基于行人识别的室内行人寻迹方法、装置、计算机设备及存储介质
CN110111368A (zh) * 2019-05-07 2019-08-09 山东广域科技有限责任公司 一种基于人体姿态识别的相似移动目标的检测跟踪方法
CN110378202A (zh) * 2019-06-05 2019-10-25 魔视智能科技(上海)有限公司 一种基于鱼眼镜头的全方位行人碰撞预警方法
CN110443228A (zh) * 2019-08-20 2019-11-12 图谱未来(南京)人工智能研究院有限公司 一种行人匹配方法、装置、电子设备及存储介质
CN110619807A (zh) * 2018-06-20 2019-12-27 北京京东尚科信息技术有限公司 生成全局热力图的方法和装置
CN111028271A (zh) * 2019-12-06 2020-04-17 浩云科技股份有限公司 基于人体骨架检测的多摄像机人员三维定位跟踪系统
CN111080712A (zh) * 2019-12-06 2020-04-28 浩云科技股份有限公司 一种基于人体骨架检测的多相机人员定位跟踪显示方法
CN111291705A (zh) * 2020-02-24 2020-06-16 北京交通大学 一种跨多目标域行人重识别方法
CN111353474A (zh) * 2020-03-30 2020-06-30 安徽建筑大学 一种基于人体姿态不变特征的行人再识别方法
CN111435535A (zh) * 2019-01-14 2020-07-21 株式会社日立制作所 一种关节点信息的获取方法及装置
CN111553247A (zh) * 2020-04-24 2020-08-18 上海锘科智能科技有限公司 一种基于改进骨干网络的视频结构化系统、方法及介质
CN111626156A (zh) * 2020-05-14 2020-09-04 电子科技大学 一种基于行人掩模和多尺度判别的行人生成方法
CN111898519A (zh) * 2020-07-28 2020-11-06 武汉大学 便携式的特定区域内运动训练辅助视觉伺服机器人系统及姿态评估方法
CN112101150A (zh) * 2020-09-01 2020-12-18 北京航空航天大学 一种基于朝向约束的多特征融合行人重识别方法
CN112733707A (zh) * 2021-01-07 2021-04-30 浙江大学 一种基于深度学习的行人重识别方法
CN112906483A (zh) * 2021-01-25 2021-06-04 中国银联股份有限公司 一种目标重识别方法、装置及计算机可读存储介质
CN112989889A (zh) * 2019-12-17 2021-06-18 中南大学 一种基于姿态指导的步态识别方法
CN112989896A (zh) * 2019-12-18 2021-06-18 广东毓秀科技有限公司 跨镜头追踪方法
CN113034544A (zh) * 2021-03-19 2021-06-25 奥比中光科技集团股份有限公司 一种基于深度相机的人流分析方法及装置
CN113033350A (zh) * 2021-03-11 2021-06-25 北京文安智能技术股份有限公司 基于俯视图像的行人重识别方法、存储介质和电子设备
CN113033349A (zh) * 2021-03-11 2021-06-25 北京文安智能技术股份有限公司 行人重识别的俯视图像选取方法、存储介质和电子设备
CN113887419A (zh) * 2021-09-30 2022-01-04 四川大学 一种基于提取视频时空信息的人体行为识别方法及系统
US11386562B2 (en) 2018-12-28 2022-07-12 Cyberlink Corp. Systems and methods for foreground and background processing of content in a live video
CN116561372A (zh) * 2023-07-03 2023-08-08 北京瑞莱智慧科技有限公司 基于多算法引擎的人员聚档方法、装置及可读存储介质

Families Citing this family (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644190A (zh) * 2016-07-20 2018-01-30 北京旷视科技有限公司 行人监控方法和装置
CN106250863B (zh) * 2016-08-09 2019-07-26 北京旷视科技有限公司 对象追踪方法和装置
CN107808111B (zh) * 2016-09-08 2021-07-09 北京旷视科技有限公司 用于行人检测和姿态估计的方法和装置
CN108009466B (zh) * 2016-10-28 2022-03-15 北京旷视科技有限公司 行人检测方法和装置
CN106960181B (zh) * 2017-02-28 2020-04-24 中科唯实科技(北京)有限公司 一种基于rgbd数据的行人属性识别方法
CN108694347B (zh) * 2017-04-06 2022-07-12 北京旷视科技有限公司 图像处理方法和装置
CN107153824A (zh) * 2017-05-22 2017-09-12 中国人民解放军国防科学技术大学 基于图聚类的跨视频行人重识别方法
US10762635B2 (en) 2017-06-14 2020-09-01 Tusimple, Inc. System and method for actively selecting and labeling images for semantic segmentation
US10552979B2 (en) 2017-09-13 2020-02-04 TuSimple Output of a neural network method for deep odometry assisted by static scene optical flow
US10671083B2 (en) 2017-09-13 2020-06-02 Tusimple, Inc. Neural network architecture system for deep odometry assisted by static scene optical flow
CN107832672B (zh) * 2017-10-12 2020-07-07 北京航空航天大学 一种利用姿态信息设计多损失函数的行人重识别方法
CN107704838B (zh) * 2017-10-19 2020-09-25 北京旷视科技有限公司 目标对象的属性识别方法及装置
CN109697390B (zh) * 2017-10-23 2020-12-22 北京京东尚科信息技术有限公司 行人检测方法、装置、介质及电子设备
CN108875498B (zh) * 2017-11-03 2022-01-28 北京旷视科技有限公司 用于行人重识别的方法、装置及计算机存储介质
CN108875501B (zh) * 2017-11-06 2021-10-15 北京旷视科技有限公司 人体属性识别方法、装置、系统及存储介质
CN108875500B (zh) * 2017-11-06 2022-01-07 北京旷视科技有限公司 行人再识别方法、装置、系统及存储介质
CN109784130B (zh) * 2017-11-15 2023-04-28 株式会社日立制作所 行人重识别方法及其装置和设备
CN108062562B (zh) * 2017-12-12 2020-03-10 北京图森未来科技有限公司 一种物体重识别方法及装置
CN108090472B (zh) * 2018-01-12 2021-05-04 浙江大学 基于多通道一致性特征的行人重识别方法及其系统
CN108280435A (zh) * 2018-01-25 2018-07-13 盛视科技股份有限公司 一种基于人体姿态估计的旅客异常行为识别方法
CN108446583A (zh) * 2018-01-26 2018-08-24 西安电子科技大学昆山创新研究院 基于姿态估计的人体行为识别方法
CN108875770B (zh) * 2018-02-06 2021-11-19 北京迈格威科技有限公司 行人检测误报数据的标注方法、装置、系统和存储介质
CN108399381B (zh) 2018-02-12 2020-10-30 北京市商汤科技开发有限公司 行人再识别方法、装置、电子设备和存储介质
CN108537136B (zh) * 2018-03-19 2020-11-20 复旦大学 基于姿态归一化图像生成的行人重识别方法
CN108734194B (zh) * 2018-04-09 2021-08-03 浙江工业大学 一种面向虚拟现实的基于单深度图的人体关节点识别方法
CN109101915B (zh) * 2018-08-01 2021-04-27 中国计量大学 基于深度学习的人脸与行人及属性识别网络结构设计方法
CN109117882B (zh) * 2018-08-10 2022-06-03 北京旷视科技有限公司 获取用户轨迹的方法、装置、系统和存储介质
CN109271888A (zh) * 2018-08-29 2019-01-25 汉王科技股份有限公司 基于步态的身份识别方法、装置、电子设备
CN109409250A (zh) * 2018-10-08 2019-03-01 高新兴科技集团股份有限公司 一种基于深度学习的无交叠视域跨摄像机行人再识别方法
CN109729315A (zh) * 2018-12-27 2019-05-07 杭州启迪万华科技产业发展有限公司 一种园区安全信息管理方法及装置
CN109919141A (zh) * 2019-04-09 2019-06-21 广东省智能制造研究所 一种基于骨架姿态的行人再识别方法
CN110348347A (zh) * 2019-06-28 2019-10-18 深圳市商汤科技有限公司 一种信息处理方法及装置、存储介质
CN110458940B (zh) * 2019-07-24 2023-02-28 兰州未来新影文化科技集团有限责任公司 动作捕捉的处理方法和处理装置
US11048917B2 (en) * 2019-07-31 2021-06-29 Baidu Usa Llc Method, electronic device, and computer readable medium for image identification
CN114581801A (zh) * 2019-09-06 2022-06-03 中国农业科学院农业资源与农业区划研究所 一种基于无人机数据采集的果树识别和数量监测方法
CN111144233B (zh) * 2019-12-10 2022-06-14 电子科技大学 基于toim损失函数的行人重识别方法
CN113448429A (zh) * 2020-03-25 2021-09-28 南京人工智能高等研究院有限公司 基于手势控制电子设备的方法及装置、存储介质和电子设备
CN113221764B (zh) * 2021-05-18 2023-04-28 安徽工程大学 一种快速行人再识别方法
CN113869357A (zh) * 2021-08-17 2021-12-31 浙江大华技术股份有限公司 属性类别识别方法、设备及计算机存储介质
KR102562865B1 (ko) * 2022-12-21 2023-08-04 주식회사세오 걸음걸이 인식 기반 객체 식별 및 추적 방법 및 컴퓨팅 장치

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120159290A1 (en) * 2010-12-17 2012-06-21 Microsoft Corporation Validation analysis of human target
CN103049758A (zh) * 2012-12-10 2013-04-17 北京工业大学 融合步态光流图和头肩均值形状的远距离身份验证方法
CN104463118A (zh) * 2014-12-04 2015-03-25 龙岩学院 一种基于Kinect的多视角步态识别方法
CN104599287A (zh) * 2013-11-01 2015-05-06 株式会社理光 对象跟踪方法和装置、对象识别方法和装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894379A (zh) * 2010-06-21 2010-11-24 清华大学 一种针对大帧间运动视频的特征点运动分割方法和装置
CN103177247B (zh) * 2013-04-09 2015-11-18 天津大学 一种融合多视角信息的目标检测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120159290A1 (en) * 2010-12-17 2012-06-21 Microsoft Corporation Validation analysis of human target
CN103049758A (zh) * 2012-12-10 2013-04-17 北京工业大学 融合步态光流图和头肩均值形状的远距离身份验证方法
CN104599287A (zh) * 2013-11-01 2015-05-06 株式会社理光 对象跟踪方法和装置、对象识别方法和装置
CN104463118A (zh) * 2014-12-04 2015-03-25 龙岩学院 一种基于Kinect的多视角步态识别方法

Cited By (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110619807A (zh) * 2018-06-20 2019-12-27 北京京东尚科信息技术有限公司 生成全局热力图的方法和装置
CN109583315A (zh) * 2018-11-02 2019-04-05 北京工商大学 一种面向智能视频监控的多通道快速人体姿态识别方法
CN109583315B (zh) * 2018-11-02 2023-05-12 北京工商大学 一种面向智能视频监控的多通道快速人体姿态识别方法
CN109753901A (zh) * 2018-12-21 2019-05-14 上海交通大学 基于行人识别的室内行人寻迹方法、装置、计算机设备及存储介质
CN109753901B (zh) * 2018-12-21 2023-03-24 上海交通大学 基于行人识别的室内行人寻迹方法、装置、计算机设备及存储介质
US11386562B2 (en) 2018-12-28 2022-07-12 Cyberlink Corp. Systems and methods for foreground and background processing of content in a live video
CN111435535B (zh) * 2019-01-14 2024-03-08 株式会社日立制作所 一种关节点信息的获取方法及装置
CN111435535A (zh) * 2019-01-14 2020-07-21 株式会社日立制作所 一种关节点信息的获取方法及装置
CN110111368B (zh) * 2019-05-07 2023-04-07 山东广域科技有限责任公司 一种基于人体姿态识别的相似移动目标的检测跟踪方法
CN110111368A (zh) * 2019-05-07 2019-08-09 山东广域科技有限责任公司 一种基于人体姿态识别的相似移动目标的检测跟踪方法
CN110378202B (zh) * 2019-06-05 2022-12-09 魔视智能科技(上海)有限公司 一种基于鱼眼镜头的全方位行人碰撞预警方法
CN110378202A (zh) * 2019-06-05 2019-10-25 魔视智能科技(上海)有限公司 一种基于鱼眼镜头的全方位行人碰撞预警方法
CN110443228A (zh) * 2019-08-20 2019-11-12 图谱未来(南京)人工智能研究院有限公司 一种行人匹配方法、装置、电子设备及存储介质
CN111028271A (zh) * 2019-12-06 2020-04-17 浩云科技股份有限公司 基于人体骨架检测的多摄像机人员三维定位跟踪系统
CN111080712A (zh) * 2019-12-06 2020-04-28 浩云科技股份有限公司 一种基于人体骨架检测的多相机人员定位跟踪显示方法
CN111080712B (zh) * 2019-12-06 2023-04-18 浩云科技股份有限公司 一种基于人体骨架检测的多相机人员定位跟踪显示方法
CN111028271B (zh) * 2019-12-06 2023-04-14 浩云科技股份有限公司 基于人体骨架检测的多摄像机人员三维定位跟踪系统
CN112989889B (zh) * 2019-12-17 2023-09-12 中南大学 一种基于姿态指导的步态识别方法
CN112989889A (zh) * 2019-12-17 2021-06-18 中南大学 一种基于姿态指导的步态识别方法
CN112989896A (zh) * 2019-12-18 2021-06-18 广东毓秀科技有限公司 跨镜头追踪方法
CN111291705B (zh) * 2020-02-24 2024-04-19 北京交通大学 一种跨多目标域行人重识别方法
CN111291705A (zh) * 2020-02-24 2020-06-16 北京交通大学 一种跨多目标域行人重识别方法
CN111353474A (zh) * 2020-03-30 2020-06-30 安徽建筑大学 一种基于人体姿态不变特征的行人再识别方法
CN111353474B (zh) * 2020-03-30 2023-12-19 安徽建筑大学 一种基于人体姿态不变特征的行人再识别方法
CN111553247B (zh) * 2020-04-24 2023-08-08 上海锘科智能科技有限公司 一种基于改进骨干网络的视频结构化系统、方法及介质
CN111553247A (zh) * 2020-04-24 2020-08-18 上海锘科智能科技有限公司 一种基于改进骨干网络的视频结构化系统、方法及介质
CN111626156A (zh) * 2020-05-14 2020-09-04 电子科技大学 一种基于行人掩模和多尺度判别的行人生成方法
CN111626156B (zh) * 2020-05-14 2023-05-09 电子科技大学 一种基于行人掩模和多尺度判别的行人生成方法
CN111898519A (zh) * 2020-07-28 2020-11-06 武汉大学 便携式的特定区域内运动训练辅助视觉伺服机器人系统及姿态评估方法
CN112101150B (zh) * 2020-09-01 2022-08-12 北京航空航天大学 一种基于朝向约束的多特征融合行人重识别方法
CN112101150A (zh) * 2020-09-01 2020-12-18 北京航空航天大学 一种基于朝向约束的多特征融合行人重识别方法
CN112733707B (zh) * 2021-01-07 2023-11-14 浙江大学 一种基于深度学习的行人重识别方法
CN112733707A (zh) * 2021-01-07 2021-04-30 浙江大学 一种基于深度学习的行人重识别方法
CN112906483A (zh) * 2021-01-25 2021-06-04 中国银联股份有限公司 一种目标重识别方法、装置及计算机可读存储介质
CN112906483B (zh) * 2021-01-25 2024-01-23 中国银联股份有限公司 一种目标重识别方法、装置及计算机可读存储介质
CN113033349A (zh) * 2021-03-11 2021-06-25 北京文安智能技术股份有限公司 行人重识别的俯视图像选取方法、存储介质和电子设备
CN113033350A (zh) * 2021-03-11 2021-06-25 北京文安智能技术股份有限公司 基于俯视图像的行人重识别方法、存储介质和电子设备
CN113033350B (zh) * 2021-03-11 2023-11-14 北京文安智能技术股份有限公司 基于俯视图像的行人重识别方法、存储介质和电子设备
CN113033349B (zh) * 2021-03-11 2023-12-26 北京文安智能技术股份有限公司 行人重识别的俯视图像选取方法、存储介质和电子设备
CN113034544A (zh) * 2021-03-19 2021-06-25 奥比中光科技集团股份有限公司 一种基于深度相机的人流分析方法及装置
CN113887419A (zh) * 2021-09-30 2022-01-04 四川大学 一种基于提取视频时空信息的人体行为识别方法及系统
CN113887419B (zh) * 2021-09-30 2023-05-12 四川大学 一种基于提取视频时空信息的人体行为识别方法及系统
CN116561372B (zh) * 2023-07-03 2023-09-29 北京瑞莱智慧科技有限公司 基于多算法引擎的人员聚档方法、装置及可读存储介质
CN116561372A (zh) * 2023-07-03 2023-08-08 北京瑞莱智慧科技有限公司 基于多算法引擎的人员聚档方法、装置及可读存储介质

Also Published As

Publication number Publication date
CN105518744B (zh) 2018-09-07
CN105518744A (zh) 2016-04-20

Similar Documents

Publication Publication Date Title
WO2017000115A1 (zh) 行人再识别方法及设备
US11727661B2 (en) Method and system for determining at least one property related to at least part of a real environment
CN104517102B (zh) 学生课堂注意力检测方法及系统
Bustard et al. Toward unconstrained ear recognition from two-dimensional images
Cai et al. Multi-object detection and tracking by stereo vision
Merad et al. Fast people counting using head detection from skeleton graph
Rahman et al. Fast action recognition using negative space features
Zhu et al. Human tracking and counting using the kinect range sensor based on adaboost and kalman filter
WO2021084972A1 (ja) 物体追跡装置および物体追跡方法
Schumann et al. A soft-biometrics dataset for person tracking and re-identification
Unzueta et al. Efficient generic face model fitting to images and videos
Zaidi et al. Video anomaly detection and classification for human activity recognition
Xu et al. Wide-baseline multi-camera calibration using person re-identification
Darujati et al. Facial motion capture with 3D active appearance models
Ubukata et al. Multi-object segmentation in a projection plane using subtraction stereo
Segundo et al. Real-time scale-invariant face detection on range images
Kölsch An appearance-based prior for hand tracking
Tu et al. An intelligent video framework for homeland protection
Zhang et al. Fast and robust head detection with arbitrary pose and occlusion
TWI728655B (zh) 應用於動物的卷積神經網路偵測方法及系統
Lorenzo-Navarro et al. An study on re-identification in RGB-D imagery
Kwolek Multi camera-based person tracking using region covariance and homography constraint
Bhuvaneswari et al. TRACKING MANUALLY SELECTED OBJECT IN VIDEOS USING COLOR HISTOGRAM MATCHING.
Kim et al. Directional pedestrian counting with a hybrid map-based model
Jiang et al. Real-time multiple people hand localization in 4d point clouds

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15896646

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19.04.2018)

122 Ep: pct application non-entry in european phase

Ref document number: 15896646

Country of ref document: EP

Kind code of ref document: A1