WO2022174523A1 - 一种提取行人的步态特征的方法、步态识别方法及系统 - Google Patents

一种提取行人的步态特征的方法、步态识别方法及系统 Download PDF

Info

Publication number
WO2022174523A1
WO2022174523A1 PCT/CN2021/093484 CN2021093484W WO2022174523A1 WO 2022174523 A1 WO2022174523 A1 WO 2022174523A1 CN 2021093484 W CN2021093484 W CN 2021093484W WO 2022174523 A1 WO2022174523 A1 WO 2022174523A1
Authority
WO
WIPO (PCT)
Prior art keywords
pedestrian
gait
image
array
event data
Prior art date
Application number
PCT/CN2021/093484
Other languages
English (en)
French (fr)
Inventor
杨志尧
牟晓正
Original Assignee
豪威芯仑传感器(上海)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 豪威芯仑传感器(上海)有限公司 filed Critical 豪威芯仑传感器(上海)有限公司
Publication of WO2022174523A1 publication Critical patent/WO2022174523A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • G06V40/25Recognition of walking or running movements, e.g. gait recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • G06T2207/20032Median filtering

Definitions

  • the invention relates to the technical field of data processing, in particular to a method for extracting gait features of pedestrians and a gait recognition method.
  • Gait recognition is an emerging biometric identification technology, which is mainly used to identify people through their walking posture. Different from other biometric recognition technologies, gait recognition is a passive recognition technology, which has the advantages of non-contact, long-distance and not easy to camouflage. Therefore, gait recognition has great advantages and broad prospects in the field of intelligent video surveillance.
  • the gait recognition technology performs identity recognition by extracting the posture information of a person when walking, it is necessary to extract the posture outline of the pedestrian in the process of recognizing the posture information.
  • the most commonly used contour extraction method is the background subtraction method, that is, to establish a background model for the video scene, obtain the foreground image containing pedestrians through the difference between the original image and the background model, and then perform binarization and mathematical morphology on the detected image. A series of image preprocessing such as learning analysis and so on can finally get the pedestrian's pose contour.
  • This contour extraction technology not only has many steps, cumbersome process, and time-consuming, but also the contour extraction effect in complex scenes is not ideal. For example, when the background is too complex, the extracted pedestrian pose contours often have some missing or attached environmental backgrounds, which seriously affects the accuracy of gait recognition.
  • the present invention provides a method for extracting gait features of pedestrians, a gait recognition method and a system, so as to try to solve or at least alleviate at least one of the above problems.
  • a method for extracting gait features of pedestrians comprising the steps of: generating a frame of images containing pedestrians for a segment of event data stream from a dynamic vision sensor, every preset duration of event data , to generate an image sequence; from the image sequence, extract the pedestrian's posture contour in each frame of images and generate a posture contour map to obtain a posture contour map sequence; perform feature extraction on the posture contour map sequence to obtain a representation of pedestrian gait. feature vector of information.
  • the event data is triggered by the relative motion of an object in the field of view and the dynamic vision sensor, the object includes a pedestrian, and the event data includes the coordinate position and time stamp of the triggered event.
  • the method according to the present invention further includes the step of: filtering each frame of image to obtain a filtered image.
  • the step of extracting the pedestrian's posture contour in each frame of image includes: initializing two arrays respectively according to the width and height of the filtered image; The pixel information is respectively mapped to the array; the longest continuous non-zero sub-array is determined from the array; based on the determined non-zero sub-array, the pedestrian's pose contour is extracted.
  • the steps of initializing the two arrays respectively include: constructing a first array whose length is the height of the filtered image, and initializing the first array; constructing A second array of length the width of the filtered image, and initializes the second array.
  • the step of respectively mapping the pixel information of the filtered image to the array includes: for each row of pixels in the filtered image, obtaining the sum of the pixel values of each row by means of accumulation. and, and store the sum of the pixel values of each row in the first array correspondingly; for each column of pixels in the filtered image, obtain the sum of the pixel values of each column by means of accumulation, and store the sum of the pixel values of each column correspondingly to the first array.
  • the step of extracting the pedestrian's posture profile includes: determining the pedestrian based on the subscript of the non-zero sub-array determined from the first array. the vertical boundary of the posture contour of The boundary in the direction, and the pose contour of the pedestrian is extracted.
  • the step of extracting the pedestrian's posture contour in each frame of images further includes: inputting the filtered image into a detection network to determine the pedestrian's posture contour.
  • the step of performing feature extraction on the sequence of posture contour maps to obtain a feature vector representing pedestrian gait information includes: inputting the sequence of posture contour maps into a feature extraction model, and processing by the feature extraction model. Finally, the feature vector representing pedestrian gait information is output, and the feature extraction model is a deep learning-based convolutional neural network.
  • the step of generating a frame of images containing pedestrians for every preset duration of event data includes: constructing an initial image of a predetermined size and assigning a pixel value of the initial image to zero,
  • the predetermined size is determined according to the size of the pixel unit array of the dynamic vision sensor; based on the coordinate position of each event data within the preset duration, the corresponding pixel is searched in the initial image; the timestamp of the event data is used to correspondingly update each The pixel value of the found pixel is used to generate a single-channel image; and the pixel value of the single-channel image is normalized to obtain a grayscale image, which is used as an image containing pedestrians.
  • a gait recognition method which includes the steps of: extracting a feature vector representing the gait information of the current pedestrian by executing the method for extracting gait features of a pedestrian; is the gait feature vector with the highest similarity in feature vector matching, in which the gait feature vector and the pedestrian's identity are associated and stored in the gait feature database; based on the pedestrian identity associated with the matched gait feature vector, determine the current the identity of the pedestrian.
  • a gait recognition system comprising: a dynamic vision sensor adapted to trigger an event based on relative motion of an object in the field of view and the dynamic vision sensor, and output a stream of event data to a gait feature Extraction device; Gait feature extraction device, suitable for extracting the posture contour of pedestrians in the field of view based on event data flow, and extracting pedestrian gait features; Identity recognition device, suitable for pedestrians based on the gait features of pedestrians, identifying pedestrians. identity.
  • a computing device comprising: one or more processors; and a memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more Multiple processors execute, one or more programs including instructions for performing any of the methods described above.
  • a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions that, when executed by a computing device, cause the computing device to perform the above-described method. either method.
  • a series of images including pedestrians are generated as an image sequence.
  • the pedestrian's pose contour can be segmented from it, and the pose contour map sequence can be formed.
  • the feature vector representing the gait information of the pedestrian is calculated by using the sequence of the pedestrian's pose contour map.
  • FIG. 1 shows a schematic diagram of a gait recognition system 100 according to some embodiments of the present invention
  • FIG. 2 shows a schematic diagram of a computing device 200 according to some embodiments of the present invention
  • FIG. 3 shows a flowchart of a method 300 for extracting gait features of a pedestrian according to an embodiment of the present invention
  • FIG. 4 shows a schematic flowchart of a gait recognition method 400 according to an embodiment of the present invention.
  • DVS Dynamic Vision Sensor
  • the sensor has a pixel unit array composed of multiple pixel units, and each pixel unit responds to and records the area with rapid changes in light intensity only when it senses a change in light intensity. That is, each pixel unit in the DVS can independently respond to and record areas with rapid changes in light intensity.
  • DVS adopts an event-triggered processing mechanism, the pixel unit will be triggered only when the object in the field of view moves relative to the dynamic vision sensor, and event data will be generated, so its output is an asynchronous event data stream instead of an image frame.
  • the data stream is, for example, the light intensity change information (eg, the time stamp of the light intensity change and the light intensity threshold) and the coordinate position of the triggered pixel unit in the pixel unit array.
  • DVS dynamic vision sensors
  • the response speed of DVS is no longer limited by traditional exposure time and frame rate, and it can detect up to 10,000 High-speed objects moving at a frame/second rate; 2) DVS has a larger dynamic range, and can accurately sense and output scene changes in low-light or high-exposure environments; 3) DVS consumes less power; Pixel units respond independently to changes in light intensity, so DVS is not affected by motion blur.
  • a DVS-based gait recognition scheme is proposed. This scheme takes into account the problems of the pedestrian pose contour extraction in the existing gait recognition scheme, which is time-consuming and seriously interfered by the background. Realize the fast and complete extraction of the pedestrian's pose contour.
  • FIG. 1 shows a schematic diagram of a gait recognition system 100 according to an embodiment of the present invention.
  • the system 100 includes a dynamic vision sensor (DVS) 110 , a gait feature extraction device 120 and an identification device 130 .
  • the gait feature extraction device 120 is coupled to the dynamic vision sensor 110 and the identity recognition device 130, respectively.
  • FIG. 1 is only an example, and the embodiment of the present invention does not limit the number of each part in the system 100 .
  • the dynamic vision sensor 110 monitors the movement of objects in the field of view in real time, and once it detects that there is an object in the field of view (relative to the dynamic vision sensor 110 ) moving (ie, the light in the field of view changes), a pixel event is triggered (or, simply referred to as "events"), output event data for dynamic pixels (ie, pixel units whose brightness changes).
  • event data output within a period of time constitute the event data stream.
  • Each event data in the event data stream includes at least the coordinate position of the triggered event (ie, the pixel unit whose brightness changes) and the timestamp information of the triggered time.
  • the specific composition of the dynamic vision sensor 110 will not be elaborated here.
  • the gait feature extraction device 120 receives the event data streams from the dynamic vision sensor 110 and processes these event data streams to extract the posture contours of pedestrians in the field of view.
  • the gait feature extraction device 120 uses the event data stream generated by the DVS to construct frames, generates images without complex backgrounds, and then extracts the pedestrian's posture contours from these images.
  • the gait feature extraction device 120 also calculates the gait feature of the pedestrian according to the pedestrian's posture profile.
  • the pedestrian's gait feature is represented by a feature vector containing the pedestrian's gait information. After that, the gait feature extraction device 120 sends the pedestrian's gait feature to the identification device 130 .
  • a gait feature database is pre-stored in the identity recognition device 130, and in the gait feature database, the pedestrian's identity identifier corresponding to each gait feature vector is associated and stored. Based on the gait feature of the pedestrian, the identification device 130 matches the gait feature vector with the highest similarity from the gait feature database, and then determines the pedestrian according to the identity identifier associated with the gait feature vector. identity of.
  • the gait feature database can also be a third-party feature database, and the identity recognition device 130 can be connected to the third-party gait feature database to match the gait feature vector with the highest similarity.
  • the embodiments of the present invention do not limit this too much.
  • the gait recognition system 100 of the present invention by processing the event data stream from the dynamic vision sensor 110, the posture contour of the pedestrian in the field of view can be quickly extracted. Afterwards, the pedestrian's gait feature is calculated by using the pedestrian's posture profile, and the pedestrian is identified according to the pedestrian's gait feature.
  • the system 100 does not need to perform complex and tedious processing of images, and can greatly improve the speed of gait recognition.
  • the image generated by the system 100 using the event data stream only contains the outline information of the moving object and does not have other background information, and the posture outline of the pedestrian segmented based on the image is clear and complete, and does not have useless environmental backgrounds etc. information, which can greatly ensure the accuracy of gait recognition.
  • FIG. 2 shows a schematic block diagram of a computing device 200 according to an embodiment of the present invention.
  • computing device 200 typically includes system memory 206 and one or more processors 204 .
  • Memory bus 208 may be used for communication between processor 204 and system memory 206 .
  • the processor 204 may be any type of process, including but not limited to: a microprocessor ( ⁇ P), a microcontroller ( ⁇ P/ ⁇ C/DSP), a digital information processor (DSP), or any of these combination.
  • Processor 204 may include one or more levels of cache, such as L1 cache 210 and L2 cache 212 , processor core 214 , and registers 216 .
  • Exemplary processor cores 214 may include arithmetic logic units (ALUs), floating point units (FPUs), digital signal processing cores (DSP cores), or any combination thereof.
  • the example memory controller 218 may be used with the processor 204 , or in some implementations, the memory controller 218 may be an internal part of the processor 204 .
  • system memory 206 may be any type of memory including, but not limited to, volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof.
  • System memory 206 may include operating system 220 , one or more applications 222 , and program data 224 .
  • applications 222 may be arranged to execute instructions using program data 224 by one or more processors 204 on an operating system.
  • Computing device 200 also includes storage device 232 including removable storage 236 and non-removable storage 238, both of which are connected to storage interface bus 234.
  • Computing device 200 may also include an interface bus 240 that facilitates communication from various interface devices (eg, output device 242 , peripheral interface 244 , and communication device 246 ) to base configuration 202 via bus/interface controller 230 .
  • Example output devices 242 include graphics processing unit 248 and audio processing unit 250 . They may be configured to facilitate communication via one or more A/V ports 252 with various external devices such as displays or speakers.
  • Example peripheral interfaces 244 may include serial interface controller 254 and parallel interface controller 256, which may be configured to facilitate communication via one or more I/O ports 258 and input devices such as keyboard, mouse, pen, etc.
  • the example communication device 246 may include a network controller 260 that may be arranged to facilitate communication via one or more communication ports 264 with one or more other computing devices 262 over a network communication link.
  • a network communication link may be one example of a communication medium.
  • Communication media may typically embody computer readable instructions, data structures, program modules in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media.
  • a "modulated data signal" may be a signal of which one or more of its data sets or changes may be made in such a way as to encode information in the signal.
  • communication media may include wired media, such as wired or leased line networks, and various wireless media, such as acoustic, radio frequency (RF), microwave, infrared (IR), or other wireless media.
  • RF radio frequency
  • IR infrared
  • the term computer readable medium as used herein may include both storage media and communication media.
  • computing device 200 may be implemented as part of a small form factor portable (or mobile) electronic device such as a cellular telephone, digital camera, personal digital assistant (PDA), personal media player device, wireless web browsing device , personal headsets, application-specific devices, or hybrid devices that can include any of the above.
  • computing device 200 may be implemented as a micro-computing module or the like. The embodiments of the present invention do not limit this.
  • computing device 200 is configured to perform a gait recognition scheme in accordance with the present invention.
  • the application 222 of the computing device 200 includes a plurality of program instructions for executing the method 300 for extracting gait features of a pedestrian and the method 400 for gait recognition according to the present invention.
  • the computing device 200 can also be used as a part of the dynamic vision sensor 110 to process the event data stream to realize moving object detection.
  • FIG. 3 shows a flowchart of a method 300 for extracting gait features of a pedestrian according to an embodiment of the present invention.
  • the method 300 is performed in the gait feature extraction device 120 . It should be noted that, due to space limitations, the descriptions about the method 300 and the system 100 are complementary to each other, and repeated parts will not be repeated.
  • the method 300 begins at step S310.
  • step S310 for a segment of event data stream from the dynamic vision sensor 110, every preset time period of event data, a frame of images including pedestrians is generated to generate an image sequence.
  • the gait feature extraction device 120 continuously or sampled receives and processes the event data stream output by the DVS.
  • the event data is triggered by the relative motion of objects in the field of view, including pedestrians, and the dynamic vision sensor 110 .
  • each event data e(x, y, t) includes the coordinate position (x, y) of the corresponding triggered event and the timestamp t of the triggered time.
  • the gait feature extraction device 120 when acquiring the event data stream, performs frame building every preset duration of event data, that is, generates a frame of images including pedestrians.
  • the timestamp of the first event data received in this time period is t 0 .
  • the timestamp t of the subsequent received event data satisfies tt 0 >T, the event data is stopped, and T is the preset duration.
  • the frame building process using event data includes the following four steps.
  • an initial image of a predetermined size is constructed and the pixel values of the initial image are all set to zero.
  • the predetermined size is determined according to the size of the pixel unit array of the dynamic vision sensor 110 . For example, if the pixel cell array is 20x30 in size, then the size of the constructed initial image is also 20x30. In other words, the pixels in the initial image correspond one-to-one with the pixel units in the pixel unit array.
  • the corresponding pixel is searched in the initial image.
  • the pixel value of each found pixel (ie, the pixel corresponding to the coordinate position of the event data) is updated correspondingly with the timestamp of the event data to generate a single-channel image.
  • IT the single-channel image
  • (x, y) represents the coordinates of the pixel
  • I T (x, y) represents the pixel value at (x, y)
  • t represents the event data e(x, y, t) corresponding to the coordinate position. timestamp.
  • the timestamp closest to the current time is taken as the pixel value of the pixel.
  • the pixel values of the single-channel image are normalized to obtain a grayscale image, which is used as an image containing pedestrians.
  • a grayscale image similar to a traditional image can be obtained, denoted as I G , which can be normalized by the following formula Unification, to get IG :
  • t represents the pixel value of the image IT at the pixel (x, y)
  • t max and t min represent the maximum pixel value and the minimum pixel value in the image IT, respectively
  • [ ] represents the rounding function.
  • the final image IG is the image containing pedestrians.
  • the pixel values are normalized to [0, 255] so that the resulting image is a grayscale image.
  • the embodiment of the present invention does not limit the specific interval of normalization, which may also be [0, 1], or [0, 1023], and so on.
  • the gait Since the gait is composed of a series of continuous actions, it is necessary to acquire consecutive N pieces of event data within a preset duration, and build frames to obtain corresponding N frames of images as an image sequence.
  • the value of N can be set according to actual requirements. In some embodiments of the present invention, the value range of N is generally between 40 and 80, but is not limited thereto.
  • step S320 from the image sequence, the posture contours of the pedestrians in each frame of images are respectively extracted and the posture contour map is generated.
  • N frames of pose contour maps generated corresponding to N frames of images are pose contour map sequences.
  • the following is an example of extracting the pedestrian's posture contour from a frame of image, and the process of extracting the pedestrian's posture contour is introduced in detail.
  • the method before the step of extracting the pedestrian's posture contour in each frame of image, the method further includes the step of: filtering each frame of image to remove noise in the image to obtain a filtered image.
  • median filtering is adopted, that is, for each pixel point, the median value of its neighborhood is used to replace the original value of the pixel point.
  • the median filter has a significant denoising effect on salt and pepper noise, and can effectively remove the noise in the input image IG , thereby obtaining an output image with a clean background, denoted as ID.
  • the generated image only pedestrians are moving in the field of view, so the generated image only contains the outline information of the pedestrians without other background information.
  • the posture contour of the entire pedestrian can be segmented without detecting the image.
  • the width and height of the filtered image two arrays are initialized respectively.
  • the width of the filtered image be W and the height to be H, that is, the size of the filtered image is W ⁇ H (it should be understood that W and H here represent the number of pixels in the horizontal and vertical directions of the image respectively)
  • the initial value of the elements is 0.
  • the initial first array A x contains H 0s
  • the initial second array A y contains W 0s.
  • the pixel information of the filtered image is respectively mapped to the two arrays.
  • the predetermined manner refers to that the pixels in the filtered image are mapped to the vertical direction (ie, the Y axis of the image) by row; meanwhile, the pixels in the filtered image are mapped to the column by column. in the horizontal direction (ie, the X-axis of the image).
  • the sum of the pixel values of each row is obtained by means of accumulation, and the sum of the pixel values of each row is correspondingly stored in the first array A x ; for each row of pixels in the filtered image, The sum of the pixel values of each column is obtained by means of accumulation, and the sum of the pixel values of each column is correspondingly stored in the second array A y .
  • first array A x and the second array A y can be represented as follows:
  • a x [i] represents the element corresponding to the subscript i in the first array
  • a y [j] represents the element corresponding to the subscript j in the second array
  • I D (x, y) represents the pixel in the filtered image
  • the pixel value of point (x, y) represents the height of the filtered image
  • W represents the width of the filtered image.
  • the pixel information of pedestrians in the filtered image will be the longest continuous non-zero sub-array in the first array A x and the longest continuous non-zero sub-array in the second array A y .
  • the longest continuous non-zero sub-array is determined from the first array A x , and the longest continuous non-zero sub-array is also determined from the second array A y .
  • the non-zero sub-array here means that all elements in the entire sub-array are non-zero values.
  • the pose contours of pedestrians are extracted.
  • the boundary of the pedestrian's posture profile in the vertical direction (Y-axis direction) is determined.
  • the starting subscript and ending subscript of the non-zero subarray are the upper and lower boundaries of the pedestrian's posture profile in the Y-axis direction.
  • the subscripts of the non-zero sub-arrays determined from the second array A y are the two boundaries of the pedestrian's posture profile in the X-axis direction. Therefore, based on the determination from the second array A y
  • the subscript of the non-zero subarray of can determine the boundary of the pedestrian's posture profile in the horizontal direction (X-axis direction). Then, based on the above-determined boundary information (including two boundaries in the vertical direction and two boundaries in the horizontal direction), the pedestrian pose contour can be segmented from the filtered image as a pose contour map.
  • a dynamic scene in addition to pedestrians, there are other moving objects, such as animals, vehicles, etc., in the field of view.
  • moving objects will not cause serious occlusion or overlap with the target pedestrian, but due to the existence of such moving objects, the framed images also have a certain degree of background interference. Therefore, in dynamic scenes, the method of object detection is used to extract pedestrian pose contours.
  • the filtered image ID is input to a detection network to determine the pedestrian's pose profile.
  • the detection network may be a target detection network such as YOLO, SSD, MobileNet, ShuffleNet, etc., which is not limited in the embodiment of the present invention.
  • the filtered image ID is used as the input image and input into the detection network. After a series of operations such as convolution and pooling, the detection frame containing the pedestrian is obtained. The image indicated by the detection frame is segmented from the filtered image, which is the pose contour map.
  • the input filtered image ID does not contain all the scene information like the traditional image, only the pixel information of the target pedestrian and other moving objects exists, which largely avoids the interference of redundant information such as the background, so the detection speed and There is a certain improvement in accuracy.
  • an image is generated based on the event data stream of DVS.
  • the pedestrian's posture profile can be extracted, which is almost It does not need to be time-consuming; and in dynamic scenes, the pedestrian's pose contour can be segmented from the image by directly performing target detection, without complex image preprocessing, but it can ensure a good segmentation effect.
  • step S330 feature extraction is performed on the pose contour map sequence to obtain a feature vector representing pedestrian gait information.
  • the pose contour map sequence is input into the feature extraction model, and after being processed by the feature extraction model (the processing includes, but not limited to, convolution, max pooling, horizontal pyramid pooling, activation, etc.), the gait Information is extracted and compressed into a feature vector and output.
  • the feature vector is the representation of the main feature information in the pedestrian's pose contour map sequence in a lower dimension, and the feature vector represents the pedestrian's gait information.
  • the feature extraction model is a deep learning based convolutional neural network. The present invention does not limit the specific neural network used to realize the feature extraction model.
  • the solution for extracting gait features of pedestrians according to the present invention has two advantages.
  • the image generated by the event data stream output by DVS is easier to segment the pedestrian pose and contour, and it takes almost no time in static scenes.
  • the pedestrian's pose contours are segmented from the image simultaneously while the target is detected. Therefore, this scheme does not require the use of additional segmentation algorithms for contour extraction, nor does complex image preprocessing, thus greatly shortening the time required for the entire gait recognition process.
  • FIG. 4 shows a schematic flowchart of a gait recognition method 400 according to an embodiment of the present invention.
  • the method 400 may be performed in the identification device 130 .
  • step S410 by executing the above-mentioned method 300 for extracting gait features of a pedestrian, a feature vector representing the gait information of the current pedestrian is extracted.
  • a feature vector representing the gait information of the current pedestrian is extracted.
  • step S420 the gait feature vector with the highest similarity is matched for the feature vector from the gait feature library.
  • the gait feature vector and the pedestrian's identity are associated and stored in the gait feature database.
  • the gait feature vectors in the gait feature library are all one-dimensional feature vectors.
  • the highest similarity is found.
  • a gait feature vector of as the matching result.
  • the feature vector of the target pedestrian is transformed into a one-dimensional feature vector; then the Euclidean distance is used to calculate the similarity between the transformed one-dimensional feature vector and the gait feature vector in the gait feature library .
  • Euclidean distance is the most common distance measure, which measures the absolute distance between points in a multidimensional space. In general, the farther the distance between the two, the lower the similarity. Conversely, the smaller the Euclidean distance, the higher the similarity.
  • the calculation formula is as follows:
  • X represents the one-dimensional feature vector transformed from the feature vector of the target pedestrian, the length of the one-dimensional feature vector is n, and Y j represents a gait feature vector to be matched in the gait feature library.
  • cosine distance is also a commonly used similarity measure.
  • Cosine similarity uses the cosine value of the angle between two vectors in the vector space as a measure of the difference between two individuals. Compared with Euclidean distance, cosine similarity pays more attention to the difference in direction of two vectors.
  • the value range of the cosine similarity is [-1, 1], and the closer the cosine value is to 1, the higher the similarity.
  • the formula for calculating cosine similarity is as follows:
  • x i and y i are the elements in the two one-dimensional feature vectors X and Y, respectively, and n represents the length of the feature vectors X and Y.
  • the method for calculating the similarity of feature vectors based on the Euclidean distance or the cosine similarity is shown here only as an example, and the embodiment of the present invention does not limit the method used to measure the similarity to be the target pedestrian.
  • the feature vector of is matched to the gait feature vector with the highest similarity.
  • step S430 the identity of the current pedestrian is determined based on the pedestrian identification associated with the matched gait feature vector.
  • the gait recognition scheme of the present invention by framing the DVS data, an image containing only motion information is obtained, which can quickly realize complete segmentation of the pedestrian posture outline, and the segmented pedestrian posture outline is clear.
  • the gait recognition based on the segmented clear pedestrian posture outline can effectively improve the accuracy or precision of gait recognition.
  • no complex calculation method is used, and no complex image preprocessing is performed, so the time required for the entire gait recognition process is greatly shortened.
  • modules or units or components of the apparatus in the examples disclosed herein may be arranged in the apparatus as described in this embodiment, or alternatively may be positioned differently from the apparatus in this example in one or more devices.
  • the modules in the preceding examples may be combined into one module or further divided into sub-modules.
  • modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment.
  • the modules or units or components in the embodiments may be combined into one module or unit or component, and further they may be divided into multiple sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method so disclosed may be employed in any combination, unless at least some of such features and/or procedures or elements are mutually exclusive. All processes or units of equipment are combined.
  • Each feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本发明公开了一种提取行人的步态特征的方法、步态识别方法及系统。其中,提取行人的步态特征的方法包括步骤:针对来自动态视觉传感器的一段事件数据流,每隔预设时长的事件数据,生成一帧包含行人的图像,来生成图像序列;从图像序列中,分别提取每一帧图像中行人的姿态轮廓并生成姿态轮廓图,以得到姿态轮廓图序列;对姿态轮廓图序列进行特征提取,以得到表示行人步态信息的特征向量。本发明一并公开相应的计算设备。

Description

一种提取行人的步态特征的方法、步态识别方法及系统 技术领域
本发明涉及数据处理技术领域,尤其涉及提取行人的步态特征的方法及步态识别方法。
背景技术
步态识别是一种新兴的生物特征识别技术,主要是通过人的行走姿态来进行身份识别。与其他的生物特征识别技术不同的是,步态识别是一种被动的识别技术,具有非接触、远距离和不容易伪装等优点。因此,步态识别在智能视频监控领域有着很大的优势和广泛的前景。
由于步态识别技术是通过提取人行走时的姿态信息来进行身份识别的,因此,在识别姿态信息的过程中,需要进行行人的姿态轮廓的提取。目前最常用的轮廓提取方法是背景减除法,也就是为视频场景建立背景模型,通过原始图像和背景模型差分来获取包含行人的前景图像,然后再对检测出来的图像进行二值化、数学形态学分析等一系列的图像预处理,最终才能得到行人的姿态轮廓。这种轮廓提取技术不仅步骤多、过程繁琐、耗费时间久,而且在复杂场景下的轮廓提取效果并不理想。例如,当背景过于复杂时,提取到的行人姿态轮廓往往会出现部分的缺失或者附带环境背景等情况,严重影响步态识别的准确度。
基于上述问题,需要一种新的步态识别方案。
发明内容
本发明提供了一种提取行人的步态特征的方法、步态识别方法及系统,以力图解决或者至少缓解上面存在的至少一个问题。
根据本发明的一个方面,提供了一种提取行人的步态特征的方法,包括步骤:针对来自动态视觉传感器的一段事件数据流,每隔预设时长的事件数据,生成一帧包含行人的图像,来生成图像序列;从图像序列中,分别提取每一帧图像中行人的姿态轮廓并生成姿态轮廓图,以得到姿态轮廓图序列;对姿态轮 廓图序列进行特征提取,以得到表示行人步态信息的特征向量。
可选地,在根据本发明的方法中,事件数据由视场中对象和动态视觉传感器的相对运动触发,对象包括行人,且事件数据中包含被触发事件的坐标位置及时间戳。
可选地,根据本发明的方法还包括步骤:对每一帧图像进行滤波,以得到滤波后图像。
可选地,在根据本发明的方法中,提取每一帧图像中行人的姿态轮廓的步骤包括:根据滤波后图像的宽和高,分别初始化两个数组;按照预定方式,将滤波后图像的像素信息分别映射到数组;从数组中分别确定出最长的连续的非零子数组;基于所确定的非零子数组,提取出行人的姿态轮廓。
可选地,在根据本发明的方法中,根据滤波后图像的宽和高,分别初始化两个数组的步骤包括:构建长度为滤波后图像的高的第一数组,并初始化第一数组;构建长度为滤波后图像的宽的第二数组,并初始化第二数组。
可选地,在根据本发明的方法中,按照预定方式,将滤波后图像的像素信息分别映射到数组的步骤包括:针对滤波后图像中每一行像素,通过累加的方式得到每行像素值之和,并将每行像素值之和对应存储到第一数组;针对滤波后图像中每一列像素,通过累加的方式得到每列像素值之和,并将每列像素值之和对应存储到第二数组。
可选地,在根据本发明的方法中,基于所确定的非零子数组,提取出行人的姿态轮廓的步骤包括:基于从第一数组中确定出的非零子数组的下标,确定行人的姿态轮廓在垂直方向上的边界;基于从第二数组中确定出的非零子数组的下标,确定行人的姿态轮廓在水平方向上的边界;基于所确定的垂直方向上的边界和水平方向上的边界,提取出行人的姿态轮廓。
可选地,在根据本发明的方法中,提取每一帧图像中行人的姿态轮廓的步骤还包括:将滤波后图像输入检测网络,以确定出行人的姿态轮廓。
可选地,在根据本发明的方法中,对姿态轮廓图序列进行特征提取,以得到表示行人步态信息的特征向量的步骤包括:将姿态轮廓图序列输入特征提取模型,经特征提取模型处理后,输出表示行人步态信息的特征向量,其中特征提取模型是基于深度学习的卷积神经网络。
可选地,在根据本发明的方法中,每隔预设时长的事件数据,生成一帧包 含行人的图像的步骤包括:构建一个预定尺寸的初始图像并将初始图像的像素值赋为零,其中预定尺寸根据动态视觉传感器的像素单元阵列的尺寸确定;基于预设时长内的各事件数据的坐标位置,在初始图像中查找其对应的像素;用事件数据的时间戳来对应更新每个被查找到的像素的像素值,生成单通道图像;以及对单通道图像的像素值进行归一化,得到灰度图,作为包含行人的图像。
根据本发明的另一方面,提供了一种步态识别方法,包括步骤:通过执行提取行人的步态特征的方法,来提取出表示当前行人的步态信息的特征向量;从步态特征库中为特征向量匹配相似度最高的步态特征向量,其中步态特征库中关联存储步态特征向量和行人的身份标识;基于与所匹配的步态特征向量相关联的行人标识,确定出当前行人的身份。
根据本发明的另一方面,提供了一种步态识别系统,包括:动态视觉传感器,适于基于视场中对象和动态视觉传感器的相对运动而触发事件,并输出事件数据流给步态特征提取装置;步态特征提取装置,适于基于事件数据流来提取视场中行人的姿态轮廓,并提取行人的步态特征;身份识别装置,适于基于行人的步态特征,识别出行人的身份。
根据本发明的另一方面,提供了一种计算设备,包括:一个或多个处理器;和存储器;一个或多个程序,其中一个或多个程序存储在存储器中并被配置为由一个或多个处理器执行,一个或多个程序包括用于执行如上所述方法中的任一方法的指令。
根据本发明的又一方面,提供了一种存储一个或多个程序的计算机可读存储介质,一个或多个程序包括指令,指令当计算设备执行时,使得计算设备执行如上所述方法中的任一方法。
综上所述,根据本发明的方案,基于动态视觉传感器输出的事件数据流,生成一系列包含行人的图像,作为图像序列。通过对图像序列的简单处理,就可以从中分割出行人的姿态轮廓,形成姿态轮廓图序列。之后,利用行人的姿态轮廓图序列,计算出表示行人步态信息的特征向量。整个处理过程简单快捷,没有繁琐的图像处理步骤,几乎不需要耗费时间,但又能保证很好地提取效果。
附图说明
为了实现上述以及相关目的,本文结合下面的描述和附图来描述某些说明 性方面,这些方面指示了可以实践本文所公开的原理的各种方式,并且所有方面及其等效方面旨在落入所要求保护的主题的范围内。通过结合附图阅读下面的详细描述,本公开的上述以及其它目的、特征和优势将变得更加明显。遍及本公开,相同的附图标记通常指代相同的部件或元素。
图1示出了根据本发明一些实施例的步态识别系统100的示意图;
图2示出了根据本发明一些实施例的计算设备200的示意图;
图3示出了根据本发明一个实施例的提取行人的步态特征的方法300的流程图;
图4示出了根据本发明一个实施例的步态识别方法400的流程示意图。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
近年来,动态视觉传感器(Dynamic Vision Sensor,DVS)在计算机视觉领域中得到了越来越多的关注和应用。DVS是一种模拟基于脉冲触发式神经元的人类视网膜的生物拟态视觉传感器。传感器内部具有由多个像素单元构成的像素单元阵列,其中每个像素单元只有在感应到光强变化时,才会响应并记录光强快速变化的区域。也就是说,DVS内的每一个像素单元能够独立自主地响应并记录光强快速变化的区域。由于DVS采用事件触发的处理机制,像素单元只有在视场中对象相对于动态视觉传感器发生运动时才会被触发,并生成事件数据,故其输出是异步的事件数据流而非图像帧,事件数据流例如是光强变化信息(如,光强变化的时间戳和光强阈值)以及被触发像素单元在像素单元阵列中的坐标位置。
基于以上工作原理特性,动态视觉传感器相比于传统视觉传感器的优越性可归纳为以下几个方面:1)DVS的响应速度不再受传统的曝光时间和帧速率限制,可以侦测到高达万帧/秒速率运动的高速物体;2)DVS具有更大的动态范围,在低光照或者高曝光环境下都能准确感应并输出场景变化;3)DVS功耗更低;4)由于DVS每个像素单元都是独立响应光强变化,因此DVS不会受运 动模糊的影响。
根据本发明的实施方式,提出了一种基于DVS的步态识别方案。该方案考虑到现有步态识别方案中行人姿态轮廓提取部分存在的耗时久、受背景干扰严重等问题,想到利用DVS数据的特性,通过一定的算法来处理其输出的事件数据流,来实现快速完整地提取行人的姿态轮廓。
图1示出了根据本发明一个实施例的步态识别系统100的示意图。如图1所示,系统100包括动态视觉传感器(DVS)110、步态特征提取装置120和身份识别装置130。其中,步态特征提取装置120分别与动态视觉传感器110和身份识别装置130相耦接。应当了解,图1仅作为示例,本发明实施例对系统100中各部分的数量并不做限制。
动态视觉传感器110实时监测视场中物体的运动变化,一旦其监测到视场中有物体(相对于动态视觉传感器110)发生运动(即,视场中的光线发生变化),就会触发像素事件(或,简称为“事件”),输出动态像素(即,亮度发生变化的像素单元)的事件数据。一段时间内输出的若干个事件数据就构成了事件数据流。该事件数据流中每个事件数据至少包括被触发事件(即,亮度发生变化的像素单元)的坐标位置和被触发时刻的时间戳信息。关于动态视觉传感器110的具体组成,此处不做过多阐述。
步态特征提取装置120接收来自动态视觉传感器110的事件数据流,并对这些事件数据流进行处理,以提取出视场中行人的姿态轮廓。在一种实施例中,步态特征提取装置120利用DVS生成的事件数据流进行建帧,生成没有复杂背景的图像,再从这些图像中提取出行人的姿态轮廓。
更进一步地,步态特征提取装置120还会根据行人的姿态轮廓来计算出行人的步态特征。在一种实施例中,行人的步态特征用包含行人的步态信息的特征向量来表示。之后,步态特征提取装置120将行人的步态特征发送给身份识别装置130。
身份识别装置130中预存有步态特征库,在步态特征库中,关联存储每个步态特征向量对应的行人的身份标识。基于行人的步态特征,身份识别装置130从步态特征库中为其匹配到相似度最高的步态特征向量,之后,根据与该步态特征向量相关联的身份标识,就确定了该行人的身份。
当然,步态特征库也可以是第三方特征库,身份识别装置130可以连接到 第三方的步态特征库,来匹配到相似度最高的步态特征向量。本发明的实施例对此不做过多限制。
根据本发明的步态识别系统100,通过对来自动态视觉传感器110的事件数据流进行处理,来快速地提取出视场中的行人的姿态轮廓。之后,利用行人的姿态轮廓,计算出行人的步态特征,并根据该行人的步态特征对该行人进行身份识别。系统100无需对图像进行复杂繁琐的处理,能够大大提高步态识别的速度。
进一步地,系统100利用事件数据流所生成的图像,只包含运动对象的轮廓信息而不存在其他的背景信息,基于该图像所分割出的行人的姿态轮廓清晰完整,且不附带环境背景等无用信息,能够极大地保证步态识别的精度。
根据本发明的一种实施例,步态识别系统100中的各部分可通过计算设备来实现。图2示出了根据本发明一个实施例的计算设备200的示意框图。
如图2所示,在基本的配置202中,计算设备200典型地包括系统存储器206和一个或者多个处理器204。存储器总线208可以用于在处理器204和系统存储器206之间的通信。
取决于期望的配置,处理器204可以是任何类型的处理,包括但不限于:微处理器(μP)、微控制器(μP/μC/DSP)、数字信息处理器(DSP)或者它们的任何组合。处理器204可以包括诸如一级高速缓存210和二级高速缓存212之类的一个或者多个级别的高速缓存、处理器核心214和寄存器216。示例的处理器核心214可以包括运算逻辑单元(ALU)、浮点数单元(FPU)、数字信号处理核心(DSP核心)或者它们的任何组合。示例的存储器控制器218可以与处理器204一起使用,或者在一些实现中,存储器控制器218可以是处理器204的一个内部部分。
取决于期望的配置,系统存储器206可以是任意类型的存储器,包括但不限于:易失性存储器(诸如RAM)、非易失性存储器(诸如ROM、闪存等)或者它们的任何组合。系统存储器206可以包括操作系统220、一个或者多个应用222以及程序数据224。在一些实施方式中,应用222可以布置为在操作系统上由一个或多个处理器204利用程序数据224执行指令。
计算设备200还包括储存设备232,储存设备232包括可移除储存器236和不可移除储存器238,可移除储存器236和不可移除储存器238均与储存接 口总线234连接。
计算设备200还可以包括有助于从各种接口设备(例如,输出设备242、外设接口244和通信设备246)到基本配置202经由总线/接口控制器230的通信的接口总线240。示例的输出设备242包括图形处理单元248和音频处理单元250。它们可以被配置为有助于经由一个或者多个A/V端口252与诸如显示器或者扬声器之类的各种外部设备进行通信。示例外设接口244可以包括串行接口控制器254和并行接口控制器256,它们可以被配置为有助于经由一个或者多个I/O端口258和诸如输入设备(例如,键盘、鼠标、笔、语音输入设备、触摸输入设备)或者其他外设(例如打印机、扫描仪等)之类的外部设备进行通信。示例的通信设备246可以包括网络控制器260,其可以被布置为便于经由一个或者多个通信端口264与一个或者多个其他计算设备262通过网络通信链路的通信。
网络通信链路可以是通信介质的一个示例。通信介质通常可以体现为在诸如载波或者其他传输机制之类的调制数据信号中的计算机可读指令、数据结构、程序模块,并且可以包括任何信息递送介质。“调制数据信号”可以是这样的信号,它的数据集中的一个或者多个或者它的改变可以在信号中编码信息的方式进行。作为非限制性的示例,通信介质可以包括诸如有线网络或者专线网络之类的有线介质,以及诸如声音、射频(RF)、微波、红外(IR)或者其它无线介质在内的各种无线介质。这里使用的术语计算机可读介质可以包括存储介质和通信介质二者。
一般地,计算设备200可以实现为小尺寸便携(或者移动)电子设备的一部分,这些电子设备可以是诸如蜂窝电话、数码照相机、个人数字助理(PDA)、个人媒体播放器设备、无线网络浏览设备、个人头戴设备、应用专用设备、或者可以包括上面任何功能的混合设备。在根据本发明的一种实施方式中,计算设备200可以被实现为微型计算模块等。本发明的实施例对此均不做限制。
在根据本发明的实施例中,计算设备200被配置为执行根据本发明的步态识别方案。其中,计算设备200的应用222中包含执行根据本发明的提取行人的步态特征的方法300和步态识别方法400的多条程序指令。
应当了解,在动态视觉传感器110具有足够的存储空间和算力的条件下,计算设备200也可以作为动态视觉传感器110的一部分,来对事件数据流进行 处理,实现运动物体检测。
图3示出了根据本发明一个实施例的提取行人的步态特征的方法300的流程图。方法300在步态特征提取装置120中执行。需要说明的是,篇幅所限,关于方法300和系统100的描述互为补充,重复部分不做赘述。
如图3所示,方法300始于步骤S310。
在步骤S310中,针对来自动态视觉传感器110的一段事件数据流,每隔预设时长的事件数据,生成一帧包含行人的图像,来生成图像序列。
如前文所述,步态特征提取装置120连续或有采样地接收并处理DVS输出的事件数据流。事件数据由视场中对象和动态视觉传感器110的相对运动触发,此处的对象包括行人。并且,每个事件数据e(x,y,t)包含其对应的被触发事件的坐标位置(x,y)和被触发时刻的时间戳t。
根据本发明的一种实施例,步态特征提取装置120在获取事件数据流时,每隔预设时长的事件数据,进行一次建帧,即,生成一帧包含行人的图像。记在该时间段内接收到的第一个事件数据的时间戳为t 0,当后续接收到的事件数据的时间戳t满足t-t 0>T时,即停止接收事件数据,T就是预设时长。具体地,使用事件数据进行建帧的过程,包括如下四步。
第一步,构建一个预定尺寸的初始图像并将该初始图像的像素值均赋为零。其中,预定尺寸根据动态视觉传感器110的像素单元阵列的尺寸确定。例如,像素单元阵列是20×30大小,那么,构建的初始图像的尺寸也是20×30。换言之,初始图像中像素与像素单元阵列中的像素单元一一对应。
第二步,基于预设时长内的各事件数据的坐标位置,在初始图像中查找其对应的像素。
第三步,用事件数据的时间戳来对应更新每个被查找到的像素(即,该事件数据的坐标位置所对应的像素)的像素值,生成单通道图像。设单通道图像记为I T,则该单通道图像可以表示为:
I T(x,y)=t
式中,(x,y)表示像素的坐标,I T(x,y)表示(x,y)处的像素值,t表示该坐标位置所对应的事件数据e(x,y,t)的时间戳。
可选地,在该段事件数据流中,如果同一像素坐标对应多次被触发事件的 事件数据,则取最靠近当前时间的时间戳,作为该像素的像素值。
第四步,对该单通道图像的像素值进行归一化,得到灰度图,作为包含行人的图像。在一种实施例中,将单通道图像I T中的像素值映射到[0,255]之间,即可得到一张类似于传统图像的灰度图,记作I G,可以采用如下公式进行归一化,来得到I G
Figure PCTCN2021093484-appb-000001
式中,t表示图像I T在像素(x,y)处的像素值,t max和t min分别表示图像I T中的最大像素值和最小像素值,[·]表示取整函数。最终得到的图像I G即为包含行人的图像。
应当了解,此处仅作为示例,将像素值归一化到[0,255],使得所生成的图像为灰度图像。但本发明的实施例并不限制归一化的具体区间,也可以是[0,1],或者[0,1023],等等。
由于步态是由一系列连续的动作组成的,因此,需要获取连续的N份预设时长内的事件数据,并建帧得到对应的N帧图像,作为图像序列。N的数值可根据实际需求进行设置,在本发明的一些实施例中,N的取值范围一般在40-80之间,但不限于此。
随后在步骤S320中,从图像序列中,分别提取每一帧图像中行人的姿态轮廓并生成姿态轮廓图。如前所述,N帧图像对应生成的N帧姿态轮廓图,就是姿态轮廓图序列。
以下以从一帧图像中提取行人的姿态轮廓为例,具体介绍行人的姿态轮廓的提取过程。
根据一种实施例,在提取每一帧图像中行人的姿态轮廓的步骤之前,还包括步骤:对每一帧图像进行滤波,以去除图像中的噪点,得到滤波后图像。在一种实施例中,采用中值滤波,即,对于每一像素点取其邻域中值替代该像素点的原值。中值滤波对椒盐噪声有显著地去噪效果,可以有效去除输入图像I G中的噪点,从而得到背景较干净的输出图像,记作I D
根据本发明的实施方式,针对不同的应用场景,采用不同的行人姿态轮廓提取方法。
根据一种实施方式,在静态场景下,视场中只有行人是运动的,因此,所 生成的图像中只包含行人的轮廓信息而不存在其他的背景信息。这样,根据图像中像素点的分布位置,就可以将整个行人的姿态轮廓分割出来,无需对图像进行检测。具体的实现步骤如下。
首先,根据滤波后图像的宽和高,分别初始化两个数组。设,滤波后图像的宽为W,高为H,即,滤波后图像的尺寸为W×H(应当了解,此处W和H分别代表了图像在水平方向和垂直方向上的像素数),构建长度为H的第一数组,并初始化该第一数组(记作A x),构建长度为W的第二数组,并初始化该第二数组(记作A y),在数组A x和A y中,元素的初始值均为0。换言之,在初始的第一数组A x中,包含H个0;在初始的第二数组A y中,包含W个0。
然后,按照预定方式,将滤波后图像的像素信息分别映射到这两个数组。在一种实施例中,预定方式是指,将滤波后图像中的像素,按行映射到垂直方向(即,图像的Y轴)上;同时,将滤波后图像中的像素,按列映射到水平方向(即,图像的X轴)上。例如,针对滤波后图像中每一行像素,通过累加的方式分别得到每行像素值之和,并将每行像素值之和对应存储到第一数组A x;针对滤波后图像中每一列像素,通过累加的方式得到每列像素值之和,并将每列像素值之和对应存储到第二数组A y
具体地,第一数组A x和第二数组A y可以表示如下:
Figure PCTCN2021093484-appb-000002
Figure PCTCN2021093484-appb-000003
式中,A x[i]表示第一数组中下标i对应的元素,A y[j]表示第二数组中下标j对应的元素,I D(x,y)表示滤波后图像中像素点(x,y)的像素值,H表示滤波后图像的高,W表示滤波后图像的宽。假设一个长度为4的数组A={1,3,5,7},下标为0,1,2,3,那么,A[0]=1,A[1]=3,A[2]=5,A[3]=7。
这样,滤波后图像中行人的像素信息在第一数组A x中会是一个最长的连续的非零子数组,在第二数组A y中也会是一个最长的连续的非零子数组。
故,接下来,从上述两个数组中分别确定出最长的连续的非零子数组。即,从第一数组A x中确定出最长的连续的非零子数组,从第二数组A y中也确定出最长的连续的非零子数组。这里的非零子数组是指整个子数组中所有元素均为非零值。
最后,基于所确定的非零子数组,提取出行人的姿态轮廓。
基于从第一数组A x中确定出的非零子数组的下标,确定行人的姿态轮廓在垂直方向(Y轴方向)上的边界。该非零子数组的起始下标和终止下标,即为行人的姿态轮廓在Y轴方向上的上边界和下边界。使用同样的方式,从第二数组A y中确定出的非零子数组的下标,就是行人的姿态轮廓在X轴方向上的两个边界,因此,基于从第二数组A y中确定出的非零子数组的下标,就可以确定行人的姿态轮廓在水平方向(X轴方向)上的边界。之后,基于上述所确定的边界信息(包括垂直方向上两个边界和水平方向上的两个边界),就能够将行人姿态轮廓从滤波后图像中分割出来,作为姿态轮廓图。
根据另一种实施方式,在动态场景下,除了行人外,视场中还存在其他的运动物体,如动物、车辆等。该类运动物体并不会对目标行人造成严重遮挡或和目标行人产生严重的重叠,但是由于这类运动物体的存在,使得建帧得到的图像也存在一定程度的背景干扰。因此,在动态场景下,使用目标检测的方法来进行行人姿态轮廓的提取。
在一种实施例中,将滤波后图像I D输入检测网络,以确定出行人的姿态轮廓。具体来说,检测网络可以是YOLO、SSD、MobileNet、ShuffleNet等目标检测网络,本发明的实施例对此不做限制。将滤波后图像I D作为输入图像,输入到检测网络中,经过一系列卷积、池化等操作后,得到包含行人的检测框。从滤波后图像中分割出该检测框所指示的图像,就是姿态轮廓图。
由于输入的滤波后图像I D并不像传统图像包含所有的场景信息,只存在目标行人以及其他运动物体的像素信息,很大程度上避免了背景等冗余信息的干扰,因此在检测速度以及准确度上都有一定的提高。
根据本发明的实施例,基于DVS的事件数据流生成图像,在静态场景下,通过将图像的像素信息分别映射到图像的X轴方向和Y轴方向,就可以提取出行人的姿态轮廓,几乎不需要耗费时间;而在动态场景下,直接进行目标检测就可以将行人的姿态轮廓从图像中分割出来,不需要进行复杂的图像预处理,但又能保证很好地分割效果。
随后在步骤S330中,对姿态轮廓图序列进行特征提取,以得到表示行人步态信息的特征向量。
根据一种实施例,将姿态轮廓图序列输入特征提取模型,经特征提取模型 处理(所述处理包括但不限于,卷积、最大值池化、水平金字塔池化、激活等)后,步态信息被提取和压缩成一个特征向量并输出。该特征向量就是行人的姿态轮廓图序列中主要特征信息在较低维度上的表达,该特征向量表示行人的步态信息。在一种实施例中,特征提取模型是基于深度学习的卷积神经网络。本发明对具体采用何种神经网络来实现特征提取模型并不做限制。
根据本发明的提取行人的步态特征的方案,相比于传统方案,有两大优势。
一方面,使用DVS输出的事件数据流生成的图像更容易进行行人姿态轮廓的分割,静态场景下几乎不需要耗费时间。而动态场景下,进行目标检测的同时也同步地将行人的姿态轮廓从图像中分割出来。因此,该方案不需要使用额外的分割算法进行轮廓提取,更不需要进行复杂的图像预处理,从而大大地缩短了整个步态识别过程所需要的时间。
另一方面,对于传统的轮廓提取方法,经常会因为背景过度复杂而导致分割得到的行人姿态轮廓存在各种问题,例如,部分缺失、附带着未分割干净的背景等,严重影响了步态识别的精度。而通过本方案分割得到的行人姿态轮廓完整清晰,有效地提高了后续步态识别的准确度。
在得到行人的步态信息后,根据本发明的实施方式,可以基于步态信息对行人进行身份识别。图4示出了根据本发明一种实施例的步态识别方法400的流程示意图。方法400可以在身份识别装置130中执行。
如图4所示,方法400始于步骤S410。在步骤S410中,通过执行上述的提取行人的步态特征的方法300,来提取出表示当前行人的步态信息的特征向量。关于提取步态信息的特征向量的过程,可参考前文方法300的相关描述,此处不再赘述。
随后在步骤S420中,从步态特征库中为特征向量匹配相似度最高的步态特征向量。
其中,步态特征库中关联存储步态特征向量和行人的身份标识。可选地,步态特征库中的步态特征向量均为一维特征向量。
根据本发明的一种实施例,通过将目标行人的特征向量(即,经步骤S410所提取的特征向量)分别与步态特征库中的步态特征向量进行相似度计算,找出相似度最高的一个步态特征向量,作为匹配结果。
在一种实施例中,先将目标行人的特征向量变换为一维特征向量;再采用 欧氏距离,来计算变换后的一维特征向量与步态特征库中的步态特征向量的相似度。欧氏距离是最常见的距离度量方法,可以衡量多维空间中各个点之间的绝对距离。一般情况下,两者的距离越远,相似度也就越低。反之,欧式距离越小,相似度越高。计算公式如下所示:
Figure PCTCN2021093484-appb-000004
式中,X表示目标行人的特征向量变换后的一维特征向量,该一维特征向量长度为n,Y j表示步态特征库中的某个待匹配的步态特征向量。
除了欧氏距离以外,余弦距离也是一个常用的相似度度量方法。余弦相似度用向量空间中两个向量夹角的余弦值作为衡量两个个体间差异的大小,相比欧氏距离来说,余弦相似度更注重两个向量在方向上的差异。一般情况下,余弦相似度的取值范围是[-1,1],余弦值越趋近于1,相似度越高。余弦相似度的计算公式如下所示:
Figure PCTCN2021093484-appb-000005
式中,x i和y i分别为两个一维特征向量X和Y中的元素,n表示特征向量X和Y的长度。
应当了解,此处仅作为示例,示出了基于欧氏距离或余弦相似度来计算特征向量相似度的方法,本发明实施例并不限制采用何种方式来进行相似度度量,来为目标行人的特征向量匹配到相似度最高的步态特征向量。
随后在步骤S430中,基于与所匹配的步态特征向量相关联的行人标识,确定出当前行人的身份。
根据本发明的步态识别方案,通过对DVS数据建帧,得到只包含运动信息的图像,能够快速实现行人姿态轮廓的完整分割,且分割出的行人姿态轮廓清晰。基于分割出的清晰的行人姿态轮廓,来进行步态识别,就能够有效提高步态识别的准确度或精度。另外,由于在分割行人姿态轮廓和提取行人步态特征阶段,没有采用很复杂的计算方式,也没有进行复杂的图像预处理,因此,大大缩短了整个步态识别过程所需的时间。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发 明的实施例可以在没有这些具体细节的情况下被实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。
本领域那些技术人员应当理解在本文所公开的示例中的设备的模块或单元或组件可以布置在如该实施例中所描述的设备中,或者可替换地可以定位在与该示例中的设备不同的一个或多个设备中。前述示例中的模块可以组合为一个模块或者此外可以分成多个子模块。
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。
此外,所述实施例中的一些在此被描述成可以由计算机系统的处理器或者由执行所述功能的其它装置实施的方法或方法元素的组合。因此,具有用于实施所述方法或方法元素的必要指令的处理器形成用于实施该方法或方法元素的 装置。此外,装置实施例的在此所述的元素是如下装置的例子:该装置用于实施由为了实施该发明的目的的元素所执行的功能。
如在此所使用的那样,除非另行规定,使用序数词“第一”、“第二”、“第三”等等来描述普通对象仅仅表示涉及类似对象的不同实例,并且并不意图暗示这样被描述的对象必须具有时间上、空间上、排序方面或者以任意其它方式的给定顺序。
尽管根据有限数量的实施例描述了本发明,但是受益于上面的描述,本技术领域内的技术人员明白,在由此描述的本发明的范围内,可以设想其它实施例。此外,应当注意,本说明书中使用的语言主要是为了可读性和教导的目的而选择的,而不是为了解释或者限定本发明的主题而选择的。因此,在不偏离所附权利要求书的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。对于本发明的范围,对本发明所做的公开是说明性的,而非限制性的,本发明的范围由所附权利要求书限定。

Claims (14)

  1. 一种提取行人的步态特征的方法,包括步骤:
    针对来自动态视觉传感器的一段事件数据流,每隔预设时长的事件数据,生成一帧包含行人的图像,来生成图像序列;
    从所述图像序列中,分别提取每一帧图像中行人的姿态轮廓并生成姿态轮廓图,以得到姿态轮廓图序列;
    对所述姿态轮廓图序列进行特征提取,以得到表示行人步态信息的特征向量。
  2. 如权利要求1所述的方法,其中,
    所述事件数据由视场中对象和动态视觉传感器的相对运动触发,所述对象包括行人,且所述事件数据中包含被触发事件的坐标位置及时间戳。
  3. 如权利要求1或2所述的方法,其中,在提取每一帧图像中行人的姿态轮廓的步骤之前,还包括步骤:
    对所述每一帧图像进行滤波,以得到滤波后图像。
  4. 如权利要求3所述的方法,其中,提取每一帧图像中行人的姿态轮廓的步骤包括:
    根据所述滤波后图像的宽和高,分别初始化两个数组;
    按照预定方式,将所述滤波后图像的像素信息分别映射到所述数组;
    从所述数组中分别确定出最长的连续的非零子数组;
    基于所确定的非零子数组,提取出行人的姿态轮廓。
  5. 如权利要求4所述的方法,其中,所述根据滤波后图像的宽和高,分别初始化两个数组的步骤包括:
    构建长度为所述滤波后图像的高的第一数组,并初始化所述第一数组;
    构建长度为所述滤波后图像的宽的第二数组,并初始化所述第二数组。
  6. 如权利要求4或5所述的方法,其中,所述按照预定方式,将滤波后图像的像素信息分别映射到数组的步骤包括:
    针对所述滤波后图像中每一行像素,通过累加的方式得到每行像素值之和, 并将所述每行像素值之和对应存储到第一数组;
    针对所述滤波后图像中每一列像素,通过累加的方式得到每列像素值之和,并将所述每列像素值之和对应存储到第二数组。
  7. 如权利要求4-6中任一项所述的方法,其中,所述基于所确定的非零子数组,提取出行人的姿态轮廓的步骤包括:
    基于从所述第一数组中确定出的所述非零子数组的下标,确定行人的姿态轮廓在垂直方向上的边界;
    基于从所述第二数组中确定出的所述非零子数组的下标,确定行人的姿态轮廓在水平方向上的边界;
    基于所确定的垂直方向上的边界和水平方向上的边界,提取出行人的姿态轮廓。
  8. 如权利要求3所述的方法,其中,提取每一帧图像中行人的姿态轮廓的步骤还包括:
    将所述滤波后图像输入检测网络,以确定出行人的姿态轮廓。
  9. 如权利要求1-8中任一项所述的方法,其中,所述对姿态轮廓图序列进行特征提取,以得到表示行人步态信息的特征向量的步骤包括:
    将所述姿态轮廓图序列输入特征提取模型,经所述特征提取模型处理后,输出表示行人步态信息的特征向量,
    其中,所述特征提取模型是基于深度学习的卷积神经网络。
  10. 如权利要求2-9中任一项所述的方法,其中,每隔预设时长的事件数据,生成一帧包含行人的图像的步骤包括:
    构建一个预定尺寸的初始图像并将所述初始图像的像素值赋为零,其中所述预定尺寸根据所述动态视觉传感器的像素单元阵列的尺寸确定;
    基于预设时长内的各事件数据的坐标位置,在所述初始图像中查找其对应的像素;
    用所述事件数据的时间戳来对应更新每个被查找到的像素的像素值,生成单通道图像;以及
    对所述单通道图像的像素值进行归一化,得到灰度图,作为包含行人的图像。
  11. 一种步态识别方法,包括步骤:
    通过执行如权利要求1-10中任一项所述的提取行人的步态特征的方法,来提取出表示当前行人的步态信息的特征向量;
    从步态特征库中为所述特征向量匹配相似度最高的步态特征向量,其中所述步态特征库中关联存储步态特征向量和行人的身份标识;
    基于与所匹配的步态特征向量相关联的行人标识,确定出当前行人的身份。
  12. 一种步态识别系统,包括:
    动态视觉传感器,适于基于视场中对象和所述动态视觉传感器的相对运动而触发事件,并输出事件数据流给步态特征提取装置;
    步态特征提取装置,适于基于所述事件数据流来提取视场中行人的姿态轮廓,并提取所述行人的步态特征;
    身份识别装置,适于基于所述行人的步态特征,识别出所述行人的身份。
  13. 一种计算设备,包括:
    一个或多个处理器;和
    存储器;
    一个或多个程序,其中所述一个或多个程序存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序包括用于执行根据权利要求1-10所述方法中的任一方法的指令,和/或,用于执行根据权利要求11所述方法的指令。
  14. 一种存储一个或多个程序的计算机可读存储介质,所述一个或多个程序包括指令,所述指令当计算设备执行时,使得所述计算设备执行根据权利要求1-12所述的方法中的任一方法,和/或,执行根据权利要求11所述的方法。
PCT/CN2021/093484 2021-02-22 2021-05-13 一种提取行人的步态特征的方法、步态识别方法及系统 WO2022174523A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110198651.6A CN112949440A (zh) 2021-02-22 2021-02-22 一种提取行人的步态特征的方法、步态识别方法及系统
CN202110198651.6 2021-02-22

Publications (1)

Publication Number Publication Date
WO2022174523A1 true WO2022174523A1 (zh) 2022-08-25

Family

ID=76245323

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/093484 WO2022174523A1 (zh) 2021-02-22 2021-05-13 一种提取行人的步态特征的方法、步态识别方法及系统

Country Status (2)

Country Link
CN (1) CN112949440A (zh)
WO (1) WO2022174523A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242076B (zh) * 2020-01-20 2023-07-28 江铃汽车股份有限公司 行人检测方法及系统
CN113660455B (zh) * 2021-07-08 2023-04-07 深圳宇晰科技有限公司 一种基于dvs数据的跌倒检测方法、系统、终端
CN113903051B (zh) * 2021-07-23 2022-12-27 南方科技大学 一种基于dvs相机数据的人体姿态检测方法及终端设备
CN114612712A (zh) * 2022-03-03 2022-06-10 北京百度网讯科技有限公司 目标分类方法、装置、设备以及存储介质
CN115617217B (zh) * 2022-11-23 2023-03-21 中国科学院心理研究所 一种车辆状态的显示方法、装置、设备及可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107403154A (zh) * 2017-07-20 2017-11-28 四川大学 一种基于动态视觉传感器的步态识别方法
CN110633692A (zh) * 2019-09-26 2019-12-31 广东工业大学 一种用于无人机航拍的行人识别方法和相关装置
CN110969087A (zh) * 2019-10-31 2020-04-07 浙江省北大信息技术高等研究院 一种步态识别方法及系统
CN111144165A (zh) * 2018-11-02 2020-05-12 银河水滴科技(北京)有限公司 一种步态信息识别方法、系统及存储介质
CN111428658A (zh) * 2020-03-27 2020-07-17 大连海事大学 一种基于模态融合的步态识别方法
CN111950321A (zh) * 2019-05-14 2020-11-17 杭州海康威视数字技术股份有限公司 步态识别方法、装置、计算机设备及存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112368756B (zh) * 2018-07-16 2022-11-11 豪威芯仑传感器(上海)有限公司 计算对象和车辆碰撞时间的方法、计算设备及车辆
CN109544590B (zh) * 2018-11-27 2020-05-15 上海芯仑光电科技有限公司 一种目标跟踪方法及计算设备
US20200275861A1 (en) * 2019-03-01 2020-09-03 Wiivv Wearables Inc. Biometric evaluation of body part images to generate an orthotic
CN111984347A (zh) * 2019-05-21 2020-11-24 北京小米移动软件有限公司 交互处理方法、装置、设备及存储介质
CN110796100B (zh) * 2019-10-31 2022-06-07 浙江大华技术股份有限公司 步态识别方法、装置、终端及存储装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107403154A (zh) * 2017-07-20 2017-11-28 四川大学 一种基于动态视觉传感器的步态识别方法
CN111144165A (zh) * 2018-11-02 2020-05-12 银河水滴科技(北京)有限公司 一种步态信息识别方法、系统及存储介质
CN111950321A (zh) * 2019-05-14 2020-11-17 杭州海康威视数字技术股份有限公司 步态识别方法、装置、计算机设备及存储介质
CN110633692A (zh) * 2019-09-26 2019-12-31 广东工业大学 一种用于无人机航拍的行人识别方法和相关装置
CN110969087A (zh) * 2019-10-31 2020-04-07 浙江省北大信息技术高等研究院 一种步态识别方法及系统
CN111428658A (zh) * 2020-03-27 2020-07-17 大连海事大学 一种基于模态融合的步态识别方法

Also Published As

Publication number Publication date
CN112949440A (zh) 2021-06-11

Similar Documents

Publication Publication Date Title
WO2022174523A1 (zh) 一种提取行人的步态特征的方法、步态识别方法及系统
US10198823B1 (en) Segmentation of object image data from background image data
US9965865B1 (en) Image data segmentation using depth data
CN105335722B (zh) 一种基于深度图像信息的检测系统及方法
WO2019128508A1 (zh) 图像处理方法、装置、存储介质及电子设备
WO2020042419A1 (zh) 基于步态的身份识别方法、装置、电子设备
CN109815843B (zh) 图像处理方法及相关产品
CN109903331B (zh) 一种基于rgb-d相机的卷积神经网络目标检测方法
US20170045950A1 (en) Gesture Recognition Systems
CN111989689A (zh) 用于识别图像内目标的方法和用于执行该方法的移动装置
CN110334762B (zh) 一种基于四叉树结合orb和sift的特征匹配方法
CN111797709B (zh) 一种基于回归检测的实时动态手势轨迹识别方法
JP2006011978A (ja) 画像処理方法、画像処理装置
CN111008935B (zh) 一种人脸图像增强方法、装置、系统及存储介质
CN111160291B (zh) 基于深度信息与cnn的人眼检测方法
WO2018082308A1 (zh) 一种图像处理方法及终端
CN111723687A (zh) 基于神经网路的人体动作识别方法和装置
JP2011113313A (ja) 姿勢推定装置
CN116645697A (zh) 一种多视角步态识别方法、装置、电子设备及存储介质
CN111291612A (zh) 一种基于多人多摄像头跟踪的行人重识别方法及装置
Lin et al. Moving object detection through image bit-planes representation without thresholding
CN111209873A (zh) 一种基于深度学习的高精度人脸关键点定位方法及系统
Zhou et al. A study on attention-based LSTM for abnormal behavior recognition with variable pooling
CN112396036A (zh) 一种结合空间变换网络和多尺度特征提取的遮挡行人重识别方法
CN108875501B (zh) 人体属性识别方法、装置、系统及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21926240

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21926240

Country of ref document: EP

Kind code of ref document: A1