WO2020224221A1 - 跟踪方法、装置、电子设备及存储介质 - Google Patents

跟踪方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2020224221A1
WO2020224221A1 PCT/CN2019/118008 CN2019118008W WO2020224221A1 WO 2020224221 A1 WO2020224221 A1 WO 2020224221A1 CN 2019118008 W CN2019118008 W CN 2019118008W WO 2020224221 A1 WO2020224221 A1 WO 2020224221A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
recognized
person
electronic device
key point
Prior art date
Application number
PCT/CN2019/118008
Other languages
English (en)
French (fr)
Inventor
车宏伟
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020224221A1 publication Critical patent/WO2020224221A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance

Definitions

  • This application relates to the field of image processing technology, and in particular to a tracking method, device, electronic equipment and storage medium.
  • the objects such as people or cars appearing in the scene are the focus of attention, and pedestrian targets, as the most active and important factors in the surveillance scene, naturally need to be more accurately identified .
  • a tracking method comprising: acquiring an image containing a human body when a tracking instruction is received; preprocessing the image to obtain an image to be recognized; and inputting the image to be recognized into a pre-trained neural network model
  • the marked image with key point marks is obtained; the up and down perception salient area detection algorithm is used to segment the marked image based on the key point marks to obtain the character image; the feature vector of the character image is extracted; the support vector machine is used for learning
  • the algorithm processes the feature vector to identify the target person in the person image.
  • An electronic device comprising: a memory that stores at least one instruction; and a processor that executes the instructions stored in the memory to implement the tracking method.
  • a computer-readable storage medium stores at least one instruction, and the at least one instruction is executed by a processor in an electronic device to implement the tracking method.
  • this application can obtain an image containing a human body when a tracking instruction is received, and preprocess the image to obtain the image to be recognized, and further input the image to be recognized into the pre-trained
  • a marked image with key point marks is obtained to make image recognition more accurate.
  • the up and down perception salient area detection algorithm is adopted to segment the marked image based on key point marks to obtain a person image and extract the person
  • the feature vector of the image is further processed by a support vector machine learning algorithm to identify the target person in the person image, thereby realizing accurate tracking of the person based on image processing technology and effectively avoiding environmental interference.
  • Fig. 1 is a flowchart of a preferred embodiment of the tracking method of the present application.
  • Fig. 2 is a functional module diagram of a preferred embodiment of the tracking device of the present application.
  • FIG. 3 is a schematic structural diagram of an electronic device implementing a preferred embodiment of the tracking method of the present application.
  • FIG. 1 it is a flowchart of a preferred embodiment of the tracking method of the present application. According to different needs, the order of the steps in the flowchart can be changed, and some steps can be omitted.
  • the tracking method is applied to one or more electronic devices.
  • the electronic device is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes but is not limited to micro Processor, Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Processor
  • embedded equipment etc.
  • the electronic device can be any electronic product that can interact with a user with a human machine, such as a personal computer, a tablet computer, a smart phone, a personal digital assistant (PDA), a game console, an interactive network television ( Internet Protocol Television, IPTV), smart wearable devices, etc.
  • a human machine such as a personal computer, a tablet computer, a smart phone, a personal digital assistant (PDA), a game console, an interactive network television ( Internet Protocol Television, IPTV), smart wearable devices, etc.
  • PDA personal digital assistant
  • IPTV Internet Protocol Television
  • smart wearable devices etc.
  • the electronic device may also include a network device and/or user equipment.
  • the network device includes, but is not limited to, a single network server, a server group composed of multiple network servers, or a cloud composed of a large number of hosts or network servers based on Cloud Computing.
  • the network where the electronic device is located includes but is not limited to the Internet, wide area network, metropolitan area network, local area network, virtual private network (Virtual Private Network, VPN), etc.
  • the tracking instruction can be triggered by anyone, and this application is not limited.
  • the tracking instruction may be triggered by a police officer or the like.
  • the image containing the human body can be captured by a camera device that communicates with the electronic device, and the camera device includes, but is not limited to, a camera on the road.
  • the preprocessing of the image by the electronic device to obtain the image to be recognized includes:
  • the electronic device performs grayscale processing on the image to obtain a grayscale image, performs binarization processing on the grayscale image to obtain a black and white image, and further performs noise reduction processing on the black and white image to obtain the The image to be recognized.
  • the operation of the electronic device to convert a color image to a grayscale image uses various ratio methods, that is, the three components of the current pixel are set to R, G, and B respectively, and then the converted pixel components are obtained using the following formula Value: 0.30*R+0.59*G+0.11*B.
  • the electronic device performs a binarization operation on the image.
  • the image binarization process is to set the pixels on the image to 0 or 255, that is, to make the entire image present an obvious black and white effect.
  • the electronic device reduces the noise of the black and white image by designing an adaptive image noise reduction filter, which can filter out "salt and pepper” noise very well, and can protect the details of the image to a large extent.
  • salt and pepper noise is a random white or black point in the image
  • the adaptive image noise reduction filter is a signal extractor, its function is to extract the original signal from the signal contaminated by noise.
  • Is the noise variance of the entire image Is the average gray value of pixels in a window near the point (x, y)
  • the collected images contain many invalid features and interference features.
  • differences in the body and clothing of pedestrians will also cause significant differences in the appearance of pedestrians, which will seriously affect the recognition accuracy, and the contaminated image will affect the follow-up
  • adaptive image noise reduction filters can reduce the impact of noise on the input image.
  • the method before preprocessing the image to obtain the image to be recognized, the method further includes:
  • the electronic device performs dimensionality reduction processing on the image.
  • the high-dimensional data is first processed for dimensionality reduction.
  • the electronic device uses a principal component analysis algorithm to perform dimensionality reduction processing on the image.
  • the principal component analysis algorithm is a method of transforming a group of potentially correlated variables into a group of linear uncorrelated variables through orthogonal transformation.
  • S12 Input the image to be recognized into a pre-trained neural network model to obtain a marked image with key point marks.
  • the electronic device inputting the image to be recognized into a pre-trained neural network model to obtain a marked image with key point marks includes:
  • the electronic device sequentially inputs the image to be recognized into a 7*7 convolution layer, a 3*3 maximum pooling layer, and 4 convolution modules to obtain the marked image with key point marks.
  • CNN Convolutional Neural Networks
  • Its artificial neurons can respond to a part of the surrounding units in the coverage area.
  • Its basic structure includes two layers, one of which is a feature extraction layer. , The input of each neuron is connected with the local receptive field of the previous layer, and the local features are extracted. Once the local feature is extracted, the positional relationship between it and other features is also determined; the second is the feature mapping layer, each computing layer of the network is composed of multiple feature maps, and each feature map is a plane. The weights of all neurons on the plane are equal.
  • the feature mapping structure uses the sigmoid function as the activation function of the convolutional network, so that the feature mapping has displacement invariance.
  • each convolutional layer in the convolutional neural network is followed by a calculation layer for local averaging and secondary extraction. This unique two-feature extraction structure reduces the feature resolution.
  • the method further includes:
  • the electronic device performs down-sampling processing on the spatial dimension of the image to be recognized.
  • the electronic device performs a down-sampling operation on the spatial dimension of the image to be recognized, so that the length and width of the input image to be recognized become half of the original.
  • each convolution module starts with a building block with linear projection, followed by a different number of building blocks with ontology mapping, and finally outputs the labeled image.
  • the multi-layer network structure of the convolutional neural network can automatically extract the deep features of the input data, and different levels of the network can learn different levels of features, thereby greatly improving the accuracy of image processing, and the convolutional neural network
  • the network Through local perception and weight sharing, the network retains the associated information between images and greatly reduces the number of required parameters.
  • the maximum pooling layer technology the number of network parameters is further reduced, and the robustness of the model is improved.
  • the model can continue to expand the depth and continue to increase the hidden layer, so as to process the image more efficiently.
  • the detected pedestrians are usually marked with a rectangular frame, and the rectangular frame will contain part of the background noise area.
  • accurate targets are required.
  • the quality of the segmentation directly affects the later recognition effect. .
  • the marked image is segmented using an up-and-down perception salient area detection algorithm, which connects the surrounding environment and segments out points that attract human visual attention.
  • the salient area always has obvious difference in color, brightness and other characteristics from the surrounding area. Due to uncertain factors such as the location and size of the salient area, the overall position information of the salient area cannot be determined locally or globally. Consider one by one. Therefore, in this embodiment, the image is divided into many small blocks, and then the similarity between each two blocks is calculated. Because the salient area has a certain degree of spatial aggregation characteristics, then the blocks that belong to the same salient area There are feature similarity and spatial aggregation between the two, that is, the salient area is determined according to the degree of spread of feature similar blocks in the image. The specific process is as follows:
  • the image I is divided into n small blocks of equal size, p i and p j represent small blocks with the center point at the position of the i-th and j-th pixel respectively, and then extract the local features of the block, and select Feel the most sensitive color L*a*b space, calculate the distance d color (p i , p j ) between each two blocks p i and p j as a measure of whether the blocks are similar, and normalize accordingly deal with. If the distance d color (p i , p j ) between the pixel i and any pixel J in the image is large, then i is a salient point.
  • d position (p i , p j ) represents the spatial Euclidean distance between two blocks. Binding characteristics and spatial distance from the d (p i, p j) to measure the similarity between the two blocks:
  • c is used as a parameter.
  • c is used as a parameter.
  • the saliency calculation formula of the current pixel i divided at the current scale is:
  • the salient area always has one or several cluster centers, then the initial saliency value matrix can be obtained for the central aggregation operation. Assuming that the cluster center of the salient region is known, the closer to the cluster center, the stronger the significance, and the further away from the cluster center, the weaker the significance.
  • the normalized saliency matrix obtained by the above saliency calculation formula add The pixels of are regarded as the cluster centers of the salient areas in the image. According to the obtained cluster centers, the saliency value of the non-cluster center points in the image is updated according to the following formula:
  • the electronic device extracting the feature vector of the person image includes:
  • the electronic device uses a scale-invariant feature transformation algorithm to extract a histogram of orientation gradient (HOG) features of the person image.
  • HOG orientation gradient
  • the directional gradient histogram is a feature descriptor used for object detection in computer vision and image processing.
  • HOG features are constructed by calculating and counting the gradient direction histograms of local regions of the image, and the process of extracting HOG features is as follows:
  • each element in the gradient matrix is a vector
  • the first component is the gradient amplitude
  • the second and third components are combined to indicate the gradient direction.
  • the image matrix is divided into small cell units, each cell unit is 4*4 pixels, every 2*2 cell units constitute a block, and the angle from 0° to 180° is divided into 9 channels evenly. Calculate the gradient size and direction of each pixel in the cell unit, and then vote to calculate the histogram of the gradient direction.
  • the gradient direction histogram has 9 direction channels, and each channel of the gradient direction histogram accumulates the sum of the gradient size of the pixels, and finally a set of vectors composed of the accumulation and sum of the pixel gradients of each channel is obtained.
  • the cell units are grouped into blocks, and the feature vector is normalized within each block. All the feature vectors after normalization are connected to form the HOG feature of the detection image.
  • the scale-invariant feature transformation algorithm is to perform feature detection in the scale space, determine the position and scale of the key point, and then use the main direction of the key point neighborhood gradient as the feature of the point, thereby achieving scale invariance
  • the feature transformation algorithm is independent of direction and scale.
  • the steps of the scale-invariant feature transformation algorithm include scale space extreme value detection, key point location and determination, key point direction determination, and feature vector generation. Since the key points have been confirmed in this embodiment, only the principal component analysis algorithm needs to be used to reduce the dimensionality of the image to obtain a stable scale-invariant feature transformation algorithm.
  • the principal component analysis algorithm is a method of transforming a group of potentially correlated variables into a group of linear uncorrelated variables through orthogonal transformation.
  • S15 Use a support vector machine learning algorithm to process the feature vector, and identify the target person in the person image.
  • the electronic device sets different weights according to the different proportions of each feature in the actual detection process, and classifies the feature vectors through a support vector machine learning algorithm, assuming training
  • the sample data set is ⁇ (x i ,y i )
  • Called the feature vector, y i represents the type of sample data. The samples are divided into positive samples and negative samples according to the sign of y i .
  • the feature vector of each sample can be used as a point to determine the positive and negative
  • the samples are separated, assuming there is a hyperplane in this space
  • the symbol ⁇ > is the inner product operator of vectors, w is a known vector, and b is a known real number. Therefore, the optimal classification function is:
  • sgn represents a sign function, and the sign function judges whether the argument is less than zero. If the argument is less than zero, the function value is -1; if it is greater than or equal to zero, the function value is 1.
  • the electronic device recognizes the target person in the person image.
  • the method further includes:
  • the electronic device obtains the position coordinates of the target person, and sends the image of the person and the position coordinates to the configuration server.
  • the configuration server can be any server, which is not limited in this application.
  • the configuration server when the configuration server is a server of a public security organ, it can assist police officers in searching for characters.
  • this application can obtain an image containing a human body when a tracking instruction is received, and preprocess the image to obtain the image to be recognized, and further input the image to be recognized into the pre-trained
  • a marked image with key point marks is obtained to make image recognition more accurate.
  • the up and down perception salient area detection algorithm is adopted to segment the marked image based on key point marks to obtain a person image and extract the person
  • the feature vector of the image is further processed by a support vector machine learning algorithm to identify the target person in the person image, thereby realizing accurate tracking of the person based on image processing technology and effectively avoiding environmental interference.
  • the tracking device 11 includes an acquisition unit 110, a preprocessing unit 111, an input unit 112, a segmentation unit 113, an extraction unit 114, an identification unit 115, a dimensionality reduction unit 116, a down-sampling unit 117, and a sending unit 118.
  • the module/unit referred to in this application refers to a series of computer program segments that can be executed by the processor 13 and can complete fixed functions, and are stored in the memory 12. In this embodiment, the functions of each module/unit will be described in detail in subsequent embodiments.
  • the acquisition unit 110 acquires an image containing a human body.
  • the tracking instruction can be triggered by anyone, and this application is not limited.
  • the tracking instruction may be triggered by a police officer or the like.
  • the image containing the human body may be captured by a camera device that communicates with an electronic device.
  • the camera device includes, but is not limited to, a camera on a road.
  • the preprocessing unit 111 preprocesses the image to obtain the image to be recognized.
  • the preprocessing unit 111 preprocessing the image to obtain the image to be recognized includes:
  • the preprocessing unit 111 performs grayscale processing on the image to obtain a grayscale image, performs binarization processing on the grayscale image to obtain a black and white image, and further performs noise reduction processing on the black and white image to obtain The image to be recognized.
  • the operation of the preprocessing unit 111 to convert the color image into a grayscale image uses the proportional method, that is, the three components of the current pixel are set to R, G, and B respectively, and then the converted image is obtained using the following formula Pixel component value: 0.30*R+0.59*G+0.11*B.
  • the preprocessing unit 111 performs a binarization operation on the image.
  • the image binarization process is to set the pixels on the image to 0 or 255, that is, to make the entire image present an obvious black and white effect.
  • the preprocessing unit 111 reduces the noise of the black and white image by designing an adaptive image noise reduction filter, which can filter out the "salt and pepper” noise very well, and can protect the details of the image to a large extent. .
  • salt and pepper noise is a random white or black point in the image
  • the adaptive image noise reduction filter is a signal extractor, its function is to extract the original signal from the signal contaminated by noise.
  • Is the noise variance of the entire image Is the average gray value of pixels in a window near the point (x, y)
  • the collected images contain many invalid features and interference features.
  • differences in the body and clothing of pedestrians will also cause significant differences in the appearance of pedestrians, which will seriously affect the recognition accuracy, and the contaminated image will affect the follow-up
  • adaptive image noise reduction filters can reduce the impact of noise on the input image.
  • the method before preprocessing the image to obtain the image to be recognized, the method further includes:
  • the dimensionality reduction unit 116 performs dimensionality reduction processing on the image.
  • the high-dimensional data is first processed for dimensionality reduction.
  • the dimensionality reduction unit 116 uses a principal component analysis algorithm to perform dimensionality reduction processing on the image.
  • the principal component analysis algorithm is a method of transforming a group of potentially correlated variables into a group of linear uncorrelated variables through orthogonal transformation.
  • the input unit 112 inputs the to-be-recognized image into a pre-trained neural network model to obtain a marked image with key point marks.
  • the input unit 112 inputting the image to be recognized into a pre-trained neural network model to obtain a marked image with key point marks includes:
  • the input unit 112 sequentially inputs the image to be recognized into a 7*7 convolutional layer, a 3*3 maximum pooling layer and 4 convolution modules to obtain the marked image with key points .
  • CNN Convolutional Neural Networks
  • Its artificial neurons can respond to a part of the surrounding units in the coverage area.
  • Its basic structure includes two layers, one of which is a feature extraction layer. , The input of each neuron is connected with the local receptive field of the previous layer, and the local features are extracted. Once the local feature is extracted, the positional relationship between it and other features is also determined; the second is the feature mapping layer, each computing layer of the network is composed of multiple feature maps, and each feature map is a plane. The weights of all neurons on the plane are equal.
  • the feature mapping structure uses the sigmoid function as the activation function of the convolutional network, so that the feature mapping has displacement invariance.
  • each convolutional layer in the convolutional neural network is followed by a calculation layer for local averaging and secondary extraction. This unique two-feature extraction structure reduces the feature resolution.
  • the method further includes:
  • the down-sampling unit 117 performs down-sampling processing on the spatial dimension of the image to be recognized.
  • the down-sampling unit 117 performs a down-sampling operation on the spatial dimension of the image to be recognized, so that the length and width of the input image to be recognized becomes half of the original.
  • each convolution module starts with a building block with linear projection, followed by a different number of building blocks with ontology mapping, and finally outputs the labeled image.
  • the multi-layer network structure of the convolutional neural network can automatically extract the deep features of the input data, and different levels of the network can learn different levels of features, thereby greatly improving the accuracy of image processing, and the convolutional neural network
  • the network Through local perception and weight sharing, the network retains the associated information between images and greatly reduces the number of required parameters.
  • the maximum pooling layer technology the number of network parameters is further reduced, and the robustness of the model is improved.
  • the model can continue to expand the depth and continue to increase the hidden layer, so as to process the image more efficiently.
  • the segmentation unit 113 uses an up-and-down perception salient area detection algorithm (Context-Aware, CA) to segment the marked image based on the key point mark to obtain a person image.
  • Context-Aware CA
  • the detected pedestrians are usually marked with a rectangular frame, and the rectangular frame will contain part of the background noise area.
  • accurate targets are required.
  • the quality of the segmentation directly affects the later recognition effect. .
  • the marked image is segmented using an up-and-down perception salient area detection algorithm, which connects the surrounding environment and segments out points that attract human visual attention.
  • the salient area always has obvious difference in color, brightness and other characteristics from the surrounding area. Due to uncertain factors such as the location and size of the salient area, the overall position information of the salient area cannot be determined locally or globally. Consider one by one. Therefore, in this embodiment, the image is divided into many small blocks, and then the similarity between each two blocks is calculated. Because the salient area has a certain degree of spatial aggregation characteristics, then the blocks that belong to the same salient area There are feature similarity and spatial aggregation between the two, that is, the salient area is determined according to the degree of spread of feature similar blocks in the image. The specific process is as follows:
  • the image I is divided into n small blocks of equal size, p i and p j represent small blocks with the center point at the position of the i-th and j-th pixel respectively, and then extract the local features of the block, and select Feel the most sensitive color L*a*b space, calculate the distance d color (p i , p j ) between each two blocks p i and p j as a measure of whether the blocks are similar, and normalize accordingly deal with. If the distance d color (p i , p j ) between the pixel i and any pixel J in the image is large, then i is a salient point.
  • d position (p i , p j ) represents the spatial Euclidean distance between two blocks. Binding characteristics and spatial distance from the d (p i, p j) to measure the similarity between the two blocks:
  • c is used as a parameter.
  • c is used as a parameter.
  • the saliency calculation formula of the current pixel i divided at the current scale is:
  • the salient area always has one or several cluster centers, then the initial saliency value matrix can be obtained for the central aggregation operation. Assuming that the cluster center of the salient region is known, the closer to the cluster center, the stronger the significance, and the further away from the cluster center, the weaker the significance.
  • the normalized saliency matrix obtained by the above saliency calculation formula add The pixels of are regarded as the cluster centers of the salient areas in the image. According to the obtained cluster centers, the saliency value of the non-cluster center points in the image is updated according to the following formula:
  • the extraction unit 114 extracts the feature vector of the person image.
  • the extraction unit 114 extracting the feature vector of the person image includes:
  • the extraction unit 114 uses a scale-invariant feature transformation algorithm to extract a histogram of orientation gradient (HOG) feature of the person image.
  • HOG orientation gradient
  • the directional gradient histogram is a feature descriptor used for object detection in computer vision and image processing.
  • HOG features are constructed by calculating and counting the gradient direction histograms of local regions of the image, and the process of extracting HOG features is as follows:
  • each element in the gradient matrix is a vector
  • the first component is the gradient amplitude
  • the second and third components are combined to indicate the gradient direction.
  • the image matrix is divided into small cell units, each cell unit is 4*4 pixels, every 2*2 cell units constitute a block, and the angle from 0° to 180° is divided into 9 channels evenly. Calculate the gradient size and direction of each pixel in the cell unit, and then vote to calculate the histogram of the gradient direction.
  • the gradient direction histogram has 9 direction channels, and each channel of the gradient direction histogram accumulates the sum of the gradient size of the pixels, and finally a set of vectors composed of the accumulation and sum of the pixel gradients of each channel is obtained.
  • the cell units are grouped into blocks, and the feature vector is normalized within each block. All the feature vectors after normalization are connected to form the HOG feature of the detection image.
  • the scale-invariant feature transformation algorithm is to perform feature detection in the scale space, determine the position and scale of the key point, and then use the main direction of the key point neighborhood gradient as the feature of the point, thereby achieving scale invariance
  • the feature transformation algorithm is independent of direction and scale.
  • the steps of the scale-invariant feature transformation algorithm include scale space extreme value detection, key point location and determination, key point direction determination, and feature vector generation. Since the key points have been confirmed in this embodiment, only the principal component analysis algorithm needs to be used to reduce the dimensionality of the image to obtain a stable scale-invariant feature transformation algorithm.
  • the principal component analysis algorithm is a method of transforming a group of potentially correlated variables into a group of linear uncorrelated variables through orthogonal transformation.
  • the recognition unit 115 uses a support vector machine learning algorithm to process the feature vector, and recognize the target person in the person image.
  • the recognition unit 115 sets different weights according to the different proportions of each feature in the actual detection process, and classifies the feature vectors through a support vector machine learning algorithm.
  • the data set of training samples is ⁇ (x i ,y i )
  • the samples are divided into positive samples and negative samples according to the sign of y i .
  • the feature vector of each sample can be used as a point to take the positive Negative samples are separated, assuming that there is a hyperplane in this space
  • the symbol ⁇ > is the inner product operator of vectors, w is a known vector, and b is a known real number. Therefore, the optimal classification function is:
  • sgn represents a sign function
  • the sign function judges whether the argument is less than zero. If the argument is less than zero, the function value is -1, and if it is greater than or equal to zero, the function value is 1.
  • the recognition unit 115 recognizes the target person in the person image.
  • the method further includes:
  • the acquiring unit 110 acquires the position coordinates of the target person, and the sending unit 118 sends the image of the person and the position coordinates to the configuration server.
  • the configuration server can be any server, which is not limited in this application.
  • the configuration server when the configuration server is a server of a public security organ, it can assist police officers in searching for characters.
  • this application can obtain an image containing a human body when a tracking instruction is received, and preprocess the image to obtain the image to be recognized, and further input the image to be recognized into the pre-trained
  • a marked image with key point marks is obtained to make image recognition more accurate.
  • the up and down perception salient area detection algorithm is adopted to segment the marked image based on key point marks to obtain a person image and extract the person
  • the feature vector of the image is further processed by a support vector machine learning algorithm to identify the target person in the person image, thereby realizing accurate tracking of the person based on image processing technology and effectively avoiding environmental interference.
  • FIG. 3 it is a schematic structural diagram of an electronic device implementing a preferred embodiment of the tracking method of the present application.
  • the electronic device 1 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC) ), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • embedded equipment etc.
  • the electronic device 1 can also be, but is not limited to, any electronic product that can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device, for example, a personal computer, a tablet computer, or a smart phone. , Personal Digital Assistant (PDA), game consoles, interactive network TV (Internet Protocol Television, IPTV), smart wearable devices, etc.
  • PDA Personal Digital Assistant
  • IPTV Internet Protocol Television
  • smart wearable devices etc.
  • the electronic device 1 may also be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the network where the electronic device 1 is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), etc.
  • the electronic device 1 includes, but is not limited to, a memory 12, a processor 13, and a computer program stored in the memory 12 and running on the processor 13, such as Tracking program.
  • the schematic diagram is only an example of the electronic device 1 and does not constitute a limitation on the electronic device 1. It may include more or less components than those shown in the figure, or a combination of certain components, or different components. Components, for example, the electronic device 1 may also include input and output devices, network access devices, buses, and the like.
  • the processor 13 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc.
  • the processor 13 is the computing core and control center of the electronic device 1 and connects the entire electronic device with various interfaces and lines. Each part of 1, and executes the operating system of the electronic device 1, and various installed applications, program codes, etc.
  • the processor 13 executes the operating system of the electronic device 1 and various installed applications.
  • the processor 13 executes the application program to implement the steps in the foregoing tracking method embodiments, such as steps S10, S11, S12, S13, S14, and S15 shown in FIG. 1.
  • the functions of the modules/units in the above-mentioned device embodiments are realized, for example: when a tracking instruction is received, an image containing a human body is acquired; the image is preprocessed, Obtain the image to be recognized; input the image to be recognized into a pre-trained neural network model to obtain a marked image with key point markers; use an up-and-down perception salient area detection algorithm to segment the marked image based on key point markers , Obtain a person image; extract a feature vector of the person image; use a support vector machine learning algorithm to process the feature vector to identify the target person in the person image.
  • the computer program may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 12 and executed by the processor 13 to complete this Application.
  • the one or more modules/units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer program in the electronic device 1.
  • the computer program can be divided into an acquisition unit 110, a preprocessing unit 111, an input unit 112, a segmentation unit 113, an extraction unit 114, an identification unit 115, a dimensionality reduction unit 116, a downsampling unit 117, and a sending unit 118.
  • the memory 12 may be used to store the computer program and/or module, and the processor 13 runs or executes the computer program and/or module stored in the memory 12 and calls the data stored in the memory 12, Various functions of the electronic device 1 are realized.
  • the memory 12 may mainly include a storage program area and a storage data area.
  • the storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may Store data (such as audio data, phone book, etc.) created based on the use of mobile phones.
  • the memory 12 may include a high-speed random access memory, and may also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD) Card, Flash Card, at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD) Card, Flash Card, at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • the memory 12 may be an external memory and/or an internal memory of the electronic device 1. Further, the memory 12 may be a circuit with a storage function without a physical form in an integrated circuit, such as RAM (Random-Access Memory, random access memory), FIFO (First In First Out), etc. Alternatively, the memory 12 may also be a memory in physical form, such as a memory stick, a TF card (Trans-flash Card), and so on.
  • RAM Random-Access Memory
  • FIFO First In First Out
  • the memory 12 may also be a memory in physical form, such as a memory stick, a TF card (Trans-flash Card), and so on.
  • the integrated module/unit of the electronic device 1 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, the steps of the foregoing method embodiments can be implemented.
  • the computer program includes computer program code
  • the computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signal telecommunications signal
  • software distribution media software distribution media.
  • the memory 12 in the electronic device 1 stores multiple instructions to implement a tracking method, and the processor 13 can execute the multiple instructions so as to realize: when a tracking instruction is received, obtain An image of the human body; preprocess the image to obtain the image to be recognized; input the image to be recognized into a pre-trained neural network model to obtain a marked image with key point markers; use up-and-down perception significant area detection algorithm , Segmenting the marked image based on the key point markers to obtain a person image; extracting the feature vector of the person image; using a support vector machine learning algorithm to process the feature vector to identify the target person in the person image .
  • the execution of multiple instructions by the processor 13 includes:
  • the processor 13 further executing multiple instructions includes:
  • the processor 13 further executing multiple instructions includes:
  • the image to be recognized is sequentially input into a 7*7 convolution layer, a 3*3 maximum pooling layer, and 4 convolution modules to obtain the marked image with key point marks.
  • the processor 13 further executing multiple instructions includes:
  • down-sampling processing is performed on the spatial dimension of the image to be recognized.
  • the processor 13 further executing multiple instructions includes:
  • the scale-invariant feature transformation algorithm is used to extract the directional gradient histogram feature of the person image.
  • the processor 13 further executing multiple instructions includes:
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

一种跟踪方法、装置、电子设备及存储介质,所述跟踪方法能够当接收到跟踪指令时,获取含有人体的图像(S10),对所述图像进行预处理,得到待识别图像(S11),将所述待识别图像输入到预先训练的神经网络模型中,得到带有关键点标记的标记图像(S12),采用上下感知显著区域检测算法,基于关键点标记对所述标记图像进行分割,得到人物图像(S13),提取所述人物图像的特征向量(S14),采用支持向量机学习算法对所述特征向量进行处理,识别出所述人物图像中的目标人物(S15),从而基于图像处理技术实现对人物的准确跟踪,有效避免环境的干扰。

Description

跟踪方法、装置、电子设备及存储介质
本申请要求于2019年5月6日提交中国专利局,申请号为201910370526.1、发明名称为“跟踪方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,尤其涉及一种跟踪方法、装置、电子设备及存储介质。
背景技术
对于当今大多数智能视频监控系统来说,场景中所出现的人或车等目标是关注的焦点,而行人目标作为监控场景中最活跃和最重要的因素,自然需要更精确地对其进行识别。
然而,对于行人的精确识别还存有很多问题,如应用场景多数比较复杂,背景的局部动态变化、光照不均造成目标阴影,以及大风等恶劣天气会增加识别的难度。另外,行人是非刚性目标,拥有丰富的姿态特征,同一行人所处不同的姿态,在检测和识别中往往差别很大。
发明内容
鉴于以上内容,有必要提供一种跟踪方法、装置、电子设备及存储介质,能够基于图像处理技术实现对人物的准确跟踪,有效避免环境的干扰。
一种跟踪方法,所述方法包括:当接收到跟踪指令时,获取含有人体的图像;对所述图像进行预处理,得到待识别图像;将所述待识别图像输入到预先训练的神经网络模型中,得到带有关键点标记的标记图像;采用上下感知显著区域检测算法,基于关键点标记对所述标记图像进行分割,得到人物图像;提取所述人物图像的特征向量;采用支持向量机学习算法对所述特征向量进行处理,识别出所述人物图像中的目标人物。
一种电子设备,所述电子设备包括:存储器,存储至少一个指令;及处理器,执行所述存储器中存储的指令以实现所述跟踪方法。
一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一个指令,所述至少一个指令被电子设备中的处理器执行以实现所述跟踪方法。
由以上技术方案可以看出,本申请能够当接收到跟踪指令时,获取含有人体的图像,并对所述图像进行预处理,得到待识别图像,进一步将所述待识别图像输入到预先训练的神经网络模型中,得到带有关键点标记的标记图像,使图像识别更加准确,采用上下感知显著区域检测算法,基于关键点标记对 所述标记图像进行分割,得到人物图像,并提取所述人物图像的特征向量,进一步采用支持向量机学习算法对所述特征向量进行处理,识别出所述人物图像中的目标人物,从而基于图像处理技术实现对人物的准确跟踪,有效避免环境的干扰。
附图说明
图1是本申请跟踪方法的较佳实施例的流程图。
图2是本申请跟踪装置的较佳实施例的功能模块图。
图3是本申请实现跟踪方法的较佳实施例的电子设备的结构示意图。
主要元件符号说明
电子设备 1
存储器 12
处理器 13
跟踪装置 11
获取单元 110
预处理单元 111
输入单元 112
分割单元 113
提取单元 114
识别单元 115
降维单元 116
下采样单元 117
发送单元 118
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面结合附图和具体实施例对本申请进行详细描述。
如图1所示,是本申请跟踪方法的较佳实施例的流程图。根据不同的需求,该流程图中步骤的顺序可以改变,某些步骤可以省略。
所述跟踪方法应用于一个或者多个电子设备中,所述电子设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述电子设备可以是任何一种可与用户进行人机交互的电子产品,例如,个人计算机、平板电脑、智能手机、个人数字助理(Personal Digital Assistant, PDA)、游戏机、交互式网络电视(Internet Protocol Television,IPTV)、智能式穿戴式设备等。
所述电子设备还可以包括网络设备和/或用户设备。其中,所述网络设备包括,但不限于单个网络服务器、多个网络服务器组成的服务器组或基于云计算(Cloud Computing)的由大量主机或网络服务器构成的云。
所述电子设备所处的网络包括但不限于互联网、广域网、城域网、局域网、虚拟专用网络(Virtual Private Network,VPN)等。
S10,当接收到跟踪指令时,获取含有人体的图像。
在本申请的至少一个实施例中,所述跟踪指令可以由任何人触发,本申请不限制。
在一些特定应用场景中,所述跟踪指令可以由警务人员等触发。
在本申请的至少一个实施例中,所述含有人体的图像可以通过与所述电子设备相通信的摄像装置拍摄,所述摄像装置包括,但不限于马路上的摄像头等。
S11,对所述图像进行预处理,得到待识别图像。
在本申请的至少一个实施例中,所述电子设备对所述图像进行预处理,得到待识别图像包括:
所述电子设备对所述图像进行灰度化处理,得到灰度图像,并对所述灰度图像进行二值化处理,得到黑白图像,进一步对所述黑白图像进行降噪处理,得到所述待识别图像。
具体地,所述电子设备将彩色图像转换为灰度图的操作,运用的是各比例法,即设当前像素的三分量分别为R,G,B,然后利用如下公式得到转换后的像素分量值:0.30*R+0.59*G+0.11*B。
进一步地,所述电子设备对所述图像进行二值化操作。图像的二值化处理就是将图像上的像素设置为0或255,也就是使整个图像呈现出明显的黑白效果。
更进一步地,所述电子设备通过设计自适应图像降噪滤波器对所述黑白图像进行降噪,这样可以很好的滤除“椒盐”噪声,并且可以很大程度的保护图像的细节。
其中,椒盐噪声是图像中一种随机出现的白点或黑点,而自适应图像降噪滤波器便是信号抽取器,它的作用是从被噪声污染的信号中抽取原来的信号。
具体地,假设输入的待处理图像为f(x,y),在退化函数H的作用下,由于受到噪声η(x,y)的影响,最终得到一个退化图像g(x,y)。这时得到一个图像退化公式:g(x,y)=η(x,y)+f(x,y),并利用Adaptive Filter方法对图像进行降噪,其方法的核心思想是:
Figure PCTCN2019118008-appb-000001
其中,
Figure PCTCN2019118008-appb-000002
是整张图像的噪声方差,
Figure PCTCN2019118008-appb-000003
是点(x,y)附近的一个窗口内的像素灰度均值,
Figure PCTCN2019118008-appb-000004
是点(x,y)附近一个窗口内的像素灰度的方差。
可以理解的是,采集的图像中包含很多无效特征和干扰特征,此外,行人身材和衣着的不同也会导致行人的外观有很大差异,这样将严重影响识别精度,而被污染的图像对后续的图像分析和处理存在不可预知的影响,自适应图像降噪滤波器可以减少噪声对输入图像的影响。
在本申请的至少一个实施例中,在对所述图像进行预处理,得到待识别图像前,所述方法还包括:
所述电子设备对所述图像进行降维处理。
可以理解的是,由于所获得数据维度过高,在处理这样的数据时耗时过大,因此首先对高维数据进行降维的处理。
具体地,所述电子设备采用主成分分析算法对所述图像进行降维处理。
其中,主成分分析算法是一种通过正交变换将一组可能存在相关性的变量转换为一组线性不相关变量的方法。
S12,将所述待识别图像输入到预先训练的神经网络模型中,得到带有关键点标记的标记图像。
在本申请的至少一个实施例中,所述电子设备将所述待识别图像输入到预先训练的神经网络模型中,得到带有关键点标记的标记图像包括:
所述电子设备将所述待识别图像依次输入一个7*7的卷积层、一个3*3的最大值池化层及4个卷积模块,得到所述带有关键点标记的标记图像。
具体地,卷积神经网络(Convolutional Neural Networks,CNN)是一种前馈神经网络,它的人工神经元可以响应一部分覆盖范围内的周围单元,其基本结构包括两层,其一为特征提取层,每个神经元的输入与前一层的局部接受域相连,并提取该局部的特征。一旦该局部特征被提取后,它与其它特征间的位置关系也随之确定下来;其二是特征映射层,网络的每个计算层由多个特征映射组成,每个特征映射是一个平面,平面上所有神经元的权值相等。特征映射结构采用sigmoid函数作为卷积网络的激活函数,使得特征映射具有位移不变性。此外,由于一个映射面上的神经元共享权值,因而减少了网络自由参数的个数。卷积神经网络中的每一个卷积层都紧跟着一个用来求局部平均与二次提取的计算层,这种特有的两次特征提取结构减小了特征分辨率。
具体地,所述方法还包括:
在所述最大值池化层,所述电子设备对所述待识别图像在空间维度上进行下采样处理。
所述电子设备对所述待识别图像在空间维度上进行下采样操作,使得输入的待识别图像的长和宽变为原来的一半。
进一步地,每个卷积模块从具有线性投影的构建块开始,随后是具有本体映射的不同数量的构建块,最后输出所述标记图像。
通过上述实施方式,卷积神经网络的多层网络结构能自动提取输入数据的深层特征,不同层次的网络可以学习到不同层次的特征,从而大大提高对图像处理的准确率,并且,卷积神经网络通过局部感知和权值共享,保留了图像间的关联信息,并且大大减少了所需参数的数量。通过最大池化层技术,进一步缩减网络参数数量,提高模型的鲁棒性,可以让模型持续地扩展深度,继续增加隐层,从而更高效地对图像进行处理。
S13,采用上下感知显著区域检测算法(Context-Aware,CA),基于关键点标记对所述标记图像进行分割,得到人物图像。
可以理解的是,通常检测出的行人是用矩形框标记的,在矩形框中会含有部分背景噪声区域,而在后期的匹配算法中需要精确的目标,分割的质量直接影响到后期的识别效果。
在本实施例中,采用上下感知显著区域检测算法对所述标记图像进行分割,该算法联系周围的环境,并将吸引人类视觉注意力的点分割出来。显著区域总是与周围区域有明显的颜色、亮度等特征的差异,由于显著区域的位置、大小等不确定性因素,无论从局部还是全局都无法确定显著区域的整体位置信息,只能从局部一一考虑。因此,在本实施例中,将图像划分为许多小块,然后计算每两个块之间的相似性,由于显著区域在空间上有一定程度的聚集特性,则同属于一个显著区域的块之间具有特征相似性和空间聚集性,即根据图像中特征相似块在图像中的散布程度来确定显著区域。其具体过程如下:
(1)单一尺度显著值计算。
具体地,将图像I分成大小相等的n个小块,p i和p j分别表示中心点在第i和j像素点位置处的小块,然后提取该块的局部特征,选用对人眼视觉感受最灵敏的彩色L*a*b空间,计算每两个块p i和p j之间的距离d color(p i,p j)作为衡量块是否相似的标准,且做相应的归一化处理。如果像素i与图像中的任何一个像素J之间的距离d color(p i,p j)都很大,则i为显著点。如果某个块与其相似的块分布在该块附近时,该块被认为是显著的;相反,如果与其相似的块零散的分布在图像的各个地方,则该块被认为是非显著的。d position(p i,p j)表示两个块之间的空间欧氏距离。结合特征距离和空间距离,使d(p i,p j)来衡量两个块之间的相似性:
Figure PCTCN2019118008-appb-000005
其中,c作为参数。通常对于某一块的显著度进行计算时,只需考虑与该块最相似的K个块,在当前尺度下划分的当前像素点i的显著度计算公式为:
Figure PCTCN2019118008-appb-000006
(2)图像上下文感知的显著值计算。
显著的区域总有一个或几个聚类中心,则可得到初始显著值矩阵进行中心聚集化操作。假设显著区域的聚类中心已知,则越靠近聚类中心的区域显著性越强,越远离聚类中心的区域显著性越弱。经过上式显著度计算公式得到的归一化后的显著值矩阵中,将
Figure PCTCN2019118008-appb-000007
的像素点视为图像中显著区域的聚类中心。根据得到的这些聚类中心,将图像中的非聚类中心点的显著值按照下式进行更新:
Figure PCTCN2019118008-appb-000008
(3)根据图像关键点,利用二值化去噪方法进行像素级分割,从而得到分割后的人物图像。
S14,提取所述人物图像的特征向量。
在本申请的至少一个实施例中,所述电子设备提取所述人物图像的特征向量包括:
所述电子设备采用尺度不变特征变换算法,提取所述人物图像的方向梯度直方图特征(Histogram of Oriented Gradient,HOG)。
具体地,所述方向梯度直方图是一种在计算机视觉和图像处理中用来进行物体检测的特征描述子。HOG特征通过计算和统计图像局部区域的梯度方向直方图来构成特征,而提取HOG特征流程如下:
计算每张人物图像的每个像素点(x,y)的梯度幅值G(x,y)和梯度方向σ(x,y),形成图像的梯度矩阵,梯度矩阵中每一个元素都是向量,第一个分量是梯度幅值,第二、三个分量组合起来表示梯度方向。图像矩阵分为小的细胞单元,每个细胞单元为4*4像素,每2*2个细胞单元构成一个块,将0°到180°的角度平均分为9个通道。计算细胞单元中的每个像素点的梯度大小和方向,然后进行投票,统计出梯度方向直方图。梯度方向直方图共有9个方向通道,梯度方向直方图的每一个通道累加出像素的梯度大小的和,最终得到一组由各个通道像素梯度累加和构成的向量。把细胞单元组成块,在每一个块内部对特征向量归一化。将经过归一化处理后的所有特征向量连接起来,形成检测图像的HOG特征。
进一步地,所述尺度不变特征变换算法即为在尺度空间进行特征检测,并确定关键点的位置和尺度,再使用关键点邻域梯度的主方向作为该点的特征,从而实现尺度不变特征变换算法对方向和尺度无关性。
所述尺度不变特征变换算法的步骤为尺度空间极值检测、关键点位置及确定、关键点方向确定和特征向量生成。由于在本实施例中,关键点已被确认,因此,只需采用主成分分析算法对图像进行降维处理,得到稳定的尺度不变特征变换算法。主成分分析算法是一种通过正交变换将一组可能存在相关性的变量转换为一组线性不相关变量的方法。
S15,采用支持向量机学习算法对所述特征向量进行处理,识别出所述人物图像中的目标人物。
在本申请的至少一个实施例中,所述电子设备根据每种特征在实际检测 过程中所占比重的不同设置了不同的权值,并通过支持向量机学习算法对特征向量进行分类,假设训练样本的数据集合为{(x i,y i)|x i∈R n,y i∈R},其中,样本数据x i是n维空间的向量,这些向量描述了待分类数据的特征,被称为特征向量,y i代表样本数据的类别,根据y i的正负来将样本分为正样本和负样本,在本实施例中,每一个样本的特征向量都可作为一个点把正负样本分开,假设在此空间存在一个超平面
Figure PCTCN2019118008-appb-000009
其中符号<>是向量的内积运算符,w是已知的向量,b是已知的实数,因此,其最优分类函数为:
Figure PCTCN2019118008-appb-000010
其中,sgn代表符号函数,符号函数判断自变量是否小于零,小于零则函数值为-1,大于或者等于零,则函数值为1。
更进一步地,所述电子设备识别出所述人物图像中的目标人物。
在本申请的至少一个实施例中,在采用支持向量机学习算法对所述特征向量进行处理,识别出所述人物图像中的目标人物后,所述方法还包括:
所述电子设备获取所述目标人物的位置坐标,并将所述人物图像及所述位置坐标发送至配置服务器。
其中,所述配置服务器可以是任意服务器,本申请不限制。
例如;当所述配置服务器为公安机关的服务器时,可以辅助警务人员进行人物的搜索。
由以上技术方案可以看出,本申请能够当接收到跟踪指令时,获取含有人体的图像,并对所述图像进行预处理,得到待识别图像,进一步将所述待识别图像输入到预先训练的神经网络模型中,得到带有关键点标记的标记图像,使图像识别更加准确,采用上下感知显著区域检测算法,基于关键点标记对所述标记图像进行分割,得到人物图像,并提取所述人物图像的特征向量,进一步采用支持向量机学习算法对所述特征向量进行处理,识别出所述人物图像中的目标人物,从而基于图像处理技术实现对人物的准确跟踪,有效避免环境的干扰。
如图2所示,是本申请跟踪装置的较佳实施例的功能模块图。所述跟踪装置11包括获取单元110、预处理单元111、输入单元112、分割单元113、提取单元114、识别单元115、降维单元116、下采样单元117及发送单元118。本申请所称的模块/单元是指一种能够被处理器13所执行,并且能够完成固定功能的一系列计算机程序段,其存储在存储器12中。在本实施例中,关于各模块/单元的功能将在后续的实施例中详述。
当接收到跟踪指令时,获取单元110获取含有人体的图像。
在本申请的至少一个实施例中,所述跟踪指令可以由任何人触发,本申请不限制。
在一些特定应用场景中,所述跟踪指令可以由警务人员等触发。
在本申请的至少一个实施例中,所述含有人体的图像可以通过与电子设备相通信的摄像装置拍摄,所述摄像装置包括,但不限于马路上的摄像头等。
预处理单元111对所述图像进行预处理,得到待识别图像。
在本申请的至少一个实施例中,所述预处理单元111对所述图像进行预处理,得到待识别图像包括:
所述预处理单元111对所述图像进行灰度化处理,得到灰度图像,并对所述灰度图像进行二值化处理,得到黑白图像,进一步对所述黑白图像进行降噪处理,得到所述待识别图像。
具体地,所述预处理单元111将彩色图像转换为灰度图的操作,运用的是各比例法,即设当前像素的三分量分别为R,G,B,然后利用如下公式得到转换后的像素分量值:0.30*R+0.59*G+0.11*B。
进一步地,所述预处理单元111对所述图像进行二值化操作。图像的二值化处理就是将图像上的像素设置为0或255,也就是使整个图像呈现出明显的黑白效果。
更进一步地,所述预处理单元111通过设计自适应图像降噪滤波器对所述黑白图像进行降噪,这样可以很好的滤除“椒盐”噪声,并且可以很大程度的保护图像的细节。
其中,椒盐噪声是图像中一种随机出现的白点或黑点,而自适应图像降噪滤波器便是信号抽取器,它的作用是从被噪声污染的信号中抽取原来的信号。
具体地,假设输入的待处理图像为f(x,y),在退化函数H的作用下,由于受到噪声η(x,y)的影响,最终得到一个退化图像g(x,y)。这时得到一个图像退化公式:g(x,y)=η(x,y)+f(x,y),并利用Adaptive Filter方法对图像进行降噪,其方法的核心思想是:
Figure PCTCN2019118008-appb-000011
其中,
Figure PCTCN2019118008-appb-000012
是整张图像的噪声方差,
Figure PCTCN2019118008-appb-000013
是点(x,y)附近的一个窗口内的像素灰度均值,
Figure PCTCN2019118008-appb-000014
是点(x,y)附近一个窗口内的像素灰度的方差。
可以理解的是,采集的图像中包含很多无效特征和干扰特征,此外,行人身材和衣着的不同也会导致行人的外观有很大差异,这样将严重影响识别精度,而被污染的图像对后续的图像分析和处理存在不可预知的影响,自适应图像降噪滤波器可以减少噪声对输入图像的影响。
在本申请的至少一个实施例中,在对所述图像进行预处理,得到待识别图像前,所述方法还包括:
降维单元116对所述图像进行降维处理。
可以理解的是,由于所获得数据维度过高,在处理这样的数据时耗时过大,因此首先对高维数据进行降维的处理。
具体地,所述降维单元116采用主成分分析算法对所述图像进行降维处 理。
其中,主成分分析算法是一种通过正交变换将一组可能存在相关性的变量转换为一组线性不相关变量的方法。
输入单元112将所述待识别图像输入到预先训练的神经网络模型中,得到带有关键点标记的标记图像。
在本申请的至少一个实施例中,所述输入单元112将所述待识别图像输入到预先训练的神经网络模型中,得到带有关键点标记的标记图像包括:
所述输入单元112将所述待识别图像依次输入一个7*7的卷积层、一个3*3的最大值池化层及4个卷积模块,得到所述带有关键点标记的标记图像。
具体地,卷积神经网络(Convolutional Neural Networks,CNN)是一种前馈神经网络,它的人工神经元可以响应一部分覆盖范围内的周围单元,其基本结构包括两层,其一为特征提取层,每个神经元的输入与前一层的局部接受域相连,并提取该局部的特征。一旦该局部特征被提取后,它与其它特征间的位置关系也随之确定下来;其二是特征映射层,网络的每个计算层由多个特征映射组成,每个特征映射是一个平面,平面上所有神经元的权值相等。特征映射结构采用sigmoid函数作为卷积网络的激活函数,使得特征映射具有位移不变性。此外,由于一个映射面上的神经元共享权值,因而减少了网络自由参数的个数。卷积神经网络中的每一个卷积层都紧跟着一个用来求局部平均与二次提取的计算层,这种特有的两次特征提取结构减小了特征分辨率。
具体地,所述方法还包括:
在所述最大值池化层,下采样单元117对所述待识别图像在空间维度上进行下采样处理。
所述下采样单元117对所述待识别图像在空间维度上进行下采样操作,使得输入的待识别图像的长和宽变为原来的一半。
进一步地,每个卷积模块从具有线性投影的构建块开始,随后是具有本体映射的不同数量的构建块,最后输出所述标记图像。
通过上述实施方式,卷积神经网络的多层网络结构能自动提取输入数据的深层特征,不同层次的网络可以学习到不同层次的特征,从而大大提高对图像处理的准确率,并且,卷积神经网络通过局部感知和权值共享,保留了图像间的关联信息,并且大大减少了所需参数的数量。通过最大池化层技术,进一步缩减网络参数数量,提高模型的鲁棒性,可以让模型持续地扩展深度,继续增加隐层,从而更高效地对图像进行处理。
分割单元113采用上下感知显著区域检测算法(Context-Aware,CA),基于关键点标记对所述标记图像进行分割,得到人物图像。
可以理解的是,通常检测出的行人是用矩形框标记的,在矩形框中会含有部分背景噪声区域,而在后期的匹配算法中需要精确的目标,分割的质量直接影响到后期的识别效果。
在本实施例中,采用上下感知显著区域检测算法对所述标记图像进行分割,该算法联系周围的环境,并将吸引人类视觉注意力的点分割出来。显著区域总是与周围区域有明显的颜色、亮度等特征的差异,由于显著区域的位置、大小等不确定性因素,无论从局部还是全局都无法确定显著区域的整体位置信息,只能从局部一一考虑。因此,在本实施例中,将图像划分为许多小块,然后计算每两个块之间的相似性,由于显著区域在空间上有一定程度的聚集特性,则同属于一个显著区域的块之间具有特征相似性和空间聚集性,即根据图像中特征相似块在图像中的散布程度来确定显著区域。其具体过程如下:
(1)单一尺度显著值计算。
具体地,将图像I分成大小相等的n个小块,p i和p j分别表示中心点在第i和j像素点位置处的小块,然后提取该块的局部特征,选用对人眼视觉感受最灵敏的彩色L*a*b空间,计算每两个块p i和p j之间的距离d color(p i,p j)作为衡量块是否相似的标准,且做相应的归一化处理。如果像素i与图像中的任何一个像素J之间的距离d color(p i,p j)都很大,则i为显著点。如果某个块与其相似的块分布在该块附近时,该块被认为是显著的;相反,如果与其相似的块零散的分布在图像的各个地方,则该块被认为是非显著的。d position(p i,p j)表示两个块之间的空间欧氏距离。结合特征距离和空间距离,使d(p i,p j)来衡量两个块之间的相似性:
Figure PCTCN2019118008-appb-000015
其中,c作为参数。通常对于某一块的显著度进行计算时,只需考虑与该块最相似的K个块,在当前尺度下划分的当前像素点i的显著度计算公式为:
Figure PCTCN2019118008-appb-000016
(2)图像上下文感知的显著值计算。
显著的区域总有一个或几个聚类中心,则可得到初始显著值矩阵进行中心聚集化操作。假设显著区域的聚类中心已知,则越靠近聚类中心的区域显著性越强,越远离聚类中心的区域显著性越弱。经过上式显著度计算公式得到的归一化后的显著值矩阵中,将
Figure PCTCN2019118008-appb-000017
的像素点视为图像中显著区域的聚类中心。根据得到的这些聚类中心,将图像中的非聚类中心点的显著值按照下式进行更新:
Figure PCTCN2019118008-appb-000018
(3)根据图像关键点,利用二值化去噪方法进行像素级分割,从而得到分割后的人物图像。
提取单元114提取所述人物图像的特征向量。
在本申请的至少一个实施例中,所述提取单元114提取所述人物图像的 特征向量包括:
所述提取单元114采用尺度不变特征变换算法,提取所述人物图像的方向梯度直方图特征(Histogram of Oriented Gradient,HOG)。
具体地,所述方向梯度直方图是一种在计算机视觉和图像处理中用来进行物体检测的特征描述子。HOG特征通过计算和统计图像局部区域的梯度方向直方图来构成特征,而提取HOG特征流程如下:
计算每张人物图像的每个像素点(x,y)的梯度幅值G(x,y)和梯度方向σ(x,y),形成图像的梯度矩阵,梯度矩阵中每一个元素都是向量,第一个分量是梯度幅值,第二、三个分量组合起来表示梯度方向。图像矩阵分为小的细胞单元,每个细胞单元为4*4像素,每2*2个细胞单元构成一个块,将0°到180°的角度平均分为9个通道。计算细胞单元中的每个像素点的梯度大小和方向,然后进行投票,统计出梯度方向直方图。梯度方向直方图共有9个方向通道,梯度方向直方图的每一个通道累加出像素的梯度大小的和,最终得到一组由各个通道像素梯度累加和构成的向量。把细胞单元组成块,在每一个块内部对特征向量归一化。将经过归一化处理后的所有特征向量连接起来,形成检测图像的HOG特征。
进一步地,所述尺度不变特征变换算法即为在尺度空间进行特征检测,并确定关键点的位置和尺度,再使用关键点邻域梯度的主方向作为该点的特征,从而实现尺度不变特征变换算法对方向和尺度无关性。
所述尺度不变特征变换算法的步骤为尺度空间极值检测、关键点位置及确定、关键点方向确定和特征向量生成。由于在本实施例中,关键点已被确认,因此,只需采用主成分分析算法对图像进行降维处理,得到稳定的尺度不变特征变换算法。主成分分析算法是一种通过正交变换将一组可能存在相关性的变量转换为一组线性不相关变量的方法。
识别单元115采用支持向量机学习算法对所述特征向量进行处理,识别出所述人物图像中的目标人物。
在本申请的至少一个实施例中,所述识别单元115根据每种特征在实际检测过程中所占比重的不同设置了不同的权值,并通过支持向量机学习算法对特征向量进行分类,假设训练样本的数据集合为{(x i,y i)|x i∈R n,y i∈R},其中,样本数据x i是n维空间的向量,这些向量描述了待分类数据的特征,被称为特征向量,y i代表样本数据的类别,根据y i的正负来将样本分为正样本和负样本,在本实施例中,每一个样本的特征向量都可作为一个点把正负样本分开,假设在此空间存在一个超平面
Figure PCTCN2019118008-appb-000019
其中符号<>是向量的内积运算符,w是已知的向量,b是已知的实数,因此,其最优分类函数为:
Figure PCTCN2019118008-appb-000020
其中,sgn代表符号函数,符号函数判断自变量是否小于零,小于零则 函数值为-1,大于或者等于零,则函数值为1。
更进一步地,所述识别单元115识别出所述人物图像中的目标人物。
在本申请的至少一个实施例中,在采用支持向量机学习算法对所述特征向量进行处理,识别出所述人物图像中的目标人物后,所述方法还包括:
所述获取单元110获取所述目标人物的位置坐标,发送单元118将所述人物图像及所述位置坐标发送至配置服务器。
其中,所述配置服务器可以是任意服务器,本申请不限制。
例如;当所述配置服务器为公安机关的服务器时,可以辅助警务人员进行人物的搜索。
由以上技术方案可以看出,本申请能够当接收到跟踪指令时,获取含有人体的图像,并对所述图像进行预处理,得到待识别图像,进一步将所述待识别图像输入到预先训练的神经网络模型中,得到带有关键点标记的标记图像,使图像识别更加准确,采用上下感知显著区域检测算法,基于关键点标记对所述标记图像进行分割,得到人物图像,并提取所述人物图像的特征向量,进一步采用支持向量机学习算法对所述特征向量进行处理,识别出所述人物图像中的目标人物,从而基于图像处理技术实现对人物的准确跟踪,有效避免环境的干扰。
如图3所示,是本申请实现跟踪方法的较佳实施例的电子设备的结构示意图。
所述电子设备1是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述电子设备1还可以是但不限于任何一种可与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互的电子产品,例如,个人计算机、平板电脑、智能手机、个人数字助理(Personal Digital Assistant,PDA)、游戏机、交互式网络电视(Internet Protocol Television,IPTV)、智能式穿戴式设备等。
所述电子设备1还可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。
所述电子设备1所处的网络包括但不限于互联网、广域网、城域网、局域网、虚拟专用网络(Virtual Private Network,VPN)等。
在本申请的一个实施例中,所述电子设备1包括,但不限于,存储器12、处理器13,以及存储在所述存储器12中并可在所述处理器13上运行的计算机程序,例如跟踪程序。
本领域技术人员可以理解,所述示意图仅仅是电子设备1的示例,并不 构成对电子设备1的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述电子设备1还可以包括输入输出设备、网络接入设备、总线等。
所述处理器13可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等,所述处理器13是所述电子设备1的运算核心和控制中心,利用各种接口和线路连接整个电子设备1的各个部分,及执行所述电子设备1的操作系统以及安装的各类应用程序、程序代码等。
所述处理器13执行所述电子设备1的操作系统以及安装的各类应用程序。所述处理器13执行所述应用程序以实现上述各个跟踪方法实施例中的步骤,例如图1所示的步骤S10、S11、S12、S13、S14、S15。
或者,所述处理器13执行所述计算机程序时实现上述各装置实施例中各模块/单元的功能,例如:当接收到跟踪指令时,获取含有人体的图像;对所述图像进行预处理,得到待识别图像;将所述待识别图像输入到预先训练的神经网络模型中,得到带有关键点标记的标记图像;采用上下感知显著区域检测算法,基于关键点标记对所述标记图像进行分割,得到人物图像;提取所述人物图像的特征向量;采用支持向量机学习算法对所述特征向量进行处理,识别出所述人物图像中的目标人物。
示例性的,所述计算机程序可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器12中,并由所述处理器13执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序在所述电子设备1中的执行过程。例如,所述计算机程序可以被分割成获取单元110、预处理单元111、输入单元112、分割单元113、提取单元114、识别单元115、降维单元116、下采样单元117及发送单元118。
所述存储器12可用于存储所述计算机程序和/或模块,所述处理器13通过运行或执行存储在所述存储器12内的计算机程序和/或模块,以及调用存储在存储器12内的数据,实现所述电子设备1的各种功能。所述存储器12可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器12可以包括高速随机存取存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存 储器件、闪存器件、或其他易失性固态存储器件。
所述存储器12可以是电子设备1的外部存储器和/或内部存储器。进一步地,所述存储器12可以是集成电路中没有实物形式的具有存储功能的电路,如RAM(Random-Access Memory,随机存取存储器)、FIFO(First In First Out,)等。或者,所述存储器12也可以是具有实物形式的存储器,如内存条、TF卡(Trans-flash Card)等等。
所述电子设备1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。
其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。
结合图1,所述电子设备1中的所述存储器12存储多个指令以实现一种跟踪方法,所述处理器13可执行所述多个指令从而实现:当接收到跟踪指令时,获取含有人体的图像;对所述图像进行预处理,得到待识别图像;将所述待识别图像输入到预先训练的神经网络模型中,得到带有关键点标记的标记图像;采用上下感知显著区域检测算法,基于关键点标记对所述标记图像进行分割,得到人物图像;提取所述人物图像的特征向量;采用支持向量机学习算法对所述特征向量进行处理,识别出所述人物图像中的目标人物。
根据本申请优选实施例,所述处理器13执行多个指令包括:
对所述图像进行灰度化处理,得到灰度图像;
对所述灰度图像进行二值化处理,得到黑白图像;
对所述黑白图像进行降噪处理,得到所述待识别图像。
根据本申请优选实施例,所述处理器13还执行多个指令包括:
对所述图像进行降维处理。
根据本申请优选实施例,所述处理器13还执行多个指令包括:
将所述待识别图像依次输入一个7*7的卷积层、一个3*3的最大值池化层及4个卷积模块,得到所述带有关键点标记的标记图像。
根据本申请优选实施例,所述处理器13还执行多个指令包括:
在所述最大值池化层,对所述待识别图像在空间维度上进行下采样处理。
根据本申请优选实施例,所述处理器13还执行多个指令包括:
采用尺度不变特征变换算法,提取所述人物图像的方向梯度直方图特征。
根据本申请优选实施例,所述处理器13还执行多个指令包括:
获取所述目标人物的位置坐标;
将所述人物图像及所述位置坐标发送至配置服务器。
具体地,所述处理器13对上述指令的具体实现方法可参考图1对应实施例中相关步骤的描述,在此不赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。
因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。
此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第二等词语用来表示名称,而并不表示任何特定的顺序。
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。

Claims (20)

  1. 一种跟踪方法,其特征在于,所述方法包括:
    当接收到跟踪指令时,获取含有人体的图像;
    对所述图像进行预处理,得到待识别图像;
    将所述待识别图像输入到预先训练的神经网络模型中,得到带有关键点标记的标记图像;
    采用上下感知显著区域检测算法,基于关键点标记对所述标记图像进行分割,得到人物图像;
    提取所述人物图像的特征向量;
    采用支持向量机学习算法对所述特征向量进行处理,识别出所述人物图像中的目标人物。
  2. 如权利要求1所述的跟踪方法,其特征在于,所述对所述图像进行预处理,得到待识别图像包括:
    对所述图像进行灰度化处理,得到灰度图像;
    对所述灰度图像进行二值化处理,得到黑白图像;
    对所述黑白图像进行降噪处理,得到所述待识别图像。
  3. 如权利要求2所述的跟踪方法,其特征在于,在对所述图像进行预处理,得到待识别图像前,所述方法还包括:
    对所述图像进行降维处理。
  4. 如权利要求1所述的跟踪方法,其特征在于,所述将所述待识别图像输入到预先训练的神经网络模型中,得到带有关键点标记的标记图像包括:
    将所述待识别图像依次输入一个7*7的卷积层、一个3*3的最大值池化层及4个卷积模块,得到所述带有关键点标记的标记图像。
  5. 如权利要求4所述的跟踪方法,其特征在于,所述方法还包括:
    在所述最大值池化层,对所述待识别图像在空间维度上进行下采样处理。
  6. 如权利要求1所述的跟踪方法,其特征在于,所述提取所述人物图像的特征向量包括:
    采用尺度不变特征变换算法,提取所述人物图像的方向梯度直方图特征。
  7. 如权利要求1所述的跟踪方法,其特征在于,在采用支持向量机学习算法对所述特征向量进行处理,识别出所述人物图像中的目标人物后,所述方法还包括:
    获取所述目标人物的位置坐标;
    将所述人物图像及所述位置坐标发送至配置服务器。
  8. 一种电子设备,其特征在于,所述电子设备包括:
    存储器,存储至少一个指令;及处理器,执行所述存储器中存储的指令以实现如下步骤:
    当接收到跟踪指令时,获取含有人体的图像;
    对所述图像进行预处理,得到待识别图像;
    将所述待识别图像输入到预先训练的神经网络模型中,得到带有关键点标记的标记图像;
    采用上下感知显著区域检测算法,基于关键点标记对所述标记图像进行分割,得到人物图像;
    提取所述人物图像的特征向量;
    采用支持向量机学习算法对所述特征向量进行处理,识别出所述人物图像中的目标人物。
  9. 如权利要求8所述的电子设备,其特征在于,所述对所述图像进行预处理,得到待识别图像包括:
    对所述图像进行灰度化处理,得到灰度图像;
    对所述灰度图像进行二值化处理,得到黑白图像;
    对所述黑白图像进行降噪处理,得到所述待识别图像。
  10. 如权利要求9所述的电子设备,其特征在于,在对所述图像进行预处理,得到待识别图像前,还包括:
    对所述图像进行降维处理。
  11. 如权利要求8所述的电子设备,其特征在于,所述将所述待识别图像输入到预先训练的神经网络模型中,得到带有关键点标记的标记图像包括:
    将所述待识别图像依次输入一个7*7的卷积层、一个3*3的最大值池化层及4个卷积模块,得到所述带有关键点标记的标记图像。
  12. 如权利要求11所述的电子设备,其特征在于,还包括:
    在所述最大值池化层,对所述待识别图像在空间维度上进行下采样处理。
  13. 如权利要求8所述的电子设备,其特征在于,所述提取所述人物图像的特征向量包括:
    采用尺度不变特征变换算法,提取所述人物图像的方向梯度直方图特征。
  14. 如权利要求8所述的电子设备,其特征在于,在采用支持向量机学习算法对所述特征向量进行处理,识别出所述人物图像中的目标人物后,还包括:
    获取所述目标人物的位置坐标;
    将所述人物图像及所述位置坐标发送至配置服务器。
  15. 一种计算机可读存储介质,其特征在于:所述计算机可读存储介质中存储有至少一个指令,所述至少一个指令被电子设备中的处理器执行以实现如下步骤:
    当接收到跟踪指令时,获取含有人体的图像;
    对所述图像进行预处理,得到待识别图像;
    将所述待识别图像输入到预先训练的神经网络模型中,得到带有关键点标记的标记图像;
    采用上下感知显著区域检测算法,基于关键点标记对所述标记图像进行 分割,得到人物图像;
    提取所述人物图像的特征向量;
    采用支持向量机学习算法对所述特征向量进行处理,识别出所述人物图像中的目标人物。
  16. 如权利要求15所述的计算机可读存储介质,其特征在于,所述对所述图像进行预处理,得到待识别图像包括:
    对所述图像进行灰度化处理,得到灰度图像;
    对所述灰度图像进行二值化处理,得到黑白图像;
    对所述黑白图像进行降噪处理,得到所述待识别图像。
  17. 如权利要求16所述的计算机可读存储介质,其特征在于,在对所述图像进行预处理,得到待识别图像前,还包括:
    对所述图像进行降维处理。
  18. 如权利要求15所述的计算机可读存储介质,其特征在于,所述将所述待识别图像输入到预先训练的神经网络模型中,得到带有关键点标记的标记图像包括:
    将所述待识别图像依次输入一个7*7的卷积层、一个3*3的最大值池化层及4个卷积模块,得到所述带有关键点标记的标记图像。
  19. 如权利要求18所述的计算机可读存储介质,其特征在于,还包括:
    在所述最大值池化层,对所述待识别图像在空间维度上进行下采样处理。
  20. 如权利要求15所述的计算机可读存储介质,其特征在于,所述提取所述人物图像的特征向量包括:
    采用尺度不变特征变换算法,提取所述人物图像的方向梯度直方图特征。
PCT/CN2019/118008 2019-05-06 2019-11-13 跟踪方法、装置、电子设备及存储介质 WO2020224221A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910370526.1A CN110222572B (zh) 2019-05-06 2019-05-06 跟踪方法、装置、电子设备及存储介质
CN201910370526.1 2019-05-06

Publications (1)

Publication Number Publication Date
WO2020224221A1 true WO2020224221A1 (zh) 2020-11-12

Family

ID=67820365

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118008 WO2020224221A1 (zh) 2019-05-06 2019-11-13 跟踪方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN110222572B (zh)
WO (1) WO2020224221A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723520A (zh) * 2021-08-31 2021-11-30 深圳市中博科创信息技术有限公司 基于特征更新的人员轨迹追踪方法、装置、设备及介质
CN114741697A (zh) * 2022-04-22 2022-07-12 中国电信股份有限公司 恶意代码分类方法、装置、电子设备和介质
CN116106856A (zh) * 2023-04-13 2023-05-12 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) 雷暴大风的识别模型建立方法、识别方法及计算设备

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222572B (zh) * 2019-05-06 2024-04-09 平安科技(深圳)有限公司 跟踪方法、装置、电子设备及存储介质
CN111709874B (zh) * 2020-06-16 2023-09-08 北京百度网讯科技有限公司 图像调整的方法、装置、电子设备及存储介质
CN111754435A (zh) * 2020-06-24 2020-10-09 Oppo广东移动通信有限公司 图像处理方法、装置、终端设备及计算机可读存储介质
CN112381092A (zh) * 2020-11-20 2021-02-19 深圳力维智联技术有限公司 跟踪方法、装置及计算机可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358182A (zh) * 2017-06-29 2017-11-17 维拓智能科技(深圳)有限公司 行人检测方法及终端设备
CN107633207A (zh) * 2017-08-17 2018-01-26 平安科技(深圳)有限公司 Au特征识别方法、装置及存储介质
CN109325412A (zh) * 2018-08-17 2019-02-12 平安科技(深圳)有限公司 行人识别方法、装置、计算机设备及存储介质
CN110222572A (zh) * 2019-05-06 2019-09-10 平安科技(深圳)有限公司 跟踪方法、装置、电子设备及存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10055850B2 (en) * 2014-09-19 2018-08-21 Brain Corporation Salient features tracking apparatus and methods using visual initialization
CN109359538B (zh) * 2018-09-14 2020-07-28 广州杰赛科技股份有限公司 卷积神经网络的训练方法、手势识别方法、装置及设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358182A (zh) * 2017-06-29 2017-11-17 维拓智能科技(深圳)有限公司 行人检测方法及终端设备
CN107633207A (zh) * 2017-08-17 2018-01-26 平安科技(深圳)有限公司 Au特征识别方法、装置及存储介质
CN109325412A (zh) * 2018-08-17 2019-02-12 平安科技(深圳)有限公司 行人识别方法、装置、计算机设备及存储介质
CN110222572A (zh) * 2019-05-06 2019-09-10 平安科技(深圳)有限公司 跟踪方法、装置、电子设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG, XINGBAO: "The research of pose varied pedestrian detection and recognition in complex scenes", ELECTRONIC TECHNOLOGY & INFORMATION SCIENCE, CHINA MASTER’S THESES FULL-TEXT DATABASE, no. 10, 15 October 2012 (2012-10-15), pages 1 - 77, XP055751479, ISSN: 1674-0246 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723520A (zh) * 2021-08-31 2021-11-30 深圳市中博科创信息技术有限公司 基于特征更新的人员轨迹追踪方法、装置、设备及介质
CN114741697A (zh) * 2022-04-22 2022-07-12 中国电信股份有限公司 恶意代码分类方法、装置、电子设备和介质
CN114741697B (zh) * 2022-04-22 2023-10-13 中国电信股份有限公司 恶意代码分类方法、装置、电子设备和介质
CN116106856A (zh) * 2023-04-13 2023-05-12 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) 雷暴大风的识别模型建立方法、识别方法及计算设备
CN116106856B (zh) * 2023-04-13 2023-08-18 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) 雷暴大风的识别模型建立方法、识别方法及计算设备

Also Published As

Publication number Publication date
CN110222572A (zh) 2019-09-10
CN110222572B (zh) 2024-04-09

Similar Documents

Publication Publication Date Title
WO2020224221A1 (zh) 跟踪方法、装置、电子设备及存储介质
US20200364443A1 (en) Method for acquiring motion track and device thereof, storage medium, and terminal
US11830230B2 (en) Living body detection method based on facial recognition, and electronic device and storage medium
Walia et al. Recent advances on multicue object tracking: a survey
US10445602B2 (en) Apparatus and method for recognizing traffic signs
US8750573B2 (en) Hand gesture detection
US8792722B2 (en) Hand gesture detection
CN110852311A (zh) 一种三维人手关键点定位方法及装置
CN108416780B (zh) 一种基于孪生-感兴趣区域池化模型的物体检测与匹配方法
WO2021203882A1 (zh) 姿态检测及视频处理方法、装置、电子设备和存储介质
Molina-Moreno et al. Efficient scale-adaptive license plate detection system
CN110163111A (zh) 基于人脸识别的叫号方法、装置、电子设备及存储介质
Dantone et al. Augmented faces
WO2021238586A1 (zh) 一种训练方法、装置、设备以及计算机可读存储介质
Gawande et al. SIRA: Scale illumination rotation affine invariant mask R-CNN for pedestrian detection
WO2023165616A1 (zh) 图像模型隐蔽后门的检测方法及系统、存储介质、终端
Li et al. Multi-view vehicle detection based on fusion part model with active learning
Li et al. Real-time tracking algorithm for aerial vehicles using improved convolutional neural network and transfer learning
Wang et al. Fusion of multiple channel features for person re-identification
CN113762009B (zh) 一种基于多尺度特征融合及双注意力机制的人群计数方法
CN115018886B (zh) 运动轨迹识别方法、装置、设备及介质
Shafie et al. Smart objects identification system for robotic surveillance
CN116246298A (zh) 一种空间占用人数统计方法、终端设备及存储介质
Kuang et al. Learner posture recognition via a fusing model based on improved SILTP and LDP
CN114283087A (zh) 一种图像去噪方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19928176

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19928176

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19928176

Country of ref document: EP

Kind code of ref document: A1