WO2018130104A1 - 人头检测方法、电子设备和存储介质 - Google Patents

人头检测方法、电子设备和存储介质 Download PDF

Info

Publication number
WO2018130104A1
WO2018130104A1 PCT/CN2018/070008 CN2018070008W WO2018130104A1 WO 2018130104 A1 WO2018130104 A1 WO 2018130104A1 CN 2018070008 W CN2018070008 W CN 2018070008W WO 2018130104 A1 WO2018130104 A1 WO 2018130104A1
Authority
WO
WIPO (PCT)
Prior art keywords
head position
sub
image
detected
layer
Prior art date
Application number
PCT/CN2018/070008
Other languages
English (en)
French (fr)
Inventor
姜德强
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP18738888.9A priority Critical patent/EP3570209A4/en
Publication of WO2018130104A1 publication Critical patent/WO2018130104A1/zh
Priority to US16/351,093 priority patent/US20190206085A1/en
Priority to US16/299,866 priority patent/US10796450B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C9/00Individual registration on entry or exit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30242Counting objects in image

Definitions

  • the present application relates to the field of image processing technologies, and in particular, to a human head detecting method, an electronic device, and a storage medium.
  • Head detection refers to detecting the head of a human body in an image, and the result of human head detection is applied to various applications, such as in the field of security.
  • human head detection is mainly based on the shape and color of the human head.
  • the specific process of human head detection is: first binarize the image, and then perform edge detection to obtain a substantially circular edge; then use circular detection to obtain the position and size of the circular edge, and then corresponding to the original image.
  • the circular area is subjected to gray scale and size determination to obtain a human head detection result.
  • the detection of human head relies on the assumption that the shape of the human head is circular.
  • the shape of the human head is not a regular circle, and the shape of the human head of different people is also different.
  • the head detection will cause some people to miss the inspection and cause the head.
  • the test results are less accurate.
  • a human head detecting method an electronic device, and a storage medium are provided.
  • a method for detecting a human head comprising:
  • the electronic device divides the image to be detected into more than one sub-image
  • the electronic device respectively inputs each of the sub-images into a convolutional neural network that has been trained according to a training image of a calibrated head position, through a pre-layer output including a convolutional layer and a sub-sampling layer in the convolutional neural network Corresponding to a first feature of each of the sub-images;
  • the electronic device maps a first feature corresponding to each of the sub-images to a second feature corresponding to each of the sub-images by a convolution layer subsequent to the pre-layer in the convolutional neural network ;
  • the electronic device maps the second feature corresponding to each of the sub-images to a head position corresponding to each of the sub-images and corresponds to the head position by a regression layer of the convolutional neural network Confidence;
  • the electronic device filters the head position corresponding to each of the sub-images according to a corresponding confidence level to obtain a detected head position in the image to be detected.
  • An electronic device comprising a memory and a processor, the memory storing computer readable instructions, the computer readable instructions being executed by the processor, such that the processor performs the following steps:
  • Each of the sub-images is respectively input to a convolutional neural network that has been trained according to a training image of a calibrated human head position, and a pre-layer output including a convolutional layer and a sub-sampling layer in the convolutional neural network corresponds to each a first feature of the sub-image;
  • mapping by the regression layer of the convolutional neural network, the second feature corresponding to each of the sub-images to a head position corresponding to each of the sub-images and a confidence level corresponding to the head position;
  • the human head position corresponding to each of the sub-images is filtered according to a corresponding confidence level, and the detected human head position in the image to be detected is obtained.
  • One or more non-volatile storage media storing computer readable instructions, when executed by one or more processors, cause one or more processors to perform the following steps:
  • Each of the sub-images is respectively input to a convolutional neural network that has been trained according to a training image of a calibrated human head position, and a pre-layer output including a convolutional layer and a sub-sampling layer in the convolutional neural network corresponds to each a first feature of the sub-image;
  • mapping by the regression layer of the convolutional neural network, the second feature corresponding to each of the sub-images to a head position corresponding to each of the sub-images and a confidence level corresponding to the head position;
  • the human head position corresponding to each of the sub-images is filtered according to a corresponding confidence level, and the detected human head position in the image to be detected is obtained.
  • FIG. 1 is an application environment diagram of a human head detecting method in an embodiment
  • FIG. 2 is a schematic diagram showing the internal structure of an electronic device in an embodiment
  • FIG. 3 is a schematic flow chart of a method for detecting a human head in an embodiment
  • FIG. 4 is a schematic structural view of a convolutional neural network in one embodiment
  • FIG. 5 is a flow chart showing the steps of converting a convolutional neural network for classification into a convolutional neural network for human head detection and training in one embodiment
  • FIG. 6 is a flow chart showing the steps of filtering the head position corresponding to each sub-image according to the corresponding confidence level to obtain the detected head position in the image to be detected in one embodiment
  • FIG. 7 is a flow chart showing the steps of determining the position of a human head detected in an image to be detected according to the selected head position and the selected head position in an embodiment
  • FIG. 8 is a schematic flowchart of a step of tracking a human head by video frame and counting human traffic according to an embodiment
  • FIG. 9 is a flow chart showing the steps of detecting a head position and tracking in the vicinity of a head position tracked in a previous video frame when the tracking head position is interrupted in an embodiment
  • Figure 10 is a schematic view showing the position of a rectangular frame calibrated in a top view image in an embodiment
  • Figure 11 is a schematic diagram of population statistics using two parallel lines in one embodiment
  • Figure 12 is a block diagram showing the structure of a human head detecting device in an embodiment
  • Figure 13 is a block diagram showing the structure of a human head detecting device in another embodiment
  • FIG. 14 is a structural block diagram of a human head detection result determining module in an embodiment
  • Figure 15 is a block diagram showing the structure of a human head detecting device in still another embodiment.
  • Figure 16 is a block diagram showing the structure of a human head detecting device in still another embodiment.
  • FIG. 1 is an application environment diagram of a human head detecting method in an embodiment.
  • the human head detection method is applied to a human head detection system including an electronic device 110 and a top view camera 120 connected to the electronic device 110.
  • the overhead camera 120 is configured to capture an image to be detected and send the image to be detected to the electronic device 110.
  • the overhead camera can be mounted on the top of the building or at a wall above the height of the person or at the corner of the top of the building, so that the overhead camera can capture images of the overhead view.
  • the plan view may be a plan view or a view with an oblique angle.
  • the electronic device 110 can be configured to segment the image to be detected into more than one sub-image; respectively input each sub-image into a convolutional neural network that has been trained according to the training image of the calibrated head position, by convolution
  • the pre-layer output including the convolutional layer and the sub-sampling layer in the neural network corresponds to the first feature of each sub-image; the first feature corresponding to each sub-image is obtained by convolving the convolutional layer after the pre-layer in the neural network Mapping to a second feature corresponding to each sub-image; mapping, by a regression layer of the convolutional neural network, a second feature corresponding to each sub-image to a head position corresponding to each sub-image and a confidence level corresponding to the position of the human head; The head position corresponding to each sub-image is filtered according to the corresponding confidence, and the detected head position in the image to be detected is obtained.
  • the electronic device includes a processor, a memory, and a network interface connected by a system bus.
  • the memory comprises a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device can store operating system and computer readable instructions.
  • the computer readable instructions when executed, may cause the processor to perform a human head detection method.
  • the processor of the electronic device can include a central processing unit and a graphics processor for providing computing and control capabilities to support operation of the electronic device.
  • the internal memory can store computer readable instructions that, when executed by the processor, cause the processor to perform a human head detection method.
  • the network interface of the electronic device is used to connect to the overhead camera.
  • the electronic device can be implemented by a separate electronic device or a cluster of multiple electronic devices.
  • the electronic device can be a personal computer, a server or a dedicated human head detection device.
  • FIG. 2 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the electronic device to which the solution of the present application is applied.
  • the specific electronic device may be It includes more or fewer components than those shown in the figures, or some components are combined, or have different component arrangements.
  • FIG. 3 is a schematic flow chart of a method for detecting a human head in an embodiment. This embodiment is mainly illustrated by the method applied to the electronic device 110 in FIGS. 1 and 2 described above. Referring to FIG. 3, the human head detecting method specifically includes the following steps:
  • the image to be detected refers to an image that needs to be detected by the human head.
  • the image to be detected may be a picture or a video frame in the video.
  • the sub image refers to an image that is smaller in size from the image to be detected than the image to be detected. All sub-images that are segmented can be the same size.
  • the electronic device can traverse the fixed-size window in the image to be detected according to the lateral step size and the vertical step size, thereby segmenting the sub-images having the same size and window size from the image to be detected during the traversal process.
  • the segmented sub-images can be combined into an image to be detected.
  • step S302 includes dividing the image to be detected into a fixed size and more than one sub-image, and there is an overlapping portion between adjacent sub-images in the segmented sub-image.
  • the sub-image adjacent refers to the position of the sub-image adjacent in the image to be detected, and the adjacent sub-images have local coincidence.
  • the electronic device can traverse the fixed-size window in the image to be detected according to the horizontal step size smaller than the window width and the vertical step size smaller than the window height, to obtain more than one sub-image of equal size, and adjacent sub-images. There is an overlap between the images.
  • Convolutional Neural Network is an artificial neural network.
  • the convolutional neural network includes a Convolutional Layer and a Pooling Layer.
  • the convolutional neural network used in this embodiment can be directly constructed, and the existing convolutional neural network can also be reconstructed.
  • the computational task in the convolutional neural network can be realized by a central processing unit or a graphics processor.
  • the time taken by the central processor for human head detection is about seconds, and the time consumption of the human head detection by the graphics processor can be reduced to 100 milliseconds. Real-time human head detection.
  • convolutional layer of a convolutional neural network there are a plurality of feature maps, each of which includes a plurality of neurons, and all neurons of the same feature map share a convolution kernel.
  • the convolution kernel is the weight of the corresponding neuron, and the convolution kernel represents a feature.
  • the convolution kernel is generally initialized in the form of a random fractional matrix, and a reasonable convolution kernel will be learned during the training process of the network. Convolution can reduce the connections between layers in a neural network while reducing the risk of overfitting.
  • Subsampling is also called Pooling. It usually has two forms: Mean Pooling and Max Pooling. Subsampling can be seen as a special convolution process. Convolution and subsampling greatly simplify the complexity of the neural network and reduce the parameters of the neural network.
  • the training image in which the head position has been calibrated refers to the position of the head of the training image that has corresponded to the artificial mark.
  • the training image that has been calibrated to the head position and the image to be detected may be images taken in the same scene, which can further improve the accuracy of the human head detection.
  • the training image that has been calibrated to the head position can be the same size as the image to be detected.
  • the human head position when training the convolutional neural network, may be assigned a reliability for the training image calibration; the training image is segmented into more than one sub-image according to the same segmentation method as when the image to be detected is segmented; The segmented sub-images are respectively input into the convolutional neural network, and the convolutional neural network outputs the head position and the confidence; the difference between the output head position and the calibrated head position is calculated, and the difference between the corresponding confidence levels is calculated, according to the two
  • the gap adjusts the parameters of the convolutional neural network; continues training until the termination condition is reached.
  • the termination condition may be that the gap is less than the preset gap, or the number of iterations reaches a preset number of times.
  • the pre-layer is a general term for other layers of a convolutional neural network other than the regression layer and the regression layer before the regression layer.
  • the pre-layer includes a convolution layer and a sub-sampling layer.
  • the pre-layer may include parallel convolutional layers, and the data output by the parallel convolutional layer may be spliced and input to the next layer.
  • the last layer in the pre-layer can be a convolutional layer or a sub-sampling layer.
  • the convolutional neural network is generally used for classification, and the pre-layer in the convolutional neural network for classification is followed by a Fully Connected Layer, and the first feature of the pre-layer output can be mapped to correspond to each
  • the probability data of the category is preset, so that the category to which the input image belongs is output through the regression layer.
  • the convolutional neural network is used for human head detection, the convolutional layer is used to replace the fully connected layer, and the second feature for describing sub-image features is output.
  • the number of second features corresponding to each sub-image may be plural.
  • the position of the head can be represented by the position of a rectangular frame including the image of the human head.
  • the position of the rectangular box can be represented by a quad.
  • the quaternion may include the abscissa and ordinate of one of the vertices of the rectangular frame and the width and height of the rectangular frame, or the quaternion may include the abscissa and the ordinate of each of the two vertices of the rectangular frame diagonal relationship.
  • the confidence level of the regression layer output corresponds to the head position of the regression layer output, indicating the probability that the corresponding head position includes the head image.
  • the regression layer can be supported by a support vector machine (SVM, English full name Support Vector Machine).
  • step S308 includes mapping a second feature corresponding to each sub-image to a head position corresponding to each sub-image and a position corresponding to the head position by convolving the convolution layer in the regression layer of the convolutional neural network. Confidence. Specifically, the electronic device can directly map the second feature corresponding to each sub-image to the head position corresponding to each sub-image and the confidence corresponding to the head position by converging the same convolution layer in the regression layer of the neural network. degree.
  • step S308 includes mapping a second feature corresponding to each sub-image to a head position corresponding to each sub-image by convolving the first convolution layer in the regression layer of the convolutional neural network;
  • a second convolutional layer in the regression layer of the neural network maps the second feature corresponding to each sub-image to a confidence level corresponding to the output human head position.
  • the sub-image passes through the pre-layer of the convolutional neural network, and outputs 128 feature matrices of size M*N.
  • 128 is the preset number and can be set as needed; M and N are determined by the parameters of the pre-layer.
  • 128 convolution layers after the pre-layer are input to the feature matrix of size M*N, and convolution processing is performed by a parameter matrix of 128*1024 size in the convolution layer, and M*N feature vectors of length 1024 are output.
  • M*N eigenvectors of length 1024 are input to the first convolutional layer in the regression layer, and are convoluted by a parameter matrix of 1024*4 size in the first convolutional layer, and output M*N four representing the position of the human head.
  • M*N eigenvectors of length 1024 are input to the second convolutional layer in the regression layer, and are convoluted by the parameter vector of 1024*1 size in the second convolutional layer, and output M*N confidences indicating the position of the human head.
  • a tuple of degrees The positional relationship between head position and confidence is reflected in the order of M*N quads and unarys output.
  • the electronic device can compare the confidence level of each human head position output by the convolutional neural network with a confidence threshold, and filter the head position whose confidence is less than the confidence threshold.
  • the electronic device may further filter the head position in the head position filtered by the confidence threshold to be smaller than the preset area.
  • the electronic device may cluster the filtered head positions to combine the plurality of head positions clustered into the same category to obtain the head position in the image to be detected, or select one of the plurality of head positions from the cluster to the same category.
  • the head position is used as the head position in the image to be detected.
  • the convolutional neural network is trained in advance based on the training image of the head position, and the convolutional neural network can automatically learn the characteristics of the human head.
  • the trained convolutional neural network can automatically extract appropriate features from the sub-image to output the candidate head position and corresponding confidence, and then filter the confidence to obtain the head position in the image to be detected. It is not necessary to presuppose the shape of the human head, and the missed detection caused by setting the shape of the human head can be avoided, and the accuracy of the human head detection is improved.
  • the first feature of the sub-image is output by the pre-layer including the convolution layer and the sub-sampling layer
  • the second feature is outputted by the convolution layer after the pre-layer and before the regression layer to accurately Describe the characteristics of the human head in the sub-image, so that the second feature is directly mapped to the position and confidence of the human head through the regression layer, which is a new application of the convolutional neural network of the new structure, compared with the traditional human detection based on circular detection. The accuracy has been greatly improved.
  • the human head detection method further includes the step of converting the convolutional neural network for classification into a convolutional neural network for human head detection and training.
  • the step of converting the convolutional neural network for classification into a convolutional neural network for human head detection and training comprises the following steps:
  • a convolutional neural network for classification is a trained convolutional neural network that can classify images input to the convolutional neural network, such as GoogleNet, VGGNET, or AlexNet.
  • the convolutional neural network used for classification includes a pre-layer, a fully-connected layer, and a regression layer.
  • the fully connected layer is for outputting a second feature corresponding to each preset category
  • each neuron of the fully connected layer is connected to all neurons of the upper layer.
  • Both the convolutional layer and the fully connected layer obtain the input of the next layer by multiplying the output of the upper layer by the parameter matrix, so the fully connected layer can be converted into the convolution layer by changing the arrangement of the parameters of the fully connected layer. .
  • the regression layer is configured to map the second feature of each preset category output by the fully connected layer to a probability corresponding to each preset category, and determine the image to which the image belongs according to the mapped probability.
  • the preset category such as selecting a preset category corresponding to the maximum probability as the preset category to which the input image belongs.
  • the regression layer is configured to map a preset number of second features output by the converted convolution layer to a head position and a confidence level corresponding to the head position.
  • the regression layer may employ a convolution layer from which the second feature is directly mapped to the head position and the confidence level corresponding to the head position.
  • the regression layer can also use two convolutional layers in parallel, one of which is used to map the second feature to the head position, and the other convolution layer is used to map the second feature to correspond to the mapped head position. Confidence.
  • S506 Training the convolutional neural network including the pre-layer, the converted convolution layer, and the replaced regression layer by using the training image with the corrected head position.
  • the convolutional neural network including the pre-layer, the converted convolutional layer and the replaced regression layer is reconstructed from the convolutional neural network used for classification, and the parameters of the pre-layer are already trained. Then the reconstructed convolutional neural network mainly needs to train the parameters in the converted convolutional layer and the replaced regression layer.
  • the reliability when training the reconstructed convolutional neural network, the reliability may be configured for the head position of the training image; and the training image is divided into more than one sub-image according to the same segmentation method as when the image to be detected is segmented;
  • the segmented sub-images are respectively input into the convolutional neural network, and after passing through the pre-layer of the convolutional neural network, the convolution layer after the pre-layer, and the regression layer, the head position and the confidence are output; the output of the head position and the calibration are calculated.
  • the termination condition may be that the gap is less than the preset gap, or the number of iterations reaches a preset number of times.
  • the post-retrofit training is performed to obtain a convolutional neural network for human head detection, which does not require reconstruction of the convolutional neural network, and can reduce the training duration and improve the detection of the human head. s efficiency.
  • step S310 specifically includes the following steps:
  • the electronic device may form a head position set corresponding to each of the sub-images segmented from the image to be detected to form a head position set, traverse the head position set, and compare the traversed head position and the confidence threshold, which will be lower than the confidence threshold.
  • the head position is removed from the head position set, and the remaining head position in the head position set after the traversal is completed, that is, the selected head position whose confidence level is higher than or equal to the confidence threshold.
  • the confidence threshold can be set as needed, such as taking a value from 0.5 to 0.99.
  • the intersection of the positions of the heads refers to the intersection of the closed areas indicated by the positions of the heads.
  • the head position is represented by the position of the rectangular frame including the image of the human head
  • the head positions intersect, that is, the corresponding rectangular frames intersect.
  • the electronic device may select a head position that intersects the filtered head position in the image to be detected from all the head position sets formed by the respective head positions corresponding to the sub-images segmented from the image to be detected.
  • the electronic device can also find the intersecting head position from only the selected head position.
  • S606. Determine a detected head position in the image to be detected according to the selected head position and the selected head position.
  • the electronic device may classify the selected head position and the selected head position, each class including at least one of the selected head positions, and a head position intersecting the at least one head position.
  • the electronic device may combine the head positions of each type into a head position as the detected head position, or select a head position from each type of head position as the detected head position.
  • the accuracy of the head detection can be further improved by using the confidence level and whether or not the intersection is used as the basis for determining the position of the human head in the image to be detected.
  • step S606 specifically includes the following steps:
  • the selected head position and the selected head position are used as nodes in the bipartite graph.
  • the bipartite graph is a graph in the graph theory, and the nodes in the bipartite graph can be divided into two groups, and the edges of all the connected nodes cross the boundaries of the group.
  • the default and positive weight is positive, such as 1000.
  • S708 Find a maximum weight matching of the bipartite graph, and obtain a detected human head position in the image to be detected.
  • the match in the bipartite graph is a set of edges, and the edges in the set have no common nodes. If all the matches of one of the bipartite graphs have the weight and maximum of each of the matched sides, the match is the maximum weight match.
  • the electronic device can traverse all combinations of edges in the bipartite graph to find the maximum weight match.
  • the electronic device can also use the Kuhn-Munkres algorithm to obtain the maximum weight matching of the bipartite graph. After the maximum weight matching is obtained, the head position associated with the edge in the maximum weight matching can be used as the head position detected in the image to be detected.
  • the head position outputted by the convolutional neural network is mostly gathered near the actual head position in the image to be detected, so the selected head position and the selected ones are selected.
  • the head position is used as a bipartite graph in the bipartite graph, and the weight of the corresponding side of the intersecting human head position is small.
  • the image to be detected is a video frame in the video
  • the human head detection method further includes the steps of tracking the human head by video frame and counting the human traffic.
  • the step of tracking the human head by video frame and counting the traffic of the person includes the following steps:
  • S802 Perform head tracking on a video frame according to the detected head position in the image to be detected.
  • the electronic device after detecting the location of the human head in a video frame, the electronic device performs the head tracking of the video frame by using the head position as the starting point.
  • the electronic device may specifically adopt a MeanShift (average drift) tracking algorithm, an optical flow tracking algorithm, or a TLD (Tracking-Learning-Detection) algorithm.
  • the specified area refers to the area specified in the video frame.
  • the direction of movement of the tracked head position relative to the designated area refers to or away from the designated area; the positional relationship of the tracked head position relative to the designated area means that the head position is within the designated area or outside the designated area.
  • the tracked head position when the tracked head position spans a line representing a boundary of the designated area in a direction toward the designated area, it is determined that the tracked head position enters the designated area; when the tracked head position spans away from the designated area, the designated area is crossed. When the line of the boundary is determined, it is determined that the tracked head position is away from the designated area.
  • the tracked head position when the tracked head position sequentially spans the first line and the second line parallel to the first line, determining that the tracked head position enters the designated area; when the tracked head position sequentially spans the second line and the first When the line is lined, it is determined that the tracked head position is away from the specified area.
  • the parallel first line and the second line may be a straight line or a curved line.
  • the designated area may be an area of the two areas obtained by the second line segmentation in the image to be detected that does not include the first line.
  • the motion direction and the positional relationship of the tracked head position with respect to the designated area are determined by two lines, and the position of the head can be prevented from moving near the boundary of the designated area, thereby causing a judgment error, thereby ensuring the correctness of the number of people.
  • the number of people may specifically count one or a combination of the number of people who have accumulated in the designated area, the number of people who have left the designated area, and the number of people entering the designated area.
  • the electronic device may increase the number of statistically accumulated people entering the designated area by one when the specified head position of the tracking enters the designated area, and/or increase the number of dynamic people entering the designated area by one; one of the electronic devices may be tracked When the head position leaves the designated area, the counted number of people who have left the designated area is incremented by one, and/or the number of dynamic people entering the specified area is increased or decreased by one.
  • the human head detection is applied to the security field, and the number of people is counted according to the moving direction and the positional relationship of the tracked head position with respect to the designated area. Based on the accurate head detection, the accuracy of the number of people can be ensured.
  • the human head detecting method further includes the step of detecting the head position and tracking the vicinity of the head position tracked in the previous video frame when the head position is interrupted.
  • the step specifically includes the following steps:
  • the electronic device traces the detected head position from the detected head position in the image to be detected, and records the tracked head position.
  • the tracking head position may be interrupted, and at this time, the head position tracked in the previous video frame recorded when the video frame is tracked is acquired.
  • the local area covering the acquired head position is smaller than the size of one video frame, and larger than the size of the area occupied by the head position tracked in the previous video frame.
  • the shape of the local area may be similar to the shape of the area occupied by the head position tracked in the previous video frame.
  • the center of the local area may overlap with the center of the area occupied by the head position tracked in the previous video frame.
  • the electronic device can detect the head position in the current video frame to find the head position belonging to the local area.
  • the electronic device can also detect the head position only in a partial area.
  • the electronic device may specifically adopt the steps of step S302 to step S310 described above to detect the position of the human head in the local area in the current video frame.
  • the detected head position may be partially or wholly located in the local area.
  • the electronic device can position the head position centered within the partial area as the head position in the detected partial area, and the head position centered outside the partial area does not belong to the head position in the partial area.
  • the partial area may be a rectangular area having a width of a*W and a height of b*H and having the same center as the rectangular frame.
  • the center coordinate of the rectangular frame tracked in the previous video frame is (X1, X2) and the center coordinate of the rectangular frame indicating the position of the head is (X2, Y2), then
  • step S908 proceeding to step S902 from the position of the human head detected in the local area.
  • the head position when the tracking head position is interrupted, the head position can be detected from the vicinity of the head position detected in the previous frame, and the interrupted head tracking can be continued, and the head detection and the head tracking can be combined to ensure the continuity of the tracking. In turn, the accuracy of the number of people is guaranteed.
  • the specific principle of the above-mentioned human head detection method will be described below using a specific application scenario.
  • the top view image at a large number of elevator entrance scenes is acquired in advance, and the position of the human head in these overhead images is calibrated, for example, a quad group is used to indicate the position of the rectangular frame 1001 in which the human head image is located in FIG.
  • Select a convolutional neural network for classification convert the fully connected layer after the pre-layer and before the regression layer into a convolutional layer, and replace the regression layer with the convolutional layer for conversion
  • the second feature of the output is mapped to a regression layer of the head position and the corresponding confidence level, thereby retraining the convolutional neural network using the calibrated top view image.
  • a top view camera is placed above the gate, and the video is captured by the overhead camera and transmitted to the electronic device connected to the overhead camera.
  • the electronic device divides the image area sandwiched by the line 1101 and the line 1104 in one of the video frames as the image to be detected, and divides the image to be detected into more than one sub-image, and respectively inputs each sub-image according to the calibration.
  • the convolutional neural network is trained by the training image of the head position, and the convolutional neural network outputs the head position corresponding to each sub-image and the confidence corresponding to the position of the human head, so that the head position corresponding to each sub-image is in accordance with the corresponding confidence.
  • Degree filtering obtaining the position of the human head detected in the image to be detected.
  • the electronic device performs head tracking according to the detected head position in the image to be detected, and the tracking is determined when the tracked head position 1105 sequentially crosses the first line 1102 and the second line 1103 parallel to the first line 1102.
  • the head position 1105 enters the designated area.
  • the tracked head position 1106 sequentially spans the second line 1103 and the first line 1102, it is determined that the tracked head position 1106 leaves the designated area.
  • the designated area in FIG. 11 may specifically be the area sandwiched by the second line 1103 and the line 1104.
  • an electronic device is further provided, the internal structure of which can be as shown in FIG. 2, the electronic device includes a human head detecting device, and the human head detecting device includes respective modules, and each module can be all or part of It is implemented by software, hardware or a combination thereof.
  • Figure 12 is a block diagram showing the structure of the human head detecting device 1200 in one embodiment.
  • the human head detecting apparatus 1200 includes a dividing module 1210, a convolutional neural network module 1220, and a human head detection result determining module 1230.
  • the segmentation module 1210 is configured to divide the image to be detected into more than one sub-image.
  • the convolutional neural network module 1220 is configured to respectively input each sub-image into a convolutional neural network that has been trained according to the trained image of the corrected head position, and through the pre-layer output including the convolution layer and the sub-sampling layer in the convolutional neural network.
  • mapping the first feature corresponding to each sub-image to the second feature corresponding to each sub-image by convolving the convolution layer after the pre-layer in the neural network; by convolution The regression layer of the neural network maps the second feature corresponding to each sub-image to a head position corresponding to each sub-image and a confidence level corresponding to the head position.
  • the human head detection result determining module 1230 is configured to filter the head position corresponding to each sub-image according to the corresponding confidence level, and obtain the detected head position in the image to be detected.
  • the convolutional neural network is trained in advance based on the training image of the head position, and the convolutional neural network can automatically learn the characteristics of the human head.
  • the trained convolutional neural network can automatically extract appropriate features from the sub-image to output the candidate head position and corresponding confidence, and then filter the confidence to obtain the head position in the image to be detected. It is not necessary to presuppose the shape of the human head, and it is possible to avoid the missed detection caused by setting the shape of the human head, and the accuracy of the human head detection is improved.
  • the first feature of the sub-image is output by the pre-layer including the convolution layer and the sub-sampling layer
  • the second feature is outputted by the convolution layer after the pre-layer and before the regression layer to accurately Describe the characteristics of the human head in the sub-image, so that the second feature is directly mapped to the position and confidence of the human head through the regression layer, which is a new application of the convolutional neural network of the new structure, compared with the traditional human detection based on circular detection. The accuracy has been greatly improved.
  • the segmentation module 1210 is further configured to divide the image to be detected into a fixed size and more than one sub-image, and there is an overlapping portion between adjacent sub-images in the segmented sub-image. In this embodiment, there is an overlapping portion between the divided adjacent sub-images, which can ensure that the adjacent sub-images have stronger correlation, and the accuracy of detecting the position of the human head from the image to be detected can be improved.
  • the human head detecting device 1200 further includes a convolutional neural network adjustment module 1240 and a training module 1250.
  • a convolutional neural network adjustment module 1240 configured to convert a fully connected layer after the pre-layer included in the convolutional neural network for classification and before the regression layer into a convolutional layer; in a convolutional neural network to be used for classification The regression layer is replaced with a regression layer for mapping the second feature of the converted convolutional layer output to the head position and the corresponding confidence.
  • the training module 1250 is configured to train the convolutional neural network including the pre-layer, the converted convolution layer and the replaced regression layer by using the training image with the corrected head position.
  • the post-retrofit training is performed to obtain a convolutional neural network for human head detection, which does not require reconstruction of the convolutional neural network, and can reduce the training duration and improve the detection of the human head. s efficiency.
  • the convolutional neural network module 1220 is further configured to map the second feature corresponding to each sub-image to a human head corresponding to each sub-image by convolving the first convolutional layer in the regression layer of the neural network Position; mapping a second feature corresponding to each sub-image to a confidence level corresponding to the output human head position by a second convolutional layer in the regression layer of the convolutional neural network.
  • the human head detection result determining module 1230 includes a filtering module 1231 and a head position determining module 1232.
  • a filtering module 1231 configured to filter, from a head position corresponding to each sub-image, a corresponding head position whose confidence level is higher than or equal to a confidence threshold; and select from the head position corresponding to each sub-image in the image to be detected The selected head position where the head position intersects.
  • the head position determining module 1232 is configured to determine the detected head position in the image to be detected according to the selected head position and the selected head position.
  • the accuracy of the human head detection can be further improved by using the confidence and the intersection as the basis for determining the position of the human head in the image to be detected.
  • the head position determination module 1232 is further configured to use the filtered head position and the selected head position as nodes in the bipartite graph; assign default and positive weights to the edges between the nodes in the bipartite graph. When the head positions indicated by the nodes associated with the edges intersect, the corresponding assigned weights are reduced; the maximum weight matching of the bipartite graph is obtained, and the detected head position in the image to be detected is obtained.
  • the head position outputted by the convolutional neural network is mostly gathered near the actual head position in the image to be detected, so the selected head position and the selected ones are selected.
  • the head position is used as a bipartite graph in the bipartite graph, and the weight of the corresponding side of the intersecting human head position is small.
  • the image to be detected is a video frame in the video.
  • the human head detecting device 1200 further includes:
  • the tracking module 1260 is configured to perform head tracking on a video frame according to the detected head position in the image to be detected.
  • the statistical condition detecting module 1270 is configured to determine a moving direction and a positional relationship of the tracked head position relative to the designated area;
  • the number of people statistics module 1280 is configured to perform population statistics based on the determined direction of motion and positional relationship.
  • the human head detection is applied to the security field, and the number of people is counted according to the moving direction and the positional relationship of the tracked head position with respect to the designated area. Based on the accurate head detection, the accuracy of the number of people can be ensured.
  • the statistical condition detecting module 1270 is further configured to determine that the tracked head position enters the designated area when the tracked head position sequentially spans the first line and the second line parallel to the first line; when the tracked head position When the second line and the first line are sequentially crossed, it is determined that the tracked head position is away from the designated area.
  • the motion direction and the positional relationship of the tracked head position with respect to the designated area are determined by two lines, and the position of the head can be prevented from moving near the boundary of the designated area, thereby causing a judgment error, thereby ensuring the correctness of the number of people.
  • the human head detection module 1200 further includes a head position acquisition module 1290.
  • Tracking module 1260 is also used to track and record the head position from video frames.
  • the head position obtaining module 1290 is configured to acquire the head position tracked in the recorded previous video frame if the head position interruption is tracked in the current video frame.
  • the convolutional neural network module 1220 is also operative to detect a head position in a localized area of the acquired head position in the current video frame.
  • the tracking module 1260 is further configured to continue to perform the step of tracking the head position by video frame and recording from the position of the human head detected in the local area.
  • the head position when the tracking head position is interrupted, the head position can be detected from the vicinity of the head position detected in the previous frame, and the interrupted head tracking can be continued, and the head detection and the head tracking can be combined to ensure the continuity of the tracking. In turn, the accuracy of the number of people is guaranteed.
  • the various steps in the various embodiments of the present application are not necessarily performed in the order indicated by the steps. Except as explicitly stated herein, the execution of these steps is not strictly limited, and the steps may be performed in other orders. Moreover, at least some of the steps in the embodiments may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be executed at different times, and the execution of these sub-steps or stages The order is also not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of the other steps.
  • Non-volatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in a variety of formats, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization chain.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • Synchlink DRAM SLDRAM
  • Memory Bus Radbus
  • RDRAM Direct RAM
  • DRAM Direct Memory Bus Dynamic RAM
  • RDRAM Memory Bus Dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

一种人头检测方法,包括:将待检测图像分割为多于一个的子图像;分别将每个子图像输入已根据已标定人头位置的训练图像进行训练的卷积神经网络,通过卷积神经网络中包括卷积层和子采样层的前置层输出对应于每个子图像的第一特征;通过卷积神经网络中前置层之后的卷积层,将对应于每个子图像的第一特征映射为对应于每个子图像的第二特征;通过卷积神经网络的回归层,将对应于每个子图像的第二特征映射为对应于每个子图像的人头位置及与人头位置相应的置信度;将对应于每个子图像的人头位置按照相应的置信度过滤,获得待检测图像中检测到的人头位置。

Description

人头检测方法、电子设备和存储介质
本申请要求于2017年01月16日提交中国专利局,申请号为2017100292446,申请名称为“人头检测方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,特别是涉及一种人头检测方法、电子设备和存储介质。
背景技术
人头检测是指在图像中检测人体的头部,人头检测的结果由多种应用,如应用于安防领域。目前人头检测主要是基于人头形状和颜色来实现。目前人头检测的具体过程是:先对图像二值化,再进行边缘检测,得到大致为圆形的边缘;再使用圆形检测,得到圆形边缘的位置和大小,进而对原图像中对应的圆形区域进行灰度和大小判定,得到人头检测结果。
然而,目前人头检测依赖于人头形状为圆形的假设,而事实上人头形状并不是规则的圆形,而且不同人的人头形状也有差异,导致目前人头检测时会造成一部分人头漏检,造成人头检测结果准确性较低。
发明内容
根据本申请提供的各种实施例,提供一种人头检测方法、电子设备和存储介质。
一种人头检测方法,包括:
电子设备将待检测图像分割为多于一个的子图像;
所述电子设备分别将每个所述子图像输入已根据已标定人头位置的训练 图像进行训练的卷积神经网络,通过所述卷积神经网络中包括卷积层和子采样层的前置层输出对应于每个所述子图像的第一特征;
所述电子设备通过所述卷积神经网络中所述前置层之后的卷积层,将对应于每个所述子图像的第一特征映射为对应于每个所述子图像的第二特征;
所述电子设备通过所述卷积神经网络的回归层,将对应于每个所述子图像的所述第二特征映射为对应于每个所述子图像的人头位置及与所述人头位置相应的置信度;及
所述电子设备将所述对应于每个所述子图像的人头位置按照相应的置信度过滤,获得所述待检测图像中检测到的人头位置。
一种电子设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
将待检测图像分割为多于一个的子图像;
分别将每个所述子图像输入已根据已标定人头位置的训练图像进行训练的卷积神经网络,通过所述卷积神经网络中包括卷积层和子采样层的前置层输出对应于每个所述子图像的第一特征;
通过所述卷积神经网络中所述前置层之后的卷积层,将对应于每个所述子图像的第一特征映射为对应于每个所述子图像的第二特征;
通过所述卷积神经网络的回归层,将对应于每个所述子图像的所述第二特征映射为对应于每个所述子图像的人头位置及与所述人头位置相应的置信度;及
将所述对应于每个所述子图像的人头位置按照相应的置信度过滤,获得所述待检测图像中检测到的人头位置。
一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
将待检测图像分割为多于一个的子图像;
分别将每个所述子图像输入已根据已标定人头位置的训练图像进行训练 的卷积神经网络,通过所述卷积神经网络中包括卷积层和子采样层的前置层输出对应于每个所述子图像的第一特征;
通过所述卷积神经网络中所述前置层之后的卷积层,将对应于每个所述子图像的第一特征映射为对应于每个所述子图像的第二特征;
通过所述卷积神经网络的回归层,将对应于每个所述子图像的所述第二特征映射为对应于每个所述子图像的人头位置及与所述人头位置相应的置信度;及
将所述对应于每个所述子图像的人头位置按照相应的置信度过滤,获得所述待检测图像中检测到的人头位置。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中人头检测方法的应用环境图;
图2为一个实施例中电子设备的内部结构示意图;
图3为一个实施例中人头检测方法的流程示意图;
图4为一个实施例中卷积神经网络的结构示意图;
图5为一个实施例中将用于分类的卷积神经网络转化为用于人头检测的卷积神经网络并训练的步骤的流程示意图;
图6为一个实施例中将对应于每个子图像的人头位置按照相应的置信度过滤,获得待检测图像中检测到的人头位置的步骤的流程示意图;
图7为一个实施例中根据筛选出的人头位置及挑选出的人头位置确定待检测图像中检测到的人头位置的步骤的流程示意图;
图8为一个实施例中逐视频帧进行人头跟踪并统计人流量的步骤的流程示意图;
图9为一个实施例中当跟踪人头位置中断时在前一视频帧中跟踪的人头位置附近检测人头位置并继续跟踪的步骤的流程示意图;
图10为一个实施例中俯视图像中标定的矩形框的位置示意图;
图11为一个实施例中利用平行的两条线条进行人数统计的示意图;
图12为一个实施例中人头检测装置的结构框图;
图13为另一个实施例中人头检测装置的结构框图;
图14为一个实施例中人头检测结果确定模块的结构框图;
图15为再一个实施例中人头检测装置的结构框图;及
图16为又一个实施例中人头检测装置的结构框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
图1为一个实施例中人头检测方法的应用环境图。参照图1,该人头检测方法应用于人头检测系统,该人头检测系统包括电子设备110以及与电子设备110连接的俯视摄像头120。其中俯视摄像头120用于拍摄待检测图像,并将待检测图像发送至电子设备110。俯视摄像头可安装在建筑物顶部或者高于人身高的墙壁处或者建筑物顶部墙角处,使得俯视摄像头可以拍摄到俯视视角的图像。俯视可以是正俯视或者具有倾斜角度的俯视。
在一个实施例中,电子设备110可用于将待检测图像分割为多于一个的子图像;分别将每个子图像输入已根据已标定人头位置的训练图像进行训练的卷积神经网络,通过卷积神经网络中包括卷积层和子采样层的前置层输出对应于每个子图像的第一特征;通过卷积神经网络中前置层之后的卷积层,将对应于每个子图像的第一特征映射为对应于每个子图像的第二特征;通过 卷积神经网络的回归层,将对应于每个子图像的第二特征映射为对应于每个子图像的人头位置及与人头位置相应的置信度;将对应于每个子图像的人头位置按照相应的置信度过滤,获得待检测图像中检测到的人头位置。
图2为一个实施例中电子设备的内部结构示意图。参照图2,该电子设备包括通过系统总线连接的处理器、存储器和网络接口。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质可存储操作系统和计算机可读指令。该计算机可读指令被执行时,可使得处理器执行一种人头检测方法。该电子设备的处理器可包括中央处理器和图形处理器,该处理器,用于提供计算和控制能力,支撑电子设备的运行。该内存储器中可存储有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器执行一种人头检测方法。该电子设备的网络接口用于据以与俯视摄像头连接。电子设备可以用独立的电子设备或者是多个电子设备组成的集群来实现。电子设备可以是个人计算机、服务器或者专用的人头检测设备。本领域技术人员可以理解,图2中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的电子设备的限定,具体的电子设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
图3为一个实施例中人头检测方法的流程示意图。本实施例主要以该方法应用于上述图1和图2中的电子设备110来举例说明。参照图3,该人头检测方法具体包括如下步骤:
S302,将待检测图像分割为多于一个的子图像。
其中,待检测图像是指需要进行人头检测的图像。待检测图像可以是图片,也可以是视频中的视频帧。子图像是指从待检测图像中分割出的尺寸小于待检测图像的图像。分割出的所有子图像可以尺寸相同。
具体地,电子设备可将固定尺寸的窗口按照横向步长及纵向步长在待检测图像中遍历,从而在遍历过程中从待检测图像中分割出与尺寸与窗口尺寸相等的子图像。分割出的子图像能够组合成待检测图像。
在一个实施例中,步骤S302包括:将待检测图像分割为固定尺寸且多于一个的子图像,且分割出的子图像中相邻的子图像之间存在重叠部分。
其中,子图像相邻是指子图像在待检测图像中的位置相邻,相邻的子图像存在局部重合。具体地,电子设备可将固定尺寸的窗口按照小于窗口宽度的横向步长及小于窗口高度的纵向步长在待检测图像中遍历,得到多于一个的尺寸相等的子图像,且相邻的子图像之间存在重叠部分。
本实施例中,分割出的相邻的子图像之间存在重叠部分,可以保证相邻的子图像具有更强的关联,可提高从待检测图像中检测出人头位置的准确性。
S304,分别将每个子图像输入已根据已标定人头位置的训练图像进行训练的卷积神经网络,通过卷积神经网络中包括卷积层和子采样层的前置层输出对应于每个子图像的第一特征。
其中,卷积神经网络(Convolutional Neural Network,简称CNN)是一种人工神经网络。卷积神经网络包括卷积层(Convolutional Layer)和子采样层(Pooling Layer)。本实施例所采用卷积神经网络可直接构建,也可将已有的卷积神经网络进行改造得到。卷积神经网络中的计算任务可由中央处理器或者图形处理器来实现,采用中央处理器人头检测耗时大概为秒级,而采用图形处理器人头检测耗时可以降到百毫秒级,基本可以实现实时的人头检测。
在卷积神经网络的卷积层中,存在多个特征图(Feature Map),每个特征图包括多个神经元,同一个特征图的所有神经元共用一个卷积核。卷积核就是相应神经元的权值,卷积核代表一个特征。卷积核一般以随机小数矩阵的形式初始化,在网络的训练过程中将学习得到合理的卷积核。卷积层可以减少神经网络中各层之间的连接,同时又降低了过拟合的风险。
子采样也叫做池化(Pooling),通常有均值子采样(Mean Pooling)和最大值子采样(Max Pooling)两种形式。子采样可以看作一种特殊的卷积过程。卷积和子采样大大简化了神经网络的复杂度,减少了神经网络的参数。
已标定人头位置的训练图像是指训练图像已经对应人工标记的人头位置。已标定人头位置的训练图像与待检测图像可以是相同场景下拍摄的图像, 可以进一步提高人头检测准确性。已标定人头位置的训练图像与待检测图像可以尺寸相同。
在一个实施例中,在训练卷积神经网络时,可为训练图像标定的人头位置分配置信度;将训练图像按照与分割待检测图像时相同的分割方式分割出多于一个的子图像;将分割出的子图像分别输入卷积神经网络,由卷积神经网络输出人头位置及置信度;计算输出的人头位置和标定的人头位置的差距,并计算相应的置信度之间的差距,根据两种差距调整卷积神经网络的参数;继续训练,直至达到终止条件。终止条件可以是差距小于预设差距,或者迭代次数达到预设次数。
前置层是对卷积神经网络中除了回归层及回归层之前的一个卷积层的其它层的统称,前置层包括卷积层和子采样层。前置层中可包括并列的卷积层,并列的卷积层输出的数据可进行拼接后输入下一层。前置层中的最后一层可以是卷积层或者子采样层。
S306,通过卷积神经网络中前置层之后的卷积层,将对应于每个子图像的第一特征映射为对应于每个子图像的第二特征。
其中,卷积神经网络一般用于分类,用于分类的卷积神经网络中前置层之后为全连接层(Fully Connected Layer),可将前置层输出的第一特征映射为对应于每一预设类别的概率数据,从而通过回归层输出输入图像所属的类别。而本实施例中,则是将卷积神经网络用于人头检测,用卷积层取代全连接层,输出用于描述子图像特征的第二特征。每个子图像对应的第二特征的数量可为多个。
S308,通过卷积神经网络的回归层,将对应于每个子图像的第二特征映射为对应于每个子图像的人头位置及与人头位置相应的置信度。
其中,人头位置可以用包括人头图像的矩形框的位置表示。矩形框的位置可以用四元组表示。四元组可以包括矩形框其中一个顶点的横坐标和纵坐标以及矩形框的宽度和高度,或者四元组可以包括矩形框对角关系的两个顶点各自的横坐标和纵坐标。回归层输出的置信度与回归层输出的人头位置一 一对应,表示相对应的人头位置包括人头图像的概率。回归层可采用支持向量机(SVM,英文全称Support Vector Machine)。
在一个实施例中,步骤S308包括:通过卷积神经网络的回归层中的卷积层,将对应于每个子图像的第二特征映射为对应于每个子图像的人头位置及与人头位置相应的置信度。具体地,电子设备可通过卷积神经网络的回归层中的相同的卷积层,直接将对应于每个子图像的第二特征映射为对应于每个子图像的人头位置及与人头位置相应的置信度。
在一个实施例中,步骤S308包括:通过卷积神经网络的回归层中的第一卷积层,将对应于每个子图像的第二特征映射为对应于每个子图像的人头位置;通过卷积神经网络的回归层中的第二卷积层,将对应于每个子图像的第二特征映射为与输出的人头位置相对应的置信度。
举例说明,参照图4,子图像经过卷积神经网络的前置层,输出128个大小为M*N的特征矩阵。其中128为预设数量,可根据需要设定;M和N由前置层的参数确定。128个大小为M*N的特征矩阵输入前置层之后的卷积层,通过该卷积层中128*1024大小的参数矩阵做卷积处理,输出M*N个长度为1024的特征向量。M*N个长度为1024的特征向量输入回归层中的第一卷积层,通过第一卷积层中1024*4大小的参数矩阵做卷积处理,输出M*N个表示人头位置的四元组。M*N个长度为1024的特征向量输入回归层中的第二卷积层,通过第二卷积层中1024*1大小的参数向量做卷积处理,输出M*N个表示人头位置的置信度的一元组。人头位置和置信度的位置关系体现在输出的M*N个四元组及一元组的次序。
S310,将对应于每个子图像的人头位置按照相应的置信度过滤,获得待检测图像中检测到的人头位置。
具体地,电子设备可将卷积神经网络输出的每个人头位置的置信度与置信度阈值比较,将置信度小于置信度阈值的人头位置过滤掉。电子设备还可以进一步将通过置信度阈值过滤后的人头位置中所占面积小于预设面积的人头位置过滤掉。电子设备可以将过滤后的人头位置进行聚类,从而将聚类到 相同类别的多个人头位置合并得到待检测图像中的人头位置,或者从聚类到相同类别的多个人头位置中选择一个人头位置作为待检测图像中的人头位置。
上述人头检测方法,卷积神经网络预先根据已标定人头位置的训练图像进行了训练,卷积神经网络可以自动学习到人头的特征。经过训练的卷积神经网络可以自动从子图像中提取合适的特征来输出候选的人头位置及相应的置信度,进而再依据置信度过滤得到待检测图像中的人头位置。不需要预先假设人头形状,可以避免因设定人头形状而导致的漏检,提高了人头检测的准确性。而且,在卷积神经网络内部,由包括卷积层和子采样层的前置层输出子图像的第一特征,再由前置层之后且回归层之前的卷积层输出第二特征,以准确地描述子图像中人头的特征,从而通过回归层直接将第二特征映射为人头位置及置信度,是新结构的卷积神经网络的新应用,相比传统的基于圆形检测的人头检测的准确性有很大提高。
在一个实施例中,在步骤S302之前,该人头检测方法还包括将用于分类的卷积神经网络转化为用于人头检测的卷积神经网络并训练的步骤。参照图5,该将用于分类的卷积神经网络转化为用于人头检测的卷积神经网络并训练的步骤具体包括如下步骤:
S502,将用于分类的卷积神经网络所包括的前置层之后且回归层之前的全连接层转换为卷积层。
用于分类的卷积神经网络是经过训练得到的可对输入该卷积神经网络的图像进行分类的卷积神经网络,如GoogleNet、VGGNET或者AlexNet。用于分类的卷积神经网络包括前置层、全连接层以及回归层。全连接层用于输出对应于每个预设类别的第二特征
全连接层与卷积层的稀疏连接和权值共享不同,全连接层的每一个神经元都和上一层的所有神经元相连接。卷积层和全连接层都是通过上一层的输出与参数矩阵相乘的方式得到下一层的输入,所以可以通过改变全连接层的参数的排列形式将全连接层转化为卷积层。
S504,将用于分类的卷积神经网络中的回归层替换为用于将转换后的卷积层输出的第二特征映射为人头位置及相应置信度的回归层。
用于分类的卷积神经网络中,回归层用于将全连接层输出的每个预设类别的第二特征映射为对应于每个预设类别的概率,并依据映射出的概率判定图像所属的预设类别,如选择对应最大概率的预设类别作为输入图像所属的预设类别。
本实施例的用于人头检测的卷积神经网络中,回归层用于将转换后的卷积层输出的预设数量的第二特征映射为人头位置以及与该人头位置相对应的置信度。回归层可以采用一个卷积层,由该卷积层直接将第二特征映射为人头位置以及与该人头位置相对应的置信度。回归层也可以采用并行的两个卷积层,其中一个卷积层用于将第二特征映射为人头位置,另一个卷积层则用于将第二特征映射为与映射的人头位置相对应的置信度。
S506,采用已标定人头位置的训练图像,对包括前置层、转换后的卷积层及替换后的回归层的卷积神经网络进行训练。
其中,包括前置层、转换后的卷积层及替换后的回归层的卷积神经网络,是从用于分类的卷积神经网络改造得到,其前置层的参数是已经过训练的,那么改造后的卷积神经网络主要需要对转换后的卷积层以及替换后的回归层中的参数进行训练。
具体地,在训练改造后的卷积神经网络时,可为训练图像标定的人头位置分配置信度;将训练图像按照与分割待检测图像时相同的分割方式分割出多于一个的子图像;将分割出的子图像分别输入卷积神经网络,经过卷积神经网络的前置层、前置层之后的卷积层以及回归层后,输出人头位置及置信度;计算输出的人头位置和标定的人头位置的差距,并计算相应的置信度之间的差距,根据两种差距调整卷积神经网络中前置层、前置层之后的卷积层以及回归层中的参数;继续训练,直至达到终止条件。终止条件可以是差距小于预设差距,或者迭代次数达到预设次数。
本实施例中,基于用于分类的卷积神经网络进行改造后训练,得到用于 人头检测的卷积神经网络,不需要重新构建卷积神经网络,并可减少训练时长,提高了实现人头检测的效率。
如图6所示,在一个实施例中,步骤S310具体包括如下步骤:
S602,从对应于每个子图像的人头位置中筛选出所对应的置信度高于或等于置信度阈值的人头位置。
具体地,电子设备可将所有从待检测图像中分割出的子图像各自对应的人头位置构成人头位置集合,遍历该人头位置集合,比较遍历的人头位置与置信度阈值,将低于置信度阈值的人头位置从人头位置集合中剔除,遍历完成后人头位置集合中剩余的人头位置,便是筛选出的所对应的置信度高于或等于置信度阈值的人头位置。置信度阈值可根据需要设定,如取0.5~0.99中的值。
S604,从对应于每个子图像的人头位置中挑选在待检测图像中与筛选出的人头位置相交的人头位置。
人头位置之间相交,是指人头位置各自所表示的封闭区域存在交集。当人头位置用包括人头图像的矩形框的位置来表示时,人头位置相交即相应的矩形框相交。具体地,电子设备可从所有从待检测图像中分割出的子图像各自对应的人头位置构成的人头位置集合中,挑选与前述筛选出的人头位置在待检测图像中相交的人头位置。电子设备也可以仅从筛选出的人头位置中寻找相交的人头位置。
S606,根据筛选出的人头位置及挑选出的人头位置确定待检测图像中检测到的人头位置。
具体地,电子设备可将筛选出的人头位置及挑选出的人头位置分类,每类至少包括筛选出的人头位置中的一个人头位置,还包括与该至少一个的人头位置相交的人头位置。电子设备可将每类的人头位置合并为一个人头位置作为检测到的人头位置,或者从每类的人头位置中选择一个人头位置作为检测到的人头位置。
本实施例中,利用置信度及是否相交作为确定待检测图像中的人头位置 的依据,可进一步提高人头检测的准确性。
如图7所示,在一个实施例中,步骤S606具体包括如下步骤:
S702,将筛选出的人头位置以及挑选出的人头位置作为二分图中的节点。
其中,二分图是图论中的图,该二分图中的节点可以被分为两组,并且使得所有连接节点的边都跨越组的边界。
S704,为二分图中的节点之间的边分配默认且为正的权重。
其中,对于每个筛选出的人头位置,其与相应挑选出的相交的人头位置之间存在边。默认且为正的权重为正值,比如1000。
S706,当边所关联的节点所表示的人头位置相交时,将相应分配的权重减小。
具体地,当边所关联的节点所表示的人头位置相交时,电子设备可将相应分配的权重减去小于默认且为正的权重的正值后除以默认且为正的权重,得到更新的权重。如默认且为正的权重为1000,小于默认且为正的权重的正值为100,则更新的权重为(1000-100)/1000=0.9。
S708,求取二分图的最大权匹配,得到待检测图像中检测到的人头位置。
其中,二分图中的匹配是边的集合,该集合中的边没有公共节点。若一个二分图的所有匹配中,其中一个匹配的各边的权重和最大,则该匹配为最大权匹配。电子设备可遍历二分图中边的所有组合,从中找出最大权匹配。电子设备还可以采用Kuhn-Munkres算法求取二分图的最大权匹配。求得最大权匹配后,最大权匹配中的边所关联的人头位置就可以作为待检测图像中检测到的人头位置。
本实施例中,由于相交的人头位置很可能对应相同的人头,卷积神经网络输出的人头位置大部分会在待检测图像中实际的人头位置附近聚集,因此将筛选出的人头位置以及挑选出的人头位置作为二分图中的节点构建二分图,且相交的人头位置相应边的权重较小,通过求取最大权匹配来得到待检测图像中检测到的人头位置,可以更加准确地进行人头检测。
在一个实施例中,待检测图像为视频中的视频帧,该人头检测方法还包 括逐视频帧进行人头跟踪并统计人流量的步骤。参照图8,该逐视频帧进行人头跟踪并统计人流量的步骤具体包括如下步骤:
S802,根据待检测图像中检测到的人头位置逐视频帧进行人头跟踪。
具体地,电子设备在一个视频帧中检测到人头位置后,以该人头位置为起点进行逐视频帧的人头跟踪。电子设备具体可采用MeanShift(均值漂移)跟踪算法、光流跟踪算法或者TLD(Tracking-Learning-Detection,跟踪学习检测)算法。
S804,确定跟踪的人头位置相对于指定区域的运动方向及位置关系。
指定区域是指在视频帧中指定的区域。跟踪的人头位置相对于指定区域的运动方向,是指朝向或者远离指定区域;跟踪的人头位置相对于指定区域的位置关系,是指人头位置在指定区域内或者指定区域外。
在一个实施例中,当跟踪的人头位置沿朝向指定区域的方向跨越表示指定区域边界的线条时,判定跟踪的人头位置进入指定区域;当跟踪的人头位置沿远离指定区域的方向跨越表示指定区域的边界的线条时,判定跟踪的人头位置离开指定区域。
在一个实施例中,当跟踪的人头位置依次跨越第一线条以及与第一线条平行的第二线条时,确定跟踪的人头位置进入指定区域;当跟踪的人头位置依次跨越第二线条以及第一线条时,确定跟踪的人头位置离开指定区域。
其中,平行的第一线条和第二线条可以是直线或者曲线。指定区域可以是待检测图像中由第二线条分割得到的两个区域中不包括第一线条的区域。本实施例中,通过两条线条来判断跟踪的人头位置相对于指定区域的运动方向及位置关系,可防止人头位置在指定区域边界附近活动而导致判断出错,从而可以保证人数统计的正确性。
S806,根据确定的运动方向和位置关系进行人数统计。
其中,人数统计具体可统计累计进入指定区域的人数、累计离开指定区域的人数以及进入指定区域的动态人数等中的一种或几种的组合。具体地,电子设备可在跟踪的一个人头位置进入指定区域时,将统计的累计进入指定 区域的人数加1,和/或,将进入指定区域的动态人数加1;电子设备可在跟踪的一个人头位置离开指定区域时,将统计的累计离开指定区域的人数加1,和/或,将进入指定区域的动态人数加减1。
本实施例中,将人头检测应用于安防领域,根据跟踪的人头位置相对于指定区域的运动方向及位置关系进行人数统计,基于准确地人头检测,可以保证人数统计的准确性。
在一个实施例中,该人头检测方法还包括当跟踪人头位置中断时在前一视频帧中跟踪的人头位置附近检测人头位置并继续跟踪的步骤,参照图9,该步骤具体包括如下步骤:
S902,逐视频帧跟踪人头位置并记录。
具体地,电子设备以待检测图像中检测到的人头位置为起点,对该检测到的人头位置进行跟踪,并记录跟踪到的人头位置。
S904,若在当前视频帧中跟踪人头位置中断,则获取记录的前一视频帧中跟踪到的人头位置。
具体地,当人物快速移动或者光线变化时,可能导致跟踪人头位置中断,此时则获取逐视频帧跟踪时记录的前一视频帧中跟踪到的人头位置。
S906,检测在当前视频帧中覆盖获取的人头位置的局部区域中的人头位置。
其中,覆盖获取的人头位置的局部区域小于一个视频帧的尺寸,且大于前一视频帧中跟踪到的人头位置所占区域的尺寸。该局部区域的形状可以与前一视频帧中跟踪到的人头位置所占区域形状相似。该局部区域的中心可以与前一视频帧中跟踪到的人头位置所占区域的中心重叠。
具体地,电子设备可在当前视频帧中检测人头位置,从而找出属于局部区域中的人头位置。电子设备也可以仅在局部区域中检测人头位置。电子设备具体可采用上述步骤S302至步骤S310的各个步骤,检测在当前视频帧中局部区域中的人头位置。检测的人头位置可以局部或者全部位于局部区域中。电子设备可将中心位于局部区域之内的人头位置作为检测到的局部区域中的 人头位置,中心位于局部区域之外的人头位置则不属于局部区域中的人头位置。
举例说明,当人头位置用包括人头图像的矩形框的位置表示时,若前一视频帧中跟踪到的矩形框的宽度为W,高度为H,设a和b均为大于1的系数,则局部区域可以是宽度为a*W且高度为b*H且与矩形框相同中心的矩形区域。若前一视频帧中跟踪到的矩形框的中心坐标为(X1,X2),另一个表示人头位置的矩形框的中心坐标为(X2,Y2),则当|X1-X2|<W/2且|Y1-Y2|<H/2时,判定中心坐标为(X2,Y2)的矩形框在中心坐标为(X1,X2)的矩形框的局部区域中。
S908,从局部区域中检测到的人头位置起继续执行步骤S902。
本实施例中,当跟踪人头位置中断时,可以从前一帧检测到的人头位置附近检测人头位置,并继续进行中断的人头跟踪,将人头检测和人头跟踪相结合,可以保证跟踪的连续性,进而保证人数统计的准确性。
下面用一个具体应用场景来说明上述人头检测方法的具体原理。事先获取大量电梯入口场景处的俯视图像,对这些俯视图像中的人头位置进行标定,如用一个四元组表示图10中人头图像所在矩形框1001的位置。选择一种用于分类的卷积神经网络,将其中的前置层之后且回归层之前的全连接层转换为卷积层,并将其中的回归层替换为用于将转换后的卷积层输出的第二特征映射为人头位置及相应置信度的回归层,从而利用已标定的俯视图像该卷积神经网络进行再训练。
参照图11,在实际应用中,若需要统计进出闸机的人数,则在闸机上方设置俯视摄像头,由该俯视摄像头拍摄视频并传输至与该俯视摄像头连接的电子设备。电子设备则将视频帧中其中一个视频帧中由线条1101和线条1104所夹图像区域作为待检测图像,将待检测图像分割为多于一个的子图像,分别将每个子图像输入已根据已标定人头位置的训练图像进行训练的卷积神经网络,由卷积神经网络输出对应于每个子图像的人头位置及与人头位置相应的置信度,从而将对应于每个子图像的人头位置按照相应的置信度过滤,获 得待检测图像中检测到的人头位置。
进一步地,电子设备根据待检测图像中检测到的人头位置逐视频帧进行人头跟踪,当跟踪的人头位置1105依次跨越第一线条1102以及与第一线条1102平行的第二线条1103时,确定跟踪的人头位置1105进入指定区域。当跟踪的人头位置1106依次跨越第二线条1103以及第一线条1102时,确定跟踪的人头位置1106离开指定区域。图11中指定区域具体可以是第二线条1103与线条1104所夹区域。
在一个实施例中,还提供了一种电子设备,该电子设备的内部结构可如图2所示,该电子设备包括人头检测装置,人头检测装置中包括各个模块,每个模块可全部或部分通过软件、硬件或其组合来实现。
图12为一个实施例中人头检测装置1200的结构框图。参照图12,该人头检测装置1200包括:分割模块1210、卷积神经网络模块1220和人头检测结果确定模块1230。
分割模块1210,用于将待检测图像分割为多于一个的子图像。
卷积神经网络模块1220,用于分别将每个子图像输入已根据已标定人头位置的训练图像进行训练的卷积神经网络,通过卷积神经网络中包括卷积层和子采样层的前置层输出对应于每个子图像的第一特征;通过卷积神经网络中前置层之后的卷积层,将对应于每个子图像的第一特征映射为对应于每个子图像的第二特征;通过卷积神经网络的回归层,将对应于每个子图像的第二特征映射为对应于每个子图像的人头位置及与人头位置相应的置信度。
人头检测结果确定模块1230,用于将对应于每个子图像的人头位置按照相应的置信度过滤,获得待检测图像中检测到的人头位置。
上述人头检测装置1200,卷积神经网络预先根据已标定人头位置的训练图像进行了训练,卷积神经网络可以自动学习到人头的特征。经过训练的卷积神经网络可以自动从子图像中提取合适的特征来输出候选的人头位置及相应的置信度,进而再依据置信度过滤得到待检测图像中的人头位置。不需要预先假设人头形状,可以避免因设定人头形状而导致的漏检,提高了人头检 测的准确性。而且,在卷积神经网络内部,由包括卷积层和子采样层的前置层输出子图像的第一特征,再由前置层之后且回归层之前的卷积层输出第二特征,以准确地描述子图像中人头的特征,从而通过回归层直接将第二特征映射为人头位置及置信度,是新结构的卷积神经网络的新应用,相比传统的基于圆形检测的人头检测的准确性有很大提高。
在一个实施例中,分割模块1210还用于将待检测图像分割为固定尺寸且多于一个的子图像,且分割出的子图像中相邻的子图像之间存在重叠部分。本实施例中,分割出的相邻的子图像之间存在重叠部分,可以保证相邻的子图像具有更强的关联,可提高从待检测图像中检测出人头位置的准确性。
如图13所示,在一个实施例中,人头检测装置1200还包括:卷积神经网络调整模块1240和训练模块1250。
卷积神经网络调整模块1240,用于将用于分类的卷积神经网络所包括的前置层之后且回归层之前的全连接层转换为卷积层;将用于分类的卷积神经网络中的回归层替换为用于将转换后的卷积层输出的第二特征映射为人头位置及相应置信度的回归层。
训练模块1250,用于采用已标定人头位置的训练图像,对包括前置层、转换后的卷积层及替换后的回归层的卷积神经网络进行训练。
本实施例中,基于用于分类的卷积神经网络进行改造后训练,得到用于人头检测的卷积神经网络,不需要重新构建卷积神经网络,并可减少训练时长,提高了实现人头检测的效率。
在一个实施例中,卷积神经网络模块1220还用于通过卷积神经网络的回归层中的第一卷积层,将对应于每个子图像的第二特征映射为对应于每个子图像的人头位置;通过卷积神经网络的回归层中的第二卷积层,将对应于每个子图像的第二特征映射为与输出的人头位置相对应的置信度。
如图14所示,在一个实施例中,人头检测结果确定模块1230包括:过滤模块1231和人头位置确定模块1232。
过滤模块1231,用于从对应于每个子图像的人头位置中筛选出所对应的 置信度高于或等于置信度阈值的人头位置;从对应于每个子图像的人头位置中挑选在待检测图像中与筛选出的人头位置相交的人头位置。
人头位置确定模块1232,用于根据筛选出的人头位置及挑选出的人头位置确定待检测图像中检测到的人头位置。
本实施例中,利用置信度及是否相交作为确定待检测图像中的人头位置的依据,可进一步提高人头检测的准确性。
在一个实施例中,人头位置确定模块1232还用于将筛选出的人头位置以及挑选出的人头位置作为二分图中的节点;为二分图中的节点之间的边分配默认且为正的权重;当边所关联的节点所表示的人头位置相交时,将相应分配的权重减小;求取二分图的最大权匹配,得到待检测图像中检测到的人头位置。
本实施例中,由于相交的人头位置很可能对应相同的人头,卷积神经网络输出的人头位置大部分会在待检测图像中实际的人头位置附近聚集,因此将筛选出的人头位置以及挑选出的人头位置作为二分图中的节点构建二分图,且相交的人头位置相应边的权重较小,通过求取最大权匹配来得到待检测图像中检测到的人头位置,可以更加准确地进行人头检测。
如图15所示,在一个实施例中,待检测图像为视频中的视频帧。人头检测装置1200还包括:
跟踪模块1260,用于根据待检测图像中检测到的人头位置逐视频帧进行人头跟踪;
统计条件检测模块1270,用于确定跟踪的人头位置相对于指定区域的运动方向及位置关系;
人数统计模块1280,用于根据确定的运动方向和位置关系进行人数统计。
本实施例中,将人头检测应用于安防领域,根据跟踪的人头位置相对于指定区域的运动方向及位置关系进行人数统计,基于准确地人头检测,可以保证人数统计的准确性。
在一个实施例中,统计条件检测模块1270还用于当跟踪的人头位置依次 跨越第一线条以及与第一线条平行的第二线条时,确定跟踪的人头位置进入指定区域;当跟踪的人头位置依次跨越第二线条以及第一线条时,确定跟踪的人头位置离开指定区域。
本实施例中,通过两条线条来判断跟踪的人头位置相对于指定区域的运动方向及位置关系,可防止人头位置在指定区域边界附近活动而导致判断出错,从而可以保证人数统计的正确性。
如图16所示,在一个实施例中,人头检测模块1200还包括人头位置获取模块1290。
跟踪模块1260还用于逐视频帧跟踪人头位置并记录。
人头位置获取模块1290,用于若在当前视频帧中跟踪人头位置中断,则获取记录的前一视频帧中跟踪到的人头位置。
卷积神经网络模块1220还用于检测在当前视频帧中覆盖获取的人头位置的局部区域中的人头位置。
跟踪模块1260还用于从局部区域中检测到的人头位置起继续执行逐视频帧跟踪人头位置并记录的步骤。
本实施例中,当跟踪人头位置中断时,可以从前一帧检测到的人头位置附近检测人头位置,并继续进行中断的人头跟踪,将人头检测和人头跟踪相结合,可以保证跟踪的连续性,进而保证人数统计的准确性。
应该理解的是,虽然本申请各实施例中的各个步骤并不是必然按照步骤标号指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,各实施例中至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于 一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (27)

  1. 一种人头检测方法,包括:
    电子设备将待检测图像分割为多于一个的子图像;
    所述电子设备分别将每个所述子图像输入已根据已标定人头位置的训练图像进行训练的卷积神经网络,通过所述卷积神经网络中包括卷积层和子采样层的前置层输出对应于每个所述子图像的第一特征;
    所述电子设备通过所述卷积神经网络中所述前置层之后的卷积层,将对应于每个所述子图像的第一特征映射为对应于每个所述子图像的第二特征;
    所述电子设备通过所述卷积神经网络的回归层,将对应于每个所述子图像的所述第二特征映射为对应于每个所述子图像的人头位置及与所述人头位置相应的置信度;及
    所述电子设备将所述对应于每个所述子图像的人头位置按照相应的置信度过滤,获得所述待检测图像中检测到的人头位置。
  2. 根据权利要求1所述的方法,其特征在于,所述电子设备将待检测图像分割为多于一个的子图像包括:
    所述电子设备将待检测图像分割为固定尺寸且多于一个的子图像,且分割出的所述子图像中相邻的子图像之间存在重叠部分。
  3. 根据权利要求1所述的方法,其特征在于,还包括:
    所述电子设备将用于分类的卷积神经网络所包括的前置层之后且回归层之前的全连接层转换为卷积层;
    所述电子设备将用于分类的所述卷积神经网络中的回归层替换为用于将转换后的卷积层输出的第二特征映射为人头位置及相应置信度的回归层;及
    所述电子设备采用已标定人头位置的训练图像,对包括所述前置层、转换后的卷积层及替换后的回归层的所述卷积神经网络进行训练。
  4. 根据权利要求1所述的方法,其特征在于,所述电子设备通过所述卷积神经网络的回归层,将对应于每个所述子图像的所述第二特征映射为对应于每个所述子图像的人头位置及与所述人头位置相应的置信度包括:
    所述电子设备通过所述卷积神经网络的回归层中的第一卷积层,将对应于每个所述子图像的所述第二特征映射为对应于每个所述子图像的人头位置;及
    所述电子设备通过所述卷积神经网络的回归层中的第二卷积层,将对应于每个所述子图像的所述第二特征映射为与输出的所述人头位置相对应的置信度。
  5. 根据权利要求1所述的方法,其特征在于,所述电子设备将所述对应于每个所述子图像的人头位置按照相应的置信度过滤,获得所述待检测图像中检测到的人头位置包括:
    所述电子设备从对应于每个所述子图像的人头位置中筛选出所对应的置信度高于或等于置信度阈值的人头位置;
    所述电子设备从对应于每个所述子图像的人头位置中挑选在所述待检测图像中与筛选出的人头位置相交的人头位置;及
    所述电子设备根据所述筛选出的人头位置及挑选出的所述人头位置确定所述待检测图像中检测到的人头位置。
  6. 根据权利要求5所述的方法,其特征在于,所述电子设备根据所述筛选出的人头位置及挑选出的所述人头位置确定所述待检测图像中检测到的人头位置包括:
    所述电子设备将所述筛选出的人头位置以及挑选出的所述人头位置作为二分图中的节点;
    所述电子设备为所述二分图中的节点之间的边分配默认且为正的权重;
    所述电子设备当所述边所关联的节点所表示的人头位置相交时,将相应分配的权重减小;及
    所述电子设备求取所述二分图的最大权匹配,得到所述待检测图像中检测到的人头位置。
  7. 根据权利要求1所述的方法,其特征在于,所述待检测图像为视频中的视频帧;所述方法还包括:
    所述电子设备根据所述待检测图像中检测到的人头位置逐视频帧进行人头跟踪;
    所述电子设备确定跟踪的人头位置相对于指定区域的运动方向及位置关系;及
    所述电子设备根据确定的所述运动方向和所述位置关系进行人数统计。
  8. 根据权利要求7所述的方法,其特征在于,所述电子设备确定跟踪的人头位置相对于指定区域的运动方向及位置关系包括:
    所述电子设备当跟踪的人头位置依次跨越第一线条以及与所述第一线条平行的第二线条时,确定跟踪的人头位置进入指定区域;及
    所述电子设备当跟踪的人头位置依次跨越所述第二线条以及所述第一线条时,确定跟踪的人头位置离开所述指定区域。
  9. 根据权利要求7所述的方法,其特征在于,所述方法还包括:
    所述电子设备逐视频帧跟踪所述人头位置并记录;
    所述电子设备若在当前视频帧中跟踪所述人头位置中断,则获取记录的前一视频帧中跟踪到的人头位置;
    所述电子设备检测在当前视频帧中覆盖获取的人头位置的局部区域中的人头位置;及
    所述电子设备从所述局部区域中检测到的人头位置起继续执行所述逐视频帧跟踪所述人头位置并记录的步骤。
  10. 一种电子设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
    将待检测图像分割为多于一个的子图像;
    分别将每个所述子图像输入已根据已标定人头位置的训练图像进行训练的卷积神经网络,通过所述卷积神经网络中包括卷积层和子采样层的前置层输出对应于每个所述子图像的第一特征;
    通过所述卷积神经网络中所述前置层之后的卷积层,将对应于每个所述 子图像的第一特征映射为对应于每个所述子图像的第二特征;
    通过所述卷积神经网络的回归层,将对应于每个所述子图像的所述第二特征映射为对应于每个所述子图像的人头位置及与所述人头位置相应的置信度;及
    将所述对应于每个所述子图像的人头位置按照相应的置信度过滤,获得所述待检测图像中检测到的人头位置。
  11. 根据权利要求10所述的电子设备,其特征在于,所述将待检测图像分割为多于一个的子图像包括:
    将待检测图像分割为固定尺寸且多于一个的子图像,且分割出的所述子图像中相邻的子图像之间存在重叠部分。
  12. 根据权利要求10所述的电子设备,其特征在于,所述计算机可读指令还使得所述处理器执行以下步骤:
    将用于分类的卷积神经网络所包括的前置层之后且回归层之前的全连接层转换为卷积层;
    将用于分类的所述卷积神经网络中的回归层替换为用于将转换后的卷积层输出的第二特征映射为人头位置及相应置信度的回归层;及
    采用已标定人头位置的训练图像,对包括所述前置层、转换后的卷积层及替换后的回归层的所述卷积神经网络进行训练。
  13. 根据权利要求10所述的电子设备,其特征在于,所述通过所述卷积神经网络的回归层,将对应于每个所述子图像的所述第二特征映射为对应于每个所述子图像的人头位置及与所述人头位置相应的置信度包括:
    通过所述卷积神经网络的回归层中的第一卷积层,将对应于每个所述子图像的所述第二特征映射为对应于每个所述子图像的人头位置;及
    通过所述卷积神经网络的回归层中的第二卷积层,将对应于每个所述子图像的所述第二特征映射为与输出的所述人头位置相对应的置信度。
  14. 根据权利要求10所述的电子设备,其特征在于,所述将所述对应于每个所述子图像的人头位置按照相应的置信度过滤,获得所述待检测图像中 检测到的人头位置包括:
    从对应于每个所述子图像的人头位置中筛选出所对应的置信度高于或等于置信度阈值的人头位置;
    从对应于每个所述子图像的人头位置中挑选在所述待检测图像中与筛选出的人头位置相交的人头位置;及
    根据所述筛选出的人头位置及挑选出的所述人头位置确定所述待检测图像中检测到的人头位置。
  15. 根据权利要求14所述的电子设备,其特征在于,所述根据所述筛选出的人头位置及挑选出的所述人头位置确定所述待检测图像中检测到的人头位置包括:
    将所述筛选出的人头位置以及挑选出的所述人头位置作为二分图中的节点;
    为所述二分图中的节点之间的边分配默认且为正的权重;
    当所述边所关联的节点所表示的人头位置相交时,将相应分配的权重减小;及
    求取所述二分图的最大权匹配,得到所述待检测图像中检测到的人头位置。
  16. 根据权利要求10所述的电子设备,其特征在于,所述待检测图像为视频中的视频帧;所述计算机可读指令还使得所述处理器执行以下步骤:
    根据所述待检测图像中检测到的人头位置逐视频帧进行人头跟踪;
    确定跟踪的人头位置相对于指定区域的运动方向及位置关系;及
    根据确定的所述运动方向和所述位置关系进行人数统计。
  17. 根据权利要求16所述的电子设备,其特征在于,所述确定跟踪的人头位置相对于指定区域的运动方向及位置关系包括:
    当跟踪的人头位置依次跨越第一线条以及与所述第一线条平行的第二线条时,确定跟踪的人头位置进入指定区域;及
    当跟踪的人头位置依次跨越所述第二线条以及所述第一线条时,确定跟 踪的人头位置离开所述指定区域。
  18. 根据权利要求16所述的电子设备,其特征在于,所述计算机可读指令还使得所述处理器执行以下步骤:
    逐视频帧跟踪所述人头位置并记录;
    若在当前视频帧中跟踪所述人头位置中断,则获取记录的前一视频帧中跟踪到的人头位置;
    检测在当前视频帧中覆盖获取的人头位置的局部区域中的人头位置;及
    从所述局部区域中检测到的人头位置起继续执行所述逐视频帧跟踪所述人头位置并记录的步骤。
  19. 一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
    将待检测图像分割为多于一个的子图像;
    分别将每个所述子图像输入已根据已标定人头位置的训练图像进行训练的卷积神经网络,通过所述卷积神经网络中包括卷积层和子采样层的前置层输出对应于每个所述子图像的第一特征;
    通过所述卷积神经网络中所述前置层之后的卷积层,将对应于每个所述子图像的第一特征映射为对应于每个所述子图像的第二特征;
    通过所述卷积神经网络的回归层,将对应于每个所述子图像的所述第二特征映射为对应于每个所述子图像的人头位置及与所述人头位置相应的置信度;及
    将所述对应于每个所述子图像的人头位置按照相应的置信度过滤,获得所述待检测图像中检测到的人头位置。
  20. 根据权利要求19所述的存储介质,其特征在于,所述将待检测图像分割为多于一个的子图像包括:
    将待检测图像分割为固定尺寸且多于一个的子图像,且分割出的所述子 图像中相邻的子图像之间存在重叠部分。
  21. 根据权利要求19所述的存储介质,其特征在于,所述计算机可读指令还使得所述处理器执行以下步骤:
    将用于分类的卷积神经网络所包括的前置层之后且回归层之前的全连接层转换为卷积层;
    将用于分类的所述卷积神经网络中的回归层替换为用于将转换后的卷积层输出的第二特征映射为人头位置及相应置信度的回归层;及
    采用已标定人头位置的训练图像,对包括所述前置层、转换后的卷积层及替换后的回归层的所述卷积神经网络进行训练。
  22. 根据权利要求19所述的存储介质,其特征在于,所述通过所述卷积神经网络的回归层,将对应于每个所述子图像的所述第二特征映射为对应于每个所述子图像的人头位置及与所述人头位置相应的置信度包括:
    通过所述卷积神经网络的回归层中的第一卷积层,将对应于每个所述子图像的所述第二特征映射为对应于每个所述子图像的人头位置;及
    通过所述卷积神经网络的回归层中的第二卷积层,将对应于每个所述子图像的所述第二特征映射为与输出的所述人头位置相对应的置信度。
  23. 根据权利要求19所述的存储介质,其特征在于,所述将所述对应于每个所述子图像的人头位置按照相应的置信度过滤,获得所述待检测图像中检测到的人头位置包括:
    从对应于每个所述子图像的人头位置中筛选出所对应的置信度高于或等于置信度阈值的人头位置;
    从对应于每个所述子图像的人头位置中挑选在所述待检测图像中与筛选出的人头位置相交的人头位置;及
    根据所述筛选出的人头位置及挑选出的所述人头位置确定所述待检测图像中检测到的人头位置。
  24. 根据权利要求23所述的存储介质,其特征在于,所述根据所述筛选出的人头位置及挑选出的所述人头位置确定所述待检测图像中检测到的人头 位置包括:
    将所述筛选出的人头位置以及挑选出的所述人头位置作为二分图中的节点;
    为所述二分图中的节点之间的边分配默认且为正的权重;
    当所述边所关联的节点所表示的人头位置相交时,将相应分配的权重减小;及
    求取所述二分图的最大权匹配,得到所述待检测图像中检测到的人头位置。
  25. 根据权利要求19所述的存储介质,其特征在于,所述待检测图像为视频中的视频帧;所述计算机可读指令还使得所述处理器执行以下步骤:
    根据所述待检测图像中检测到的人头位置逐视频帧进行人头跟踪;
    确定跟踪的人头位置相对于指定区域的运动方向及位置关系;及
    根据确定的所述运动方向和所述位置关系进行人数统计。
  26. 根据权利要求25所述的存储介质,其特征在于,所述确定跟踪的人头位置相对于指定区域的运动方向及位置关系包括:
    当跟踪的人头位置依次跨越第一线条以及与所述第一线条平行的第二线条时,确定跟踪的人头位置进入指定区域;及
    当跟踪的人头位置依次跨越所述第二线条以及所述第一线条时,确定跟踪的人头位置离开所述指定区域。
  27. 根据权利要求25所述的存储介质,其特征在于,所述计算机可读指令还使得所述处理器执行以下步骤:
    逐视频帧跟踪所述人头位置并记录;
    若在当前视频帧中跟踪所述人头位置中断,则获取记录的前一视频帧中跟踪到的人头位置;
    检测在当前视频帧中覆盖获取的人头位置的局部区域中的人头位置;及
    从所述局部区域中检测到的人头位置起继续执行所述逐视频帧跟踪所述人头位置并记录的步骤。
PCT/CN2018/070008 2017-01-16 2018-01-02 人头检测方法、电子设备和存储介质 WO2018130104A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP18738888.9A EP3570209A4 (en) 2017-01-16 2018-01-02 HUMAN HEAD DETECTION PROCESS, ELECTRONIC DEVICE AND STORAGE MEDIA
US16/351,093 US20190206085A1 (en) 2017-01-16 2019-03-12 Human head detection method, eletronic device and storage medium
US16/299,866 US10796450B2 (en) 2017-01-16 2019-03-12 Human head detection method, eletronic device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710029244.6 2017-01-16
CN201710029244.6A CN106845383B (zh) 2017-01-16 2017-01-16 人头检测方法和装置

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US16/351,093 Continuation US20190206085A1 (en) 2017-01-16 2019-03-12 Human head detection method, eletronic device and storage medium
US16/299,866 Continuation US10796450B2 (en) 2017-01-16 2019-03-12 Human head detection method, eletronic device and storage medium

Publications (1)

Publication Number Publication Date
WO2018130104A1 true WO2018130104A1 (zh) 2018-07-19

Family

ID=59123959

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/070008 WO2018130104A1 (zh) 2017-01-16 2018-01-02 人头检测方法、电子设备和存储介质

Country Status (4)

Country Link
US (2) US10796450B2 (zh)
EP (1) EP3570209A4 (zh)
CN (1) CN106845383B (zh)
WO (1) WO2018130104A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490110A (zh) * 2019-01-29 2019-11-22 王馨悦 基于人体工程学特征检测的客流计数装置及方法
CN111008631A (zh) * 2019-12-20 2020-04-14 浙江大华技术股份有限公司 图像的关联方法及装置、存储介质和电子装置
CN112364716A (zh) * 2020-10-23 2021-02-12 岭东核电有限公司 核电设备异常信息检测方法、装置和计算机设备

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845383B (zh) 2017-01-16 2023-06-06 腾讯科技(上海)有限公司 人头检测方法和装置
US10019654B1 (en) * 2017-06-28 2018-07-10 Accenture Global Solutions Limited Image object recognition
CN107886098A (zh) * 2017-10-25 2018-04-06 昆明理工大学 一种基于深度学习的识别太阳黑子的方法
US10824907B2 (en) 2017-12-07 2020-11-03 Shanghai United Imaging Healthcare Co., Ltd. Systems and methods for image processing
CN107832807B (zh) * 2017-12-07 2020-08-07 上海联影医疗科技有限公司 一种图像处理方法和系统
CN108073898B (zh) * 2017-12-08 2022-11-18 腾讯科技(深圳)有限公司 人头区域识别方法、装置及设备
CN108154110B (zh) * 2017-12-22 2022-01-11 任俊芬 一种基于深度学习人头检测的密集人流量统计方法
CN108090454A (zh) * 2017-12-26 2018-05-29 上海理工大学 校园公共浴室人流量统计系统
CN108345832A (zh) * 2017-12-28 2018-07-31 新智数字科技有限公司 一种人脸检测的方法、装置及设备
CN108198191B (zh) * 2018-01-02 2019-10-25 武汉斗鱼网络科技有限公司 图像处理方法及装置
CN108154196B (zh) * 2018-01-19 2019-10-22 百度在线网络技术(北京)有限公司 用于输出图像的方法和装置
CN108881740B (zh) * 2018-06-28 2021-03-02 Oppo广东移动通信有限公司 图像方法和装置、电子设备、计算机可读存储介质
CN109241871A (zh) * 2018-08-16 2019-01-18 北京此时此地信息科技有限公司 一种基于视频数据的公共区域人流跟踪方法
TWI686748B (zh) * 2018-12-07 2020-03-01 國立交通大學 人流分析系統及人流分析方法
CN109816011B (zh) * 2019-01-21 2021-09-07 厦门美图之家科技有限公司 视频关键帧提取方法
US11048948B2 (en) * 2019-06-10 2021-06-29 City University Of Hong Kong System and method for counting objects
US11182903B2 (en) 2019-08-05 2021-11-23 Sony Corporation Image mask generation using a deep neural network
CN112347814A (zh) * 2019-08-07 2021-02-09 中兴通讯股份有限公司 客流估计与展示方法、系统及计算机可读存储介质
CN110688914A (zh) * 2019-09-09 2020-01-14 苏州臻迪智能科技有限公司 一种手势识别的方法、智能设备、存储介质和电子设备
CN111680569B (zh) * 2020-05-13 2024-04-19 北京中广上洋科技股份有限公司 基于图像分析的出勤率检测方法、装置、设备及存储介质
CN111915779B (zh) * 2020-07-31 2022-04-15 浙江大华技术股份有限公司 一种闸机控制方法、装置、设备和介质
CN111931670B (zh) * 2020-08-14 2024-05-31 成都数城科技有限公司 基于卷积神经网的深度图像头部检测与定位方法及系统
US11587325B2 (en) * 2020-09-03 2023-02-21 Industrial Technology Research Institute System, method and storage medium for detecting people entering and leaving a field
JP2022090491A (ja) * 2020-12-07 2022-06-17 キヤノン株式会社 画像処理装置、画像処理方法、及びプログラム
CN113011297A (zh) * 2021-03-09 2021-06-22 全球能源互联网研究院有限公司 基于边云协同的电力设备检测方法、装置、设备及服务器
CN115082836B (zh) * 2022-07-23 2022-11-11 深圳神目信息技术有限公司 一种行为识别辅助的目标物体检测方法及装置
US11914800B1 (en) 2022-10-28 2024-02-27 Dell Products L.P. Information handling system stylus with expansion bay and replaceable module
US11983337B1 (en) 2022-10-28 2024-05-14 Dell Products L.P. Information handling system mouse with strain sensor for click and continuous analog input
US11983061B1 (en) 2022-10-28 2024-05-14 Dell Products L.P. Information handling system peripheral device sleep power management
CN115797341B (zh) * 2023-01-16 2023-04-14 四川大学 一种自动即时判定头颅侧位x光片自然头位的方法
CN116245911B (zh) * 2023-02-08 2023-11-03 珠海安联锐视科技股份有限公司 一种视频过线统计方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8582807B2 (en) * 2010-03-15 2013-11-12 Nec Laboratories America, Inc. Systems and methods for determining personal characteristics
CN104992167A (zh) * 2015-07-28 2015-10-21 中国科学院自动化研究所 一种基于卷积神经网络的人脸检测方法及装置
CN105005774A (zh) * 2015-07-28 2015-10-28 中国科学院自动化研究所 一种基于卷积神经网络的人脸亲属关系识别方法及装置
CN105740758A (zh) * 2015-12-31 2016-07-06 上海极链网络科技有限公司 基于深度学习的互联网视频人脸识别方法
CN106022295A (zh) * 2016-05-31 2016-10-12 北京奇艺世纪科技有限公司 一种数据位置的确定方法及装置
CN106845383A (zh) * 2017-01-16 2017-06-13 腾讯科技(上海)有限公司 人头检测方法和装置

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7127087B2 (en) * 2000-03-27 2006-10-24 Microsoft Corporation Pose-invariant face recognition system and process
GB2409028A (en) * 2003-12-11 2005-06-15 Sony Uk Ltd Face detection
US20050286753A1 (en) * 2004-06-25 2005-12-29 Triant Technologies Inc. Automated inspection systems and methods
US20100014755A1 (en) * 2008-07-21 2010-01-21 Charles Lee Wilson System and method for grid-based image segmentation and matching
CN102156863B (zh) * 2011-05-16 2012-11-14 天津大学 跨摄像头的多运动目标跟踪方法
CN102902967B (zh) * 2012-10-16 2015-03-11 第三眼(天津)生物识别科技有限公司 基于人眼结构分类的虹膜和瞳孔的定位方法
US20140307076A1 (en) * 2013-10-03 2014-10-16 Richard Deutsch Systems and methods for monitoring personal protection equipment and promoting worker safety
CN103559478B (zh) * 2013-10-07 2018-12-04 唐春晖 俯视行人视频监控中的客流计数与事件分析方法
US10275688B2 (en) * 2014-12-17 2019-04-30 Nokia Technologies Oy Object detection with neural network
US9524450B2 (en) * 2015-03-04 2016-12-20 Accenture Global Services Limited Digital image processing using convolutional neural networks
US10074041B2 (en) * 2015-04-17 2018-09-11 Nec Corporation Fine-grained image classification by exploring bipartite-graph labels
CN104922167B (zh) 2015-07-16 2018-09-18 袁学军 一种灵芝孢子粉冲剂及其制备方法
CN105374050B (zh) * 2015-10-12 2019-10-18 浙江宇视科技有限公司 运动目标跟踪恢复方法及装置
CN105608690B (zh) * 2015-12-05 2018-06-08 陕西师范大学 一种基于图论和半监督学习相结合的图像分割方法
CN106022237B (zh) * 2016-05-13 2019-07-12 电子科技大学 一种端到端的卷积神经网络的行人检测方法
WO2018121013A1 (en) * 2016-12-29 2018-07-05 Zhejiang Dahua Technology Co., Ltd. Systems and methods for detecting objects in images
WO2019092672A2 (en) * 2017-11-13 2019-05-16 Way2Vat Ltd. Systems and methods for neuronal visual-linguistic data retrieval from an imaged document
US11551026B2 (en) * 2018-11-27 2023-01-10 Raytheon Company Dynamic reconfiguration training computer architecture
US10607331B1 (en) * 2019-06-28 2020-03-31 Corning Incorporated Image segmentation into overlapping tiles

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8582807B2 (en) * 2010-03-15 2013-11-12 Nec Laboratories America, Inc. Systems and methods for determining personal characteristics
CN104992167A (zh) * 2015-07-28 2015-10-21 中国科学院自动化研究所 一种基于卷积神经网络的人脸检测方法及装置
CN105005774A (zh) * 2015-07-28 2015-10-28 中国科学院自动化研究所 一种基于卷积神经网络的人脸亲属关系识别方法及装置
CN105740758A (zh) * 2015-12-31 2016-07-06 上海极链网络科技有限公司 基于深度学习的互联网视频人脸识别方法
CN106022295A (zh) * 2016-05-31 2016-10-12 北京奇艺世纪科技有限公司 一种数据位置的确定方法及装置
CN106845383A (zh) * 2017-01-16 2017-06-13 腾讯科技(上海)有限公司 人头检测方法和装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490110A (zh) * 2019-01-29 2019-11-22 王馨悦 基于人体工程学特征检测的客流计数装置及方法
CN111008631A (zh) * 2019-12-20 2020-04-14 浙江大华技术股份有限公司 图像的关联方法及装置、存储介质和电子装置
CN111008631B (zh) * 2019-12-20 2023-06-16 浙江大华技术股份有限公司 图像的关联方法及装置、存储介质和电子装置
CN112364716A (zh) * 2020-10-23 2021-02-12 岭东核电有限公司 核电设备异常信息检测方法、装置和计算机设备

Also Published As

Publication number Publication date
EP3570209A1 (en) 2019-11-20
EP3570209A4 (en) 2020-12-23
CN106845383A (zh) 2017-06-13
US20190206085A1 (en) 2019-07-04
US20190206083A1 (en) 2019-07-04
CN106845383B (zh) 2023-06-06
US10796450B2 (en) 2020-10-06

Similar Documents

Publication Publication Date Title
WO2018130104A1 (zh) 人头检测方法、电子设备和存储介质
CN110245662B (zh) 检测模型训练方法、装置、计算机设备和存储介质
WO2020238902A1 (zh) 图像分割方法、模型训练方法、装置、设备及存储介质
WO2019096029A1 (zh) 活体识别方法、存储介质和计算机设备
CN109644255B (zh) 标注包括一组帧的视频流的方法和装置
EP1650711B1 (en) Image processing device, imaging device, image processing method
CN110929617B (zh) 一种换脸合成视频检测方法、装置、电子设备及存储介质
WO2020024395A1 (zh) 疲劳驾驶检测方法、装置、计算机设备及存储介质
CN111160291B (zh) 基于深度信息与cnn的人眼检测方法
CN112836640A (zh) 一种单摄像头多目标行人跟踪方法
WO2023131301A1 (zh) 消化系统病理图像识别方法、系统及计算机存储介质
CN113343985B (zh) 车牌识别方法和装置
WO2022206680A1 (zh) 图像处理方法、装置、计算机设备和存储介质
US20240119584A1 (en) Detection method, electronic device and non-transitory computer-readable storage medium
Ni et al. Pats: Patch area transportation with subdivision for local feature matching
CN112883941A (zh) 一种基于并行神经网络的人脸表情识别方法
CN113780145A (zh) 精子形态检测方法、装置、计算机设备和存储介质
WO2021169625A1 (zh) 网络翻拍照片的检测方法、装置、计算机设备及存储介质
CN109447022A (zh) 一种镜头类型识别方法及装置
CN115147644A (zh) 图像描述模型的训练和描述方法、系统、设备及存储介质
JP7059889B2 (ja) 学習装置、画像生成装置、学習方法、及び学習プログラム
CN111915713A (zh) 一种三维动态场景的创建方法、计算机设备、存储介质
Midwinter et al. Unsupervised defect segmentation with pose priors
JP2004013615A (ja) 移動物体監視装置
JP7360303B2 (ja) 画像処理装置および画像処理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18738888

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2018738888

Country of ref document: EP