WO2023024779A1 - 人像检测方法、装置、电子设备和存储介质 - Google Patents

人像检测方法、装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2023024779A1
WO2023024779A1 PCT/CN2022/107190 CN2022107190W WO2023024779A1 WO 2023024779 A1 WO2023024779 A1 WO 2023024779A1 CN 2022107190 W CN2022107190 W CN 2022107190W WO 2023024779 A1 WO2023024779 A1 WO 2023024779A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
feature image
detected
images
Prior art date
Application number
PCT/CN2022/107190
Other languages
English (en)
French (fr)
Inventor
李远哲
闵捷
Original Assignee
西门子(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西门子(中国)有限公司 filed Critical 西门子(中国)有限公司
Publication of WO2023024779A1 publication Critical patent/WO2023024779A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions

Definitions

  • the present application relates to the technical field of image processing, and in particular to a method, device, electronic device and storage medium for detecting a human figure.
  • the staff counts the number of people entering and leaving the transportation hub at the entrance and exit of the transportation hub, and determines the number of gathered people in the transportation hub according to the statistical results of the number of staff.
  • the transportation hub usually has multiple entrances and exits, by counting the number of people entering and exiting the transportation hub at the entrances and exits to determine the number of people gathered, it is necessary to equip staff at each entrance and exit of the transportation hub to count the number of people, so more manpower is required, resulting in The cost of counting the number of people gathered is relatively high.
  • the portrait detection method, device, electronic device and storage medium provided by the present application can reduce the cost of counting the number of people gathered.
  • the embodiment of the present application provides a method for detecting a human figure, including:
  • the at least two second feature images determine the distribution of human figures in the image to be detected.
  • the embodiment of the present application also provides a human portrait detection device, including:
  • An acquisition module configured to acquire an image to be detected, wherein the image to be detected includes at least one portrait
  • a generation module configured to generate at least two first feature images of the image to be detected acquired by the acquisition module, wherein the first first feature image is obtained after feature extraction from the image to be detected, and the latter A first feature image is obtained after feature extraction from a previous first feature image;
  • a fusion module configured to perform feature fusion on the at least two first feature images generated by the generation module to obtain at least two second feature images
  • a detection module configured to determine the distribution of portraits in the image to be detected according to the at least two second feature images obtained by the fusion module.
  • the embodiment of the present application also provides an electronic device, including: a processor, a communication interface, a memory, and a communication bus, and the processor, the memory, and the communication interface complete the mutual communication through the communication bus.
  • Communication including: a processor, a communication interface, a memory, and a communication bus, and the processor, the memory, and the communication interface complete the mutual communication through the communication bus.
  • the memory is used to store at least one executable instruction, and the executable instruction causes the processor to perform operations corresponding to the portrait detection method provided in the first aspect.
  • the embodiment of the present application also provides a computer-readable storage medium, where computer instructions are stored on the computer-readable storage medium, and when the computer instructions are executed by a processor, the processor executes the above-mentioned Operations corresponding to the portrait detection method provided in the first aspect.
  • the embodiment of the present application further provides a computer program product
  • the computer program product is tangibly stored on a computer-readable medium and includes computer-executable instructions, and when executed, the computer-executable instructions use At least one processor executes the portrait detection method provided in the first aspect or any possible implementation manner of the first aspect.
  • the image to be detected including the portrait
  • extracting features from the image to be detected to obtain a plurality of first feature images, and then fusing each first feature image to obtain a plurality of second feature images, and then according to each second feature image
  • the number of people gathered and the distribution of people realize the automatic detection of the number of people gathered and the distribution of people, and there is no need to equip staff at each entrance and exit of the place to count the number of people, which can save manpower and reduce the number of people gathered in the place. the cost of.
  • the second feature image is obtained by performing feature fusion on the first feature image, according to the generation order of each first feature image, at least two adjacently generated first feature images Perform feature fusion to obtain at least two second feature images, and different second feature images are obtained by feature fusion of at least two first feature images that are not completely the same.
  • each first feature image is generated sequentially, the latter first feature image is obtained after extracting features from the previous first feature image, some features in the previous first feature image may be discarded in the latter first feature image, and The discarded features may be the smaller-sized portraits in the image to be detected.
  • By performing feature fusion on the adjacently generated first feature images it is ensured that the obtained second feature images will not lose the features in the image to be detected, so that in When the distribution of portraits in the image to be detected is determined based on the second feature image, portraits with smaller sizes in the image to be detected can also be identified, thereby improving the detection accuracy of the number and distribution of portraits in the image to be detected.
  • the first eigenimage corresponds to the second eigenimage.
  • the second feature image corresponding to the first feature image generated by the latter is fused with the first feature image generated by the previous one, the second feature image corresponding to the first feature image generated by the previous one is obtained, Therefore, for any first characteristic image, the second characteristic image corresponding to the first characteristic image includes all the features in the first characteristic image higher order than the first characteristic image, and the image to be detected will not be lost.
  • the features in the image so that when determining the number and distribution of portraits in the image to be detected based on the second feature image, the comprehensiveness of the recognition of portraits in the image to be detected can be improved, and the detection of the number of portraits and the distribution of portraits in the image to be detected can be improved. accuracy.
  • For the corresponding second feature image perform convolution processing on the second feature image corresponding to the nth generated first feature image to obtain a third feature image, and perform bilinear interpolation processing on the third feature image to obtain
  • the fourth feature image is fused with the first feature image generated by the n-1th to obtain the fifth feature image
  • the fifth feature image is convolved to obtain the n-1th generated The second feature image corresponding to the first feature image.
  • n is an integer greater than 1 and less than or equal to the total number of first feature images
  • the size of the second feature image and the third feature image corresponding to the nth generated first feature image are both C*W*H
  • C is the number of channels
  • W is the width of the image
  • H is the height of the image
  • the size of the fourth feature image is C*2W*2H
  • the size of the first feature image generated by the n-1th is C*2W*2H
  • the size of the fifth feature image is 2C*2W*2H
  • the size of the second feature image corresponding to the n-1th generated first feature image is C*2W*2H.
  • the third feature image Before performing feature fusion, perform bilinear interpolation processing on the third feature image to obtain a fourth feature image having the same size as the n-1th generated first feature image, so that feature fusion can proceed smoothly.
  • the fifth feature image generated by feature fusion is convoluted to obtain the second feature image with the same size as the first feature image generated by the n-1th, ensuring that the input feature image is consistent with the output
  • the feature images have the same size, which is convenient for subsequently determining the distribution of the portraits in the image to be detected according to the second feature image, so that the portrait detection can be carried out smoothly.
  • the fifth feature image is obtained by performing receptive field enhancement processing on the second feature image, and the reference area of the portrait in the fifth feature image in the image to be detected is increased.
  • the reference area of the five feature images in the image to be detected so when the distribution of portraits in the image to be detected is determined based on the fifth feature image, the ability to detect portraits of different sizes in the image to be detected can be improved, thereby improving the performance of portraits in the image to be detected. Quantity and distribution of portraits for detection accuracy.
  • the receptive field enhancement process is performed on the second feature image to obtain the fifth feature image
  • three times of convolution processing is performed on the second feature image to obtain the seventh feature image.
  • the size of the second characteristic image and the fifth characteristic image, the seventh characteristic image, the eighth characteristic image and the ninth characteristic image corresponding to the second characteristic image are C* W*H
  • the size of the tenth feature image corresponding to the second feature image is 3C*W*H.
  • the seventh feature image, the eighth feature image, and the ninth feature image are obtained, and by performing feature fusion on the seventh feature image, the eighth feature image, and the ninth feature image, A tenth feature image is obtained, and a fifth feature image corresponding to the second feature image is obtained by performing convolution on the tenth feature image.
  • the seventh feature image, the eighth feature image and the ninth feature image are obtained by performing different times of convolution on the second feature image, the fifth feature obtained based on the seventh feature image, the eighth feature image and the ninth feature image
  • the image has a stronger receptive field than the second feature image, so that people of different sizes in the image to be detected can be accurately detected based on the fifth feature image, thereby ensuring the accuracy of detection of the number and distribution of faces in the image to be detected.
  • the sixth feature image is first subjected to normalization processing, and then the normalized sixth feature image Input the images into the pre-trained first classifier, second classifier and third classifier respectively, obtain the center point information output by the first classifier, obtain the first image frame information output by the second classifier, and obtain the third classifier The output second image frame information, and then according to the central point information, the first image frame information and the second image frame information, determine the distribution of the portraits in the image to be detected.
  • the center point information is used to indicate the center point coordinates of the portrait head in the image to be detected
  • the first image frame information includes the coordinate value of the rectangular frame used to mark the portrait head in the image to be detected
  • the second image frame information is included in the image to be detected
  • the detection image summarizes the coordinate values of the rectangular frame used to mark the human body, the position of the head of the portrait in the image to be detected can be determined according to the center point information and the first image frame information, and the position of the human body in the image to be detected can be determined according to the second image frame information position, and then according to the number of rectangular frames marked with the head of the portrait or the number of rectangular frames marked with the human body, the number of portraits in the image to be detected can be determined.
  • the position of the rectangular frame can determine the distribution of portraits in the image to be detected. Marking the portrait head and human body in the image to be processed by the rectangular frame can more accurately determine the number and distribution of portraits in the image to be detected, and then can more accurately determine the number of people gathered and the distribution of people in the corresponding place. Help improve user experience.
  • the normalized sixth feature image can also be input into the fourth classifier to obtain the image frame quality information output by the fourth classifier, and the image frame quality information is used to indicate the Detect the accuracy of the rectangular frame used to mark the head of the portrait in the image to mark the head of the portrait, and then filter out the target center point from the center point information according to the quality information of the image frame, and the corresponding target center point is used to mark the head of the portrait
  • the accuracy of the rectangular frame is less than the preset accuracy threshold, and then the coordinate value of the target center point is deleted from the center point information.
  • each rectangular frame determined by the second classifier corresponds to a central point coordinate in the central point information
  • Delete the coordinates of the center point corresponding to the rectangular frame from the center point information and then discard the rectangular frame that fails to accurately mark the head of the portrait in the image to be detected, so as to avoid misrecognition of the portrait, thereby further improving the number of portraits in the image to be detected and The detection accuracy of the portrait distribution.
  • FIG. 1 is a flow chart of a human portrait detection method provided in Embodiment 1 of the present application;
  • FIG. 2 is a schematic diagram of a feature fusion method provided in Embodiment 2 of the present application.
  • FIG. 3 is a flow chart of another feature fusion method provided in Embodiment 2 of the present application.
  • FIG. 4 is a flow chart of a feature fusion method provided in Embodiment 2 of the present application.
  • FIG. 5 is a schematic diagram of a portrait detection method provided in Embodiment 3 of the present application.
  • FIG. 6 is a schematic diagram of a receptive field enhancement processing method provided in Embodiment 3 of the present application.
  • FIG. 7 is a flow chart of a method for determining the number of portraits and the distribution of portraits provided in Embodiment 3 of the present application;
  • FIG. 8 is a schematic diagram of a human portrait detection device provided in Embodiment 4 of the present application.
  • FIG. 9 is a schematic diagram of another portrait detection device provided in Embodiment 4 of the present application.
  • FIG. 10 is a schematic diagram of another human image detection device provided in Embodiment 4 of the present application.
  • FIG. 11 is a schematic diagram of another human image detection device provided in Embodiment 4 of the present application.
  • FIG. 12 is a schematic diagram of an electronic device provided in Embodiment 5 of the present application.
  • C i fifth feature image 801: acquisition module 802: generation module
  • Detection sub-module 805 Calculation module 806: Screening module
  • Delete module 1202 Processor 1204: Communication interface
  • a subway station usually includes 4 entrances and exits, and a railway station includes multiple entrances and multiple exits. It is necessary to staff each entrance and exit through manual counting. In order to determine the number of gathered people in the venue, more manpower is required for counting the number of people, resulting in a higher cost of counting the number of gathered people.
  • images to be detected including portraits are collected from the place, features are extracted from the images to be detected to obtain a plurality of first feature images, and then each first The feature images are subjected to feature fusion to obtain a plurality of second feature images, and then the distribution of portraits in the image to be detected is determined according to each second feature image.
  • the distribution of portraits in the image to be detected can determine the number and distribution of people gathered in the place.
  • the image to be detected including portraits is collected from the place where the number of people to be counted and the distribution of people are determined, and the image to be detected is processed to determine the distribution of the portraits in the image to be detected, and then determine the number of people gathered in the place And the distribution of personnel, there is no need to assign staff to count the number of people at each entrance and exit of the place, which can save manpower and reduce the cost of counting the number of people gathered in the place.
  • the embodiment of the present application extracts the feature image from the image to be detected, and determines the number of people and the number of people in the image to be detected by performing various types of processing on the feature image, such as feature extraction, feature fusion, and receptive field enhancement. Distribution, where the feature images involved (including the first feature image, the second feature image...the Nth feature image, etc.) refer to the feature image (featuremap) in the convolutional layer.
  • Fig. 1 is a flow chart of a portrait detection method 100 provided in Embodiment 1 of the present application. As shown in Fig. 1, the portrait detection method 100 includes the following steps:
  • Step 101 acquire an image to be detected.
  • the image to be detected is an image requiring portrait recognition, and the image to be detected includes at least one portrait.
  • the image to be detected is an image in a place with a large flow of people.
  • the image to be detected can be collected by a camera set at a high place in a place with a large flow of people.
  • Step 102 generating at least two first characteristic images of the image to be detected.
  • features are first extracted from the image to be detected to obtain a first feature image, and then features are extracted from the obtained first feature image to obtain a new first feature image, that is, the first first feature
  • the image is obtained after features are extracted from the image to be detected, and the next first feature image is obtained after features are extracted from the previous first feature image.
  • extract features from the image to be detected to obtain the first feature image 1 extract features from the first feature image 1 to obtain the first feature image 2, extract features from the first feature image 2 to obtain the first feature image 3, and extract features from the first feature image 2 to obtain the first feature image 3.
  • Features are extracted from a feature image 3 to obtain a first feature image 4 . That is, the first feature image 1 is obtained after feature extraction from the image to be detected, the first feature image 2 is obtained after feature extraction from the first feature image 1, and the first feature image 3 is obtained from the first feature image 2 The first feature image 4 is obtained after feature extraction from the first feature image 3 .
  • Step 103 Perform feature fusion on each first feature image to obtain at least two second feature images.
  • two or more first feature images are subjected to feature fusion to obtain at least two second feature images, wherein different second feature images consist of at least not identical
  • the fusion of two first feature images is obtained.
  • feature fusion is to combine the features extracted from the image into a feature that is more discriminative than the input, that is, perform feature fusion on at least two first feature images, and obtain a feature that is better than each first feature image used. A more discriminative second feature image.
  • a series of feature fusion strategy or a parallel feature fusion strategy can be used.
  • Step 104 determine the distribution of human figures in the image to be detected.
  • the second feature image is obtained by performing feature fusion on the first feature image
  • the first feature image is directly or indirectly extracted from the image to be detected, so the second feature image includes the position of the portrait in the image to be detected, the outline of the portrait, Therefore, according to each second feature image, the distribution of portraits in the image to be detected can be determined.
  • the image to be detected including a portrait
  • features are extracted from the image to be detected to obtain a plurality of first feature images, and then the first feature images are fused to obtain a plurality of second feature images, and then according to Each second characteristic image determines the distribution of portraits in the image to be detected. Since the image to be detected can be collected from the corresponding place, the portraits in the image to be detected can be mapped to the corresponding place, so that according to the distribution of portraits in the image to be detected , to determine the number of people gathered and the distribution of people in the corresponding place, and realize the automatic detection of the number of people gathered and the distribution of people The cost of gathering the number of people for statistics.
  • the first first feature image is obtained after feature extraction from the image to be detected, and the latter first feature image is obtained after feature extraction from the previous first feature image, according to the first feature image Acquisition order, the higher the order of the first feature image acquired later, the higher-order first feature image has stronger semantic information, but the resolution is lower, and the perception of details is poorer, resulting in small objects in the high-order is lost in the first eigenimage of .
  • the second feature image is obtained by performing feature fusion on different first feature images, ensuring that the second feature image includes high-order semantic information without losing small objects, and ensuring that smaller portraits in the image to be detected can be identified, thereby To ensure the accuracy of the detection of the number of portraits and the distribution of portraits in the image to be detected.
  • the distribution of the portraits in the image to be detected may include the position distribution of the portraits in the image to be detected, and may include the number of portraits in the image to be detected.
  • At least two adjacently generated first feature images can be The feature images are fused to obtain at least two second feature images, wherein different second feature images are obtained by feature fusion of at least two first feature images that are not completely the same.
  • the latter first feature image is obtained after feature extraction from the previous first feature image, small objects in the previous first feature image may be lost in the latter first feature image, according to the first A sequence of generating feature images, performing feature fusion on at least two adjacently generated first feature images to obtain a second feature image, ensuring that the second feature image includes small objects, and then determining the image to be detected according to the second feature image
  • the smaller portraits in the image to be detected can be identified, thereby ensuring the accuracy of the detection of the number and distribution of portraits in the image to be detected.
  • each first characteristic image is the first characteristic image 1, the first characteristic image 2, the first characteristic image 3 and the first characteristic image 4, in When performing feature fusion on the first feature image to generate the second feature image, feature fusion can be performed on the first feature image 1 and the first feature image 2, feature fusion on the first feature image 2 and the first feature image 3, and feature fusion on the first feature image 2 and the first feature image 3.
  • a characteristic image 3 and the first characteristic image 4 are subjected to feature fusion, the first characteristic image 1, the first characteristic image 2 and the first characteristic image 3 are subjected to feature fusion, and the first characteristic image 2, the first characteristic image 3 and the first characteristic image are A feature image 4 is subjected to feature fusion, and feature fusion is performed on the first feature image 1, the first feature image 2, the first feature image 3, and the first feature image 4, and a second feature image can be obtained for each feature fusion.
  • the feature fusion of the first feature image 1 and the first feature image 2 can be performed first, and then the feature fusion The result is subjected to feature fusion with the first feature image 3 to obtain the second feature image.
  • the first feature image 2 and the first feature image 3 can be first feature fused, and then the feature fusion result is combined with the first feature image A feature image 4 is subjected to feature fusion to obtain a second feature image.
  • Fig. 2 is a schematic diagram of a feature fusion method provided in Embodiment 2 of the present application. As shown in Fig. 2, there are a total of N first feature images, and according to the generation order of each first feature image, the first first feature image A 1 is extracted from the image to be detected A0 , and the nth first feature image A n is extracted from the first feature image A n-1 , where n is an integer greater than 1 and less than or equal to N.
  • the second feature image B n corresponding to the nth generated first feature image A n is fused with the n-1th generated first feature image A n-1 to obtain the n-1th
  • the generated first feature image A n-1 corresponds to the second feature image B n-1 .
  • Fig. 3 is a schematic diagram of another feature fusion method provided in Embodiment 2 of the present application. As shown in Fig. 3, there are a total of 4 first feature images. According to the generation order of each first feature image, the first first feature image A1 is extracted from the image to be detected A0 , the second first feature image A2 is extracted from the first feature image A1 , and the third first feature image A3 is extracted from the first feature image A2 Obtained, the fourth first feature image A4 is extracted from the first feature image A3 .
  • convolution processing is performed on the Nth generated first feature image to obtain a second feature image corresponding to the Nth first feature image, and the nth generated first feature image
  • the corresponding second feature image is subjected to feature fusion with the n-1th generated first feature image to obtain a second feature image corresponding to the n-1th generated first feature image, so that each second feature
  • the sum of the images includes all the feature information in the image to be detected, which can improve the comprehensiveness of the recognition of the portraits in the image to be detected, and thus ensure the accuracy of the detection of the number and distribution of portraits in the image to be detected.
  • the second characteristic image corresponding to the first characteristic image has a lower resolution, and correspondingly the second characteristic image includes fewer features, Moreover, the size of the second characteristic image is relatively small, and the larger portrait in the image to be detected can be quickly identified through the second characteristic image.
  • the second characteristic image corresponding to the first characteristic image has a higher resolution, correspondingly the second characteristic image includes more features, and the second characteristic image The size is larger, and the smaller portrait in the image to be detected can be identified through the second feature image.
  • the obtained second feature images have different resolutions, and the second feature images with lower resolution include high-order features, which can be used to quickly identify larger portraits in the image to be detected, while the second feature images with higher resolution
  • the second characteristic image of the second feature image includes more image information, which can be used to identify smaller portraits in the image to be detected, so that when the distribution of portraits in the image to be detected is determined through each second feature image, it can not only improve the performance of portraits in the image to be detected
  • the recognition efficiency can also ensure the accuracy of the recognition of the human figure in the image to be detected.
  • FIG. 4 is a flowchart of a feature fusion method 400 provided in Embodiment 2 of the present application. As shown in FIG. 4 , the feature fusion method 400 includes the following steps:
  • Step 401 Input a second feature image corresponding to the nth generated first feature image.
  • the size of the second feature image corresponding to the nth generated first feature image is C*W*H, where C is the number of channels, W is the width of the image, and H is the height of the image.
  • the size of the second feature image corresponding to the nth generated first feature image is defined as C*W*H, just to illustrate the changes in the size and number of channels of each feature image during the feature fusion process
  • the size and number of channels of the third feature image are not specifically limited, because different first feature images have different sizes, and the second feature images corresponding to different first feature images also have different sizes.
  • Step 402 Perform convolution processing on the second feature image corresponding to the nth generated first feature image to obtain a third feature image.
  • the second characteristic image B 3 When obtaining the second characteristic image B 3 corresponding to the first characteristic image A 3 , the second characteristic image B 4 is first convoluted to obtain the third characteristic image.
  • the size of the obtained third feature image is also C*W*H.
  • the size of the convolution kernel used may be C*3*3.
  • Step 403 Perform bilinear interpolation processing on the third feature image to obtain a fourth feature image.
  • the size of the n-1th generated first feature image is C*2W*2H, in order to be able to Perform feature fusion with the n-1th generated first feature image, and perform bilinear interpolation processing on the third feature image to obtain a fourth feature image with a size of C*2W*2H.
  • the upsampling layer bilinear interpolation can be performed on the third feature image to obtain a fourth feature with a size of C*2W*2H image.
  • Step 404 Perform feature fusion of the fourth feature image and the n-1th generated first feature image to obtain a fifth feature image.
  • the size of the second feature image corresponding to the nth generated first feature image is C*W*H
  • the size of the n-1th generated first feature image is C*2W*2H
  • the fourth The size of the feature image is also C*2W*2H
  • the fifth feature image with a size of 2C*2W*2H is obtained by performing feature fusion on the fourth feature image and the n-1th generated first feature image.
  • Step 405 performing convolution processing on the fifth feature image to obtain a second feature image corresponding to the (n-1)th generated first feature image.
  • the size of the fifth feature image is 2C*2W*2H
  • the size of the first feature image generated by the n-1th is C*2W*2H
  • the first feature image corresponding to the n-1th generated first feature image The second feature image should have the same size as the first feature image generated by the n-1th one. For this reason, the fifth feature image is convoluted to obtain the first feature image corresponding to the n-1th one.
  • the size of the convolution kernel used may be C*3*3.
  • the fourth feature image before performing feature fusion, bilinear interpolation is performed on the third feature image to obtain the fourth feature image, so that the fourth feature image has the same size so that feature fusion can proceed smoothly.
  • the fifth feature image is convolved to obtain the second feature image corresponding to the n-1th generated first feature image, so that it corresponds to the n-1th generated first feature image.
  • the second feature image has the same size as the first feature image generated by the n-1th, ensuring that the input feature image and the output feature image have the same size, which is convenient for subsequent determination of the image to be detected based on the second feature image
  • the distribution of portraits makes portrait detection go smoothly.
  • the size of the portrait in the image to be detected is uncertain due to the influence of the distance between the person and the image acquisition device, and the person who is closer to the image acquisition device has a larger size in the image to be detected. The person who is far away from the image acquisition device has a smaller portrait in the image to be detected.
  • the receptive field enhancement process can be performed on the second feature image, and then According to the second feature image after the receptive field enhancement processing, the distribution of the portraits in the image to be detected is determined.
  • FIG. 5 is a schematic diagram of a portrait detection method provided in Embodiment 3 of the present application.
  • receptive field enhancement processing is performed on each second feature image to obtain a corresponding fifth feature image.
  • Perform receptive field enhancement processing on the second characteristic image B 1 to obtain the fifth characteristic image C 1 perform receptive field enhancement processing on the second characteristic image B 2 to obtain the fifth characteristic image C 2 , and perform The receptive field enhancement processing is performed to obtain the fifth characteristic image C n-1
  • the receptive field enhancement processing is performed on the second characteristic image B n to obtain the fifth characteristic image C n
  • the receptive field enhancement processing is performed on the second characteristic image B N to obtain the fifth feature Image CN .
  • After obtaining the fifth characteristic image corresponding to each second characteristic image perform feature fusion on each fifth characteristic image to obtain a sixth characteristic image D, and then determine the distribution of portraits in the image to be detected according to the sixth characteristic image D.
  • the size of the portrait in the image to be detected is different, and the fifth feature image can be obtained by performing receptive field enhancement processing on the second feature image, which can increase the fifth feature
  • the reference area of the portrait in the image in the image to be detected so that when the distribution of portraits in the image to be detected is determined based on the fifth feature image, the ability to detect portraits of different sizes in the image to be detected can be improved, thereby increasing the number of portraits in the image to be detected and the detection accuracy of the portrait distribution.
  • different times of convolution processing may be performed on the second feature image, and then the convolution processing of different times to obtain A plurality of feature images are fused to obtain a fifth feature image.
  • Fig. 6 is a schematic diagram of a receptive field enhancement processing method provided in Embodiment 3 of the present application.
  • the second feature image B i is processed through three parallel convolution processes.
  • the feature image B i is subjected to convolution processing, and then the fifth feature image C i is obtained by performing feature fusion on the feature images obtained by three parallel convolution processing processes.
  • the size of the second feature image B i is defined as C* W*H, C is the number of channels, W is the width of the image, and H is the height of the image.
  • the second feature image B i is firstly convolved with a convolution kernel with a size of C*3*3 to obtain a feature image B i11 with a size of C*W*H, and then through The convolution kernel of size C*3*3 performs convolution processing on the feature image B i11 to obtain the feature image B i12 of size C*W*H, and then the feature image B is checked by the convolution kernel of size C*3*3 Convolution processing is performed on i12 to obtain a seventh feature image B i13 with a size of C*W*H.
  • the second feature image B i is firstly convolved with a convolution kernel of size C*3*3 to obtain a feature image B i21 of size C*W*H, and then by A convolution kernel with a size of C*3*3 performs convolution processing on the feature image B i21 to obtain an eighth feature image B i22 with a size of C*W*H.
  • the second feature image B i is convolved with a convolution kernel of size C*3*3 to obtain a ninth feature image B i31 of size C*W*H.
  • the size of the convolution kernels used in a total of 6 convolutions is C*3*3, and the same or different convolution kernels can be used in the 6 convolutions , or part of the convolution processing uses the same convolution kernel, which is not limited in this embodiment of the present application.
  • the eighth characteristic image B i22 and the ninth characteristic image B i31 After obtaining the seventh characteristic image B i13 , the eighth characteristic image B i22 and the ninth characteristic image B i31 , perform feature fusion on the seventh characteristic image B i13 , the eighth characteristic image B i22 and the ninth characteristic image B i31 to obtain The tenth feature image B i123 with a size of 3C*W*H, and then perform convolution processing on the tenth feature image with a convolution kernel with a size of C*1*1 to obtain the first feature image B i corresponding to the second feature image
  • Five feature images C i the size of the fifth feature image C i is the same as that of the second feature image B i , which is also C*W*H.
  • the seventh characteristic image, the eighth characteristic image and the ninth characteristic image are obtained by performing different times of convolution processing on the second characteristic image, and the seventh characteristic image, the eighth characteristic image and the ninth characteristic image After performing feature fusion on the nine feature images to obtain the tenth feature image, convolution processing is performed on the tenth feature image to obtain a fifth feature image having the same size as the second feature image, so that the obtained fifth feature image is relatively larger than the second feature image
  • the feature image has a stronger receptive field, so that people of different sizes in the image to be detected can be accurately detected based on the fifth feature image, thereby ensuring the accuracy of detection of the number and distribution of people in the image to be detected.
  • the sixth feature image when determining the distribution of portraits in the image to be detected according to the sixth feature image, can be input into multiple pre-trained classifiers, and each classifier can determine the distribution of portraits in the image to be detected The coordinates of the center point of the image and the rectangular frame marked with the portrait, and then according to the coordinates of the center point of the portrait and the rectangular frame marked with the portrait, the distribution of the portraits in the image to be detected is determined.
  • FIG. 7 is a flow chart of a method 700 for determining the number of portraits and portrait distribution provided in Embodiment 3 of the present application. As shown in FIG. 7 , the method 700 for determining the number of portraits and portrait distribution includes the following steps:
  • Step 701. Input the sixth feature image after normalization processing into the first classifier, and obtain center point information output by the first classifier.
  • the sixth feature image is first normalized, so that the sixth feature image can be input into the pre-trained classifier, and the classifier can identify the target image based on the normalized sixth feature image. Detect portraits in images.
  • group normalization processing may be performed on the sixth feature image.
  • the first classifier is trained in advance by image samples, and the first classifier is used to determine the coordinates of the center point of the portrait head in the original image corresponding to the feature image according to the input feature image.
  • After normalizing the sixth feature image input the normalized sixth feature image into the first classifier to obtain center point information output by the first classifier, and the center point information is used to indicate the image to be detected.
  • the coordinates of the center point of the portrait's head According to the center point coordinates output by the first classifier, the center point of the head of the portrait can be marked on the image to be detected.
  • Step 702 Input the sixth feature image after normalization processing into the second classifier, and obtain the first image frame information output by the second classifier.
  • the second classifier is trained in advance by image samples, and the second classifier is used to determine a rectangular frame for labeling a portrait head in an original image corresponding to the feature image according to the input feature image.
  • After normalizing the sixth feature image input the normalized sixth feature image into the second classifier to obtain the first image frame information output by the second classifier, the first image frame information is included in The coordinates of the rectangular frame used to mark the head of the person in the image to be detected.
  • the first image frame information includes the coordinate value of the upper left corner and the lower right corner of the image frame.
  • the image frame defined by the frame information is used to mark the head of the portrait in the image to be detected. Therefore, combined with the center point information output by the first classifier and the first image frame information, each portrait can be marked on the image to be detected by a rectangular frame head.
  • Step 703 Input the sixth feature image after normalization processing into the third classifier, and obtain the second image frame information output by the third classifier.
  • the third classifier is trained in advance through the sample image, and the third classifier is used to determine a rectangular frame used to mark the human body in the original image corresponding to the feature image according to the input feature image. After normalizing the sixth feature image, input the normalized sixth feature image into the third classifier to obtain the second image frame information output by the third classifier, the second image frame information is included in The coordinate value of the rectangular frame used to mark the human body in the image to be detected.
  • the second image frame information includes the coordinate value of the upper left corner and the lower right corner of the image frame. Since the image frame defined by the second image frame information is used to mark the human body in the image to be detected, according to the second image Frame information, each human body can be marked by a rectangle on the image to be detected.
  • the image to be detected may not include a complete human body. For example, only the head of a person or the image of the head and upper body of a person are included in the image to be detected.
  • the third image trained by image samples The classifier can predict the position of the entire human body in the image to be detected based on the head of the portrait, and then output the coordinate value of the image frame used to mark the human body in the image to be detected.
  • Step 704 according to the center point information, the first image frame information and the second image frame information, determine the distribution of the portraits in the image to be detected.
  • the center point information is used to indicate the coordinates of the center point of the portrait head in the image to be detected
  • the first image frame information is used to indicate the rectangular frame marked with the portrait head in the image to be detected
  • the second image frame information is used to indicate the image to be detected Therefore, the position of the head of the portrait in the image to be detected can be determined according to the center point information and the first image frame information, and the position of the human body in the image to be detected can be determined according to the information of the second image frame, and then according to the The number of portrait heads or the number of human bodies in the image can determine the number of portraits in the image to be detected, and the distribution of portraits in the image to be detected can be determined according to the position of the head of the portrait in the image to be detected and the position of the human body.
  • a plurality of classifiers are pre-trained, and the sixth feature image after normalization processing is input into each classifier respectively, and the center point information, the first image frame information and the second image output by each classifier are obtained Frame information, the position of the portrait head in the image to be detected can be determined according to the center point information and the first image frame information, the position of the human body in the image to be detected can be determined according to the second image frame information, and then according to the rectangular frame marking the portrait head The number of the number of the human body or the number of rectangular frames marked with the human body can determine the number of portraits in the image to be detected. Distribution of portraits.
  • the number and distribution of portraits in the image to be detected based on the center point information, and use the coordinate deviation value to determine the rectangular frame marking the head of the portrait, which can improve the computing speed of the second classifier.
  • the number and distribution of the portraits in the image to be detected can be determined, which can avoid the conflict between the features of the head of the portrait and the characteristics of the human body, so that the image to be detected can be determined more accurately.
  • a fourth classifier may be trained in advance through image samples, and the fourth classifier is used to determine the character used to characterize the feature image according to the input feature image.
  • the rectangular frame marked with the head of the portrait marks the accuracy information of the head of the portrait.
  • the image frame quality information is used to indicate In the image to be detected, the rectangular frame used to mark the head of the portrait marks the accuracy of the head of the portrait.
  • the target center point can be determined from the center point information according to the image frame quality information, wherein the accuracy of the rectangular frame used to mark the portrait head corresponding to the target center point is less than the preset accuracy threshold , and then deleted from the coordinate value of the center point of the winning target in the center point information.
  • the pre-trained fourth classifier is used to detect whether the rectangular frame determined by the second classifier can accurately mark the head of the portrait, because each rectangular frame determined by the second classifier and the center point information Corresponding to a central point coordinate of the second classifier, when it is determined that a rectangular frame determined by the second classifier cannot accurately label the portrait head in the image to be detected, the central point coordinate corresponding to the rectangular frame is deleted from the central point information, and then Abandoning the rectangular frame that fails to accurately mark the head of the portrait in the image to be detected avoids misidentification of the portrait, thereby further improving the accuracy of detection of the number and distribution of portraits in the image to be detected.
  • FIG. 8 is a schematic diagram of a portrait detection device 800 provided in Embodiment 4 of the present application. As shown in FIG. 8 , the portrait detection device 800 includes:
  • An acquisition module 801, configured to acquire an image to be detected, wherein the image to be detected includes at least one portrait
  • a generation module 802 configured to generate at least two first feature images of the image to be detected acquired by the acquisition module 801, wherein the first first feature image is obtained after extracting features from the image to be detected, and the latter first feature image The image is obtained after feature extraction from the previous first feature image;
  • a fusion module 803, configured to perform feature fusion on at least two first feature images generated by the generation module 802, to obtain at least two second feature images;
  • the detection module 804 is configured to determine the distribution of human figures in the image to be detected according to the at least two second characteristic images obtained by the fusion module 803 .
  • the acquisition module 801 can be used to execute step 101 in the first embodiment above
  • the generation module 802 can be used to execute step 102 in the first embodiment above
  • the fusion module 803 can be used to execute the steps in the first embodiment above 103.
  • the detection module 804 may be configured to execute step 104 in the first embodiment above.
  • the fusion module 803 is configured to perform feature fusion on at least two adjacently generated first feature images according to the generation sequence of each first feature image to obtain at least two The second feature image, wherein the different second feature images are obtained through feature fusion of at least two first feature images that are not completely the same.
  • FIG. 9 is a schematic diagram of another portrait detection device 800 provided in Embodiment 4 of the present application.
  • the fusion module 803 includes:
  • the convolution sub-module 8031 is configured to perform convolution processing on the Nth generated first feature image according to the order in which the first feature images are generated, and obtain the second corresponding to the Nth generated first feature image.
  • Feature image wherein, N is the quantity of the first feature image;
  • the first fusion sub-module 8032 is used to perform feature fusion on the second feature image corresponding to the n-th generated first feature image obtained by the convolution sub-module 8031 and the (n-1)-th generated first feature image , to obtain a second feature image corresponding to the n-1th generated first feature image, where n is an integer greater than 1 and less than or equal to N.
  • the first fusion submodule 8032 is configured to perform the following operations:
  • the size of the n-1th generated first feature image is C*2W*2H, and the n-1th generated first feature image has a size of C*2W*2H.
  • the size of the five-feature image is 2C*2W*2H;
  • FIG. 10 is a schematic diagram of another human portrait detection device 800 provided in Embodiment 4 of the present application. As shown in FIG. 10 , the detection module 804 includes:
  • the enhancement sub-module 8041 is configured to perform receptive field enhancement processing on each second feature image to obtain a corresponding fifth feature image;
  • the second fusion sub-module 8042 is configured to perform feature fusion on the fifth feature images obtained by the enhancement sub-module 8041 to obtain a sixth feature image;
  • the detection sub-module 8043 is used to determine the distribution of portraits in the image to be detected according to the sixth feature image obtained by the second fusion sub-module 8042.
  • the enhancement submodule 8041 is configured to perform the following processing for each second feature image:
  • the detection submodule 8043 is configured to perform the following processing:
  • the distribution of the portraits in the image to be detected is determined.
  • Fig. 11 is a schematic diagram of another portrait detection device 800 provided in Embodiment 4 of the present application. As shown in Fig. 11, the portrait detection device 800 further includes:
  • a calculation module 805, configured to input the normalized sixth feature image into a fourth classifier to obtain image frame quality information output by the fourth classifier, wherein the fourth classifier is used to determine according to the input feature image
  • the rectangular frame marking the head of the portrait is used to mark the accuracy of the head of the portrait.
  • the image frame quality information is used to indicate the image to be detected, which is used to mark the head of the portrait The accuracy of the rectangular frame to mark the head of the portrait;
  • the screening module 806 is used to determine the target center point from the center point information according to the image frame quality information obtained by the calculation module 805, wherein the accuracy of the rectangular frame used to mark the portrait head corresponding to the target center point is less than the preset accuracy threshold;
  • the deletion module 807 is configured to delete the coordinate value of the target center point determined by the screening module 806 from the center point information.
  • FIG. 12 is a schematic diagram of an electronic device provided in Embodiment 5 of the present application.
  • the specific embodiment of the present application does not limit the specific implementation of the electronic device.
  • an electronic device 1200 provided by an embodiment of the present application includes: a processor (processor) 1202 , a communication interface (Communications Interface) 1204 , a memory (memory) 1206 , and a communication bus 1208 . in:
  • the processor 1202 , the communication interface 1204 , and the memory 1206 communicate with each other through the communication bus 1208 .
  • the communication interface 1204 is used for communicating with other electronic devices or servers.
  • the processor 1202 is configured to execute the program 1210, and specifically, may execute relevant steps in any of the aforementioned embodiments of the human portrait detection method.
  • the program 1210 may include program codes including computer operation instructions.
  • the processor 1202 may be a central processing unit CPU, or an ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement the embodiments of the present application.
  • the one or more processors included in the smart device may be of the same type, such as one or more CPUs, or may be different types of processors, such as one or more CPUs and one or more ASICs.
  • the memory 1206 is used to store the program 1210 .
  • the memory 1206 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory.
  • the program 1210 may be specifically configured to enable the processor 1202 to execute the portrait detection method in any of the preceding embodiments.
  • the electronic device of the embodiment of the present application after acquiring the image to be detected including the portrait, extracting features from the image to be detected to obtain a plurality of first feature images, and then fusing the first feature images to obtain a plurality of second feature images, Then determine the distribution of the portraits in the images to be detected according to the second feature images. Since the images to be detected can be collected from the corresponding places, the portraits in the images to be detected can be mapped to the corresponding places, so that according to the portraits in the images to be detected distribution, determine the number of people gathered in the corresponding place and the distribution of people, and realize the automatic detection of the number of people gathered and the distribution of people. The cost of counting the number of people gathered in a venue.
  • the present application also provides a computer-readable storage medium storing instructions for causing a machine to execute the image detection method as described herein.
  • a system or device equipped with a storage medium may be provided, on which a software program code for realizing the functions of any of the above embodiments is stored, and the computer (or CPU or MPU of the system or device) ) to read and execute the program code stored in the storage medium.
  • the program code itself read from the storage medium can realize the function of any one of the above-mentioned embodiments, so the program code and the storage medium storing the program code constitute a part of the present application.
  • Examples of storage media for providing program code include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), Tape, non-volatile memory card, and ROM.
  • the program code can be downloaded from a server computer via a communication network.
  • the program code read from the storage medium is written into the memory provided in the expansion board inserted into the computer or written into the memory provided in the expansion module connected to the computer, and then based on the program code
  • the instruction causes the CPU installed on the expansion board or the expansion module to perform some or all of the actual operations, thereby realizing the functions of any one of the above-mentioned embodiments.
  • the embodiment of the present application also provides a computer program product, the computer program product is tangibly stored on a computer-readable medium and includes computer-executable instructions, and the computer-executable instructions cause at least one processor to
  • the portrait detection methods provided in the foregoing embodiments are executed. It should be understood that the solutions in this embodiment have the corresponding technical effects in the foregoing method embodiments, and details are not repeated here.
  • the hardware modules may be implemented mechanically or electrically.
  • a hardware module may include permanently dedicated circuitry or logic (such as a dedicated processor, FPGA or ASIC) to perform the corresponding operations.
  • the hardware modules may also include programmable logic or circuits (such as general-purpose processors or other programmable processors), which can be temporarily set by software to complete corresponding operations.
  • the specific implementation mechanical way, or a dedicated permanent circuit, or a temporary circuit

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供了人像检测方法、装置、电子设备和存储介质,该人像检测方法包括:获取一张待检测图像,其中,所述待检测图像中包括至少一个人像;生成所述待检测图像的至少两个第一特征图像,其中,首个第一特征图像是从所述待检测图像中提取特征后得到的,后一个第一特征图像是从前一个第一特征图像中提取特征后得到的;对所述至少两个第一特征图像进行特征融合,获得至少两个第二特征图像;根据所述至少两个第二特征图像,确定所述待检测图像中人像的分布。本方案降低对聚集人员数量进行统计的成本。

Description

人像检测方法、装置、电子设备和存储介质 技术领域
本申请涉及图像处理技术领域,尤其涉及一种人像检测方法、装置、电子设备和存储介质。
背景技术
随着城市的快速发展,城市交通枢纽的人流量越来越大,比如地铁站、火车站、机场等交通枢纽具有较大的人流量,在出现意外情况或恶劣天气时,会在短时间内聚集大量人员,较多人员聚集存在较大的安全隐患,为此需要确定交通枢纽内聚集人员数量,以在聚集人员数量超过交通枢纽的承载能力时采取限流措施,预防跌落站台、踩踏等事故的发生。
目前,为了确定交通枢纽内的聚集人员数量,由工作人员在交通枢纽的出入口统计出入交通枢纽的人数,根据各工作人员的人数统计结果确定交通枢纽内的聚集人员数量。
由于交通枢纽通常具有多个出入口,通过工作人员在出入口统计出入交通枢纽人数以确定聚集人员数量的方法,需要在交通枢纽的每个出入口配备工作人员进行人数统计,因此需要较多的人力,导致对聚集人员数量进行统计的成本较高。
发明内容
有鉴于此,本申请提供的人像检测方法、装置、电子设备和存储介质,能够降低对聚集人员数量进行统计的成本。
第一方面,本申请实施例提供了一种人像检测方法,包括:
获取一张待检测图像,其中,所述待检测图像中包括至少一个人像;
生成所述待检测图像的至少两个第一特征图像,其中,首个第一特征图像是从所述待检测图像中提取特征后得到的,后一个第一特征图像是从前一个第一特征图像中提取特征后得到的;
对所述至少两个第一特征图像进行特征融合,获得至少两个第二特征图像;
根据所述至少两个第二特征图像,确定所述待检测图像中人像的分布。
第二方面,本申请实施例还提供了一种人像检测装置,包括:
获取模块,用于获取一张待检测图像,其中,所述待检测图像中包括至少一个人像;
生成模块,用于生成所述获取模块获取到的所述待检测图像的至少两个第一特征图像, 其中,首个第一特征图像是从所述待检测图像中提取特征后得到的,后一个第一特征图像是从前一个第一特征图像中提取特征后得到的;
融合模块,用于对所述生成模块生成的所述至少两个第一特征图像进行特征融合,获得至少两个第二特征图像;
检测模块,用于根据所述融合模块获得的所述至少两个第二特征图像,确定所述待检测图像中人像的分布。
第三方面,本申请实施例还提供了一种电子设备,包括:处理器、通信接口、存储器和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;
所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行上述第一方面所提供人像检测方法对应的操作。
第四方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机指令,所述计算机指令在被处理器执行时,使所述处理器执行上述第一方面所提供人像检测方法对应的操作。
第五方面,本申请实施例还提供了一种计算机程序产品,所述计算机程序产品被有形地存储在计算机可读介质上并且包括计算机可执行指令,所述计算机可执行指令在被执行时使至少一个处理器执行如上述第一方面或第一方面的任一可能的实现方式提供的人像检测方法。
其中,获取包括人像的待检测图像后,从待检测图像中提取特征获得多个第一特征图像,然后对各第一特征图像进行融合获得多个第二特征图像,进而根据各第二特征图像确定待检测图像中人像的分布,由于待检测图像可以从相应场所内采集,因此可以将待检测图像中的人像映射到相应的场所内,从而根据待检测图像中人像的分布,确定相应场所内聚集人员的数量和人员的分布情况,实现了聚集人员数量和人员分布情况的自动检测,无需在场所的各出入口配备工作人员进行人数统计,从而能够节省人力,降低对场所内聚集人员数量进行统计的成本。
对于上述任一方面,可选地,在通过对第一特征图像进行特征融合而获得第二特征图像时,根据各第一特征图像的生成顺序,对相邻生成的至少两个第一特征图像进行特征融合,获得至少两个第二特征图像,不同的第二特征图像由不完全相同的至少两个第一特征图像进行特征融合而得到。
由于各第一特征图像顺序生成,后一个第一特征图像从前一个第一特征图像中提取特征后得到,前一个第一特征图像中的部分特征在后一个第一特征图像中可能被舍弃,而被舍弃的特征可能是待检测图像中尺寸较小的人像,通过对相邻生成的第一特征图像进行特征融合,保证所获得的第二特征图像不会丢失待检测图像中的特征,从而在基于第二特征图像确定待 检测图像中人像的分布时,待检测图像中尺寸较小的人像也能够被识别出来,从而能够提高对待检测图像中人像数量和人像分布进行检测的准确性。
对于上述任一方面,可选地,在对相邻生成的第一特征图像进行特征融合获得第二特征图像时,首先对最后一个生成的第一特征图像进行卷积处理,获得与最后一个生成的第一特征图像相对应的第二特征图像,然后将与后一个生成的第一特征图像相对应的第二特征图像,与前一个生成的第一特征图像进行特征融合,获得与前一个生成的第一特征图像相对应的第二特征图像。
由于将与后一个生成的第一特征图像相对应的第二特征图像,与前一个生成的第一特征图像进行特征融合,获得与前一个生成的第一特征图像相对应的第二特征图像,因此,对于任意一个第一特征图像,与该第一特征图像相对应的第二特征图像中包括,比该第一特征图像高阶的第一特征图像中的全部特征,不会丢失待检测图像中的特征,从而在基于第二特征图像确定待检测图像中的人像数量和人像分布时,能够提高对待检测图像中人像进行识别的全面性,进而提高对待检测图像中人像数量和人像分布进行检测的准确性。
对于上述任一方面,可选地,在将后一个生成的第一特征图像对应的第二特征图像,与前一个生成的第一特征图像进行特征融合,获得与前一个生成的第一特征图像相对应的第二特征图像时,对与第n个生成的第一特征图像相对应的第二特征图像进行卷积处理,获得第三特征图像,对第三特征图像进行双线性插值处理获得第四特征图像,将第四特征图像与第n-1个生成的第一特征图像进行特征融合获得第五特征图像,对第五特征图像进行卷积处理,获得与第n-1个生成的第一特征图像相对应的第二特征图像。其中,n为大于1且小于或等于第一特征图像总数的整数,与第n个生成的第一特征图像相对应的第二特征图像和第三特征图像的尺寸均为C*W*H,C为通道数,W为图像的宽度,H为图像的高度,第四特征图像的尺寸为C*2W*2H,第n-1个生成的第一特征图像的尺寸为C*2W*2H,第五特征图像的尺寸为2C*2W*2H,与第n-1个生成的第一特征图像相对应的第二特征图像的尺寸为C*2W*2H。
在进行特征融合之前,对第三特征图像进行双线性插值处理,获得与第n-1个生成的第一特征图像具有相同尺寸的第四特征图像,以使特征融合能够顺利进行。在进行特征融合之后,对特征融合生成的第五特征图像进行卷积处理,获得与第n-1个生成的第一特征图像具有相同尺寸的第二特征图像,保证输入的特征图像与输出的特征图像具有相同的尺寸,便于后续根据第二特征图像确定待检测图像中人像的分布,使得人像检测能够顺利进行。
对于上述任一方面,可选地,在根据第二特征图像确定待检测图像中人像的分布时,首先对每个第二特征图像进行感受野增强处理,获得相应的第五特征图像,然后对各第五特征图像进行特征融合获得一张第六特征图像,然后根据第六特征图像确定待检测图像中人像的 分布。
由于待处理图像中不同人像的尺寸不同,通过对第二特征图像进行感受野增强处理获得第五特征图像,增大第五特征图像中人像在待检测图像中的参考区域,由于增大了第五特征图像在待检测图像中的参考区域,因此在基于第五特征图像确定待检测图像中人像的分布时,能够提高对待检测图像中不同尺寸人像进行检测的能力,进而提高对待检测图像中人像数量和人像分布进行检测的准确性。
对于上述任一方面,可选地,在对第二特征图像进行感受野增强处理获得第五特征图像时,针对每个第二特征图像,对该第二特征图像进行三次卷积处理获得第七特征图像,对该第二特征图像进行两次卷积处理获得第八特征图像,对该第二特征图像进行一次卷积处理获得第九特征图像,然后对第七特征图像、第八特征图像和第九特征图像进行特征融合,获得第十特征图像,然后对第十特征图像进行卷积处理,获得与该第二特征图像相对应的第五特征图像。其中,针对任意一个第二特征图像,该第二特征图像以及与该第二特征图像相对应的第五特征图像、第七特征图像、第八特征图像和第九特征图像的尺寸均为C*W*H,与该第二特征图像相对应的第十特征图像的尺寸为3C*W*H。
通过对第二特征图像进行不同次数的卷积处理,获得第七特征图像、第八特征图像和第九特征图像,通过对第七特征图像、第八特征图像和第九特征图像进行特征融合,获得第十特征图像,通过对第十特征图像进行卷积获得与第二特征图像相对应的第五特征图像。由于第七特征图像、第八特征图像和第九特征图像通过对第二特征图像进行不同次数的卷积获得,使得基于第七特征图像、第八特征图像和第九特征图像获得的第五特征图像相对于第二特征图像具有更强的感受野,从而能够基于第五特征图像对待检测图像中不同尺寸的人像进行准确检测,从而保证对待检测图像中人像数量和人像分布进行检测的准确性。
对于上述任一方面,可选地,在根据第六特征图像确定待检测图像中人像的分布时,首先对第六特征图像进行归一化处理,然后将经归一化处理后的第六特征图像分别输入预先训练的第一分类器、第二分类器和第三分类器,获得第一分类器输出的中心点信息,获得第二分类器输出的第一图像框信息,获得第三分类器输出的第二图像框信息,进而根据中心点信息、第一图像框信息和第二图像框信息,确定待检测图像中人像的分布。
中心点信息用于指示待检测图像中人像头部的中心点坐标,第一图像框信息包括在待检测图像中用于标注人像头部的矩形框的坐标值,第二图像框信息包括在待检测图像汇总用于标注人体的矩形框的坐标值,根据中心点信息和第一图像框信息可以确定待检测图像中人像头部的位置,根据第二图像框信息可以确定待检测图像中人体的位置,进而根据标注人像头部的矩形框的数量或标注人体的矩形框的数量,可以确定待检测图像中人像的数量,根据待 检测图像中标注人像头部的矩形框的位置和标注人体的矩形框的位置,可确定待检测图像中人像的分布。通过矩形框标注出待处理图像中的人像头部和人体,能够更加准确地确定待检测图像中人像的数量和分布,进而可以更加准确地确定相应场所内人员的聚集数量和人员分布情况,有助于提高用户的使用体验。
对于上述任一方面,可选地,还可以经归一化处理后的第六特征图像输入第四分类器,获得第四分类器输出的图像框质量信息,图像框质量信息用于指示在待检测图像中用于标注人像头部的矩形框对人像头部进行标注的准确性,然后根据图像框质量信息从中心点信息中筛选出目标中心点,目标中心点对应的用于标注人像头部的矩形框的准确性小于预设的准确性阈值,然后从中心点信息中将目标中心点的坐标值删除。
由于第二分类器确定出的每个矩形框与中心点信息中的一个中心点坐标相对应,在确定第二分类器确定出的一个矩形框不能准确标注待检测图像中的人像头部时,将该矩形框对应的中心点坐标从中心点信息中删除,进而舍弃未能准确标注待检测图像中人像头部的矩形框,避免人像的误识别,从而能够进一步提高对待检测图像中人像数量和人像分布进行检测的准确性。
附图说明
图1是本申请实施例一提供的一种人像检测方法的流程图;
图2是本申请实施例二提供的一种特征融合方法的示意图;
图3是本申请实施例二提供的另一种特征融合方法的流程图;
图4是本申请实施例二提供的一种特征融合方法的流程图;
图5是本申请实施例三提供的一种人像检测方法的示意图;
图6是本申请实施例三提供的一种感受野增强处理方法的示意图;
图7是本申请实施例三提供的一种人像数量和人像分布确定方法的流程图;
图8是本申请实施例四提供的一种人像检测装置的示意图;
图9是本申请实施例四提供的另一种人像检测装置的示意图;
图10是本申请实施例四提供的又一种人像检测装置的示意图;
图11是本申请实施例四提供的再一种人像检测装置的示意图;
图12是本申请实施例五提供的一种电子设备的示意图。
附图标记列表:
100:人像检测方法                  400:特征融合方法
700:人像数量和人像分布确定方法     800:人像检测装置
1200:电子设备                      A 0-A N:第一特征图像
B 0-B N:第二特征图像       C 0-C N:第五特征图像       D:第六特征图像
B i:第二特征图像          B i11、B i12、B i21:特征图像 B i13:第七特征图像
B i22:第八特征图像        B i31:第九特征图像         B i123:第十特征图像
C i:第五特征图像          801:获取模块             802:生成模块
803:融合模块             804:检测模块             8031:卷积子模块
8032:第一融合子模块      8041:增强子模块          8042:第二融合子模块
8043:检测子模块          805:计算模块             806:筛选模块
807:删除模块             1202:处理器              1204:通信接口
1206:存储器              1208:通信总线            1210:程序
101:获取一张待检测图像
102:生成待检测图像的至少两个第一特征图像
103:对各第一特征图像进行特征融合,获得至少两个第二特征图像
104:根据各第二特征图像,确定待检测图像中人像的分布
401:输入与第n个生成的第一特征图像相对应的第二特征图像
402:对与第n个第一特征图像相对应的第二特征图像进行卷积处理,获得第三特征图像
403:对第三特征图像进行双线性插值处理,获得第四特征图像
404:将第四特征图像与第n-1个生成的第一特征图像进行特征融合,获得第五特征图像
405:对第五特征图像进行卷积处理,获得与第n-1个第一特征图像相对应的第二特征图像
701:将归一化处理后的第六特征图像输入第一分类器,获得中心点信息
702:将归一化处理后的第六特征图像输入第二分类器,获得第一图像框信息
703:将归一化处理后的第六特征图像输入第三分类器,获得第二图像框信息
704:根据中心点信息、第一及第二图像框信息,确定人像的分布
具体实施方式
如前所述,在地铁站、火车站、机场等人流量较大的场所,为了预防跌落站台、踩踏等事故的发生,需要确定相应场所内的聚集人员数量和人员分布情况。目前,通过人工在相应场所的出入口统计出入场所的人数,根据人数统计结果确定场所内的聚集人员数量,通过这种方式仅能够确定场所内聚集人员的数量,无法确定场所内的人员分布情况,要确定人员分布情况还需要人工现场查看。另外,人流量较大的场所通常包括多个出入口,比如地铁站通 常包括4个出入口,火车站包括多个进站口和多个出站口,通过人工计数的方式需要在每个出入口配备人员进行人数统计,为了确定场所内聚集人员的数量,需要较多的人力,导致对聚集人员数量进行统计的成本较高。
本申请实施例中,对于需要确定聚集人员数量和人员分布情况的场所,从该场所采集包括人像的待检测图像,从待检测图像中提取特征获得多个第一特征图像,然后对各第一特征图像进行特征融合,获得多个第二特征图像,然后根据各第二特征图像确定待检测图像中人像的分布,由于待检测图像从需要确定聚集人员数量和人员分布情况的场所采集,因此根据待检测图像中人像的分布,可以确定场所内聚集人员的数量和人员分布情况。由此可见,从待进行人员数量统计和人员分布情况确定的场所采集包括人像的待检测图像,通过对待检测图像进行图像处理,确定待检测图像中人像的分布,进而确定场所内聚集人员的数量和人员分布情况,无需在场所的每个出入口配备工作人员进行人数统计,从而能够节省人力,降低对场所内聚集人员数量进行统计的成本。
需要说明的是,本申请实施例从待检测图像中提取特征图像,通过对特征图像进行诸如特征提取、特征融合、感受野增强等各种类型的处理,确定待检测图像中的人员数量和人员分布,其中涉及的各特征图像(包括第一特征图像、第二特征图像…第N特征图像等)均是指卷积层中的特征图像(featuremap)。
下面结合附图对本申请实施例提供的人像检测方法、装置和电子设备进行详细说明。
实施例一
图1是本申请实施例一提供的一种人像检测方法100的流程图,如图1所示,该人像检测方法100包括如下步骤:
步骤101、获取一张待检测图像。
待检测图像是需要进行人像识别的图像,待检测图像中包括至少一个人像。在确定人流量较大场所内聚集人员数量和人员分布情况时,待检测图像是人流量较大场所内的图像,比如可以通过设置在人流量较大场所内高处的摄像头采集待检测图像。
步骤102、生成待检测图像的至少两个第一特征图像。
在获取到待检测图像后,首先从待检测图像中提取特征,获得一个第一特征图像,然后从获得的第一特征图像中提取特征,获得新的第一特征图像,即首个第一特征图像是从待检测图像中提取特征后得到的,后一个第一特征图像是从前一个第一特征图像中提取特征后得到的。
比如,从待检测图像中提取特征获得第一特征图像1,从第一特征图像1中提取特征获得第一特征图像2,从第一特征图像2中提取特征获得第一特征图像3,从第一特征图像3中 提取特征获得第一特征图像4。即,第一特征图像1是从待检测图像中提取特征后得到的,第一特征图像2是从第一特征图像1中提取特征后得到的,第一特征图像3是从第一特征图像2中提取特征后得到的,第一特征图像4是从第一特征图像3中提取特征后得到的。
步骤103、对各第一特征图像进行特征融合,获得至少两个第二特征图像。
在获取到多个第一特征图像后,对两个或两个以上的第一特征图像进行特征融合,获得至少两个第二特征图像,其中,不同的第二特征图像由不完全相同的至少两个第一特征图像融合而获得。
特征融合的目的,是将从图像中提取的特征合并成一个比输入更具有判别能力的特征,即对至少两个第一特征图像进行特征融合,获得一个比所使用的每个第一特征图像更具有判别能力的第二特征图像。在进行特征融合时,可以采用系列特征融合策略或并行特征融合策略,其中,系列特征融合策略直接将两个特征进行连接,两个输入特征x和y的维数若为p和q,输出特征z的维数为p+q,并行特征融合策略将两个特征向量组合成负向量,对于输入特征x和y,输出特征z=x+iy,i是虚数单元。
需要说明的是,通过对第一特征图像进行特征融合获得第二特征图像,除了可以使用上述的系列特征融合策略或并行特征融合策略外,还可以使用其他类型的特征融合方式,对于特征融合的具体方式本申请实施例不作限定。
步骤104、根据各第二特征图像,确定待检测图像中人像的分布。
由于第二特征图像通过对第一特征图像进行特征融合而获得,第一特征图像直接或间接从待检测图像中提取获得,因此第二特征图像中包括反映待检测图像中人像位置、人像轮廓、人像尺寸等的信息,所以根据各第二特征图像能够确定待检测图像中人像的分布。
在本申请实施例中,获取包括人像的待检测图像后,从待检测图像中提取特征获得多个第一特征图像,然后对各第一特征图像进行融合获得多个第二特征图像,进而根据各第二特征图像确定待检测图像中人像的分布,由于待检测图像可以从相应场所内采集,因此可以将待检测图像中的人像映射到相应的场所内,从而根据待检测图像中人像的分布,确定相应场所内聚集人员的数量和人员的分布情况,实现了聚集人员数量和人员分布情况的自动检测,无需在场所的各出入口配备工作人员进行人数统计,从而能够节省人力,降低对场所内聚集人员数量进行统计的成本。
需要说明的是,由于首个第一特征图像是从待检测图像中提取特征后得到的,后一个第一特征图像是从前一个第一特征图像中提取特征后得到的,按照第一特征图像的获取顺序,越靠后获取的第一特征图像越高阶,高阶的第一特征图像具有更强的语义信息,但分辨率较低,对细节的感知能力较差,导致小物体在高阶的第一特征图像中丢失。通过对不同的第一 特征图像进行特征融合获得第二特征图像,保证第二特征图像包括高阶的语义信息,同时不会丢失小物体,保证能够识别出待检测图像中较小的人像,从而保证对待检测图像中人像数量和人像分布进行检测的准确性。
还需要说明的是,在本申请实施例及后续各实施例中,待检测图像中人像的分布,可包括待检测图像中人像的位置分布,可包括待检测图像中人像的数量。
实施例二
在实施例一所提供人像检测方法100的基础上,在对第一特征图像进行特征融合获得第二图像时,可以根据各第一特征图像的生成顺序,对相邻生成的至少两个第一特征图像进行特征融合,获得至少两个第二特征图像,其中,不同的第二特征图像,由不完全相同的至少两个第一特征图像进行特征融合而获得。
在本申请实施例中,由于后一个第一特征图像是从前一个第一特征图像中提取特征后得到的,前一个第一特征图像中小物体在后一个第一特征图像中可能会丢失,根据第一特征图像的生成顺序,将相邻生成的至少两个第一特征图像进行特征融合,获得第二特征图像,保证第二特征图像中包括小物体,进而在根据第二特征图像确定待检测图像中的人像数量和人像分布时,能够将待检测图像中较小的人像识别出来,从而保证对待检测图像中人像数量和人像分布进行检测的准确性。
在一个例子中,按照各第一特征图像由前至后的生成顺序,各第一特征图像为第一特征图像1、第一特征图像2、第一特征图像3和第一特征图像4,在对第一特征图像进行特征融合生成第二特征图像时,可以对第一特征图像1和第一特征图像2进行特征融合,对第一特征图像2和第一特征图像3进行特征融合,对第一特征图像3和第一特征图像4进行特征融合,对第一特征图像1、第一特征图像2和第一特征图像3进行特征融合,对第一特征图像2、第一特征图像3和第一特征图像4进行特征融合,对第一特征图像1、第一特征图像2、第一特征图像3和第一特征图像4进行特征融合,每次特征融合均可以获得一个第二特征图像。
应理解,在对第一特征图像1、第一特征图像2和第一特征图像3进行特征融合时,可以先对第一特征图像1和第一特征图像2进行特征融合,然后再将特征融合结果与第一特征图像3进行特征融合,获得第二特征图像。在对第一特征图像2、第一特征图像3和第一特征图像4进行特征融合时,可以先对第一特征图像2和第一特征图像3进行特征融合,然后再将特征融合结果与第一特征图像4进行特征融合,获得第二特征图像。在对第一特征图像1、第一特征图像2、第一特征图像3和第一特征图像4进行特征融合时,可以先对第一特征图像1和第一特征图像2进行特征融合获得特征融合结果1,然后将特征融合结果1与第一特征图像3进行特征融合获得特征融合结果2,然后将特征融合结果2与第一特征图像4进 行特征融合,获得第二特征图像。
在一种可能的实现方式中,在对第一特征图像进行特征融合生成第二特征图像时,可以将后一个第一特征图像对应的第二特征图像,与前一个第一特征图像进行特征融合,获得与前一个第一特征图像相对应的第二特征图像。图2是本申请实施例二提供的一种特征融合方法的示意图,如图2所示,共计有N个第一特征图像,按照各第一特征图像的生成顺序,首个第一特征图像A 1从待检测图像A 0中提取得到,第n个第一特征图像A n从第一特征图像A n-1中提取得到,其中n为大于1且小于或等于N的整数。对第N个生成的第一特征图像A N进行卷积处理,获得与第N个生成的第一特征图像A N相对应的第二特征图像B N。将与第n个生成的第一特征图像A n相对应的第二特征图像B n,与第n-1个生成的第一特征图像A n-1进行特征融合,获得与第n-1个生成的第一特征图像A n-1相对应的第二特征图像B n-1
图3是本申请实施例二提供的另一种特征融合方法的示意图,如图3所示,共计有4个第一特征图像,按照各第一特征图像的生成顺序,首个第一特征图像A 1从待检测图像A 0中提取得到,第二个第一特征图像A 2从第一特征图像A 1中提取得到,第三个第一特征图像A 3从第一特征图像A 2中提取得到,第四个第一特征图像A 4从第一特征图像A 3中提取得到。对第一特征图像A 4进行卷积处理,获得与第一特征图像A 4相对应的第二特征图像B 4;对第一特征图像A 3和第二特征图像B 4进行特征融合,获得与第一特征图像A 3相对应的第二特征图像B 3;对第一特征图像A 2和第二特征图像B 3进行特征融合,获得与第一特征图像A 2相对应的第二特征图像B 2;对第一特征图像A 1和第二特征图像B 2进行特征融合,获得与第一特征图像A 1相对应的第二特征图像B 1
在本申请实施例中,对第N个生成的第一特征图像进行卷积处理,获得与第N个第一特征图像相对应的第二特征图像,将与第n个生成的第一特征图像相对应的第二特征图像,与第n-1个生成的第一特征图像进行特征融合,获得与第n-1个生成的第一特征图像相对应的第二特征图像,使得各第二特征图像的总和包括了待检测图像中的全部特征信息,进而能够提高对待检测图像中人像进行识别的全面性,进而保证对待检测图像中人像数量和人像分布进行检测的准确性。
在本申请实施例中,对于较后获取到的第一特征图像,该第一特征图像对应的第二特征图像具有较低的分辨率,相应的该第二特征图像所包括的特征较少,且该第二特征图像的尺寸较小,通过该第二特征图像可以快速识别待检测图像中较大的人像。对于较先获取到的第一特征图像,该第一特征图像对应的第二特征图像具有较高的分辨率,相应的该第二特征图像所包括的特征较多,且该第二特征图像的尺寸较大,通过该第二特征图像可以识别待检测图像中较小的人像。由此可见,所获得的各第二特征图像具有不同的分辨率,分辨率较低的 第二特征图像包括高阶特征,可用于快速识别待检测图像中较大的人像,而分辨率较高的第二特征图像包括较多的图像信息,可用于识别待检测图像中较小的人像,从而通过各第二特征图像确定待检测图像中人像的分布时,不仅能够提高对待检测图像中人像进行识别的效率,还能够保证对待检测图像中人像进行识别的准确性。
在一种可能的实现方式中,在将与第n个生成的第一特征图像相对应的第二特征图像,与第n-1个生成的第一特征图像进行特征融合,获得与第n-1个生成的第一特征图像相对应的第二特征图像时,可以通过双线性插值使特征融合的第一特征图像和第二特征图像具有相同的尺寸,以保证能够顺利对第一特征图像和第二特征图像进行特征融合。图4是本申请实施例二提供的一种特征融合方法400的流程图,如图4所示,该特征融合方法400包括如下步骤:
步骤401、输入与第n个生成的第一特征图像相对应的第二特征图像。
与第n个生成的第一特征图像相对应的第二特征图像的尺寸为C*W*H,C为通道数,W为图像的宽度,H为图像的高度。
需要说明的是,定义与第n个生成的第一特征图像相对应的第二特征图像的尺寸为C*W*H,仅是为了说明特征融合过程中各特征图像的尺寸和通道数的变化,并非对第三特征图像的尺寸和通道数进行具体限定,因为不同第一特征图像具有不同的尺寸,不同的第一特征图像对应的第二特征图像也具有不同的尺寸。
步骤402、对与第n个生成的第一特征图像相对应的第二特征图像进行卷积处理,获得第三特征图像。
在获取与对n-1个生成的第一特征图像相对应的第二特征图像时,首先对与第n个生成的第一特征图像相对应的第二特征图像进行卷积处理,获得第三特征图像。参见图3,比如在获取与第一特征图像A 3相对应的第二特征图像B 3时,首先对第二特征图像B 4进行卷积处理,获得第三特征图像。
在对与第n个生成的第一特征图像相对应的第二特征图像进行卷积处理时,所获得第三特征图像的尺寸也为C*W*H。另外,在对与第n个生成的第一特征图像相对应的第二特征图像进行卷积处理时,所使用卷积核的尺寸可以为C*3*3。
步骤403、对第三特征图像进行双线性插值处理,获得第四特征图像。
在与第n个生成的第一特征图像相对应的第二特征图像的尺寸为C*W*H时,第n-1个生成的第一特征图像的尺寸为C*2W*2H,为了能够与第n-1个生成的第一特征图像进行特征融合,对第三特征图像进行双线性插值处理,获得尺寸为C*2W*2H的第四特征图像。
在对尺寸为C*W*H的第三特征图像进行双线性插值处理时,可以对第三特征图像进行 上采样层双线性插值,以获得尺寸为C*2W*2H的第四特征图像。
步骤404、将第四特征图像与第n-1个生成的第一特征图像进行特征融合,获得第五特征图像。
在与第n个生成的第一特征图像相对应的第二特征图像的尺寸为C*W*H时,第n-1个生成的第一特征图像的尺寸为C*2W*2H,第四特征图像的尺寸也为C*2W*2H,通过对第四特征图像与第n-1个生成的第一特征图像进行特征融合,获得尺寸为2C*2W*2H的第五特征图像。
步骤405、对第五特征图像进行卷积处理,获得与第n-1个生成的第一特征图像相对应的第二特征图像。
由于第五特征图像的尺寸为2C*2W*2H,而第n-1个生成的第一特征图像的尺寸为C*2W*2H,与第n-1个生成的第一特征图像对应的第二特征图像,应当与第n-1个生成的第一特征图像具有相同的尺寸,为此对第五特征图像进行卷积处理,获得与第n-1个生成的第一特征图像相对应且尺寸为C*2W*2H的第二特征图像。
在对第五特征图像进行卷积处理时,所使用卷积核的尺寸可以为C*3*3。
在本申请实施例中,在进行特征融合之前,对第三特征图像进行双线性插值处理获得第四特征图像,使第四特征图像与第n-1个生成的第一特征图像具有相同的尺寸,以使特征融合能够顺利进行。在特征融合之后,对第五特征图像进行卷积处理获得与第n-1个生成的第一特征图像相对应的第二特征图像,使得与第n-1个生成的第一特征图像相对应的第二特征图像,与第n-1个生成的第一特征图像具有相同的尺寸,保证输入的特征图像与输出的特征图像具有相同的尺寸,便于后续根据第二特征图像确定待检测图像中人像的分布,使得人像检测能够顺利进行。
实施例三
待检测图像中包括多个人像时,受人员与图像采集设备之间距离的影响,待检测图像中人像的大小是不确定的,距离图像采集设备较近的人员在待检测图像中具有较大的人像,距离图像采集设备较远的人员在待检测图像中具有较小的人像,为了能够从待检测图像中识别出具有不同尺寸的人像,可以对第二特征图像进行感受野增强处理,进而根据经感受野增强处理后的第二特征图像,确定待检测图像中人像的分布。
图5是本申请实施例三提供的一种人像检测方法的示意图,如图5所示,分别对每个第二特征图像进行感受野增强处理,获得相对应的第五特征图像,具体地,对第二特征图像B 1进行感受野增强处理获得第五特征图像C 1,对第二特征图像B 2进行感受野增强处理获得第五特征图像C 2,对第二特征图像B n-1进行感受野增强处理获得第五特征图像C n-1,对第二特 征图像B n进行感受野增强处理获得第五特征图像C n,对第二特征图像B N进行感受野增强处理获得第五特征图像C N。在获得各第二特征图像对应的第五特征图像后,对各第五特征图像进行特征融合,获得一张第六特征图像D,进而根据第六特征图像D确定待检测图像中人像的分布。
在本申请实施例中,由于人员距离图像采集设备的距离不同,导致待检测图像中人像的尺寸不同,通过对第二特征图像进行感受野增强处理获得第五特征图像,可以增大第五特征图像中人像在待检测图像中的参考区域,从而基于第五特征图像确定待检测图像中人像的分布时,能够提高对待检测图像中不同尺寸人像进行检测的能力,进而提高对待检测图像中人像数量和人像分布进行检测的准确性。
在一种可能的实现方式中,在对第二特征图像进行感受野增强处理获得第五特征图像时,可以对第二特征图像进行不同次数的卷积处理,进而将通过不同次数卷积处理获得的多个特征图像进行融合,而获得第五特征图像。
图6是本申请实施例三提供的一种感受野增强处理方法的示意图,如图6所示,针对每个第二特征图像B i,通过三个并行的卷积处理流程分别对该第二特征图像B i进行卷积处理,进而通过对三个并行的卷积处理流程获得的特征图像进行特征融合而获得第五特征图像C i,此处定义第二特征图像B i的尺寸为C*W*H,C为通道数,W为图像的宽度,H为图像的高度。
在第一个卷积处理流程中,首先通过尺寸为C*3*3的卷积核对第二特征图像B i进行卷积处理,获得尺寸为C*W*H的特征图像B i11,然后通过尺寸为C*3*3的卷积核对特征图像B i11进行卷积处理,获得尺寸为C*W*H的特征图像B i12,然后通过尺寸为C*3*3的卷积核对特征图像B i12进行卷积处理,获得尺寸为C*W*H的第七特征图像B i13
在第二个卷积处理流程中,首先通过尺寸为C*3*3的卷积核对第二特征图像B i进行卷积处理,获得尺寸为C*W*H的特征图像B i21,然后通过尺寸为C*3*3的卷积核对特征图像B i21进行卷积处理,获得尺寸为C*W*H的第八特征图像B i22
在第三个卷积处理流程中,通过尺寸为C*3*3的卷积核对第二特征图像B i进行卷积处理,获得尺寸为C*W*H的第九特征图像B i31
需要说明的是,在上述三个卷积处理流程中,共计6次卷积处理所使用卷积核的尺寸均为C*3*3,6次卷积处理可以使用相同或不同的卷积核,或者其中部分卷积处理使用相同的卷积核,对此本申请实施例不作限定。
在获得第七特征图像B i13、第八特征图像B i22和第九特征图像B i31后,对第七特征图像B i13、第八特征图像B i22和第九特征图像B i31进行特征融合,获得尺寸为3C*W*H的第十特 征图像B i123,然后对采用尺寸为C*1*1的卷积核对第十特征图像进行卷积处理,获得与第二特征图像B i相对应的第五特征图像C i,第五特征图像C i的尺寸与第二特征图像B i相同,也为C*W*H。
在本申请实施例中,通过对第二特征图像进行不同次数的卷积处理,获得第七特征图像、第八特征图像和第九特征图像,在对第七特征图像、第八特征图像和第九特征图像进行特征融合获得第十特征图像后,再对第十特征图像进行卷积处理,获得与第二特征图像具有相同尺寸的第五特征图像,使得所获得第五特征图像相对于第二特征图像具有更强的感受野,从而能够基于第五特征图像对待检测图像中不同尺寸的人像进行准确检测,从而保证对待检测图像中人像数量和人像分布进行检测的准确性。
在一种可能的实现方式中,在根据第六特征图像确定待检测图像中人像的分布时,可以将第六特征图像输入预先训练的多个分类器,通过各分类器确定待检测图像中人像的中心点坐标及标注人像的矩形框,进而根据人像的中心点坐标及标注人像的矩形框,确定待检测图像中人像的分布。
图7是本申请实施例三提供的一种人像数量和人像分布确定方法700的流程图,如图7所示,该人像数量和人像分布确定方法700包括如下步骤:
步骤701、将经归一化处理后的第六特征图像输入第一分类器,获得第一分类器输出的中心点信息。
在获得第六特征图像之后,首先对第六特征图像进行归一化处理,以便于后续将第六特征图像输入预先训练的分类器,分类器基于归一化处理后的第六特征图像识别待检测图像中的人像。在对第六特征图像进行归一化处理时,具体可以对第六特征图像进行组归一化处理。
预先通过图像样本训练第一分类器,第一分类器用于根据输入的特征图像,确定与该特征图像相对应的原图像中人像头部的中心点坐标。在对第六特征图像进行归一化处理后,将归一化处理后的第六特征图像输入第一分类器,获得第一分类器输出的中心点信息,中心点信息用于指示待检测图像中人像头部的中心点坐标。根据第一分类器输出的中心点坐标,可以在待检测图像上标注出人像头部的中心点。
步骤702、将经归一化处理后的第六特征图像输入第二分类器,获得第二分类器输出的第一图像框信息。
预先通过图像样本训练第二分类器,第二分类器用于根据输入的特征图像,确定用于在该特征图像对应的原图像中标注人像头部的矩形框。在对第六特征图像进行归一化处理后,将归一化处理后的第六特征图像输入第二分类器,获得第二分类器输出的第一图像框信息,第一图像框信息包括在待检测图像中用于标注人像头部的矩形框的坐标值。
在一个例子中,第一图像框信息包括图像框的左上角坐标值和右下角坐标值,该左上角坐标值和右下角坐标值是相对于人像头部中心点的偏离值,由于第一图像框信息定义的图像框用于标注待检测图像中人像的头部,因此,结合第一分类器输出的中心点信息和第一图像框信息,可以在待检测图像上通过矩形框标注出各人像头部。
步骤703、将经归一化处理后的第六特征图像输入第三分类器,获得第三分类器输出的第二图像框信息。
预先通过样本图像训练第三分类器,第三分类器用于根据输入的特征图像,确定用于在该特征图像对应的原图像中标注人体的矩形框。在对第六特征图像进行归一化处理后,将归一化处理后的第六特征图像输入第三分类器,获得第三分类器输出的第二图像框信息,第二图像框信息包括在待检测图像中用于标注人体的矩形框的坐标值。
在一个例子中,第二图像框信息包括图像框放入左上角坐标值和右下角坐标值,由于第二图像框信息定义的图像框用于标注待检测图像中的人体,因此根据第二图像框信息,可以在待检测图像上通过矩形标注出各人体。
需要说明的是,由于遮挡等原因,待检测图像中可能并不包括完整的人体,比如在待检测中仅包括人像头部或者仅包括人像头部和上半身图像,通过图像样本训练出的第三分类器,能够基于人像头部预测出整个人体在待检测图像中所处的位置,进而输出用于标注待检测图像中人体的图像框的坐标值。
步骤704、根据中心点信息、第一图像框信息和第二图像框信息,确定待检测图像中人像的分布。
由于中心点信息用于指示待检测图像中人像头部的中心点坐标,第一图像框信息用于指示待检测图像中标注人像头部的矩形框,第二图像框信息用于指示待检测图像中标注人体的矩形框,因此根据中心点信息和第一图像框信息能够确定待检测图像中人像头部的位置,根据第二图像框信息能够确定待检测图像中人体的位置,进而根据待检测图像中人像头部的数量或人体的数量,可以确定待检测图像中人像的数量,根据待检测图像中人像头部的位置和人体的位置,可以确定待检测图像中人像的分布。
在本申请实施例中,预先训练多个分类器,将归一化处理后的第六特征图像分别输入各分类器,获得各分类器输出的中心点信息、第一图像框信息和第二图像框信息,根据中心点信息和第一图像框信息可以确定待检测图像中人像头部的位置,根据第二图像框信息可以确定待检测图像中人体的位置,进而根据标注人像头部的矩形框的数量或标注人体的矩形框的数量,可以确定待检测图像中人像的数量,根据待检测图像中标注人像头部的矩形框的位置和标注人体的矩形框的位置,可确定待检测图像中人像的分布。基于中心点信息确定待检测 图像中人像的数量和人像的分布,采用坐标偏离值确定标注人像头部的矩形框,可以提高第二分类器的运算速度。基于标注人像头部的矩形框和标注人体的矩形框,确定待检测图像中人像的数量和人像的分布,能够避免人像头部特征与人体特征冲突,从而能够更加准确地确定待检测图像中的人像,以更加准确地确定待检测图像中的人像数量和人像分布,进而在将待检测图像中标注人像头部的矩形框和标准人体的矩形框映射到相应场所后,可以准确地确定相应场所内的聚集人员的数量和人员分布情况。
可选地,在图7所示人像数量和人像分布确定方法700的基础上,可以预先通过图像样本训练第四分类器,第四分类器用于根据输入的特征图像,确定用于表征在该特征图像对应的原图像中,标注人像头部的矩形框对人像头部进行标注的准确性信息。在对第六特征图像进行归一化处理后,将经归一化处理后的第六特征图像输入第四分类器,获得第四分类器输出的图像框质量信息,图像框质量信息用于指示在待检测图像中,用于标注人像头部的矩形框对人像头部进行标注的准确性。在获得图像框质量信息后,可以根据图像框质量信息,从中心点信息中确定目标中心点,其中目标中心点对应的用于标注人像头部的矩形框的准确性小于预设的准确性阈值,进而从中心点信息中奖目标中心点的坐标值删除。
在本申请实施例中,预先训练的第四分类器用于检测第二分类器确定出的矩形框是否能够准确标注人像头部,由于第二分类器确定出的每个矩形框与中心点信息中的一个中心点坐标相对应,在确定第二分类器确定出的一个矩形框不能准确标注待检测图像中的人像头部时,将该矩形框对应的中心点坐标从中心点信息中删除,进而舍弃未能准确标注待检测图像中人像头部的矩形框,避免人像的误识别,从而能够进一步提高对待检测图像中人像数量和人像分布进行检测的准确性。
实施例四
图8是本申请实施例四提供的一种人像检测装置800的示意图,如图8所示,该人像检测装置800包括:
获取模块801,用于获取一张待检测图像,其中,待检测图像中包括至少一个人像;
生成模块802,用于生成获取模块801获取到的待检测图像的至少两个第一特征图像,其中,首个第一特征图像是从待检测图像中提取特征后得到的,后一个第一特征图像是从前一个第一特征图像中提取特征后得到的;
融合模块803,用于对生成模块802生成的至少两个第一特征图像进行特征融合,获得至少两个第二特征图像;
检测模块804,用于根据融合模块803获得的至少两个第二特征图像,确定待检测图像中人像的分布。
在本申请实施例中,获取模块801可用于执行上述实施例一中的步骤101,生成模块802可用于执行上述实施例一中的步骤102,融合模块803可用于执行上述实施例一中的步骤103,检测模块804可用于执行上述实施例一中的步骤104。
在一种可能的实现方式中,如图8所示,融合模块803用于根据各第一特征图像的生成顺序,对相邻生成的至少两个第一特征图像进行特征融合,获得至少两个第二特征图像,其中,不同的第二特征图像,由不完全相同的至少两个第一特征图像进行特征融合而得到。
图9是本申请实施例四提供的另一种人像检测装置800的示意图,如图9所示,融合模块803包括:
卷积子模块8031,用于根据各第一特征图像被的生成顺序,对第N个生成的第一特征图像进行卷积处理,获得与第N个生成的第一特征图像相对应的第二特征图像,其中,N为第一特征图像的数量;
第一融合子模块8032,用于将卷积子模块8031获得的与第n个生成的第一特征图像相对应的第二特征图像,与第n-1个生成的第一特征图像进行特征融合,获得与第n-1个生成的第一特征图像相对应的第二特征图像,其中,n为大于1且小于或等于N的整数。
在一种可能的实现方式中,如图9所示,第一融合子模块8032用于执行如下操作:
对与第n个生成的第一特征图像相对应的第二特征图像进行卷积处理,获得第三特征图像,其中,与第n个生成的第一特征图像相对应的第二特征图像和第三特征图像的尺寸均为C*W*H,C为通道数,W为图像的宽度,H为图像的高度;
对第三特征图像进行双线性插值处理,获得第四特征图像,其中,第四特征图像的尺寸为C*2W*2H;
将第四特征图像与第n-1个生成的第一特征图像进行特征融合,获得第五特征图像,其中,第n-1个生成的第一特征图像的尺寸为C*2W*2H,第五特征图像的尺寸为2C*2W*2H;
对第五特征图像进行卷积处理,获得与第n-1个生成的第一特征图像相对应的第二特征图像,其中,与第n-1个生成的第一特征图像相对应的第二特征图像的尺寸为C*2W*2H。
图10是本申请实施例四提供的又一种人像检测装置800的示意图,如图10所示,检测模块804包括:
增强子模块8041,用于分别对每个第二特征图像进行感受野增强处理,获得相对应的第五特征图像;
第二融合子模块8042,用于对增强子模块8041获得的各第五特征图像进行特征融合,获得一张第六特征图像;
检测子模块8043,用于根据第二融合子模块8042获得的第六特征图像,确定待检测图 像中人像的分布。
在一种可能的实现方式中,如图10所示,增强子模块8041用于针对每个第二特征图像,均执行如下处理:
对该第二特征图像进行三次卷积处理,获得第六特征图像,其中,该第二特征图像和第六特征图像的尺寸均为C*W*H,C为通道数,W为图像的宽度,H为图像的高度;
对该第二特征图像进行两次卷积处理,获得第七特征图像,其中,第七特征图像的尺寸为C*W*H;
对该第二特征图像进行一次卷积处理,获得第八特征图像,其中,第八特征图像的尺寸为C*W*H;
对第六特征图像、第七特征图像和第八特征图像进行特征融合,获得第九特征图像,其中,第九特征图像的尺寸为3C*W*H;
对第九特征图像进行卷积处理,获得与该第二特征图像相对应的第五特征图像,其中,第五特征图像的尺寸为C*W*H。
在一种可能的实现方式中,如图10所示,检测子模块8043用于执行如下处理:
将经归一化处理后的第六特征图像输入第一分类器,获得第一分类器输出的中心点信息,其中,第一分类器用于根据输入的特征图像,确定与该特征图像对应的原图像中人像头部的中心点坐标,中心点信息用于指示待检测图像中人像头部的中心点坐标;
将经归一化处理后的第六特征图像输入第二分类器,获得第二分类器输出的第一图像框信息,其中,第二分类器用于根据输入的特征图像,确定用于在该特征图像对应的原图像中标注人像头部的矩形框,第一图像框信息包括在待检测图像中用于标注人像头部的矩形框的坐标值;
将经归一化处理的第六特征图像输入第三分类器,获得第三分类器输出的第二图像框信息,其中,第三分类器用于根据输入的特征图像,确定用于在该特征图像对应的原图像中标注人体的矩形框,第二图像框信息包括在待检测图像中用于标注人体的矩形框的坐标值;
根据中心点信息、第一图像框信息和第二图像框信息,确定待检测图像中人像的分布。
图11是本申请实施例四提供的再一种人像检测装置800的示意图,如图11所示,该人像检测装置800还包括:
计算模块805,用于将经归一化处理后的第六特征图像输入第四分类器,获得第四分类器输出的图像框质量信息,其中,第四分类器用于根据输入的特征图像,确定用于表征该特征图像对应的原图像中,标注人像头部的矩形框对人像头部进行标注的准确性的信息,图像框质量信息用于指示待检测图像中,用于标注人像头部的矩形框对人像头部进行标注的准确 性;
筛选模块806,用于根据计算模块805获得的图像框质量信息,从中心点信息中确定目标中心点,其中,目标中心点对应的用于标注人像头部的矩形框的准确性小于预设的准确性阈值;
删除模块807,用于从中心点信息中将筛选模块806确定出的目标中心点的坐标值删除。
需要说明的是,上述人像检测装置内的各模块、子模块之间的信息交互、执行过程等内容,由于与前述人像检测方法实施例基于同一构思,具体内容可参见前述人像检测方法实施例中的叙述,此处不再赘述。
实施例五
图12是本申请实施例五提供的一种电子设备的示意图,本申请具体实施例并不对电子设备的具体实现做限定。参见图12,本申请实施例提供的电子设备1200包括:处理器(processor)1202、通信接口(Communications Interface)1204、存储器(memory)1206、以及通信总线1208。其中:
处理器1202、通信接口1204、以及存储器1206通过通信总线1208完成相互间的通信。
通信接口1204,用于与其它电子设备或服务器进行通信。
处理器1202,用于执行程序1210,具体可以执行前述任一人像检测方法实施例中的相关步骤。
具体地,程序1210可以包括程序代码,该程序代码包括计算机操作指令。
处理器1202可能是中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本申请实施例的一个或多个集成电路。智能设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。
存储器1206,用于存放程序1210。存储器1206可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
程序1210具体可以用于使得处理器1202执行前述任一实施例中的人像检测方法。
程序1210中各步骤的具体实现可以参见前述任一人像检测方法实施例中的相应步骤和单元中对应的描述,在此不赘述。所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的设备和模块的具体工作过程,可以参考前述方法实施例中的对应过程描述,在此不再赘述。
通过本申请实施例的电子设备,获取包括人像的待检测图像后,从待检测图像中提取特征获得多个第一特征图像,然后对各第一特征图像进行融合获得多个第二特征图像,进而根 据各第二特征图像确定待检测图像中人像的分布,由于待检测图像可以从相应场所内采集,因此可以将待检测图像中的人像映射到相应的场所内,从而根据待检测图像中人像的分布,确定相应场所内聚集人员的数量和人员的分布情况,实现了聚集人员数量和人员分布情况的自动检测,无需在场所的各出入口配备工作人员进行人数统计,从而能够节省人力,降低对场所内聚集人员数量进行统计的成本。
本申请还提供了一种计算机可读存储介质,存储用于使一机器执行如本文所述的图像检测方法的指令。具体地,可以提供配有存储介质的系统或者装置,在该存储介质上存储着实现上述实施例中任一实施例的功能的软件程序代码,且使该系统或者装置的计算机(或CPU或MPU)读出并执行存储在存储介质中的程序代码。
在这种情况下,从存储介质读取的程序代码本身可实现上述实施例中任何一项实施例的功能,因此程序代码和存储程序代码的存储介质构成了本申请的一部分。
用于提供程序代码的存储介质实施例包括软盘、硬盘、磁光盘、光盘(如CD-ROM、CD-R、CD-RW、DVD-ROM、DVD-RAM、DVD-RW、DVD+RW)、磁带、非易失性存储卡和ROM。可选择地,可以由通信网络从服务器计算机上下载程序代码。
此外,应该清楚的是,不仅可以通过执行计算机所读出的程序代码,而且可以通过基于程序代码的指令使计算机上操作的操作系统等来完成部分或者全部的实际操作,从而实现上述实施例中任意一项实施例的功能。
此外,可以理解的是,将由存储介质读出的程序代码写到插入计算机内的扩展板中所设置的存储器中或者写到与计算机相连接的扩展模块中设置的存储器中,随后基于程序代码的指令使安装在扩展板或者扩展模块上的CPU等来执行部分和全部实际操作,从而实现上述实施例中任一实施例的功能。
本申请实施例还提供了一种计算机程序产品,所述计算机程序产品被有形地存储在计算机可读介质上并且包括计算机可执行指令,所述计算机可执行指令在被执行时使至少一个处理器执行上述各实施例提供的人像检测方法。应理解,本实施例中的各方案具有上述方法实施例中对应的技术效果,此处不再赘述。
需要说明的是,上述各流程和各系统结构图中不是所有的步骤和模块都是必须的,可以根据实际的需要忽略某些步骤或模块。各步骤的执行顺序不是固定的,可以根据需要进行调整。上述各实施例中描述的系统结构可以是物理结构,也可以是逻辑结构,即,有些模块可能由同一物理实体实现,或者,有些模块可能分由多个物理实体实现,或者,可以由多个独 立设备中的某些部件共同实现。
以上各实施例中,硬件模块可以通过机械方式或电气方式实现。例如,一个硬件模块可以包括永久性专用的电路或逻辑(如专门的处理器,FPGA或ASIC)来完成相应操作。硬件模块还可以包括可编程逻辑或电路(如通用处理器或其它可编程处理器),可以由软件进行临时的设置以完成相应操作。具体的实现方式(机械方式、或专用的永久性电路、或者临时设置的电路)可以基于成本和时间上的考虑来确定。
上文通过附图和优选实施例对本申请进行了详细展示和说明,然而本申请不限于这些已揭示的实施例,基与上述多个实施例本领域技术人员可以知晓,可以组合上述不同实施例中的代码审核手段得到本申请更多的实施例,这些实施例也在本申请的保护范围之内。

Claims (12)

  1. 一种人像检测方法(100),其特征在于,包括:
    获取(101)一张待检测图像,其中,所述待检测图像中包括至少一个人像;
    生成(102)所述待检测图像的至少两个第一特征图像,其中,首个第一特征图像是从所述待检测图像中提取特征后得到的,后一个第一特征图像是从前一个第一特征图像中提取特征后得到的;
    对所述至少两个第一特征图像进行特征融合(103),获得至少两个第二特征图像;
    根据所述至少两个第二特征图像,确定(104)所述待检测图像中人像的分布。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述至少两个第一特征图像进行特征融合(103),获得至少两个第二特征图像,包括:
    根据各所述第一特征图像的生成顺序,对相邻生成的至少两个所述第一特征图像进行特征融合,获得至少两个所述第二特征图像,其中,不同的所述第二特征图像,由不完全相同的至少两个所述第一特征图像进行特征融合而得到。
  3. 根据权利要求2所述的方法,其特征在于,所述根据各所述第一特征图像的生成顺序,对相邻生成的至少两个所述第一特征图像进行特征融合,获得至少两个所述第二特征图像,包括:
    根据各所述第一特征图像被的生成顺序,对第N个生成的第一特征图像进行卷积处理,获得与第N个生成的第一特征图像相对应的第二特征图像,其中,N为第一特征图像的数量;
    将与第n个生成的第一特征图像相对应的第二特征图像,与第n-1个生成的第一特征图像进行特征融合,获得与第n-1个生成的第一特征图像相对应的第二特征图像,其中,n为大于1且小于或等于N的整数。
  4. 根据权利要求3所述的方法,其特征在于,所述将与第n个生成的第一特征图像相对应的第二特征图像,与第n-1个生成的第一特征图像进行特征融合,获得与第n-1个生成的第一特征图像相对应的第二特征图像,包括:
    对与第n个生成的第一特征图像相对应的第二特征图像进行卷积处理(402),获得第三特征图像,其中,与第n个生成的第一特征图像相对应的第二特征图像和所述第三特征图像的尺寸均为C*W*H,C为通道数,W为图像的宽度,H为图像的高度;
    对所述第三特征图像进行双线性插值处理(403),获得第四特征图像,其中,所述第四特征图像的尺寸为C*2W*2H;
    将所述第四特征图像与第n-1个生成的第一特征图像进行特征融合(404),获得第五特征图像,其中,第n-1个生成的第一特征图像的尺寸为C*2W*2H,所述第五特征图像的尺寸为2C*2W*2H;
    对所述第五特征图像进行卷积处理(405),获得与第n-1个生成的第一特征图像相对应的第二特征图像,其中,与第n-1个生成的第一特征图像相对应的第二特征图像的尺寸为C*2W*2H。
  5. 根据权利要求1至4中任一所述的方法,其特征在于,所述根据所述至少两个第二特征图像,确定所述待检测图像中人像的分布,包括:
    分别对每个所述第二特征图像进行感受野增强处理,获得相对应的第五特征图像;
    对各所述第五特征图像进行特征融合,获得一张第六特征图像;
    根据所述第六特征图像,确定所述待检测图像中人像的分布。
  6. 根据权利要求5所述的方法,其特征在于,所述分别对每个所述第二特征图像进行感受野增强处理,获得相对应的第五特征图像,包括:
    针对每个所述第二特征图像,均执行:
    对该第二特征图像进行三次卷积处理,获得第七特征图像,其中,该第二特征图像和所述第七特征图像的尺寸均为C*W*H,C为通道数,W为图像的宽度,H为图像的高度;
    对该第二特征图像进行两次卷积处理,获得第八特征图像,其中,所述第八特征图像的尺寸为C*W*H;
    对该第二特征图像进行一次卷积处理,获得第九特征图像,其中,所述第九特征图像的尺寸为C*W*H;
    对所述第七特征图像、所述第八特征图像和所述第九特征图像进行特征融合,获得第十特征图像,其中,所述第十特征图像的尺寸为3C*W*H;
    对所述第十特征图像进行卷积处理,获得与该第二特征图像相对应的所述第五特征图像,其中,所述第五特征图像的尺寸为C*W*H。
  7. 根据权利要求5所述的方法,其特征在于,所述根据所述第六特征图像,确定所述待检测图像中人像的分布,包括:
    将经归一化处理后的所述第六特征图像输入(701)第一分类器,获得所述第一分类器输出的中心点信息,其中,所述第一分类器用于根据输入的特征图像,确定与该特征图像对应的原图像中人像头部的中心点坐标,所述中心点信息用于指示所述待检测图像中人像头部的中心点坐标;
    将经归一化处理后的所述第六特征图像输入(702)第二分类器,获得所述第二分类器输出的第一图像框信息,其中,所述第二分类器用于根据输入的特征图像,确定用于在该特征图像对应的原图像中标注人像头部的矩形框,所述第一图像框信息包括在所述待检测图像中用于标注人像头部的矩形框的坐标值;
    将经归一化处理的所述第六特征图像输入(703)第三分类器,获得所述第三分类器输出的第二图像框信息,其中,所述第三分类器用于根据输入的特征图像,确定用于在该特征图像对应的原图像中标注人体的矩形框,所述第二图像框信息包括在所述待检测图像中用于标注人体的矩形框的坐标值;
    根据所述中心点信息、所述第一图像框信息和所述第二图像框信息,确定(704)所述待检测图像中人像的分布。
  8. 根据权利要求7所述的方法,其特征在于,所述方法还包括:
    将经归一化处理后的所述第六特征图像输入第四分类器,获得所述第四分类器输出的图像框质量信息,其中,所述第四分类器用于根据输入的特征图像,确定用于表征该特征图像对应的原图像中,标注人像头部的矩形框对人像头部进行标注的准确性的信息,所述图像框质量信息用于指示所述待检测图像中,用于标注人像头部的矩形框对人像头部进行标注的准确性;
    根据所述图像框质量信息,从所述中心点信息中确定目标中心点,其中,所述目标中心点对应的用于标注人像头部的矩形框的准确性小于预设的准确性阈值;
    从所述中心点信息中将所述目标中心点的坐标值删除。
  9. 一种人像检测装置(800),其特征在于,包括用于执行如权利要求1-8中任一项所述方法中各操作的模块。
  10. 一种电子设备(1200),其特征在于,包括:处理器(1202)、通信接口(1204)、存储器(1206)和通信总线(1208),所述处理器(1202)、所述存储器(1206)和所述通信接口(1204)通过所述通信总线(1208)完成相互间的通信;
    所述存储器(1206)用于存放至少一可执行指令,所述可执行指令使所述处理器(1202)执行如权利要求1-8中任一项所述的人像检测方法对应的操作。
  11. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机指令,所述计算机指令在被处理器执行时,使所述处理器执行权利要求1-8中任一项所述的方法。
  12. 一种计算机程序产品,其特征在于,所述计算机程序产品被有形地存储在计算机可读介质上并且包括计算机可执行指令,所述计算机可执行指令在被执行时使至少一个处理器 执行根据权利要求1-8中任一项所述的方法。
PCT/CN2022/107190 2021-08-26 2022-07-21 人像检测方法、装置、电子设备和存储介质 WO2023024779A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110987099.9A CN115731585A (zh) 2021-08-26 2021-08-26 人像检测方法、装置、电子设备和存储介质
CN202110987099.9 2021-08-26

Publications (1)

Publication Number Publication Date
WO2023024779A1 true WO2023024779A1 (zh) 2023-03-02

Family

ID=85289928

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/107190 WO2023024779A1 (zh) 2021-08-26 2022-07-21 人像检测方法、装置、电子设备和存储介质

Country Status (2)

Country Link
CN (1) CN115731585A (zh)
WO (1) WO2023024779A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555481A (zh) * 2019-09-06 2019-12-10 腾讯科技(深圳)有限公司 一种人像风格识别方法、装置和计算机可读存储介质
CN111274994A (zh) * 2020-02-13 2020-06-12 腾讯科技(深圳)有限公司 漫画人脸检测方法、装置、电子设备及计算机可读介质
CN111783749A (zh) * 2020-08-12 2020-10-16 成都佳华物链云科技有限公司 一种人脸检测方法、装置、电子设备及存储介质
US20200356762A1 (en) * 2017-11-10 2020-11-12 Koninklijke Philips N.V. Change-aware person identification
CN114220126A (zh) * 2021-12-17 2022-03-22 杭州晨鹰军泰科技有限公司 一种目标检测系统及获取方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200356762A1 (en) * 2017-11-10 2020-11-12 Koninklijke Philips N.V. Change-aware person identification
CN110555481A (zh) * 2019-09-06 2019-12-10 腾讯科技(深圳)有限公司 一种人像风格识别方法、装置和计算机可读存储介质
CN111274994A (zh) * 2020-02-13 2020-06-12 腾讯科技(深圳)有限公司 漫画人脸检测方法、装置、电子设备及计算机可读介质
CN111783749A (zh) * 2020-08-12 2020-10-16 成都佳华物链云科技有限公司 一种人脸检测方法、装置、电子设备及存储介质
CN114220126A (zh) * 2021-12-17 2022-03-22 杭州晨鹰军泰科技有限公司 一种目标检测系统及获取方法

Also Published As

Publication number Publication date
CN115731585A (zh) 2023-03-03

Similar Documents

Publication Publication Date Title
Masood et al. License plate detection and recognition using deeply learned convolutional neural networks
US11455805B2 (en) Method and apparatus for detecting parking space usage condition, electronic device, and storage medium
WO2020125216A1 (zh) 一种行人重识别方法、装置、电子设备及计算机可读存储介质
CN108875481B (zh) 用于行人检测的方法、装置、系统及存储介质
CN112487848B (zh) 文字识别方法和终端设备
CN110765833A (zh) 一种基于深度学习的人群密度估计方法
CN109344746B (zh) 行人计数方法、系统、计算机设备和存储介质
CN107292302B (zh) 检测图片中兴趣点的方法和系统
KR20110020718A (ko) 타겟 분석 장치 및 방법
CN112016605A (zh) 一种基于边界框角点对齐和边界匹配的目标检测方法
WO2019119515A1 (zh) 人脸分析、过滤方法、装置、嵌入式设备、介质和集成电路
CN106874913A (zh) 一种菜品检测方法
CN107657220A (zh) 一种基于hog特征和svm的白带霉菌自动检测方法
CN111008576A (zh) 行人检测及其模型训练、更新方法、设备及可读存储介质
CN110599463A (zh) 一种基于轻量级联神经网络的舌像检测及定位算法
CN113111838A (zh) 行为识别方法及装置、设备和存储介质
CN112036520A (zh) 基于深度学习的大熊猫年龄识别方法、装置及存储介质
CN113076860B (zh) 一种野外场景下的鸟类检测系统
CN110728193A (zh) 一种脸部图像丰富度特征的检测方法及设备
WO2023024779A1 (zh) 人像检测方法、装置、电子设备和存储介质
CN112287905A (zh) 车辆损伤识别方法、装置、设备及存储介质
CN114708582B (zh) 基于ai和rpa的电力数据智慧稽查方法及装置
KR102416714B1 (ko) 3차원 이미지 및 딥러닝을 활용한 도시 규모 나무 지도 작성 시스템 및 방법
CN113159193B (zh) 模型训练方法、图像识别方法、存储介质及程序产品
Creusen et al. A semi-automatic traffic sign detection, classification, and positioning system

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE