CN108229353B

CN108229353B - Human body image classification method and apparatus, electronic device, storage medium, and program

Info

Publication number: CN108229353B
Application number: CN201711399693.6A
Authority: CN
Inventors: 李玉洁; 旷章晖; 张伟
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2017-12-21
Filing date: 2017-12-21
Publication date: 2020-09-22
Anticipated expiration: 2037-12-21
Also published as: CN108229353A

Abstract

The embodiment of the invention discloses a human body image classification method and device, electronic equipment, a storage medium and a program, wherein the method comprises the following steps: respectively carrying out feature extraction and mask extraction on an image to be processed to obtain image features corresponding to the image to be processed and a region mask corresponding to a human body region in the image to be processed; the image to be processed comprises at least one human body area; fusing the image features and the area masks to obtain area enhancement features of the human body area; and obtaining a classification result of the image to be processed according to the regional enhancement features based on a classification network. The embodiment of the invention can effectively focus on the local information of the image, and not only predict the classification result of the image to be processed according to the global information.

Description

Human body image classification method and apparatus, electronic device, storage medium, and program

Technical Field

The present invention relates to computer vision technologies, and in particular, to a human body image classification method and apparatus, an electronic device, a storage medium, and a program.

Background

The image classification is an image processing method for distinguishing different types of targets according to different characteristics reflected in image information; image classification typically replaces human visual interpretation by classifying an image or each image element in an image into one of several categories by quantitative analysis of the image using a computer.

With the popularity of live webcasting, the standardization of live webcasting has also been scheduled, and pornography monitoring of live webcasting is an important issue. At present, pornography monitoring in images or videos is mainly based on a deep learning method, and features are directly extracted and classified.

Disclosure of Invention

The embodiment of the invention provides a human body image classification technology.

The classification method of the human body image provided by the embodiment of the invention comprises the following steps:

respectively carrying out feature extraction and mask extraction on an image to be processed to obtain image features corresponding to the image to be processed and a region mask corresponding to a human body region in the image to be processed; the image to be processed comprises at least one human body area; the region mask is used for highlighting a human body region in the image to be processed;

fusing the image features and the area mask to obtain area enhancement features of the human body area;

and obtaining a classification result of the image to be processed according to the regional enhancement features based on a classification network.

In another embodiment based on the foregoing method of the present invention, the performing feature extraction and mask extraction operations on the image to be processed respectively to obtain image features corresponding to the image to be processed and an area mask corresponding to a human body area in the image to be processed includes:

performing feature extraction operation on an image to be processed by utilizing a first neural network to obtain image features corresponding to the image to be processed;

performing mask extraction operation on an image to be processed by utilizing a second neural network to obtain a region mask corresponding to a human body region in the image to be processed; the image features are the same size as the area mask in both length and width dimensions.

In another embodiment based on the above method of the present invention, the performing mask extraction on the image to be processed by using the second neural network to obtain an area mask corresponding to a human body area in the image to be processed includes:

performing human body key point detection operation on the image to be processed by utilizing a second neural network to obtain a key point feature map with a human body region outline; the human body key points are used for identifying the positions of human body areas in the image to be processed;

and obtaining the area mask by calculating the distances between all pixel points in the key point feature map and all the human body key points.

In another embodiment based on the above method of the present invention, obtaining the region mask by calculating distances between all pixel points in the keypoint feature map and all the human keypoints, includes:

calculating the distances between the pixel points in the key point feature map and all the human body key points to obtain the target distance values of the pixel points;

obtaining a mask value corresponding to the pixel point based on the target distance value of the pixel point by utilizing a normal distribution function;

and constructing the area mask based on mask values corresponding to all the pixel points in the key point feature map.

In another embodiment based on the above method of the present invention, the obtaining the target distance value of the pixel point by calculating the distance between the pixel point in the keypoint feature map and all the human keypoints includes:

calculating Euclidean distances between the pixel points in the key point feature map and the key points of the human body to obtain a plurality of distance values;

and taking the minimum distance value in the distance values as the target distance value of the pixel point.

In another embodiment of the foregoing method, the obtaining, by using a normal distribution function, a mask value corresponding to the pixel point based on the target distance value of the pixel point includes:

inputting the target distance value of the pixel point into a normal distribution function to obtain a mask intermediate value corresponding to the pixel point;

and calculating to obtain a mask value corresponding to the pixel point based on the mask intermediate value.

In another embodiment of the foregoing method according to the present invention, the obtaining a mask value corresponding to the pixel point based on the mask intermediate value includes:

adding a set value to the obtained mask intermediate value to obtain a corresponding mask value; the mask value corresponding to the pixel point with the distance to the human body key point smaller than the preset value is larger than a preset threshold value, and the mask value corresponding to the pixel point with the distance to the human body key point larger than the preset value is smaller than the preset threshold value.

In another embodiment based on the above method of the present invention, fusing the image feature and the region mask to obtain a region enhancement feature of the human body region, including:

performing dot product operation on elements in the image features and corresponding elements in the area mask to obtain area enhancement features of the human body area after the image features are enhanced; the image features correspond to sizes of the region masks in a length dimension and a width dimension, and the elements include pixel points in a feature map corresponding to the image features or vector values in feature vectors corresponding to the image features.

In another embodiment of the foregoing method according to the present invention, obtaining a classification result of the image to be processed according to the regional enhancement features based on a classification network includes:

obtaining a central point coordinate based on the obtained region enhancement feature; the central point coordinate corresponds to the center of a human body area in the image to be processed;

inputting the regional enhancement features of the central point coordinates into a classification network, obtaining a classification probability value through the classification network, and obtaining a classification result based on the classification probability value.

In another embodiment of the foregoing method according to the present invention, obtaining coordinates of a center point based on the obtained region enhancement features includes:

and screening the image characteristic diagram with enhanced human body region characteristics after fusion to obtain the human body region characteristics of the corresponding human body region, and obtaining the central point coordinates of the corresponding human body region based on the human body region characteristics.

In another embodiment of the above method according to the present invention, obtaining coordinates of a center point of the corresponding body region based on the body region features includes:

and obtaining all coordinates of the corresponding human body region in the image feature map based on the human body region features, and solving a two-dimensional mean value of all the coordinates of the corresponding human body region to obtain the coordinates of the central point of the corresponding human body region.

According to another aspect of the embodiments of the present invention, there is provided a human body image classification apparatus including:

the extraction unit is used for respectively carrying out feature extraction and mask extraction on an image to be processed to obtain image features corresponding to the image to be processed and a region mask corresponding to a human body region in the image to be processed; the image to be processed comprises at least one human body area; the region mask is used for highlighting a human body region in the image to be processed;

the fusion unit is used for fusing the image characteristics and the area mask to obtain the area enhancement characteristics of the human body area;

and the classification unit is used for obtaining a classification result of the image to be processed according to the regional enhancement features based on a classification network.

In another embodiment of the above apparatus according to the present invention, the extracting unit includes:

the characteristic extraction module is used for carrying out characteristic extraction operation on the image to be processed by utilizing a first neural network to obtain image characteristics corresponding to the image to be processed;

the mask extraction module is used for performing mask extraction operation on the image to be processed by utilizing a second neural network to obtain an area mask corresponding to the human body area in the image to be processed; the image features are the same size as the area mask in both length and width dimensions.

In another embodiment of the foregoing apparatus according to the present invention, the mask extracting module includes:

the key point detection module is used for executing human key point detection operation on the image to be processed by utilizing a second neural network to obtain a key point characteristic diagram with a human body region outline; the human body key points are used for identifying the positions of human body areas in the image to be processed;

and the mask area module is used for obtaining the area mask by calculating the distances between all pixel points in the key point feature map and all the human body key points.

In another embodiment of the foregoing apparatus according to the present invention, the mask area module includes:

the distance calculation module is used for calculating the distances between the pixel points in the key point feature map and all the human body key points to obtain the target distance values of the pixel points;

the mask value calculation module is used for obtaining a mask value corresponding to the pixel point based on the target distance value of the pixel point by utilizing a normal distribution function;

and the region determining module is used for forming the region mask based on mask values corresponding to all the pixel points in the key point feature map.

In another embodiment of the above apparatus according to the present invention, the distance calculating module is specifically configured to calculate euclidean distances between the pixel points in the key point feature map and the human key points, and obtain a plurality of distance values; and taking the minimum distance value in the distance values as the target distance value of the pixel point.

In another embodiment of the above apparatus according to the present invention, the mask value calculating module includes:

the intermediate value calculation module is used for inputting the target distance value of the pixel point into a normal distribution function to obtain a mask intermediate value corresponding to the pixel point;

and the mask value acquisition module is used for calculating to obtain a mask value corresponding to the pixel point based on the mask intermediate value.

In another embodiment based on the above apparatus of the present invention, the mask value obtaining module is specifically configured to add a set value to the obtained mask intermediate value to obtain a corresponding mask value; the mask value corresponding to the pixel point with the distance to the human body key point smaller than the preset value is larger than the preset threshold value, and the mask value corresponding to the pixel point with the distance to the human body key point larger than the preset value is smaller than the preset threshold value.

In another embodiment of the apparatus according to the present invention, the fusion unit is specifically configured to perform a dot product operation on elements in the image feature and corresponding elements in the area mask to obtain an area enhanced feature of the human body area after the image feature is enhanced; the image features correspond to sizes of the region masks in a length dimension and a width dimension, and the elements include pixel points in a feature map corresponding to the image features or vector values in feature vectors corresponding to the image features.

In another embodiment of the above apparatus according to the present invention, the classification unit includes:

the center obtaining module is used for obtaining a center point coordinate based on the obtained region enhancement feature; the central point coordinate corresponds to the center of a human body area in the image to be processed;

and the probability classification module is used for inputting the regional enhancement features of the central point coordinates into a classification network, obtaining a classification probability value through the classification network and obtaining a classification result based on the classification probability value.

In another embodiment of the apparatus according to the present invention, the center obtaining module is specifically configured to screen the image feature map with enhanced human body region features after fusion to obtain human body region features corresponding to the human body region, and obtain the coordinates of the center point of the corresponding human body region based on the human body region features.

In another embodiment of the above apparatus according to the present invention, the center obtaining module is further configured to obtain all coordinates of a corresponding human body region in the image feature map based on the human body region feature, and calculate a two-dimensional average value for all coordinates of the corresponding human body region to obtain a center point coordinate of the corresponding human body region.

According to another aspect of the embodiments of the present invention, there is provided an electronic device, including a processor, the processor including the human body image classification apparatus as described above.

According to an aspect of an embodiment of the present invention, there is provided an electronic device, including: a memory for storing executable instructions;

and a processor for communicating with the memory to execute the executable instructions to perform the operations of the human body image classification method as described above.

According to an aspect of the embodiments of the present invention, there is provided a computer storage medium for storing computer-readable instructions, wherein the instructions, when executed, perform the operations of the human body image classification method as described above.

According to an aspect of the embodiments of the present invention, there is provided a computer program, including computer readable code, characterized in that when the computer readable code runs on a device, a processor in the device executes instructions for implementing the steps in the classification method of human body images as described above.

Based on the classification method and apparatus for human body images, the electronic device, the storage medium, and the program provided by the embodiments of the present invention, feature extraction and mask extraction operations are respectively performed on an image to be processed to obtain image features corresponding to the image to be processed and an area mask corresponding to a human body area in the image to be processed; the technical problem that classification is carried out only by means of feature extraction and the human body region in the image is ignored is solved, the region enhancement feature of the human body region is obtained by fusing the image feature and the region mask, the feature corresponding to the human body region in the image is enhanced, and the technical effect that the feature of the human body region is used as a main classification basis is achieved; the obtained regional enhancement features are input into a classification network, corresponding output is obtained through the classification network, local information of the image can be effectively concerned, and a classification result of the image to be processed is predicted according to the global information.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

The invention will be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

fig. 1 is a flowchart of a human body image classification method according to an embodiment of the present invention.

Fig. 2 is a specific example of the classification method of human body images according to the present invention.

Fig. 3 is a schematic structural diagram of an embodiment of the human body image classification device of the present invention.

Fig. 4 is a schematic structural diagram of an electronic device for implementing a terminal device or a server according to an embodiment of the present application.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Embodiments of the invention are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the computer system/server include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.

The computer system/server may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

In the prior art, pornography detection is mainly based on a deep learning method, and features are directly extracted and classified. Although the precision of image classification is remarkably improved along with the development of the deep learning technology, more problems still exist for the complicated classification problem of the scene of the image pornography detection, for example, the image of a sensitive part is difficult to solve by a direct classification mode due to the small-range local pornography action in a normal scene.

Fig. 1 is a flowchart of a human body image classification method according to an embodiment of the present invention. As shown in fig. 1, the method of this embodiment includes:

step 101, respectively performing feature extraction and mask extraction on an image to be processed to obtain image features corresponding to the image to be processed and a region mask corresponding to a human body region in the image to be processed.

Wherein, the image to be processed comprises at least one human body region; the region mask is used for highlighting the human body region in the image to be processed; the mask is a string of binary codes which carry out bit AND operation on the target field and shields the current input bit; specifically, the mask is generated mainly depending on the distance to the key points, the mask has a large value near the key points and a small value far away from the key points, so that when the features are fused, the features near the human body region are enhanced, the features far away from the human body region are inhibited, and the classified attention can be more focused on the region (human body region) needing attention; in this embodiment, the area mask is obtained by extracting the mask based on the human body area in the image to be processed, and is used to shield the feature value corresponding to the image to be processed, where a value of the area mask corresponding to the human body area is large, and a value of the area mask far from the human body area is small.

And step 102, fusing the image characteristics and the area mask to obtain the area enhancement characteristics of the human body area.

The human body region is specially represented in the feature by fusing the feature and the region mask, specifically, the fusing mode may include multiplying or superimposing the feature points at the corresponding positions, and the specific fusing method is not limited by the present invention.

And 103, obtaining a classification result of the image to be processed according to the regional enhancement features based on the classification network.

Based on the classification method for human body images provided by the embodiment of the invention, the image to be processed is respectively subjected to feature extraction and mask extraction, so as to obtain image features corresponding to the image to be processed and an area mask corresponding to a human body area in the image to be processed; the technical problem that classification is carried out only by means of feature extraction and the human body region in the image is ignored is solved, the region enhancement feature of the human body region is obtained by fusing the image feature and the region mask, the feature corresponding to the human body region in the image is enhanced, and the technical effect that the feature of the human body region is used as a main classification basis is achieved; the obtained regional enhancement features are input into a classification network, corresponding output is obtained through the classification network, local information of the image can be effectively concerned, and a classification result of the image to be processed is predicted according to the global information.

In another embodiment of the classification method for human body images according to the present invention, based on the above embodiment, operation 101 includes:

performing feature extraction operation on the image to be processed by utilizing a first neural network to obtain image features corresponding to the image to be processed;

performing mask extraction operation on the image to be processed by utilizing a second neural network to obtain an area mask corresponding to the human body area in the image to be processed; the image features are the same size as the area mask in both the length and width dimensions.

In this embodiment, feature extraction and mask extraction are realized through different neural networks, the neural network for mask extraction may apply a network for key point extraction, specifically, a mask RCNN key point detection neural network may be adopted, and the feature extraction network may specifically adopt a CNN convolutional neural network, and since different tasks cause extracted features to focus on different details, weights among the first neural network, the second neural network and the classification network cannot be shared; the feature has four dimensions, the size in this embodiment refers to the length and width of the feature, and the feature and the area mask are the same in the length and width dimensions, so that feature fusion can be realized.

In a specific example of the above embodiments of the classification method for human body images according to the present invention, performing a mask extraction operation on an image to be processed by using a second neural network to obtain a region mask corresponding to a human body region in the image to be processed, includes:

performing human body key point detection operation on the image to be processed by utilizing a second neural network to obtain a key point characteristic diagram with a human body region outline; the human body key points are used for identifying the positions of human body areas in the image to be processed;

and obtaining the area mask by calculating the distances between all pixel points in the key point feature map and all human body key points.

In this embodiment, since the focus is on the human body region, the second neural network detects key points of the human body, identifies the human body region from the image to be processed, and expresses the position of the corresponding human body region by using a mask to realize the salient representation of the human body region.

In a specific example of the above embodiments of the human body image classification method according to the present invention, obtaining an area mask by calculating distances between all pixel points in a key point feature map and all human body key points includes:

calculating the distances between pixel points in the key point feature map and all human body key points to obtain the target distance value of the pixel points;

acquiring mask values corresponding to the pixel points based on the target distance values of the pixel points by utilizing a normal distribution function;

and forming an area mask based on mask values corresponding to all pixel points in the key point feature map.

Specifically, the mask may be understood as an image with the size same as the feature length and width, for any point on the mask, the minimum value of the distances from the point to all key points is calculated, the minimum value is used as the target distance value of the point, the target distance value of each point is used as a parameter to be input into the normal distribution function, the obtained value is between 0 and 1, and the obtained value plus 1 is the value of the mask. For the key point, the obtained minimum value of the distance is 0, and the function output is 1 respectively through positive and negative phases; a Normal distribution (also called "Normal distribution"), also known as Gaussian distribution (Gaussian distribution), is a probability distribution, which is a distribution of a continuous random variable with two parameters μ and σ 2, the first parameter μ being the mean of the random variable that follows the Normal distribution, and the second parameter σ 2 being the variance of this random variable, so the Normal distribution is denoted as N (μ, σ 2). The normal curve is bell-shaped, with low ends and high middle, and is symmetrical left and right, so it is often called bell-shaped curve.

In a specific example of the above embodiments of the human body image classification method according to the present invention, obtaining a target distance value of a pixel point by calculating distances between the pixel point in the key point feature map and all human body key points includes:

calculating Euclidean distances between pixel points in the key point feature map and key points of a human body to obtain a plurality of distance values;

In this embodiment, the distance between the pixel point and the key point is calculated by calculating the euclidean distance, and in practical application, the distance may be calculated by other distance calculation methods (such as cosine distance, etc.), wherein the minimum value in the distance values may express the distance between one pixel point and the key point, so that the minimum distance value is used as the target distance value, the smaller the target distance value is, the closer the distance is to the human key point, the human key point itself is 0, and by inputting the normal distribution, the mask value in which the value of the pixel point at the human key point is the largest and the value of the pixel point away from the human key point is the smaller is obtained.

In a specific example of the above embodiments of the classification method for human body images according to the present invention, obtaining a mask value corresponding to a pixel point based on a target distance value of the pixel point by using a normal distribution function includes:

inputting the target distance value of the pixel point into a normal distribution function to obtain a mask intermediate value of the corresponding pixel point;

and calculating to obtain a mask value of the corresponding pixel point based on the mask intermediate value.

In this embodiment, a target distance value with a smaller value closer to a human body key point and a larger value farther from the human body key point is obtained through distance calculation, specifically, calculating a mask value may input the target distance value as a parameter into a normal distribution function, and adding a set value (e.g., 1) to the obtained value to obtain a mask value; for the key point, the distance is 0, 1 is output through a normal distribution function, a set value (for example, 2 is obtained by adding 1) is added to obtain an increased numerical value, and the characteristics of the key point can be amplified; and far away from the key point, the distance is large, the target distance value is close to 0 through a normal distribution function, a set value (for example, 1 is added to obtain 1) is added to obtain an increased numerical value, the increased numerical value can normally represent the features of other points except the key point or relatively carry out smaller scaling, so that the features of other areas are normally represented or relatively reduced while the key point of the human body is highlighted in the feature diagram, and the overall features of the image cannot be ignored.

In a specific example of the above embodiments of the human body image classification method according to the present invention, obtaining a mask value of a corresponding pixel point based on a mask intermediate value includes:

adding a set value to the obtained mask intermediate value to obtain a corresponding mask value; the mask value corresponding to the pixel point with the distance from the human body key point smaller than the preset value is larger than the preset threshold value, and the mask value corresponding to the pixel point with the distance from the human body key point larger than the preset value is smaller than the preset threshold value.

In the embodiment, the mask value near the key point of the human body is increased by adding the set value to the mask intermediate value, and the mask value far away from the key point of the human body is reduced; specifically, the mask value range is 1-2, the mask value of the key point is 2, the more distant the key point is, the smaller the mask value is, the empirical value is, that is, when the feature fusion is performed to perform the point multiplication operation, the feature value of the key point is doubled, the feature is enhanced, the mask value far away the key point is close to 1, after the point multiplication, the feature is almost unchanged, the more close the local value near the key point is to 2, the feature can be enhanced, in short, the more distant the key point is, the feature is unchanged (multiplied by 1) after the fusion, the feature is retained, the more close the local feature is enhanced, and the most enhancement of the key point is performed. In principle, the lower limit of the value is 1, the upper limit is not necessarily 2, and the value is only an empirical value and is not used to limit the invention.

In another embodiment of the classification method for human body images according to the present invention, based on the above embodiment, operation 102 includes:

and performing dot product operation on the elements in the image characteristics and the corresponding elements in the area mask to obtain the area enhancement characteristics of the human body area after the image characteristics are enhanced.

The image features correspond to the size of the region mask in the length dimension and the width dimension, the elements comprise pixel points in a feature map corresponding to the image features or vector values in feature vectors corresponding to the image features, the mask values corresponding to the human body regions are large, the feature values of the human body regions are large, the mask values far away from the human body regions are small, the image features after point multiplication are small, the image features of the human body regions are enhanced, the image features far away from the human body regions are restrained, and then the enhanced features of the concerned local information are obtained.

In another embodiment of the classification method for human body images according to the present invention, based on the above embodiment, operation 103 includes:

and inputting the regional enhancement features of the central point coordinates into a classification network, obtaining a classification probability value through the classification network, and obtaining a classification result based on the classification probability value.

In this embodiment, when the regional enhancement features are classified by using a classification network, softmax may be specifically used as the classification network, softmax is a mapping function, and for a classification problem, the classification problem may be divided into two categories (e.g., pornography and non-pornography), the output of the network may be two values, the first value represents the possibility of belonging to non-pornography, the second value represents the possibility of pornography, and for softmax, if there are two numbers a, b. If a > b, a is close to 1, b is close to 0, and a + b is 1 after softmax; a < b, the value after softmax is the probability value that really represents the classification.

In a specific example of the above embodiments of the human body image classification method according to the present invention, obtaining coordinates of a center point based on the obtained region enhancement features includes:

In this embodiment, to obtain the center point coordinates of the body region, the body region needs to be first separated, specifically, the body region features are obtained by screening, and the center point coordinates of the body region can be determined based on the body region features.

In a specific example of the above embodiments of the classification method for human body images according to the present invention, obtaining coordinates of a center point of a corresponding human body region based on characteristics of the human body region includes:

In this embodiment, the center point coordinate is obtained by calculating the two-dimensional mean, but in the actual application process, other modes of calculating the center point coordinate may also be applied, and the mode of calculating the center point coordinate is not limited in the present invention.

Fig. 2 is a specific example of the classification method of human body images according to the present invention. As shown in fig. 2, when the present invention is applied to pornography detection, first, a feature and a human body region mask (intermediate mask value) are obtained by processing an image to be processed corresponding to a first processing frame; then corresponding to a second processing frame, fusing the obtained features and the reconstructed mask (mask value) to obtain region enhancement features for enhancing the human body region; and finally, corresponding to a third processing frame, outputting a classification result of pornography or non-pornography through the judgment of the CNN neural network.

Although highly abstract and global feature information can be obtained after an image passes through a CNN network in the prior art, local detail information is lost, so that local pornographic information can be easily ignored by a general classification model in a large scene (normal scene). According to the method, the characteristics of the human body area are enhanced, and the global characteristics are not reduced or are reduced less, so that the local detail information is not lost while the characteristics of the human body are highlighted, the pornographic detection precision of the picture containing the person is improved, and the false detection rate is reduced.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Fig. 3 is a schematic structural diagram of an embodiment of the human body image classification device of the present invention. The apparatus of this embodiment may be used to implement the method embodiments of the present invention described above. As shown in fig. 3, the apparatus of this embodiment includes:

the extracting unit 31 is configured to perform feature extraction and mask extraction on the image to be processed respectively, so as to obtain an image feature corresponding to the image to be processed and an area mask corresponding to a human body area in the image to be processed.

Wherein, the image to be processed comprises at least one human body region; the region mask is used for highlighting the human body region in the image to be processed; the mask is a string of binary codes which carry out bit AND operation on the target field and shields the current input bit; specifically, the mask is generated mainly depending on the distance to the key points, the mask has a large value near the key points and a small value far away from the key points, so that when the features are fused, the features near the human body area are enhanced, the features far away from the human body area are inhibited, and the classified attention can be more concentrated on the area (human body area) needing attention; in this embodiment, the area mask is obtained by extracting the mask based on the human body area in the image to be processed, and is used to shield the feature value corresponding to the image to be processed, where a value of the area mask corresponding to the human body area is large, and a value of the area mask far from the human body area is small.

And a fusion unit 32, configured to fuse the obtained image feature and the region mask to obtain a region enhancement feature of the human body region.

And the classification unit 33 is configured to obtain a classification result of the image to be processed according to the regional enhancement features based on the classification network.

Based on the classification device for human body images provided by the above embodiment of the present invention, feature extraction and mask extraction operations are respectively performed on an image to be processed to obtain image features corresponding to the image to be processed and an area mask corresponding to a human body area in the image to be processed; the technical problem that classification is carried out only by means of feature extraction and the human body region in the image is ignored is solved, the region enhancement feature of the human body region is obtained by fusing the image feature and the region mask, the feature corresponding to the human body region in the image is enhanced, and the technical effect that the feature of the human body region is used as a main classification basis is achieved; the obtained regional enhancement features are input into a classification network, corresponding output is obtained through the classification network, local information of the image can be effectively concerned, and a classification result of the image to be processed is predicted according to the global information.

In another embodiment of the human body image classification device of the present invention, on the basis of the above embodiment, the extraction unit 31 includes:

the characteristic extraction module is used for carrying out characteristic extraction operation on the image to be processed by utilizing the first neural network to obtain the image characteristics corresponding to the image to be processed;

the mask extraction module is used for performing mask extraction operation on the image to be processed by utilizing the second neural network to obtain an area mask corresponding to the human body area in the image to be processed; the image features are the same size as the area mask in both the length and width dimensions.

In a specific example of the above embodiments of the human body image classification device according to the present invention, the mask extracting module includes:

the key point detection module is used for executing human key point detection operation on the image to be processed by utilizing the second neural network to obtain a key point characteristic diagram with a human body region outline; the human body key points are used for identifying the positions of human body areas in the image to be processed;

and the mask area module is used for obtaining an area mask by calculating the distances between all pixel points in the key point feature map and all human body key points.

In a specific example of the above embodiments of the human body image classification device according to the present invention, the mask region module includes:

the distance calculation module is used for calculating the distances between the pixel points in the key point feature map and all human body key points to obtain the target distance values of the pixel points;

and the area determining module is used for forming an area mask based on mask values corresponding to all pixel points in the key point feature map.

In a specific example of each of the above embodiments of the human body image classification apparatus of the present invention, the distance calculation module is specifically configured to calculate an euclidean distance between a pixel point in the key point feature map and a human body key point, and obtain a plurality of distance values; and taking the minimum distance value in the distance values as the target distance value of the pixel point.

In a specific example of the above embodiments of the apparatus for classifying human body images according to the present invention, the mask value calculating module includes:

the intermediate value calculation module is used for inputting the target distance value of the pixel point into the normal distribution function to obtain the mask intermediate value of the corresponding pixel point;

and the mask value acquisition module is used for calculating to obtain the mask value of the corresponding pixel point based on the mask intermediate value.

In a specific example of each of the above embodiments of the apparatus for classifying human body images according to the present invention, the mask value obtaining module is specifically configured to add a set value to an obtained mask intermediate value to obtain a corresponding mask value; the mask value corresponding to the pixel point with the distance from the human body key point smaller than the preset value is larger than the preset threshold value, and the mask value corresponding to the pixel point with the distance from the human body key point larger than the preset value is smaller than the preset threshold value.

In another embodiment of the human body image classification apparatus according to the present invention, on the basis of the above embodiment, the fusion unit 32 is specifically configured to perform a dot product operation on elements in the image features and corresponding elements in the area mask, so as to obtain the area enhancement features of the human body area after the image features are enhanced.

The image features correspond to the size of the region mask in the length dimension and the width dimension, the elements comprise pixel points in a feature map corresponding to the image features or vector values in feature vectors corresponding to the image features, the mask values corresponding to the human body regions are large, the feature values of the human body regions are large, the mask values far away from the human body regions are small, the features after point multiplication are small, the image features of the human body regions are enhanced, the image features far away from the human body regions are restrained, and then the enhanced features of the concerned local information are obtained.

In another embodiment of the human body image classification method according to the present invention, on the basis of the above embodiment, the classification unit 33 includes:

In a specific example of each of the above embodiments of the apparatus for classifying human body images according to the present invention, the center obtaining module is specifically configured to screen the image feature map with enhanced features of the fused human body region to obtain the features of the human body region corresponding to the human body region, and obtain the coordinates of the center point of the corresponding human body region based on the features of the human body region.

In a specific example of the above embodiments of the human body image classification device according to the present invention, the center obtaining module is further configured to obtain all coordinates of a corresponding human body region in the image feature map based on the human body region feature, and calculate a two-dimensional average value for all coordinates of the corresponding human body region to obtain a center point coordinate of the corresponding human body region.

According to an aspect of the embodiments of the present invention, there is provided an electronic device, which includes a processor, wherein the processor includes the human body image classification apparatus according to any one of the above embodiments of the present invention.

According to an aspect of an embodiment of the present invention, there is provided an electronic apparatus including: a memory for storing executable instructions;

and a processor for communicating with the memory to execute the executable instructions to perform the operations of any of the above-described embodiments of the human body image classification method of the present invention.

According to an aspect of the embodiments of the present invention, there is provided a computer storage medium for storing computer-readable instructions, which when executed, perform the operations of any one of the above embodiments of the human body image classification method according to the present invention.

According to another aspect of the embodiments of the present invention, there is provided a computer program, which includes a computer readable code, when the computer readable code is run on a device, a processor in the device executes instructions of the steps in the classification method of human body images according to the present invention.

The embodiment of the invention also provides electronic equipment, which can be a mobile terminal, a Personal Computer (PC), a tablet computer, a server and the like. Referring now to fig. 4, there is shown a schematic diagram of an electronic device 400 suitable for use in implementing a terminal device or server of an embodiment of the present application: as shown in fig. 4, the computer system 400 includes one or more processors, communication sections, and the like, for example: one or more Central Processing Units (CPUs) 401, and/or one or more image processors (GPUs) 413, etc., which may perform various appropriate actions and processes according to executable instructions stored in a Read Only Memory (ROM)402 or loaded from a storage section 408 into a Random Access Memory (RAM) 403. The communication section 412 may include, but is not limited to, a network card, which may include, but is not limited to, an ib (infiniband) network card.

The processor may communicate with the read only memory 402 and/or the random access memory 430 to execute the executable instructions, connect with the communication part 412 through the bus 404, and communicate with other target devices through the communication part 412, thereby completing the operations corresponding to any one of the methods provided by the embodiments of the present application, for example, performing the feature extraction and the mask extraction on the image to be processed respectively to obtain the features corresponding to the image to be processed and the area mask corresponding to the human body area in the image to be processed; fusing the obtained features and the region mask to obtain region enhancement features of the human body region; inputting the regional enhancement features into a classification network, and outputting a classification result of the image to be processed through the classification network.

In addition, in the RAM403, various programs and data necessary for the operation of the device can also be stored. The CPU401, ROM402, and RAM403 are connected to each other via a bus 404. The ROM402 is an optional module in case of the RAM 403. The RAM403 stores or writes executable instructions into the ROM402 at runtime, and the executable instructions cause the processor 401 to execute operations corresponding to the above-described communication method. An input/output (I/O) interface 405 is also connected to bus 404. The communication unit 412 may be integrated, or may be provided with a plurality of sub-modules (e.g., a plurality of IB network cards) and connected to the bus link.

The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.

It should be noted that the architecture shown in fig. 4 is only an optional implementation manner, and in a specific practical process, the number and types of the components in fig. 4 may be selected, deleted, added or replaced according to actual needs; in different functional component settings, separate settings or integrated settings may also be used, for example, the GPU and the CPU may be separately set or the GPU may be integrated on the CPU, the communication part may be separately set or integrated on the CPU or the GPU, and so on. These alternative embodiments are all within the scope of the present disclosure.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product including a computer program tangibly embodied on a machine-readable medium, the computer program including program code for performing the method illustrated in the flowchart, where the program code may include instructions corresponding to performing the steps of the method provided in the embodiments of the present disclosure, for example, performing feature extraction and mask extraction operations on an image to be processed, respectively, to obtain features corresponding to the image to be processed and an area mask corresponding to a human body area in the image to be processed; fusing the obtained features and the region mask to obtain region enhancement features of the human body region; inputting the regional enhancement features into a classification network, and outputting a classification result of the image to be processed through the classification network. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 401.

The method and apparatus, device of the present invention may be implemented in a number of ways. For example, the method, apparatus and device of the present invention may be implemented by software, hardware, firmware or any combination of software, hardware and firmware. The above-described order for the steps of the method is for illustrative purposes only, and the steps of the method of the present invention are not limited to the order specifically described above unless specifically indicated otherwise. Furthermore, in some embodiments, the present invention may also be embodied as a program recorded in a recording medium, the program including machine-readable instructions for implementing a method according to the present invention. Thus, the present invention also covers a recording medium storing a program for executing the method according to the present invention.

The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A classification method of human body images is characterized by comprising the following steps:

respectively carrying out feature extraction and mask extraction on an image to be processed to obtain image features corresponding to the image to be processed and a region mask corresponding to a human body region in the image to be processed; the image to be processed comprises at least one human body area, and the area mask is used for highlighting the human body area in the image to be processed; the generation of the area mask depends on the distance from the key point of the human body, and the value of the area mask is large near the key point of the human body and is small far away from the key point; the human body key points are used for identifying the positions of human body areas in the image to be processed;

2. The method according to claim 1, wherein the performing feature extraction and mask extraction on the image to be processed respectively to obtain image features corresponding to the image to be processed and an area mask corresponding to a human body area in the image to be processed comprises:

3. The method according to claim 2, wherein performing a mask extraction operation on the image to be processed by using the second neural network to obtain a region mask corresponding to a human body region in the image to be processed comprises:

performing human body key point detection operation on the image to be processed by utilizing a second neural network to obtain a key point feature map with a human body region outline;

4. The method of claim 3, wherein obtaining the region mask by calculating distances between all pixel points in the keypoint feature map and all the human keypoints comprises:

5. The method according to claim 4, wherein the obtaining the target distance value of the pixel point by calculating the distance between the pixel point in the key point feature map and all the human body key points comprises:

6. The method according to claim 4, wherein the obtaining the mask value corresponding to the pixel point based on the target distance value of the pixel point by using a normal distribution function comprises:

7. The method of claim 6, wherein said computing a mask value corresponding to said pixel point based on said masked intermediate value comprises:

adding a set value to the obtained mask intermediate value to obtain a corresponding mask value; the mask value corresponding to the pixel point with the distance to the human body key point smaller than the preset value is larger than the preset threshold value, and the mask value corresponding to the pixel point with the distance to the human body key point larger than the preset value is smaller than the preset threshold value.

8. The method according to any one of claims 1 to 7, wherein fusing the image feature and the region mask to obtain a region enhanced feature of the human body region comprises:

9. The method according to any one of claims 1 to 7, wherein obtaining the classification result of the image to be processed according to the region enhancement features based on a classification network comprises:

10. The method of claim 9, wherein obtaining center point coordinates based on the obtained region enhancement features comprises:

11. The method of claim 10, wherein obtaining center point coordinates of the corresponding body region based on the body region features comprises:

12. A classification apparatus for human body images, comprising:

the extraction unit is used for respectively carrying out feature extraction and mask extraction on an image to be processed to obtain image features corresponding to the image to be processed and a region mask corresponding to a human body region in the image to be processed; the image to be processed comprises at least one human body area, and the area mask is used for highlighting the human body area in the image to be processed; the generation of the area mask depends on the distance from the key point of the human body, and the value of the area mask is large near the key point of the human body and is small far away from the key point; the human body key points are used for identifying the positions of human body areas in the image to be processed;

13. The apparatus of claim 12, wherein the extraction unit comprises:

14. The apparatus of claim 13, wherein the mask extraction module comprises:

the key point detection module is used for executing human key point detection operation on the image to be processed by utilizing a second neural network to obtain a key point characteristic diagram with a human body region outline;

15. The apparatus of claim 14, wherein the mask region module comprises:

16. The apparatus according to claim 15, wherein the distance calculating module is specifically configured to calculate euclidean distances between the pixel points in the key point feature map and the human key points to obtain a plurality of distance values; and taking the minimum distance value in the distance values as the target distance value of the pixel point.

17. The apparatus of claim 15, wherein the mask value calculation module comprises:

18. The apparatus according to claim 17, wherein the mask value obtaining module is specifically configured to add a set value to the obtained mask intermediate value to obtain a corresponding mask value; the mask value corresponding to the pixel point with the distance to the human body key point smaller than the preset value is larger than the preset threshold value, and the mask value corresponding to the pixel point with the distance to the human body key point larger than the preset value is smaller than the preset threshold value.

19. The apparatus according to any one of claims 12 to 18, wherein the fusion unit is specifically configured to perform a dot product operation on elements in the image feature and corresponding elements in the area mask to obtain an enhanced area feature of the human body area after the image feature is enhanced; the image features correspond to sizes of the region masks in a length dimension and a width dimension, and the elements include pixel points in a feature map corresponding to the image features or vector values in feature vectors corresponding to the image features.

20. The apparatus according to any one of claims 12-18, wherein the classification unit comprises:

21. The apparatus according to claim 20, wherein the center obtaining module is specifically configured to filter the image feature map with enhanced human body region features after fusion to obtain human body region features corresponding to the human body region, and obtain coordinates of a center point of the corresponding human body region based on the human body region features.

22. The apparatus according to claim 21, wherein the center obtaining module is further configured to obtain all coordinates of a corresponding body region in the image feature map based on the body region features, and calculate a two-dimensional average value for all coordinates of the corresponding body region to obtain a center point coordinate of the corresponding body region.

23. An electronic device, characterized in that it comprises a processor comprising the human body image classification apparatus of any one of claims 12 to 22.

24. An electronic device, comprising: a memory for storing executable instructions;

and a processor for communicating with the memory to execute the executable instructions to perform the operations of the human body image classification method of any one of claims 1 to 11.

25. A computer storage medium storing computer-readable instructions, wherein the instructions, when executed, perform the operations of the human body image classification method according to any one of claims 1 to 11.