CN111860169B - Skin analysis method, device, storage medium and electronic equipment - Google Patents

Skin analysis method, device, storage medium and electronic equipment Download PDF

Info

Publication number
CN111860169B
CN111860169B CN202010562734.4A CN202010562734A CN111860169B CN 111860169 B CN111860169 B CN 111860169B CN 202010562734 A CN202010562734 A CN 202010562734A CN 111860169 B CN111860169 B CN 111860169B
Authority
CN
China
Prior art keywords
skin
face image
image
feature map
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010562734.4A
Other languages
Chinese (zh)
Other versions
CN111860169A (en
Inventor
陈坤鹏
姚聪
王鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kuangshi Technology Co Ltd
Original Assignee
Beijing Kuangshi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kuangshi Technology Co Ltd filed Critical Beijing Kuangshi Technology Co Ltd
Priority to CN202010562734.4A priority Critical patent/CN111860169B/en
Publication of CN111860169A publication Critical patent/CN111860169A/en
Application granted granted Critical
Publication of CN111860169B publication Critical patent/CN111860169B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/162Detection; Localisation; Normalisation using pixel segmentation or colour matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the technical field of artificial intelligence, and provides a skin analysis method, a skin analysis device, a storage medium and electronic equipment. The skin analysis method comprises the following steps: acquiring a skin image; processing the skin image by using a semantic segmentation network to obtain a segmentation mask for the skin image, wherein the segmentation mask contains information for indicating the defect type of each pixel in the skin image; and determining a connected domain formed by pixels belonging to the same type of skin flaws in the skin image according to the segmentation mask, and determining a positioning frame containing the skin flaws according to the connected domain. The method can detect the flaws on the skin only by utilizing the skin image to be analyzed and the semantic segmentation network, thereby effectively reducing the cost of skin analysis. In addition, the skin analysis is carried out by using the method, so that the user does not need to go offline to a hospital or a beauty institution, and the convenience is high. In addition, the semantic segmentation network adopted by the method can carry out pixel-level segmentation on the skin image, and is also beneficial to accurately detecting skin flaws.

Description

Skin analysis method, device, storage medium and electronic equipment
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a skin analysis method, a skin analysis device, a storage medium and electronic equipment.
Background
At present, people pay more and more attention to skin care and facial cosmetology, in order to analyze the skin condition of a human body and further provide references for further skin care or cosmetology of a user, a traditional mode generally adopts a skin analyzer to carry out test analysis or is judged manually by a professional doctor, equipment cost or labor cost is relatively high, and users often need to go to a hospital or a cosmetology mechanism to go to offline, so that convenience is lacking.
Disclosure of Invention
An object of an embodiment of the present application is to provide a skin analysis method, a skin analysis device, a storage medium and an electronic device, so as to improve the above technical problems.
In order to achieve the above purpose, the present application provides the following technical solutions:
In a first aspect, an embodiment of the present application provides a skin analysis method, including: acquiring a skin image; processing the skin image by using a semantic segmentation network to obtain a segmentation mask for the skin image, wherein the segmentation mask contains information for indicating the flaw category of each pixel in the skin image; and determining a connected domain formed by pixels belonging to the same type of skin flaws in the skin image according to the segmentation mask, and determining a positioning frame containing the skin flaws according to the connected domain.
The method can detect the flaws on the skin (the flaws reflect the skin condition) by only using the skin image to be analyzed and the semantic segmentation network, does not need professional skin analysis equipment or professional personnel intervention, and effectively reduces the cost of skin analysis. In addition, the skin analysis is carried out by the method, and a user does not need to be in close proximity to a hospital or a beauty institution under the line, because the skin image can be completely transmitted on the line, the semantic segmentation network can also be completely deployed on the line, namely the method supports the on-line analysis of the skin, and the convenience is higher. In addition, the semantic segmentation network adopted by the method can carry out pixel-level segmentation on the skin image, so that the method is also beneficial to accurately detecting skin flaws (most areas of the skin flaws are smaller).
In an implementation manner of the first aspect, the processing the skin image with the semantic segmentation network to obtain a segmentation mask for the skin image includes: extracting features of the skin image by using a first convolutional neural network to obtain a first feature map; carrying out multi-scale pooling on the first feature map to obtain a plurality of second feature maps with different scales; convolving each of the plurality of second feature maps to obtain a plurality of third feature maps with different scales; up-sampling each third feature map in the plurality of third feature maps to obtain a plurality of fourth feature maps with the same scale, wherein the scale of the fourth feature map is the same as that of the first feature map; and splicing the first feature map and the fourth feature maps to form a fifth feature map, and performing convolution processing on the fifth feature map to obtain the segmentation mask of the skin image.
In the implementation manner, the first feature map is subjected to multi-scale operation, which is favorable for capturing the features of targets with different scales, so that the quality of a fifth feature map obtained by final fusion is better, and further, the skin image can be better segmented based on the segmentation mask calculated by the fifth feature map.
In an implementation manner of the first aspect, the segmentation mask includes a channel, and each pixel value in the channel takes an enumeration value, where the enumeration value characterizes a defect class of a corresponding pixel in the skin image; or the segmentation mask comprises a plurality of channels, wherein the pixel value in each channel is 0 or 1, and the ordinal number of each channel represents the flaw class of the corresponding pixel in the skin image.
According to the first aspect, the segmentation mask needs to include information indicating the type of defect to which each pixel in the skin image belongs. The above-described implementation exemplifies two schemes for implementing the division mask, scheme 1 is to include this information in the pixel values of the division mask, and scheme 2 is to include this information in the channel ordinal number of the division mask.
In an implementation manner of the first aspect, the determining a positioning frame including a skin defect according to the connected domain includes: determining the minimum circumscribed rectangle of the connected domain; and when the area of the minimum circumscribed rectangle is smaller than a first preset threshold value and/or the length-width ratio of the minimum circumscribed rectangle is smaller than a second preset threshold value, determining the minimum circumscribed rectangle as a positioning frame containing skin flaws.
The minimum bounding rectangle of the connected domain can be directly determined as the locating frame of the skin defect, and appropriate post-treatment can be performed according to the steps set forth in the above implementation manner. If the area of the smallest bounding rectangle is larger (larger than the first preset threshold), it is considered that the skin blemish is not the skin blemish, because most skin blemishes such as vaccinia, blemish, nevi, etc. are smaller, and if the aspect ratio of the smallest bounding rectangle is larger (larger than the second preset threshold), it is considered that the skin blemish is not the skin blemish, because most skin blemish such as vaccinia, blemish, nevi, etc. are approximately circular rather than stripe-shaped. Of course, whether to use the two rules to filter the minimum circumscribed rectangle should also be considered as to the type of skin blemish detected, for example, for skin scars (such as burns, scalds, cuts, etc.), the area may be larger, the shape may be irregular, and the processing logic similar to blemishes such as acnes, spots, moles, etc. cannot be used.
In one implementation manner of the first aspect, the defect class includes: background, acne, blemishes, nevi, and scars.
In an implementation manner of the first aspect, the skin image is a face image, and the method further includes: extracting key points from the face image; dividing the face image into a plurality of areas according to the extracted key points, wherein the face image in each area is a local face image; extracting features of the local face image by using a second convolutional neural network to obtain a sixth feature map; inputting the sixth feature map to at least one classification prediction branch, and obtaining a classification prediction score of each classification prediction branch output for one face attribute in the local face image; wherein the face attribute comprises a skin attribute; and determining at least one face attribute in the face image according to the obtained at least one classification prediction score.
In the above implementation manner, if the skin image is a facial image, besides performing skin flaw detection, a convolutional neural network may be used to perform facial attribute analysis (including analysis on skin), so as to obtain a richer facial skin condition, and provide valuable reference data for further skin care or cosmetology for the user.
In an implementation manner of the first aspect, the feature extraction of the local face image by using a second convolutional neural network, to obtain a sixth feature map, includes: extracting multi-level features of the local face image by using a plurality of convolution modules which are connected in sequence, and splicing the multi-level features into the sixth feature map; wherein the plurality of convolution modules includes a Block module in ResNet networks and Inception Module in GoogleNet networks.
The ResNet network can effectively solve the gradient dispersion problem, the multi-scale convolution structure of the GoogleNet network can effectively extract the characteristics of targets with different scales, the characteristics of the two networks are mainly represented by convolution modules contained in the two networks, wherein the convolution modules in the ResNet network are Block modules, and the convolution modules in the GoogleNet network are Inception Module modules. In the implementation manner, the advantages of ResNet network and GoogleNet network are combined, and multi-level feature fusion is performed, so that the quality of the extracted features is improved, and the result of face attribute analysis is more accurate.
In an implementation manner of the first aspect, the extracting, by using a plurality of convolution modules that are sequentially connected, a multi-level feature of the local face image, and splicing the multi-level feature into the sixth feature map includes: extracting low-level features of the local face image by using at least one Block module in ResNet networks; extracting middle layer features of the partial face image based on the low layer features of the partial face image using at least one Inception Module module in the GoogleNet network; extracting, with at least one Inception Module module in a GoogleNet network, high-level features of the partial face image based on mid-level features of the partial face image; and splicing the low-layer features, the middle-layer features and the high-layer features into the sixth feature map.
The low-level features are mainly outline features, the high-level features are mainly abstract semantic features, and in the implementation mode, the low-level, medium-level and high-level features are extracted and fused, so that the robustness of feature extraction is enhanced, and the extracted features can better represent targets with different scales.
In an implementation manner of the first aspect, the inputting the sixth feature map into at least one classification prediction branch, obtaining a classification prediction score for one face attribute in the local face image output by each classification prediction branch, includes: and the sixth feature map is processed through a full-connection layer and then is input into at least one classification prediction branch, and the input feature map is processed through the full-connection layer and a classifier which are sequentially connected in each classification prediction branch to obtain a classification prediction score which is output by each classification prediction branch and aims at one face attribute in the local face image.
In the implementation manner, features in the sixth feature map are integrated through the full-connection layer, the integrated features are input into each classification prediction branch, each classification prediction branch outputs a classification prediction score for one face attribute in the partial face images, and the partial face images at different positions may be input into different classification prediction branches to obtain classification prediction scores of different face attributes. For example, a partial face image of an eye position may be input to two classification prediction branches for predicting the presence or absence of an eye pocket and for predicting the presence or absence of a dark eye ring, a score of the presence or absence of an eye pocket and a score of the presence or absence of a dark eye ring are obtained, and then it is possible to further determine whether or not an eye pocket and a dark eye ring are present based on these two scores.
In one implementation manner of the first aspect, the skin property includes: whether or not pouch is present, whether or not dark eye circles are present, whether or not various wrinkles are present, whether or not skin is oily, whether or not pores are large, and whether or not blackhead is present.
In an implementation manner of the first aspect, the face attribute further includes a face feature attribute, where the face feature attribute includes whether the eyelid is double eyelid.
Although the method of the present application is mainly used for skin analysis, since the face has been divided into regions, the division result may be used to analyze some other face attributes (such as facial feature attributes) by "taking" the result of the division, and these attributes are not directly indicative of the skin condition, but are still beneficial to fully recognizing the current face.
In one implementation manner of the first aspect, the method further includes: and carrying out weighted calculation according to the skin property determined from the face image and the skin defect detected from the face image to obtain the skin total score of the face image.
In the implementation manner, the skin type total score of the face image is obtained according to the skin type attribute and the skin blemish weighting, so that the user can know the condition of the face skin in general, and the total score is directly given, so that the method is more practical and friendly for users who do not want to know specific skin problems and only want to know the general condition of the skin quickly.
In a second aspect, an embodiment of the present application provides a skin analysis device, including: the image acquisition module is used for acquiring skin images; the semantic segmentation module is used for processing the skin image by utilizing a semantic segmentation network to obtain a segmentation mask aiming at the skin image, wherein the segmentation mask contains information for indicating the defect type of each pixel in the skin image; and the flaw positioning module is used for determining a connected domain formed by pixels belonging to the same type of skin flaws in the skin image according to the segmentation mask, and determining a positioning frame containing the skin flaws according to the connected domain.
In a third aspect, embodiments of the present application provide a computer readable storage medium having stored thereon computer program instructions which, when read and executed by a processor, perform the method provided by the first aspect or any one of the possible implementations of the first aspect.
In a fourth aspect, an embodiment of the present application provides an electronic device, including: a memory and a processor, the memory having stored therein computer program instructions which, when read and executed by the processor, perform the method of the first aspect or any one of the possible implementations of the first aspect.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a skin analysis method provided by an embodiment of the present application;
FIG. 2 shows a block diagram of a semantic segmentation network provided by an embodiment of the present application;
FIG. 3 is a flow chart of another skin analysis method provided by an embodiment of the present application;
fig. 4 is a schematic diagram showing a key point detection result provided by an embodiment of the present application;
fig. 5 is a schematic diagram illustrating a region division result of a face image according to an embodiment of the present application;
FIG. 6 illustrates a block diagram of a second convolutional neural network provided by an embodiment of the present application;
fig. 7 shows a block diagram of an image processing apparatus according to an embodiment of the present application;
fig. 8 shows a schematic diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Fig. 1 shows a flowchart of a skin analysis method according to an embodiment of the present application, which may be performed by an electronic device, and fig. 8 shows a possible structure of the electronic device, which will be described later with reference to fig. 8. The main functions to be performed by the skin analysis method of fig. 1 are to detect skin blemishes, including but not limited to, detection of blemishes, spots, moles, scars, etc. on the skin. Referring to fig. 1, the method includes:
step S110: a skin image is acquired.
The skin image is an image including human skin, and may be, for example, a face image, a leg image, an arm image, etc., and will be mainly described below as an example of the face image for simplicity. The manner of acquiring the skin image is not limited, and may be, for example, real-time acquisition or acquisition from some ready-made dataset. The skin image may originate locally from the electronic device or from the network.
Step S120: the skin image is processed by using the semantic segmentation network, and a segmentation mask for the skin image is obtained.
Semantic segmentation refers to the division of pixels in an image into semantically interpretable categories, which can be achieved by means of a semantic segmentation network, which can be, for example, a convolutional neural network. In particular to the solution of the present application, the above semantically interpretable category refers to a flaw category, which includes but is not limited to: background, blemishes, spots, nevi, and scars, where background refers to non-blemishes, i.e., skin that is blemished or non-skin.
Most skin blemishes are small in area (e.g., blemishes, spots, moles are not large in general), and if the blemishes are detected with less accuracy than the pixel level, the detection result is likely to be inaccurate, and according to the definition above, the semantic segmentation network has pixel-level image segmentation capability, that is, can output the blemish category of each pixel, so that the use of the semantic segmentation network is more advantageous for accurately detecting skin blemishes than the use of other target detection algorithms.
Specifically, the semantic segmentation network outputs a segmentation mask for the skin image as a segmentation result, and the segmentation mask contains information for indicating the defect type of each pixel in the skin image (which is the meaning of pixel-level segmentation), so that the pixels belonging to each defect type in the skin image can be segmented by performing a certain operation (the operation mode is related to mask definition) on the skin image based on the segmentation mask.
In one implementation, the segmentation mask includes a channel, and each pixel value in the channel takes an enumeration value, where the enumeration value characterizes a defect class of a corresponding pixel in the skin image, so that a pixel belonging to any defect class in the skin image can be segmented according to the enumeration value. For example, the width and height of the skin image are w and h pixels, respectively, and the defect types are four types (background, acne, speckle, nevus), then the dimensions of the division mask are (w, h, 1), that is, the pixels in the division mask and the pixels in the skin image are in one-to-one correspondence, the third dimension 1 of the division mask represents the number of channels, and each pixel value of the division mask can take four enumerated values of 0,1, 2, and 3, which represent the defect types of the corresponding pixels in the skin image: 0 represents the corresponding pixel as background, 1 represents the corresponding pixel as acne, 2 represents the corresponding pixel as spot, and 3 represents the corresponding pixel as mole. For example, to segment the set of pixels belonging to the vaccinia in the skin image, only the pixels with the value of 1 in the segmentation mask are required to be reserved in the corresponding pixels in the skin image, and the pixels with the value of not 1 in the segmentation mask are required to be zeroed in the corresponding pixels in the skin image. In summary, this implementation includes information of the defect class to which each pixel in the skin image belongs in the pixel values of the segmentation mask.
In another implementation, the segmentation mask includes a plurality of channels, the pixel value in each channel is 0 or 1, and the ordinal number of each channel represents the defect class of the pixel corresponding to the channel in the skin image, so that each channel can be used to segment the pixel belonging to any defect class in the skin image. For example, the width and height of the skin image are w and h pixels, respectively, and the defect types are four types (background, acne, speckle, nevus), then the dimensions of the division mask are (w, h, 4), that is, the pixels in the division mask are in one-to-one correspondence with the pixels in the skin image, the third dimension 4 of the division mask represents the number of channels, and the ordinal numbers of the four channels represent the defect types of the pixels corresponding to the channels in the skin image, respectively: 0 represents the corresponding pixel as background, 1 represents the corresponding pixel as acne, 2 represents the corresponding pixel as spot, 3 represents the corresponding pixel as mole (assuming that the channels are counted from 0), the pixel value in each channel can only take 0 or 1,1 indicates that the pixel of the skin image at the position is the pixel corresponding to the current channel, and 0 indicates that the pixel of the skin image at the position is not the pixel corresponding to the current channel. For example, those pixels in the channel with the ordinal number 1 of the segmentation mask, which take a value of 1, correspond to the set of pixels belonging to vaccinia in the skin image. Therefore, to segment the set of pixels belonging to the acnes in the skin image, the channel with the ordinal number of 1 in the segmentation mask and the skin image can be subjected to an and operation, and the non-zero pixels in the operation result are the pixels belonging to the acnes in the skin image.
It is of course not excluded that the division mask also has other implementations.
In some alternatives, step S120 may be implemented according to the following steps, and reference is made to the structure diagram of the semantic segmentation network shown in fig. 2 when the following steps are set forth.
(1) And extracting features of the skin image by using a first convolutional neural network to obtain a first feature map. The first convolutional neural network specially used for feature extraction may also be referred to as a backbone network (backbone), and the backbone network may be implemented by using some mature networks with good feature extraction capability, such as ResNet network, resNet network, VGG16 network, VGG19 network, googleNet network, etc., and of course, a new network may also be designed as the backbone network.
(2) And carrying out multi-scale pooling on the first feature map to obtain a plurality of second feature maps with different scales. For example, in FIG. 2, the first feature map is pooled into four different scale second feature maps, the scales of which are 1/1, 1/2, 1/4, 1/6 of the first feature map, respectively, where 1/1 is understood to be that pooling is not performed. Of course, the number of the second feature maps with different scales and the specific scale values can be flexibly set according to requirements, and fig. 2 is only an example. It should also be noted that the original first feature map will also remain a copy for performing the subsequent step (5).
(3) And carrying out convolution processing on each second characteristic diagram in the plurality of second characteristic diagrams to obtain a plurality of third characteristic diagrams with different scales. The convolution processing herein may refer to processing with one or more convolution layers, although other layers may be interspersed in the convolution layers. Referring to fig. 2, the four second feature maps with different scales are convolved to obtain four third feature maps with different scales.
(4) And upsampling each third characteristic diagram in the plurality of third characteristic diagrams to obtain a plurality of fourth characteristic diagrams with the same scale, wherein the scale of the fourth characteristic diagrams is the same as that of the first characteristic diagrams. Since pooling in step (2) can also be considered as a down-sampling operation (except 1/1 pooling), step (4) can be considered as an inverse operation of step (2) for restoring the scale of the third feature map to the scale of the original first feature map.
(5) And splicing (Concat) the first characteristic map and the fourth characteristic maps to obtain a fifth characteristic map, and carrying out convolution processing on the fifth characteristic map to obtain a segmentation mask of the skin image. Wherein the first feature map is retained in step (2), and the plurality of fourth feature maps are obtained in step (4). The convolution processing herein may refer to processing with one or more convolution layers, although other layers may be interspersed in the convolution layers. Possible forms of the segmentation mask have been exemplified above and are not repeated here.
In the implementation of the semantic segmentation network, the first feature map is subjected to multi-scale operation, so that the features of targets with different scales in the skin image can be captured, the quality of a fifth feature map obtained through final fusion is better, and the skin image can be segmented better by a segmentation mask calculated based on the fifth feature map. Of course, the semantic segmentation network is not only capable of such an implementation, for example, in some implementations, it is also possible to pool the first feature map on only a single scale.
Step S130: and determining a connected domain formed by pixels belonging to the same type of skin flaws in the skin image according to the segmentation mask, and determining a positioning frame containing the skin flaws according to the connected domain.
It has already been mentioned in the description of step S120 that the pixels belonging to each defect class in the skin image can be segmented by using the segmentation mask, and the connected domain of the skin defect class can be formed for the pixels belonging to the same skin defect class and adjacent to each other (the isolated pixels can be regarded as forming one connected domain by themselves). The location of each connected domain and the corresponding defect type may be recorded, in an ideal case, one connected domain is a skin defect to be detected, such as a acne, a spot, a nevus, a scar, etc.
In some implementations, for each connected domain, a positioning frame containing pixels constituting the connected domain is output according to the positions of the pixels, which indicates that a skin defect is detected at the positioning frame, and of course, the defect types can be output simultaneously. For example, the positioning frame may be the smallest circumscribed rectangle of the connected domain.
In other implementations, a positioning frame is not output for each connected domain, and although the semantic segmentation network has higher precision, the semantic segmentation network cannot ensure that the skin flaws detected according to the network are correct, so that a certain condition can be set to properly screen the obtained connected domain, and only the connected domain meeting the condition is considered to be a real skin flaw, the corresponding positioning frame can be output for the connected domain.
For example, it may be determined that the smallest bounding rectangle of the connected domain is first determined, and then it is determined whether the area of the smallest bounding rectangle is smaller than the first preset threshold, and if the area of the smallest bounding rectangle is smaller (smaller than the first preset threshold), it is considered that the skin blemish is included therein, because most skin blemishes such as acne, speckle, nevi, etc. are smaller.
For another example, it may be determined that the minimum bounding rectangle of the connected domain is first determined, and then it is determined whether the aspect ratio of the minimum bounding rectangle is smaller than the second preset threshold, and if the aspect ratio of the minimum bounding rectangle is smaller (smaller than the second preset threshold), it is considered that the skin blemish is included therein, because most skin blemishes such as acne, spots, nevi are approximately circular rather than stripe-shaped.
In some implementations, the above two determinations may also be combined, namely that a skin imperfection is considered to be contained therein only if the area of the smallest bounding rectangle is less than a first predetermined threshold and the aspect ratio is less than a second predetermined threshold.
Of course, if the two above judging rules are adopted to filter the smallest external rectangle, the type of skin blemish corresponding to the connected domain is also considered, for example, for skin scars (such as burns, scalds, incised wounds, etc.), the area may be larger, the shape may be irregular, and the processing logic similar to blemishes such as acnes, spots, moles, etc. cannot be adopted.
It will be appreciated that other methods of screening the connected domain may be used, for example, for a connected domain with too small an area (e.g. comprising only one or two pixels), which may be noise in the image, or may be excluded from skin imperfections. For another example, if the skin image is a face image, the face image is usually located in the center of the screen, and if the detected connected domain is located at the corner or edge of the screen, it can be regarded as a false detection because there is no skin at these positions, and so on.
In summary, the skin analysis method can detect the flaws on the skin by only using the skin image to be analyzed and the semantic segmentation network, and the flaws reflect the skin condition and can be used as the reference for further skin care or cosmetology of the user. The method is carried out without using professional skin analysis equipment and without intervention of professional staff, thereby effectively reducing the cost of skin analysis. In addition, the skin analysis is carried out by the method, a user does not need to be sent to a hospital or a beauty institution under the line, because the skin image can be completely transmitted on the line, the semantic segmentation network can be completely deployed on the line, for example, the semantic segmentation network is deployed at a server side, the user submits the skin image of the user from a client side or a webpage to the server side, and the server side returns a result to the user after analyzing the skin image by the method, namely, the method supports the on-line analysis of the skin and has higher convenience. In addition, as already mentioned above, the semantic segmentation network adopted by the method can carry out pixel-level segmentation on the skin image, so that the method is also beneficial to accurately detecting skin flaws with small areas, but attention is paid to the fact that the method is not limited to detecting skin flaws with small areas or detecting skin flaws with large areas.
In addition to detecting flaws on the face, the face attribute can be analyzed for the case that the skin image is a face image, wherein the face attribute comprises the skin attribute of the face, and the skin attribute comprises, but is not limited to, whether an eye pocket exists, whether dark eyes exist, whether various wrinkles exist, whether skin is oily, whether pores are large, whether blackheads exist and the like, so that the skin condition of the face can be analyzed more comprehensively, and more valuable reference data can be provided for further skin care or cosmetology of a user. Fig. 3 shows a flowchart of another skin analysis method provided by an embodiment of the present application, which may be performed by an electronic device, and fig. 8 shows a possible structure of the electronic device, which will be described later with reference to fig. 8. The main function to be implemented by the skin analysis method in fig. 3 is face attribute analysis, and the method in fig. 3 may be performed before or after the method in fig. 1, may be performed in parallel with the method in fig. 1, or may not perform the method in fig. 1 but only perform the method in fig. 3. Referring to fig. 3, the method includes:
Step S210: and acquiring a face image.
Step S210 is similar to step S110 and will not be repeated. In particular, if the method in fig. 3 is performed after the method in fig. 1, since the face image has already been acquired in step S110, the face image may be acquired again without performing step S210.
Step S220: and extracting key points from the face image.
Key points in a face refer to certain locations in the face that have specific features, such as corners of the mouth, corners of the eyes, etc. The key point extraction may be performed by some existing methods, for example, a multitasking convolutional neural network (Multi-TASK CASCADED Convolutional Neural Networks, abbreviated as MTCNN), a cascade regression tree (EnsembleofRegressionTrees, abbreviated as ERT), and the principles thereof may be referred to in the prior art, and are not specifically described herein. Fig. 4 shows a schematic diagram of the extraction result of the keypoints, each circle representing one keypoint.
Step S230: the face image is divided into a plurality of areas according to the extracted key points, and the face image in each area is a local face image.
The face image may be divided into a plurality of preset areas, such as forehead (rectangular frame 1 in fig. 5), eyes (rectangular frames 2, 3 in fig. 5), cheeks (rectangular frames 4, 5 in fig. 5), chin (rectangular frame 6 in fig. 5), and the like, according to the positions of the extracted key points. The division of the regions, although not strictly limited, may enable each region to cover one or more face properties to be analyzed in a subsequent step, for example, assuming that the face is analyzed for the presence of dark circles, which are typically located in the lower part of the eye, so that the ocular region may cover the skin of the lower part of the eye when the ocular region is divided.
The following describes a specific method for dividing an eye region based on the positions of key points, taking the division of the eye region as an example: the upper boundary of a key point positioning rectangular frame (namely a rectangular frame corresponding to an eye region) above the eyebrows is selected, the inner boundary of the rectangular frame is positioned by the key point positioned at the innermost side of the left eye corner and the right eye corner, the outer boundary of the rectangular frame is positioned by the key point positioned on the face outline is selected, the lower boundary of the rectangular frame is positioned by the key point positioned above the cheeks, and after the boundary position of the rectangular frame is determined, the position of the whole rectangular frame is determined, so that the division of the eye region is completed. The division of other regions may be performed in a similar manner.
Although it is also possible to perform face attribute analysis based on the entire face image, after dividing the face region, the feature extraction in the subsequent step can be made more accurate by using the partial face image for face attribute analysis. Still take the analysis of whether the human face has the black eye as an example, if the whole human face image is used for extracting the characteristics related to the black eye, although the method is feasible in theory, the accuracy is poor because the coverage range of the human face image is larger, and if the partial image of the eye area is used for extracting the characteristics, the content is relatively simple because the coverage range of the partial image is smaller, the accuracy of the characteristic extraction is improved, and the accuracy of judging whether the human face has the black eye is improved.
Step S240: and carrying out feature extraction on the local face image by using a second convolution neural network to obtain a sixth feature map.
As with the first convolutional neural network mentioned above, the second convolutional neural network dedicated for feature extraction belongs to the skeleton network, and the second convolutional neural network can be implemented by using some mature networks with good feature extraction capability, such as ResNet network, resNet network, VGG16 network, VGG19 network, googleNet network, etc., and of course, a new network can be designed as the skeleton network.
Considering that the ResNet network can effectively solve the gradient dispersion problem, the multi-scale convolution structure of the GoogleNet network can effectively extract the characteristics of targets with different scales, and the characteristics of the two networks are mainly reflected by the structural design of the convolution modules contained in the multi-scale convolution structure (the convolution modules in the ResNet network are Block, and the specific structures of the blocks and Inception Module modules in the GoogleNet network are Inception Module, and the prior art can be referred to), in one implementation manner of the second convolution neural network, in order to combine the advantages of the ResNet network and the GoogleNet network, a plurality of convolution modules (including the Block module in the ResNet network and the Inception Module module in the GoogleNet network) connected in sequence can be firstly utilized to extract the multi-level characteristics of the local face image, then the multi-level characteristics are fused into a sixth characteristic map, and the fusion manner can adopt stitching (Concat). The implementation mode combines the advantages of ResNet network and GoogleNet network through feature fusion, and performs multi-level feature extraction, so that the quality of the extracted features is improved, and the subsequent result of face attribute analysis is more accurate.
One specific network design for the above implementation is given below and is illustrated in connection with the network architecture shown in fig. 6:
(1) And extracting the low-level features of the local face image by using at least one Block module in ResNet networks. For example, referring to fig. 6, the res net network may be divided into 5 parts, respectively conv1, conv2_x, conv3_x, conv4_x and conv5_x (details about these 5 parts may refer to the materials about ResNet networks in the prior art), and the parts other than conv1 include Block modules, and the first 3 parts are used in fig. 6 to extract the low-level features of the partial face image.
(2) With at least one Inception Module module in the GoogleNet network, mid-level features of the partial face image are extracted based on low-level features of the partial face image. For example, referring to fig. 6, 6 Inception Module modules in series are used to further extract the middle layer features of the partial face image based on the lower layer features of the partial face image.
(3) With at least one Inception Module module in the GoogleNet network, high-level features of the partial face image are extracted based on the mid-level features of the partial face image. For example, referring to fig. 6, 4 serially connected Inception Module modules are used to further extract the high-level features of the partial face image based on the mid-level features of the partial face image.
(4) And splicing the low-layer features, the middle-layer features and the high-layer features of the local face image into a sixth feature map.
In the network structure, the low-level features are mainly outline features in the local face image, the high-level features are mainly abstract semantic features in the local face image, the middle-level features are positioned between the low-level features, the middle-level features and the high-level features, the low-level features, the middle-level features and the high-level features are extracted and fused, the robustness of feature extraction is improved, and the extracted features can better represent targets with different scales. Of course, the network also combines the advantages of each of ResNet and GoogleNet networks.
Step S250: and inputting the sixth feature map to at least one classification prediction branch to obtain a classification prediction score, which is output by each classification prediction branch, for one face attribute in the local face image.
Step S260: and determining at least one face attribute in the face image according to the obtained at least one classification prediction score.
The above two steps are described in conjunction. Each prediction branch is also a network, the input of the network is a sixth feature map, and the output is a classification prediction score aiming at one face attribute in the local face image, so that the corresponding face attribute can be determined based on the score. Note that for a partial face image, to analyze which face attributes in the image, the sixth feature map corresponding to the partial face image is input to the corresponding classification prediction branch, and the sixth feature map corresponding to the partial face image is not required to be input to all classification prediction branches. For example, the partial face image of the eye position may be input to two classification prediction branches for predicting whether or not there is an eye pocket and for predicting whether or not there is a black eye, a score of whether or not there is an eye pocket and a score of whether or not there is a black eye are obtained, and then based on the two scores, it is possible to further determine whether or not there is an eye pocket and a black eye of the face in the partial face image, for example, a threshold is set, and if the score is greater than the threshold, it is determined that there is an eye pocket, otherwise it is determined that there is no eye pocket.
Further, in some implementations of step S250, the sixth feature map may be processed by a full-connection layer (multiple full-connection layers may also be used) to integrate features in the sixth feature map, and then the integrated features are input to at least one classification prediction branch, where each classification prediction branch is formed by a full-connection layer (multiple full-connection layers may also be used) plus a classifier (such as softmax), and a classification prediction score of a face attribute corresponding to the classification prediction branch may be output, and then in step S260, a corresponding face attribute may be determined based on the classification prediction score.
The above-mentioned human face attributes are all attributes related to skin, but other attributes which are irrelevant to skin or at least have no obvious relationship, such as human face feature attributes (such as double eyelid, etc.) exist in the human face. Although the skin analysis method in fig. 3 is mainly used for skin analysis, since the face has been divided into regions, some other face attributes may be analyzed "by the way" using the division result, and these attributes are not directly indicative of the skin condition, but are still beneficial to fully recognizing the current face. For example, to analyze whether or not the partial face image of the eye position is double-eyelid, a classification prediction branch for predicting whether or not the partial face image is double-eyelid may be inputted to obtain a score of whether or not the partial face image is double-eyelid, and then whether or not the face in the partial face image is double-eyelid may be further determined based on the score.
In some implementations, if the method in fig. 1 is performed, the method in fig. 3 is performed, and the skin texture total score of the face image may be obtained by performing a weighted calculation according to the skin texture attribute determined from the face image and the skin blemish detected from the face image. For example, a corresponding weight and score may be set for each skin attribute, and a corresponding weight and score may be set for each flaw, and weighted summation may be performed. The score of the skin property can directly adopt a classification prediction score, the score of the flaw can be the average value of the scores of all pixels forming the flaw, and the scores of the pixels forming the flaw can be given by the semantic segmentation network together when the segmentation mask is output, so that the flaw classification score of the pixel is represented. The skin total score is convenient for users to know the condition of facial skin as a whole, and the direct giving of the total score is more practical and friendly for users who do not want to know specific skin problems and only want to know the general condition of skin quickly.
Fig. 7 shows a functional block diagram of a skin analysis device 300 according to an embodiment of the present application. Referring to fig. 7, the skin analysis device 300 includes:
an image acquisition module 310 for acquiring a skin image;
the semantic segmentation module 320 is configured to process the skin image by using a semantic segmentation network, and obtain a segmentation mask for the skin image, where the segmentation mask includes information for indicating a defect class to which each pixel in the skin image belongs;
The defect positioning module 330 is configured to determine a connected domain formed by pixels belonging to the same type of skin defect in the skin image according to the segmentation mask, and determine a positioning frame containing the skin defect according to the connected domain.
In one implementation of the skin analysis device 300, the semantic segmentation module 320 processes the skin image using a semantic segmentation network, the obtaining a segmentation mask for the skin image comprising: extracting features of the skin image by using a first convolutional neural network to obtain a first feature map; carrying out multi-scale pooling on the first feature map to obtain a plurality of second feature maps with different scales; convolving each of the plurality of second feature maps to obtain a plurality of third feature maps with different scales; up-sampling each third feature map in the plurality of third feature maps to obtain a plurality of fourth feature maps with the same scale, wherein the scale of the fourth feature map is the same as that of the first feature map; and splicing the first feature map and the fourth feature maps to form a fifth feature map, and performing convolution processing on the fifth feature map to obtain the segmentation mask of the skin image.
In one implementation of the skin analysis device 300, the segmentation mask includes a channel, and each pixel value in the channel takes an enumeration value, where the enumeration value characterizes a defect class of a corresponding pixel of the skin image; or the segmentation mask comprises a plurality of channels, wherein the pixel value in each channel is 0 or 1, and the ordinal number of each channel represents the flaw class of the corresponding pixel in the skin image.
In one implementation of the skin analysis device 300, the defect localization module 330 determines a localization frame containing a skin defect according to the connected domain, including: determining the minimum circumscribed rectangle of the connected domain; and when the area of the minimum circumscribed rectangle is smaller than a first preset threshold value and/or the length-width ratio of the minimum circumscribed rectangle is smaller than a second preset threshold value, determining the minimum circumscribed rectangle as a positioning frame containing skin flaws.
In one implementation of the skin analysis device 300, the defect categories include: background, acne, blemishes, nevi, and scars.
In one implementation of the skin analysis device 300, the skin image is a face image, the device further comprising:
The key point extraction module is used for extracting key points from the face image;
The region dividing module is used for dividing the face image into a plurality of regions according to the extracted key points, and the face image in each region is a local face image;
the feature extraction module is used for carrying out feature extraction on the local face image by using a second convolutional neural network to obtain a sixth feature map;
The score prediction module is used for inputting the sixth feature map to at least one classification prediction branch to obtain a classification prediction score which is output by each classification prediction branch and aims at one face attribute in the local face image; wherein the face attribute comprises a skin attribute;
And the attribute analysis module is used for determining at least one face attribute in the face image according to the obtained at least one classification prediction score.
In one implementation of the skin analysis device 300, the feature extraction module performs feature extraction on the local face image using a second convolutional neural network to obtain a sixth feature map, including: extracting multi-level features of the local face image by using a plurality of convolution modules which are connected in sequence, and splicing the multi-level features into the sixth feature map; wherein the plurality of convolution modules includes a Block module in ResNet networks and a Inception Module module in GoogleNet networks.
In one implementation of the skin analysis device 300, the feature extraction module extracts a multi-level feature of the local face image by using a plurality of convolution modules connected in sequence, and splices the multi-level feature into the sixth feature map, including: extracting low-level features of the local face image by using at least one Block module in ResNet networks; extracting middle layer features of the partial face image based on the low layer features of the partial face image using at least one Inception Module module in the GoogleNet network; extracting, with at least one Inception Module module in a GoogleNet network, high-level features of the partial face image based on mid-level features of the partial face image; and splicing the low-layer features, the middle-layer features and the high-layer features into the sixth feature map.
In one implementation of the skin analysis device 300, the score prediction module inputs the sixth feature map to at least one classification prediction branch, obtains a classification prediction score for one face attribute in the partial face image output by each classification prediction branch, including: and the sixth feature map is processed through a full-connection layer and then is input into at least one classification prediction branch, and the input feature map is processed through the full-connection layer and a classifier which are sequentially connected in each classification prediction branch to obtain a classification prediction score which is output by each classification prediction branch and aims at one face attribute in the local face image.
In one implementation of the skin analysis device 300, the skin properties include: whether or not pouch is present, whether or not dark eye circles are present, whether or not various wrinkles are present, whether or not skin is oily, whether or not pores are large, and whether or not blackhead is present.
In one implementation of the skin analysis device 300, the face attribute further includes a face feature attribute including whether it is double eyelid.
In one implementation of the skin analysis device 300, the device further comprises:
And the general evaluation module is used for carrying out weighted calculation according to the skin property determined from the face image and the skin flaw detected from the face image to obtain the skin general score of the face image.
The skin analysis device 300 according to the embodiment of the present application has been described in the foregoing method embodiments, and for brevity, reference may be made to the corresponding details of the method embodiments where the device embodiments are not mentioned.
Fig. 8 shows a possible structure of an electronic device 400 provided by an embodiment of the present application. Referring to fig. 8, an electronic device 400 includes: processor 410, memory 420, and communication interface 430, which are interconnected and communicate with each other by a communication bus 440 and/or other forms of connection mechanisms (not shown).
The Memory 420 includes one or more (Only one is shown in the figure), which may be, but is not limited to, random Access Memory (RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), and the like. The processor 410, as well as other possible components, may access, read, and/or write data from, the memory 420.
The processor 410 includes one or more (only one shown) which may be an integrated circuit chip having signal processing capabilities. The processor 410 may be a general-purpose processor, including a Central Processing Unit (CPU), a micro control unit (Micro Controller Unit MCU), a network processor (NetworkProcessor NP), or other conventional processor; but may also be a special purpose processor including a Digital Signal Processor (DSP), application SPECIFIC INTEGRATED Circuits (ASIC), field programmable gate array (Field Programmable GATE ARRAY), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
Communication interface 430 includes one or more (only one shown) that may be used to communicate directly or indirectly with other devices for data interaction. Communication interface 430 may include an interface for wired and/or wireless communication.
One or more computer program instructions may be stored in memory 420 that may be read and executed by processor 410 to implement the skin analysis method provided by embodiments of the present application, as well as other desired functions.
It is to be understood that the configuration shown in fig. 8 is merely illustrative, and that electronic device 400 may also include more or fewer components than those shown in fig. 8, or have a different configuration than that shown in fig. 8. The components shown in fig. 8 may be implemented in hardware, software, or a combination thereof. The electronic device 400 may be a physical device such as a PC, a notebook, a tablet, a cell phone, a server, an embedded device, etc., or may be a virtual device such as a virtual machine, a virtualized container, etc. The electronic device 400 is not limited to a single device, and may be a combination of a plurality of devices or a cluster of a large number of devices.
The embodiment of the application also provides a computer readable storage medium, and the computer readable storage medium stores computer program instructions which execute the skin analysis method provided by the embodiment of the application when the computer program instructions are read and run by a processor of a computer. For example, a computer-readable storage medium may be implemented as memory 420 in electronic device 400 in FIG. 8.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
Further, the units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Furthermore, functional modules in various embodiments of the present application may be integrated together to form a single portion, or each module may exist alone, or two or more modules may be integrated to form a single portion.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (12)

1. A method of skin analysis, comprising:
Acquiring a skin image;
processing the skin image by using a semantic segmentation network to obtain a segmentation mask for the skin image, wherein the segmentation mask contains information for indicating the flaw category of each pixel in the skin image;
determining a connected domain formed by pixels belonging to the same type of skin flaws in the skin image according to the segmentation mask, and determining a positioning frame containing the skin flaws according to the connected domain, wherein the positioning frame is a minimum circumscribed rectangle of the connected domain;
The skin image is a face image, the method further comprising:
extracting key points from the face image;
Dividing the face image into a plurality of areas according to the extracted key points, wherein the face image in each area is a local face image;
Extracting features of the local face image by using a second convolutional neural network to obtain a sixth feature map;
inputting the sixth feature map to at least one classification prediction branch, and obtaining a classification prediction score of each classification prediction branch output for one face attribute in the local face image; wherein the face attribute comprises a skin attribute;
determining at least one face attribute in the face image according to the obtained at least one classification prediction score;
The step of extracting features of the local face image by using a second convolutional neural network to obtain a sixth feature map includes:
extracting multi-level features of the local face image by using a plurality of convolution modules which are connected in sequence, and splicing the multi-level features into the sixth feature map; wherein the plurality of convolution modules includes a Block module in ResNet network and a Inception Module module in GoogleNet network;
the method for extracting the multi-level features of the local face image by using the plurality of convolution modules connected in sequence and splicing the multi-level features into the sixth feature map comprises the following steps:
Extracting low-level features of the local face image by using at least one Block module in ResNet networks;
extracting middle layer features of the partial face image based on the low layer features of the partial face image using at least one Inception Module module in the GoogleNet network;
Extracting, with at least one Inception Module module in a GoogleNet network, high-level features of the partial face image based on mid-level features of the partial face image;
And splicing the low-layer features, the middle-layer features and the high-layer features into the sixth feature map.
2. The skin analysis method of claim 1, wherein the processing the skin image with a semantic segmentation network to obtain a segmentation mask for the skin image comprises:
extracting features of the skin image by using a first convolutional neural network to obtain a first feature map;
Carrying out multi-scale pooling on the first feature map to obtain a plurality of second feature maps with different scales;
Convolving each of the plurality of second feature maps to obtain a plurality of third feature maps with different scales;
up-sampling each third feature map in the plurality of third feature maps to obtain a plurality of fourth feature maps with the same scale, wherein the scale of the fourth feature map is the same as that of the first feature map;
And splicing the first feature map and the fourth feature maps to form a fifth feature map, and performing convolution processing on the fifth feature map to obtain the segmentation mask of the skin image.
3. The skin analysis method according to claim 1, wherein the segmentation mask comprises a channel, each pixel value in the channel taking an enumeration value, the enumeration value characterizing a defect class of a corresponding pixel in the skin image; or alternatively
The segmentation mask comprises a plurality of channels, wherein the pixel value in each channel is 0 or 1, and the ordinal number of each channel represents the flaw class of the corresponding pixel in the skin image.
4. The skin analysis method according to claim 1, wherein the determining a localization frame containing skin imperfections from the connected domain comprises:
determining the minimum circumscribed rectangle of the connected domain;
And when the area of the minimum circumscribed rectangle is smaller than a first preset threshold value and/or the length-width ratio of the minimum circumscribed rectangle is smaller than a second preset threshold value, determining the minimum circumscribed rectangle as a positioning frame containing skin flaws.
5. The skin analysis method of claim 1, wherein the flaw class comprises: background, acne, blemishes, nevi, and scars.
6. The skin analysis method according to claim 1, wherein the sixth feature map is input to at least one classification prediction branch, and a classification prediction score for one face attribute in the partial face image output by each classification prediction branch is obtained:
And the sixth feature map is processed through a full-connection layer and then is input into at least one classification prediction branch, and the input feature map is processed through the full-connection layer and a classifier which are sequentially connected in each classification prediction branch to obtain a classification prediction score which is output by each classification prediction branch and aims at one face attribute in the local face image.
7. The skin analysis method of claim 1, wherein the skin properties comprise: whether or not pouch is present, whether or not dark eye circles are present, whether or not various wrinkles are present, whether or not skin is oily, whether or not pores are large, and whether or not blackhead is present.
8. The skin analysis method of claim 1, wherein the face attribute further comprises a face feature attribute, the face feature attribute comprising whether it is double eyelid.
9. The skin analysis method of claim 1, wherein the method further comprises:
and carrying out weighted calculation according to the skin property determined from the face image and the skin defect detected from the face image to obtain the skin total score of the face image.
10. A skin analysis device, comprising:
the image acquisition module is used for acquiring skin images;
The semantic segmentation module is used for processing the skin image by utilizing a semantic segmentation network to obtain a segmentation mask aiming at the skin image, wherein the segmentation mask contains information for indicating the defect type of each pixel in the skin image;
The flaw positioning module is used for determining a connected domain formed by pixels belonging to the same type of skin flaws in the skin image according to the segmentation mask, and determining a positioning frame containing the skin flaws according to the connected domain, wherein the positioning frame is a minimum circumscribed rectangle of the connected domain;
The skin image is a face image, the apparatus further comprising:
The key point extraction module is used for extracting key points from the face image;
The region dividing module is used for dividing the face image into a plurality of regions according to the extracted key points, and the face image in each region is a local face image;
the feature extraction module is used for carrying out feature extraction on the local face image by using a second convolutional neural network to obtain a sixth feature map;
The score prediction module is used for inputting the sixth feature map to at least one classification prediction branch to obtain a classification prediction score which is output by each classification prediction branch and aims at one face attribute in the local face image; wherein the face attribute comprises a skin attribute;
The attribute analysis module is used for determining at least one face attribute in the face image according to the obtained at least one classification prediction score;
The feature extraction module performs feature extraction on the local face image by using a second convolutional neural network to obtain a sixth feature map, including: extracting multi-level features of the local face image by using a plurality of convolution modules which are connected in sequence, and splicing the multi-level features into the sixth feature map; wherein the plurality of convolution modules includes a Block module in ResNet network and a Inception Module module in GoogleNet network;
The feature extraction module extracts the multi-level features of the local face image by using a plurality of convolution modules which are connected in sequence, and splices the multi-level features into the sixth feature map, including: extracting low-level features of the local face image by using at least one Block module in ResNet networks; extracting middle layer features of the partial face image based on the low layer features of the partial face image using at least one Inception Module module in the GoogleNet network; extracting, with at least one Inception Module module in a GoogleNet network, high-level features of the partial face image based on mid-level features of the partial face image; and splicing the low-layer features, the middle-layer features and the high-layer features into the sixth feature map.
11. A computer readable storage medium, characterized in that it has stored thereon computer program instructions which, when read and executed by a processor, perform the method according to any of claims 1-9.
12. An electronic device comprising a memory and a processor, the memory having stored therein computer program instructions that, when read and executed by the processor, perform the method of any of claims 1-9.
CN202010562734.4A 2020-06-18 2020-06-18 Skin analysis method, device, storage medium and electronic equipment Active CN111860169B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010562734.4A CN111860169B (en) 2020-06-18 2020-06-18 Skin analysis method, device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010562734.4A CN111860169B (en) 2020-06-18 2020-06-18 Skin analysis method, device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN111860169A CN111860169A (en) 2020-10-30
CN111860169B true CN111860169B (en) 2024-04-30

Family

ID=72986863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010562734.4A Active CN111860169B (en) 2020-06-18 2020-06-18 Skin analysis method, device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111860169B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112581359B (en) * 2020-12-23 2023-06-09 Oppo(重庆)智能科技有限公司 Image processing method, device, terminal and storage medium
CN114983338A (en) * 2021-03-02 2022-09-02 华为技术有限公司 Skin detection method and electronic equipment
CN113139486A (en) * 2021-04-29 2021-07-20 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for processing image
CN113486768A (en) * 2021-07-01 2021-10-08 成都九章丽欣科技有限公司 Image recognition method for skin
KR102471441B1 (en) * 2021-12-20 2022-11-28 주식회사 아이코어 Vision inspection system for detecting failure based on deep learning
CN114596314A (en) * 2022-05-09 2022-06-07 合肥联亚制衣有限公司 Training method, device, equipment and medium for cloth flaw detection model
CN116993714A (en) * 2023-08-30 2023-11-03 深圳伯德睿捷健康科技有限公司 Skin detection method, system and computer readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106469302A (en) * 2016-09-07 2017-03-01 成都知识视觉科技有限公司 A kind of face skin quality detection method based on artificial neural network
CN108229296A (en) * 2017-09-30 2018-06-29 深圳市商汤科技有限公司 The recognition methods of face skin attribute and device, electronic equipment, storage medium
CN109447990A (en) * 2018-10-22 2019-03-08 北京旷视科技有限公司 Image, semantic dividing method, device, electronic equipment and computer-readable medium
CN110059635A (en) * 2019-04-19 2019-07-26 厦门美图之家科技有限公司 A kind of skin blemishes detection method and device
CN110148121A (en) * 2019-05-09 2019-08-20 腾讯科技(深圳)有限公司 A kind of skin image processing method, device, electronic equipment and medium
CN110472605A (en) * 2019-08-21 2019-11-19 广州纳丽生物科技有限公司 A kind of skin problem diagnostic method based on deep learning face subregion

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10325146B2 (en) * 2016-05-08 2019-06-18 Modiface Inc. Hierarchical differential image filters for skin analysis
US10354159B2 (en) * 2016-09-06 2019-07-16 Carnegie Mellon University Methods and software for detecting objects in an image using a contextual multiscale fast region-based convolutional neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106469302A (en) * 2016-09-07 2017-03-01 成都知识视觉科技有限公司 A kind of face skin quality detection method based on artificial neural network
CN108229296A (en) * 2017-09-30 2018-06-29 深圳市商汤科技有限公司 The recognition methods of face skin attribute and device, electronic equipment, storage medium
CN109447990A (en) * 2018-10-22 2019-03-08 北京旷视科技有限公司 Image, semantic dividing method, device, electronic equipment and computer-readable medium
CN110059635A (en) * 2019-04-19 2019-07-26 厦门美图之家科技有限公司 A kind of skin blemishes detection method and device
CN110148121A (en) * 2019-05-09 2019-08-20 腾讯科技(深圳)有限公司 A kind of skin image processing method, device, electronic equipment and medium
CN110472605A (en) * 2019-08-21 2019-11-19 广州纳丽生物科技有限公司 A kind of skin problem diagnostic method based on deep learning face subregion

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
IMAGE SEGMENTATION FOR SKIN DETECTION;Sajaa G. Mohammed 等;《西南交通大学学报》;20200228;第1-11页 *
Pyramid Scene Parsing Network;Hengshuang Zhao 等;《2017 IEEE Conference on Computer Vision and Pattern Recognition》;第6230-6239页 *
基于Mask R-CNN的人脸皮肤色斑检测分割方法;陈友升等;《激光杂志》;20191225(第12期);第23-26页 *
结合上下文特征与CNN多层特征融合的语义分割;罗会兰等;《中国图象图形学报》;20191216(第12期);第148-157页 *

Also Published As

Publication number Publication date
CN111860169A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN111860169B (en) Skin analysis method, device, storage medium and electronic equipment
US11887311B2 (en) Method and apparatus for segmenting a medical image, and storage medium
WO2021169128A1 (en) Method and apparatus for recognizing and quantifying fundus retina vessel, and device and storage medium
CN109166130B (en) Image processing method and image processing device
CN109886933B (en) Medical image recognition method and device and storage medium
Cheng et al. Discriminative vessel segmentation in retinal images by fusing context-aware hybrid features
CN107945173B (en) Skin disease detection method and system based on deep learning
CN110490850B (en) Lump region detection method and device and medical image processing equipment
US20210118144A1 (en) Image processing method, electronic device, and storage medium
CN109584209B (en) Vascular wall plaque recognition apparatus, system, method, and storage medium
CN112017185B (en) Focus segmentation method, device and storage medium
CN109389129A (en) A kind of image processing method, electronic equipment and storage medium
US11967181B2 (en) Method and device for retinal image recognition, electronic equipment, and storage medium
CN109345540B (en) Image processing method, electronic device and storage medium
Hsu et al. Chronic wound assessment and infection detection method
CN109389562A (en) Image repair method and device
WO2016032398A2 (en) Method and device for analysing an image
Rajathi et al. Varicose ulcer (C6) wound image tissue classification using multidimensional convolutional neural networks
KR102172192B1 (en) Facial Wrinkle Recognition Method, System, and Stroke Detection Method through Facial Wrinkle Recognition
Tavakoli et al. Unsupervised automated retinal vessel segmentation based on Radon line detector and morphological reconstruction
CN114724258A (en) Living body detection method, living body detection device, storage medium and computer equipment
CN117274278B (en) Retina image focus part segmentation method and system based on simulated receptive field
Shih A precise automatic system for the hair assessment in hair‐care diagnosis applications
Liu et al. Feature pyramid U‐Net for retinal vessel segmentation
CN113781387A (en) Model training method, image processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant