CN116964642A

CN116964642A - System and method for acne counting, localization and visualization

Info

Publication number: CN116964642A
Application number: CN202180066794.XA
Authority: CN
Inventors: 帕汉姆·阿拉比; 玉泽·张; 蒋若玮
Original assignee: Ba Lioulaiya
Current assignee: Ba Lioulaiya
Priority date: 2020-10-02
Filing date: 2021-10-01
Publication date: 2023-10-27

Abstract

Systems, methods, and techniques provide for localization, counting, and visualization of acne. The images are processed using the trained models to identify objects. The model may be a deep learning (e.g., convolutional neural) network configured to classify objects by focusing on detection of small objects. The image may be an end-to-end processed frontal or side facial image. The model identifies and locates different types of acne. Examples are counted and visualized, for example by annotating the source image. Example annotations are overlays identifying the type and location of each instance. Counting by acne type aids in scoring. Products and/or services may be recommended in response to the identification (e.g., type, location, count, and/or score) of acne.

Description

System and method for acne counting, localization and visualization

Cross reference

The present application claims the national benefits of U.S. provisional application number 63/086,694, filed on month 10, 2 of 2020, and priority in other instances, the entire contents of which are incorporated herein by reference where permitted. The application also claims priority from french application number FR 2013002 filed on 12/10 2020, the entire contents of which are incorporated herein by reference where permitted.

Technical Field

The present application relates to the fields of computer image processing, convolutional neural networks, and dermatology, and more particularly, to systems and methods for acne counting, localization, and visualization, and electronic commerce systems and methods utilizing the same.

Background

Skin conditions such as acne often affect the face and may occur elsewhere in the body. In the case of acne grease or dead skin cells blocking hair follicles, various types of blemishes can result. Although common to adolescents, it also occurs in people of other age groups. Given a facial image, the acne localization task aims to detect whether there is any acne in the facial portrait.

Acne localization is useful in downstream applications related to dermatology and image visualization.

Disclosure of Invention

According to embodiments, systems, methods, and techniques for acne localization, counting, and visualization are provided. The images are processed using the model to identify (classify) objects. In one embodiment, the model is a Convolutional Neural Network (CNN) configured to classify objects by focusing on detection of small objects. In one embodiment, the image is a facial image in face or side mode processed end-to-end by CNN, requiring cropping. In one embodiment, the model identifies and locates different types of acne. Acne instances are counted (e.g., by type) and visualized, such as by annotating the source image. Example annotations are overlays identifying the type and location of each instance. Counting by acne type aids in scoring. In one embodiment, products and/or services may be recommended in response to the identification (e.g., type, location, count, and/or score) of acne. In one embodiment, the purchase is facilitated.

In one embodiment, a method is provided comprising: analyzing the source images to determine respective locations of acne instances; and visualizing the acne instances on the source image for display; wherein the source image is analyzed using a model configured to: detecting at least one type of acne in the image; and focus on detecting small objects in the image.

In one embodiment, the model is a deep-learning neural network model configured for object classification and localization.

In one embodiment, the model is configured to process the image on a pixel level, operating end-to-end to directly detect acne locations without cropping the source image.

In one embodiment, the model generates a corresponding anchor box that provides location information for each acne instance detected.

In one embodiment, an anchor frame aspect ratio for defining one of the anchor frames is calculated from acne instances identified in the image dataset using k-means clustering.

In one embodiment, the model is block-based and the method includes providing a block of skin from the source image to the model for processing to detect an acne instance.

In one embodiment, the model is block-based and the method includes providing a block of skin from the source image to the model for processing to detect an acne instance, and wherein the block is determined from the skin mask.

In one embodiment, the visual acne instances indicate the respective locations of each instance on the source image.

In one embodiment, visualizing acne indicates the corresponding acne type for each instance on the source image.

In one embodiment, the at least one type of acne comprises one or more of persistent acne, inflammatory acne, and pigmented acne.

In one embodiment, the method includes determining a count of instances of acne.

In one embodiment, the method includes obtaining a recommendation for a product and/or service specific to an instance of treating acne.

In one embodiment, the method includes communicating with an electronic commerce system to purchase products and/or services.

In one embodiment, the source image includes a facial image in a front mode or a side mode.

In one embodiment, the method includes acquiring a source image.

In one embodiment, a method is provided comprising: analyzing the source images to determine respective locations of acne instances; and generating and providing a counted acne score in response to the instance; wherein the source image is analyzed using a model configured to: performing focus detection on a small object in an image; and detecting at least one type of acne in the image.

In one embodiment, the acne score is responsive to one or more of the location, count, and type of acne.

In one embodiment, the method includes generating a recommendation for one or more of a product and a service specific to the treatment of the acne instance.

In one embodiment, the recommendation is generated in response to a factor selected from the group consisting of: acne type, count by type, score by type, location of acne, location of purchaser, delivery location, regulatory requirements, counter indication, gender, co-recommendation, likelihood of user following the guideline of use, and likelihood of user following the guideline of use.

In one embodiment, the method includes visualizing an acne instance on a source image.

In one embodiment, a computing device is provided that includes circuitry configured to perform a method according to any of the embodiments.

In one embodiment, a computing system is provided that includes circuitry configured to provide: an interface for receiving a source image and returning an annotated source image that visualizes an acne instance determined by a model configured to process the source image; wherein the model is configured to: performing focus detection on a small object in an image; and detecting at least one type of acne in the image.

In one embodiment, a computing system is configured to provide: a recommendation component configured to recommend products and/or services specific to treating at least some instances of acne; and an e-commerce transaction component to facilitate purchase of products and/or services.

In one embodiment, the model includes one of: a block-based model configured to receive a block of skin from the source image for processing to detect an acne instance, the block determined from the skin mask; and a single detection layer model configured to output a plurality of predictions, including a three-dimensional tensor encoding bounding box, an objectivity, and a category prediction, and wherein the plurality of predictions are filtered to filter redundant detection of the same acne instance.

Drawings

Figures 1A, 1B, and 1C are facial images using visual annotations to display acne examples by type and location, according to an embodiment.

Fig. 2 is an illustration of an example of a facial marker-based skin mask for use in processing facial images, according to one embodiment.

Fig. 3 is an illustration of a facial image with acne visualization and mask visualization, according to one embodiment.

FIG. 4 is an e-commerce network diagram illustrating a client computing device configured to detect instances of acne in source face images and obtain recommendations for product and/or service recommendations and purchase them, according to one embodiment.

FIG. 5 is a block diagram of a computing device according to one embodiment.

FIG. 6 is a flow chart of operations according to one embodiment.

Detailed Description

According to one embodiment, a model configured to process source images to detect instances of acne is shown and described. The model is configured to determine a corresponding location of the acne instance, performing an acne localization task. In one embodiment, the model is configured to return bounding boxes or coordinates. In one embodiment, the model is further configured to classify the instances as a particular type of acne. In one embodiment, the source image is a full face image.

In one embodiment, the model is a deep learning model. In one embodiment, the model has a focus on small objects.

In one embodiment, the model is end-to-end and operates on a pixel level, meaning that the model processes the entire image (e.g., face) and detects acne directly on the image without any clipping.

In the following description, various embodiments of a model for processing a source image are described, wherein a first set of embodiments ("first model embodiment of a first model type") detects a deep learning network based on modified YOLO objects, and a second set of embodiments ("second model embodiment of a second model type") is based on a block-based deep learning network. (see J. Redmon and A.Farhadi, YOLOv: incremental improvements, arXiv:1804.02767, URL 4.8.2018: arXiv. Org/abs/1804.02767, incorporated herein by reference where permitted, hereinafter referred to as "YOLOv 3").

It should be appreciated that the various features described in association with embodiments of one of the model types apply to embodiments of the second model type. By way of example, but not limitation, the localization of acne, detection by type of acne, acne count (e.g., by type), and acne visualization features apply to both model types. In embodiments, the first and second model embodiments are useful for processing a complete facial image, whether a frontal image or a side image, without cropping. In embodiments, the first and second model embodiments are for processing a complete facial image guided by a skin mask generated from the facial image. While some of these features relate to the operation of embodiments of the model itself, such as acne detection features (e.g., localization and classification), other features (e.g., counting, visualization, etc.) are operations associated with processing the output from the model. These and other features are common among model embodiments.

First model embodiment

Fig. 1A, 1B, and 1C illustrate examples of acne visualization after analyzing respective source images using a model, wherein three types of acne are detected and visualized, according to one embodiment. Types include persistent acne, inflammatory acne, and pigmented acne. The type of acne that can be detected depends on the training data used in training the model. This means that the model itself has the extensibility to learn and detect other types of acne if applicable training data is provided. It should be appreciated that in one embodiment, fewer types of acne may be trained for detection.

As shown, the model also supports acne detection on the three facial views entered (left side/side, right side/side and front mode), according to one embodiment. In this embodiment, the model is view-invariant and can process images from any view/mode.

Fig. 1A, 1B, and 1C are Black and White (BW) simulation images 100, 102, and 104, showing annotation source images visualizing an example of acne, according to one embodiment. The images 100, 102 and 104 are converted from original color images and edited for patent application compliance purposes, although in practice color images are used in embodiments, such as rendering a more realistic image of an object. Image 100 is a left (side) view, image 102 is a right (side) view, and image 104 is a front view.

In this embodiment, the three-view annotation is purposely focused on a particular portion of the face on each view. For example, the frontal view 104 annotates acne around the mouth and in the center of the image, as they are more apparent in this view. Similarly, side views 100 and 102 annotate acne surrounding the temple, while front view 104 does not. In this embodiment, the training data is defined (labeled) in such a way for each view, and the model is trained to identify data about these locations accordingly.

In this embodiment, instances are labeled with a type 106, a detection confidence metric (e.g., a numerical value) 108, and a bounding box 110. Acne may be in close proximity on the face, so that when visualized at a particular scale, the labels may overlap. In one embodiment, (although not shown in the BW example) different colors of labels may be used or otherwise to distinguish acne types. Other label types (e.g., color-coded bounding boxes without text) may be used, etc. In this example, acne types are described as "infl" (inflammatory), "rete" (retentive), and "pigm" (pigmentary). With respect to the detection confidence measure, the value is in the range of 0 to 1, such that a measure of 0.58 indicates a confidence of 58% that the detected instance is acne. In this embodiment, further filtering is used to select among the detected acne instances for visualization. For example, a 50% threshold is used to visualize instances at or above the threshold. The threshold may vary.

First model type structure

In one embodiment, the main architecture of the model is based on the YOLOv3 object detection algorithm.

Bounding box prediction (position)

Briefly, YOLOv3 predicts bounding boxes using dimension clusters as anchor boxes, predicts four coordinates for each bounding box (t _x ，t _y ，t _h ，t _w ) (e.g., sometimes referred to herein as "bbox"). The box may be offset from the upper left corner of the image (c _x ，c _y ). The width and height of the box are predicted as an offset from the cluster centroid (σ (t) _x ),σ(t _y )). The position of the center coordinates of the box relative to the filtering application is predicted using a sigmoid function. The sum of squares of the error losses is used during training.

YOLOv3 describes predicting the objectivity score of each bounding box using logistic regression. If the overlap of the previous bounding box with the ground truth (ground true) object exceeds any other previous bounding box, the score should be 1. If the previous bounding box is not the best, but does overlap the ground truth object by more than a certain threshold, then the prediction is ignored using a threshold of 0.5. In YOLOv3, each ground truth object is assigned a previous bounding box. If no previous bounding box is assigned to the ground truth object, it does not result in a loss of coordinate or class prediction, but only in a loss of objectivity.

Category prediction (e.g., object prediction or acne type)

The multi-label classification is used for class prediction for each box via a separate logical classifier. Class prediction is trained using binary cross entropy loss.

YOLOv3 describes three-scale frame predictions. Several convolution layers are added from the basic feature extractor. The last predicted three-dimensional tensor encodes bounding boxes, objectivity, and class predictions. Thus, for 3 boxes per scale, the tensor is nxnx [3 x (4+1+80) ] for 4 bounding box offsets, 1 object prediction, and 80 class predictions.

The feature map from the previous 2 layers is then 2 x upsampled and the early feature map is combined with the upsampled features using a cascade. Several convolution layers are added to process the combined feature map. Similar tensors are predicted to be twice as large. Again a similar approach is used to predict a third tensor for a third scale, in combination with previous calculations and fine-grained features from earlier processing. k-means clustering is used to determine the previous bounding box, where 9 clusters and 3 scales are arbitrarily chosen and the clusters are equally divided in scale.

The backbone network described in YOLOv3 for feature extraction is a 53-layer convolutional network named DARKNET-53.

Modification of

According to embodiments herein, the model disclosed in YOLOv3 is modified to focus on smaller objects by:

reducing the size of the backbone network;

only one YOLO detection layer is retained and the anchor frame size (aspect ratio) and is recalculated

For the current acne localization task, fine-tuning the model to the best performance on the training dataset.

In the case where only one YOLO detection layer is retained, prediction is performed on one scale level in the embodiment. Thus, in this embodiment, the YOLO detection layer is a layer that processes predictions of bounding boxes. Which receives characteristics (e.g., output of the backbone network layer) and outputs objectivity, class prediction, and bbox location. The layer itself has no limit on the number of predictions. In one embodiment, during inference, the layer will output a large number of prediction frames. Non-maximum suppression (NMS) and manually set confidence thresholds are used to filter it.

In this embodiment, filtering using the NMS is used at least in part to filter redundant detection of the same acne instance. When the model predicts two or more boxes on the same acne, the NMS filters out instances with lower confidence and retains the highest confidence box on the particular instance.

In one embodiment, the model is trained to predict three classes of objects for the respective 3 acne types. There is no additional or background class because YOLO does not make background predictions (e.g., YOLOv3 does not draw a bounding box over the background and indicates it as background YOLOv3 predicts only the target class). The number and type of classes can be easily adjusted to accommodate the format of the data. For example, if more classes of objects (e.g., acne or other objects) are identified in the data, the model may be adjusted and trained to predict the more classes.

With respect to tuning, in one embodiment, various operations are performed, including: 1) Dense data enhancement, comprising: converting the image to HSV color space and adding random color dithering (randomly changing values over a range) over the saturation and value channels; performing random affine transformation on the image; 2) Using a multi-step learning rate scheduler; and 3) using an evolutionary algorithm, evolving the hyper-parameters from the above by training multiple rounds and iteratively selecting the best settings.

In one embodiment, the anchor frame aspect ratio is calculated using k-means clustering (e.g., annotated by a human reviewer (e.g., expert)) of acne instances identified in the (training) image dataset, as described in YOLOv 3. That is, the anchor boxes are clustered according to aspect ratio using a k-means clustering operation. In YOLOv3, the anchor box is used to guide bounding box prediction. In other approaches, the operation may find blocks of the image that may contain the target object, and then predict bounding boxes around the object. But this is a computationally demanding task.

The understanding in the art has evolved such that it is determined that it is not necessary to predict the original box, as the dataset should have some inherent characteristics. For example, an automobile dataset should always have a long rectangular bbox, while a facial dataset should have a nearly square bbox. Using these principles, the image dataset is evaluated to cluster the size of bbox and find some common size that is shared. These dimensions are used as anchor boxes. Then, the model only needs to predict the offset of the anchor frame, saving a lot of calculation work. In one embodiment, the algorithm used follows the same method. The algorithm first obtains all bounding boxes from the training annotations, extracts and formats these boxes into (width, height) data pairs, and then uses k-means (clustering technique that partitions the data into k individual clusters) to find cluster centroids. The number of clusters (k) is manually set, which should be consistent with the YOLO layer receiving a certain number of anchor boxes. In one embodiment, k=5 for the model.

Table 1 shows the model structure by layer after modification as thus described, according to one embodiment.

TABLE 1

The model itself cannot predict the total number of object instances (e.g., acne) directly by class or aggregate. Thus, for example, the per-class or aggregate count may be determined by counting acne instances by class, respectively, from predictions (outputs) of the model, such as after filtering the predictions by using non-maximal suppression.

The model may be evaluated using a comparison of the count determined from the prediction and the count based on ground truth. Common object detection metrics depend on how the counts of hits and misses are defined. A threshold value of the overlap ratio (cross ratio) is generally used, but the threshold value may be arbitrarily set to be high or low, which significantly affects the digital accuracy.

In one embodiment, counts are generated and used as a way to evaluate how the model works.

In one embodiment, counts are generated from inferred time usage of models for downstream tasks. Instead of (or in addition to) predicting the exact location of acne, the API of the model may be configured to determine the count. The downstream application may use the count or even score based on the predictive count for tasks such as skin condition severity estimation, product and/or service recommendation, and the like.

Data set

A facial image dataset containing 348 facial images of 116 different subjects (3 views per subject) was collected. A group of dermatologists labels these images and identifies any visible acne that may be of any of three types: inflammatory acne, persistent acne, and pigmented acne. The identified acne is then saved in the format of center point coordinates for defining ground truth values for training and testing, etc.

In one embodiment, a group of three dermatologists (specialists) examine and annotate the image set. Each expert annotates all images and identifies all three types of acne. In one embodiment, the annotated data (three versions of the annotation) is merged according to the following steps:

all three versions are combined. The total number of boxes after this step is a simple addition of the three originally annotated boxes.

Redundancy boxes are filtered based on logic similar to NMS. That is, if IoU (cross ratio) of two boxes is greater than a certain threshold, a larger box is used as Ground Truth (GT). If multiple boxes overlap each other and all have a large IoU, the largest box is used as GT.

Evaluation of

The performance of acne localization is measured based on common object detection metrics. It includes precision (P), recall (R), mAP and F1 scores. Note that the mAP and F1 scores are direct formulas based on precision and recall. Performance assessment of localization tasks is shown in table 2.

Category(s)	Image processing apparatus	Target object	P	R	mAP	F1
							All of which	69	1860	0.225	0.568	0.313	0.322
Pigmentary nature	69	796	0.233	0.531	0.267	0.324
							Inflammatory diseases	69	275	0.246	0.636	0.443	0.355
Retention of the particles	69	784	0.197	0.536	0.23	0.288

TABLE 2

The following abbreviations for the type of acne evaluated in the current model are used: pigm=pigment, inf=inflammatory, rete=retentive, where p=precision, r=recall (Recall).

The same model was also evaluated in terms of counting tasks, the results are shown in table 3 below:

	pigmentary nature	Inflammatory diseases	Retention of the particles
				<5％	0.50725	0.82609	0.56522
Errors divided by category	6.058	2.4638	5.2319
				Average error	4.584541063

TABLE 3 Table 3

The calculated error is the absolute error (|# prediction- # ground truth|). The metrics evaluated are <5% (i.e., what is the ratio of the absolute error <5 shown in the test case.

Although the above embodiments are described with respect to processing an entire facial image, in one embodiment, for example, facial markers may be used to define a skin mask to indicate the portion of the image in which the skin is depicted. The above-described first model embodiment may process the skin portion of the facial image as indicated by the skin mask.

In one embodiment, the operation may reject images that do not include faces and not process the images for acne localization and counting. In one embodiment, for example, the operation may process the image to locate rejected facial markers, or further process the image accordingly.

Second model embodiment-Block-based acne localization

One possible disadvantage of the Yolo-based method of the first embodiment described above is that noise annotations can lead to weak results. Accordingly, provided herein is a second model embodiment according to a block-based method of locating acne spots. In one embodiment, a block-based method is applied to the entire facial image. In one embodiment, a block-based method is applied using a skin mask to process one or more portions of a facial image.

For example, in an embodiment of the orientation mask, at inference time, the facial marker detector pre-processes a source image (e.g., a self-captured image) of the face to define a skin mask of the face. An example of a logo based skin mask 200 is shown in fig. 2, where the mask portion is white, showing the area for treating acne, and the non-mask portion is black, showing the untreated area. In this example, mask 200 shows an upper facial component 202 for the forehead, a T-line component 204 above the bridge of the nose, and a lower facial component 206 for the examination, mandible, and chin, with portions for the eye region (e.g., 208), lower nasal region 210, and lip region 212 omitted. The background around the face is also omitted from the mask.

At the extrapolated time, in mask-related embodiments, blocks of resolution are created within the generated skin mask by scanning from top left to bottom right in steps of one third (1/3) of the block width. The block width is normalized by dividing it by 15, which is defined as the width of the face. Each block passes through a trained convolutional neural network (i.e., the second model embodiment) described below. In one embodiment, the model outputs a list of probabilities for the following categories: inflammatory, retentive, pigmentary (three acne types or categories) and healthy skin. In one embodiment, a non-maximum suppression (NMS) is applied to select the best box among the returned detected candidate acne.

In non-mask related embodiments, similar operations are performed using blocks to process the entire facial image, including, for example, non-skin portions such as the background, hair, eyes, and lips. For example, at extrapolated time, in a non-mask related embodiment, a block of resolution is created within the entire image by scanning from top left to bottom right, in steps of one third (1/3) of the block width as described. Each block passes through a trained convolutional neural network (i.e., the second model embodiment) described below. In one embodiment, the model outputs a list of probabilities for the following categories: inflammatory, retentive, pigmentary (three acne types or categories) and healthy skin. In one embodiment, a non-maximum suppression (NMS) is applied to select the best box among the returned detected candidate acne. Healthy skin categories provide the classifier with options beyond three acne categories.

In one embodiment, the selected box is used to perform acne counting and acne visualization using the source image. In one embodiment, a filter is applied to select between those instances of the best box of detected acne using a threshold confidence metric. For example, detected instances at or above the threshold (or only those instances above the threshold) are counted and/or visualized.

Fig. 3 is an illustration of a screen shot 300 showing a source face image 302 annotated to visualize acne (e.g., 304, 306, and 308), according to one embodiment. Screenshot 300 also shows a source image annotated to display an applicable skin mask 310 (here outlined with a dashed line) determined for the face. In fig. 3, the acne is visualized using a circle around the center point of the box of each detected acne instance to be visualized. In one embodiment, in practice, the circles are different colors or different gray values to distinguish 3 acne categories. In this illustration, a solid line pattern (e.g., 304) or one of two dashed line patterns (e.g., 306 and 308) is used to distinguish circles. In one embodiment, the skin mask is not visualized (not shown). The skin mask may not be visualized because one is not used, or because it is used, but it is not desired to be displayed.

Second model type structure and training

In one embodiment, the block-based model includes a residual network backbone (e.g., resNet-50, with 50 neural network layers) and three fully connected layers, where the rectifying linear activation functions (e.g., leakyReLU (leakage rectifying linear units)) are interleaved between adjacent Fully Connected (FC) layers. The last layer is shown as FC1- > LeakyRelu1- > FC2- > LeakyRelu2- > FC3. (see He, kaiming; zhang, xiangyu; ren, shaoqing; sun, jian (2015-12-10), "deep residual learning for image recognition". ArXiv:1512.03385, incorporated herein by reference, where allowed).

To create a block-based model dataset, according to one embodiment 2450 healthy blocks and 3577 acne blocks (blocks include examples of any of the three acne categories described above) are sampled, respectively, from the full image of the dataset described above with reference to the first model embodiment. During the training phase, the network is trained with standard cross entropy loss functions, and data enhancements include random affine transformation, random horizontal flipping, and channel shuffling.

In one embodiment, the front image is used for masking and annotation.

To process the source image to detect acne instances by type, according to embodiments in which masks are used, the source image is processed using logo detection to define an applicable mask, defining a facial region with skin for analysis. A plurality of overlapping blocks (responsive to block size and step size) are extracted from the skin image according to an applicable mask and processed by the model. The model generates a probability list of the following categories: inflammatory, retentive, pigmentary (three acne types or categories) and healthy skin. The mask instructs the operation to ignore any eyes, lips, hair and background in the source image. Multiple detection instances of the same acne are filtered to select a better/best instance (e.g., using NMS). A second filter is applied to select the detected instance having at least a minimum confidence level.

Similarly, in embodiments using the second model type, the detected acne instances are counted by type, as described with reference to the embodiments of the first model. In one embodiment, a score is calculated. In one embodiment, the detected acne is visualized in association with the source image. In one embodiment, a mask is visualized in association with the source image, showing which skin areas were processed.

Downstream application

The following description relates to any one of the first model embodiment and the second model embodiment. Thus, according to an embodiment, the analysis operation utilizes a model (e.g., skin on a face, e.g., in a frontal or lateral mode) that may be configured to output the location of the acne instances detected in the source image, wherein different types of acne (categories of subjects) may be distinguished by the model. For example, an analysis or other determination operation may count instances to determine each type of aggregate count and/or count.

The location, type, and/or count define acne data that is useful in downstream applications. In one embodiment, the acne instances are visualized on a source image or display using a visualization operation. In one embodiment, the source image is annotated (e.g., using overlay or other techniques) to indicate location. In one example, the location is a corresponding location for each instance on the source image. In one embodiment, the visualization operation indicates a respective acne type for each instance on the source image. In one embodiment, the visualization operation provides for the display of the count.

Acne data (e.g., count and type (classification)) helps predict skin condition scores during surgery. For example, more persistent acne and less inflammatory acne indicates a lighter condition and vice versa.

In one embodiment, the operation recommends products and/or services for acne based on the acne data and/or information derived therefrom (e.g., scores). According to one embodiment, an operation (e.g., an e-commerce operation) facilitates a purchase.

In one embodiment, the visualization operation directs the application of the product, e.g., using the location to indicate the location of the application. In one example, the source image is used in the visualization to provide a user-specific tutorial via a Graphical User Interface (GUI).

For practical applications of the model and its output, various examples and computing systems are contemplated. In one example, a computer network includes at least one client computing device (e.g., for a dermatologist or consumer) and a network-based e-commerce computing system. In this embodiment, the client computing device is able to determine instances of acne in the source image and obtain recommendations for product and/or service recommendations and purchase them.

The model for analyzing the source image and providing acne data as output may be configured to execute when run on a user device or server or other remote device. The application may be configured to obtain a source image and use the model to generate acne data. The acne data may be used to provide one or more of a diagnosis (e.g., acne score), visualization of acne on the source image, and a recommendation of a product or service specific to treating acne.

FIG. 4 is a block diagram of an example computer network 400 in which a personal-use computing device 402 operated by a user 404 communicates with remotely located server computing devices (i.e., server 408 and server 410) via a communication network 406. In one embodiment, the user 404 is a consumer. Also shown is a second user 412 and a second computing device 414 configured for communication via the communication network 406. In one embodiment, the second user 410 is a dermatologist. In one embodiment, server 408 is configured to provide an instance of model (418) to process the image to generate acne data and recommendations for a product and/or service to treat acne. In one embodiment, server 408 generates acne scores and recommendations from the received acne data (e.g., without processing the image itself). In one embodiment, server 408 generates acne data including location data for each instance of detected acne, for example, for visualizing acne. In one embodiment, server 408 generates a visualization to provide to another device for display. In one embodiment, server 408 generates acne data including a count by type (e.g., no visualization) to generate a score.

In one embodiment, the server 408 generates recommendations (e.g., recommending products, services (which may include both therapeutic practitioner/service provider or products and services) using rules or models or other means.) in one embodiment, the corresponding products are associated with the corresponding treatment plan/instructions for use, in one embodiment, in response to factors of acne type and generate the recommendations by type count or score.

In one embodiment, server 410 provides an e-commerce interface to purchase products and/or services recommended by server 408, for example.

In one embodiment, computing device 402 is for personal use by user 404 and is not available to the public. However, services from the server are available to the public. Here, the public includes registered users and/or clients, and the like. A publicly available computing device 416 (which may be located at a brick-and-mortar store, for example) is also coupled to network 406.

The computing device 402 is configured to perform acne localization, etc., as described herein, i.e., evaluating acne locations and determining counts, etc. In this embodiment, a model (CNN) 418 is stored and utilized on the on-board computing device 402. In this embodiment, a second instance of model 418 is stored at server 408 and provided for use by other computing devices, e.g., via a cloud service, web service, etc., for analyzing images received from computing devices (e.g., 416, etc.).

For example, the computing device 402 is configured to communicate with the server 408 to provide acne data (which may include scoring data) and receive product/service recommendations in response to the acne data and/or other information about the user (e.g., age, gender, etc.). Computing device 402 (or server 408 on behalf thereof) is configured to communicate with server 410 to obtain e-commerce services to purchase recommended products and/or services.

Computing device 402 is shown as a handheld mobile device (e.g., a smartphone or tablet). However, the physical device may be another computing device, such as a laptop computer, desktop computer, workstation, or the like. Acne localization and counting, etc., described herein may be implemented on other computing device types. According to an example, computing devices 402, 414, and 416 may be configured using one or more native applications or browser-based applications, for example.

In this embodiment, computing device 602 comprises a user device, for example, to obtain one or more images (e.g., pictures of skin, particularly faces) and process the one or more images to generate corresponding acne data, etc. Such an activity is called performing skin diagnosis. The skin diagnosis may be performed in association with a skin treatment plan, wherein images are periodically acquired and analyzed to determine skin scores, such as acne as described. The scores may be stored (locally, remotely, or both) and compared between sessions, for example, to display trends, improvements, etc. The skin score and/or skin image may be accessed by a user 404 of the computing device 402 and made available (e.g., via a server 408 or otherwise (electronically) communicated via a communication network 406) to another user (e.g., a second user 412, such as a dermatologist) of the computer system 400. The second computing device 414 may also perform the described skin diagnostics. Which may receive images from a remote source (e.g., computing device 402 or server 408, etc.) and/or may capture images via an optical sensor (e.g., camera) coupled thereto or in any other manner. As described, the model 418 may be stored and used from the second computing device 414 or from the server 408.

An application may be provided to perform skin diagnostics, suggest one or more products, and monitor skin changes after one or more product applications (which may define a treatment phase in a treatment plan) over a period of time. The computer application may provide a workflow such as a series of instructional Graphical User Interfaces (GUIs) and/or other user interfaces that are typically interactive and receive user input to perform any of the following activities:

skin diagnostics, such as acne;

product recommendations, such as treatment plans;

product procurement or other acquisition;

alert, instruct and/or record (e.g. log) the product application of the corresponding treatment phase;

subsequent (e.g., one or more follow-up) skin diagnosis; and

presenting results (e.g., comparison results);

for example, monitoring the progress of a skin treatment plan according to a treatment plan schedule. Any of these activities may generate data that may be stored remotely, such as for viewing by user 412, for viewing by another person, for aggregation of data with other users (e.g., to measure treatment plan efficacy in an aggregate), and so forth.

The comparison results (e.g., previous and subsequent results) may be presented via computing device 402, whether during treatment planning and/or upon completion of treatment planning, etc. As noted, various aspects of skin diagnostics may be performed on computing device 402 or by a remote coupling device (e.g., a server in the cloud or another arrangement).

Fig. 5 is a block diagram of a computing device 402 in accordance with one or more aspects of the present disclosure. Computing device 402 includes one or more processors 502, one or more input devices 504, gesture-based I/O devices 506, one or more communication units 508, and one or more output devices 510. Computing device 402 also includes one or more storage devices 512 that store one or more modules and/or data. According to one embodiment, the modules include a model 418, an application 516 with components for a graphical user interface (GUI 518) and/or workflow for therapy monitoring (e.g., therapy monitor 520), an image acquisition 522 (e.g., interface), and a therapy/product selector 530 (e.g., interface). The data may include one or more images (e.g., image 524) for processing, diagnostic data 526 (e.g., acne data, corresponding scores, ethnicity, gender, or other user data), treatment data 528 (e.g., log data related to a particular treatment), a treatment plan with a schedule (e.g., for reminders), and so forth.

Application 516 provides functionality to acquire one or more images (e.g., video) and process the images to determine a skin diagnosis of the deep neural network provided by model 418.

Storage device 512 may store additional modules, such as an operating system 532 and other modules (not shown) including communications modules; a graphics processing module (e.g., a GPU for the processor 502); a map module; a contact module; a calendar module; a photo/gallery module; photo (image/media) editing; a media player and/or streaming media module; social media applications; a browser module; etc. Herein, a memory device may be referred to as a memory unit.

Communication channel 538 may couple each of components 502, 504, 506, 508, 510, and 512 with any of the modules (e.g., 418 and 516) for inter-component communication, whether communicatively, physically, and/or operatively. In some examples, communication channel 338 may include a system bus, a network connection, an interprocess communication data structure, or any other method for communicating data.

The one or more processors 502 may implement functions and/or execute instructions within the computing device 402. For example, the processor 502 may be configured to receive instructions and/or data from the storage device 512 to perform the functions of the modules shown in fig. 5, etc. (e.g., operating system, application programs, etc.). Computing device 402 may store data/information to storage device 512. Some functions are described further below. It should be appreciated that operations may not fall entirely within modules 418 and 516 of fig. 5, such that one module may assist in the functionality of another module.

The computer program code for carrying out operations may be written in any combination of one or more programming languages, such as an object oriented programming language, such as Java, smalltalk, C ++ or the like, or a conventional procedural programming language, such as the "C" programming language or similar programming languages.

The computing device 402 may generate output for display on a screen of the gesture-based I/O device 506, or in some examples, for display by a projector, monitor, or other display device. It will be appreciated that gesture-based I/O device 506 may be configured using a variety of techniques (e.g., with respect to input capabilities: resistive touch screen, surface acoustic wave touch screen, capacitive touch screen, projected capacitive touch screen, pressure sensitive screen, acoustic pulse recognition touch screen, or another field-sensitive screen technology; and with respect to output capabilities: liquid Crystal Display (LCD), light Emitting Diode (LED) display, organic Light Emitting Diode (OLED) display, dot matrix display, e-ink, or similar monochrome or color display).

In the examples described herein, gesture-based I/O device 506 includes a touch screen device capable of receiving a haptic interaction or gesture as input from a user interacting with the touch screen. Such gestures may include a tap gesture, a drag or swipe gesture, a flick gesture, a pause gesture (e.g., a user touching the same location of the screen for at least a threshold period of time), where the user touches or points to one or more locations of the gesture-based I/O device 506. Gesture-based I/O device 506 may also include a non-click gesture. Gesture-based I/O device 506 may output or display information to a user, such as a graphical user interface. Gesture-based I/O device 506 may present various applications, functions, and capabilities of computing device 402, including, for example, applications 516 for capturing images, viewing images, processing images, and displaying new images, messaging applications, telephony communications, contact and calendar applications, web browsing applications, gaming applications, electronic book applications, and financial, payment, and other applications or functions, among others.

Although the present disclosure primarily shows and discusses gesture-based I/O device 506 in the form of a display screen device (e.g., a touch screen) with I/O capabilities, other examples of gesture-based input/output devices that may detect movement and that do not themselves include a screen may be utilized. In this case, the computing device 402 includes a display screen or GUI coupled to a display apparatus to present new images and applications 516. Computing device 402 may receive gesture-based input from a track pad/touch pad, one or more cameras, or another presence or gesture-sensitive input device, where presence means a presence aspect of a user, including, for example, all or part of the user's actions.

The one or more communication units 508 may communicate with external devices (e.g., server 408, server 410, second computing device 412) by sending and/or receiving network signals over one or more networks, such as via communication network 404, e.g., for purposes as described and/or for other purposes (e.g., printing). The communication unit may include various antennas and/or network interface cards, chips (e.g., global Positioning Satellites (GPS)), etc., for wireless and/or wired communications.

The input device 504 and the output device 510 may include any of one or more buttons, switches, pointing devices, cameras, keyboards, microphones, one or more sensors (e.g., biometrics, etc.), speakers, bells, one or more lights, haptic (vibration) devices, etc. One or more of them may be coupled via a Universal Serial Bus (USB) or other communication channel (e.g., 338). The camera (input device 504) may be front-facing (i.e., on the same side as it) to allow a user to capture images using the camera for "self-photographing" while viewing gesture-based I/O device 506.

The one or more storage devices 512 may take different forms and/or configurations, for example, as short-term memory or long-term memory. The storage device 512 may be configured to store information for a short period of time as volatile memory that does not retain stored content when powered down. Examples of volatile memory include Random Access Memory (RAM), dynamic Random Access Memory (DRAM), static Random Access Memory (SRAM), and the like. In some examples, storage device 512 also includes one or more computer-readable storage media, e.g., for storing a greater amount of information than volatile memory and/or for long-term storage of such information, retaining information when powered down. Examples of non-volatile memory include magnetic hard disk, optical disk, floppy disk, flash memory, or forms of electrically programmable memory (EPROM) or Electrically Erasable and Programmable (EEPROM) memory.

In one embodiment, other user-oriented computing devices (e.g., 414 and 416) are similarly configured as computing device 402, as described. In one embodiment, the application 516 is configured differently for a dermatologist user, e.g., with functionality to work with multiple patients. In one embodiment, the application 516 is configured differently for devices located at the store, e.g., having functionality to work with multiple customers, for store brands, etc.

In another example, the network model 418 is located remotely and the computing device (e.g., any of 402, 414, and 416) is capable of transmitting images via a suitably configured application 516 for processing and return of diagnostic data (e.g., acne data). In such examples, application 516 is configured to perform these activities.

Although not shown, the computing device may be configured as a training environment to train the neural network model 514, for example, using a network as shown in fig. 5 along with appropriate training and/or testing data.

Model 418 may be applicable to a lightweight architecture for a computing device that is a mobile device (e.g., a smart phone or tablet) that has less processing resources than a "larger" device (e.g., a laptop, desktop, workstation, server, or other comparable computing device).

Although not shown in detail, the server device shown is a computing device similar in basic structure (processor, storage device, communication device, input and output devices) to computing device 402, but acting as a server-side device. That is, server devices are typically not configured with consumer-oriented hardware, have fewer input and output devices, fewer user applications, server-oriented O/S, and so forth.

FIG. 6 is a flow diagram of operations 600 according to one embodiment.

At 602, a source (face) image is received. For example, the source image is a self-captured image captured by a camera of the computing device.

The operation at 604 is optional and is shown in dashed line fashion. A facial skin mask for guiding image processing is determined.

At 606, the operation processes the facial image using the model to generate a local instance of acne by type (e.g., one of three categories). In one embodiment, the model is configured to: detecting at least one type of acne in the image; and focus on detecting small objects in the image. In one embodiment, the model is a first model embodiment. In an embodiment, the model is a second model embodiment. In one embodiment, the treatment is directed by a skin mask generated by the operation at 604.

At 608, the detected instances are filtered to select better locations and instances with higher confidence levels, as described above. Thus, the final detected acne is determined by type.

At 610, a count is determined by type. At 610, a score representing the severity metric is generated, for example. At 612, the ultimately detected acne is visualized in association with the source image. For example, the images are annotated to show the final detected acne by type. In one embodiment, the image has annotations overlaid thereon. At 612, a score is presented.

At 614, a recommendation is obtained. At 614, the recommendation is presented. In one embodiment, the score is provided to an e-commerce service to generate a recommendation for the product and/or service.

In one embodiment, operation 600 may be performed by a single computing device (e.g., device 402) in communication with another device (e.g., 408 and/or 410), if applicable.

In one embodiment, the operations are performed by more than one computing device (e.g., 402, 408, and 410). For example, operations at 606 include transmitting the source image to computing device 408, which provides the model as a service. Computing device 408 may also perform filtering (step 608). Device 408 may also count acne (step 610). Device 408 may also visualize acne (step 612) and return source images (e.g., for presentation by device 402) that overlap with the visualization and counting and/or scoring. For example, the third computing device 410 may provide recommendations in response to queries that include scores. In operation 600, the steps therein may be divided into more than one step or combined into fewer steps.

Practical implementations may include any or all of the features described herein. These and other aspects, features, and various combinations may be expressed as methods, apparatus, systems, components, program products, and in other ways, for performing the functions. Many embodiments have been described. However, it will be appreciated that various modifications may be made without departing from the spirit and scope of the processes and techniques described herein. Further, other steps may be provided from the described processes, or steps may be eliminated, and other components may be added to, or removed from, the described systems. Accordingly, other embodiments are within the scope of the following claims.

Throughout the description and claims of this specification, the words "comprise" and "comprising" and variations thereof mean "including but not limited to" and are not intended to (nor do) exclude other elements, integers or steps. Throughout the specification, the singular encompasses the plural unless the context requires otherwise. In particular, where the indefinite article is used, the specification is to be understood as contemplating plurality as well as singularity, unless the context requires otherwise.

Features, integers, characteristics, or groups described in conjunction with a particular aspect, embodiment, or example of the invention are to be understood to be applicable to any other aspect, embodiment, or example unless incompatible therewith. All of the features disclosed herein (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive. The invention is not limited to the details of any of the foregoing examples or embodiments. The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed.

Claims

1. A method, comprising:

analyzing the source images to determine respective locations of acne instances; and

visualizing the acne instances on the source image for display;

wherein the source image is analyzed using a model configured to:

detecting at least one type of acne in the image; and is also provided with

Focusing on detecting small objects in the image.

2. The method of claim 1, wherein the model is a deep-learning neural network model configured for object classification and localization.

3. The method of any of claims 1 and 2, wherein the model is configured to process the image on a pixel level, operating end-to-end to directly detect acne locations without cropping the source image.

4. A method according to any one of claims 1 to 3, wherein the model generates a respective anchor frame providing positional information for each of the acne instances detected.

5. A method according to claim 3, wherein the anchor frame aspect ratio for defining one of the anchor frames is calculated using k-means clustering of identified acne instances from the image dataset.

6. The method of any of claims 1-5, wherein the model comprises a single detection layer to output a plurality of predictions including a three-dimensional tensor-encoded bounding box, an objectivity, and a category prediction, and wherein the plurality of predictions are filtered to filter redundant detection of the same acne instance.

7. A method according to any one of claims 1 to 3, wherein the model is block-based and the method comprises providing a block of skin from the source image to the model for processing to detect an acne instance.

8. The method of any one of claims 1 and 2, wherein the model is block-based and the method includes providing a block of skin from the source image to the model for processing to detect an acne instance, and wherein the block is determined from a skin mask.

9. The method of any one of claims 1 to 8, wherein visualizing the acne instances indicates a respective location of each of the instances on the source image.

10. The method of any one of claims 1 to 9, wherein visualizing the acne indicates a respective acne type for each of the instances on the source image.

11. The method of any one of claims 1 to 10, wherein the at least one type of acne comprises one or more of persistent acne, inflammatory acne, and pigmented acne.

12. The method of any one of claims 1 to 11, comprising determining a count of the acne instances.

13. The method of any one of claims 1 to 12, comprising obtaining a recommendation for one or more of a product and a service specific for treating the acne instance.

14. The method of claim 13, comprising communicating with an electronic commerce system to make a purchase of the product or service.

15. The method of claim 13 or 14, wherein the recommendation is generated in response to a factor selected from the group consisting of: acne type, count by type, score by type, location of the acne, location of purchaser, delivery location, regulatory requirements, counter indication, gender, co-recommendation, likelihood of user following the guideline of use, and likelihood of user following the guideline of use.

16. A method, comprising:

generating and providing a counted acne score in response to the instance;

wherein the source image is analyzed using a model configured to:

focusing on detecting small objects in the image; and is also provided with

Detecting at least one type of acne in the image;

wherein the model comprises one of:

a single detection layer model configured to output a plurality of predictions including a three-dimensional tensor encoding bounding box, an objectivity, and a category prediction, and wherein the plurality of predictions are filtered to filter redundant detection of the same acne instance; and

a block-based model is configured to receive a block of skin from the source image for processing to detect an acne instance, the block being determined from a skin mask.

17. The method of claim 16, wherein the acne score is responsive to one or more of a location, a count, and a type of acne.

18. The method of any one of claims 16 and 17, comprising generating a recommendation for one or more of a product and a service specific to treating the acne instance, the recommendation generated in response to a factor selected from the group consisting of: acne type, count by type, score by type, location of the acne, location of purchaser, delivery location, regulatory requirements, counter indication, gender, co-recommendation, likelihood of user following the guideline of use, and likelihood of user following the guideline of use.

19. The method of any one of claims 16 to 18, comprising visualizing the acne instance on the source image.

20. A computing device comprising circuitry configured to perform the method of any of the preceding claims.

21. A computing system comprising circuitry configured to provide:

an interface for receiving a source image and returning an annotated source image that visualizes an acne instance determined by a model configured to process the source image;

Wherein the model is configured to:

focusing on detecting small objects in the image; and is also provided with

Detecting at least one type of acne in the image;

wherein the model comprises one of:

a block-based model configured to receive a block of skin from the source image for processing to detect an acne instance, the block being determined from a skin mask; and

a single detection layer model configured to output a plurality of predictions including a three-dimensional tensor encoding bounding box, an objectivity, and a class prediction, and wherein the plurality of predictions are filtered to filter redundant detection of the same acne instance.

22. The computing system of claim 21, configured to provide:

a recommendation component configured to recommend products and/or services specific to treating at least some of the acne instances; and

an e-commerce transaction component to facilitate purchase of the product and/or service.