WO2019142127A1

WO2019142127A1 - Method and system of creating multiple expression emoticons

Info

Publication number: WO2019142127A1
Application number: PCT/IB2019/050390
Authority: WO
Inventors: Feroz Abbasi
Original assignee: Feroz Abbasi
Priority date: 2018-01-17
Filing date: 2019-01-17
Publication date: 2019-07-25

Abstract

The present invention discloses a method of creating multiple expression emoticons relating to a user is provided. The method includes the steps of: receiving a selection of an image from a user including at least a face at anterior part of said human head; performing an image processing operation for extracting the human head portion and removing background; analyzing said first portion to determine one or more parameters including age, ethnicity and skin colour; identifying a set of pre-stored overlays based on determined said one or more parameters; identifying key asset/ feature Region of interest (ROIs) and filling pixel portions corresponding to ROIs; identifying a set of pre-stored overlays/ asset template based on determined said one or more parameters,; creating one more second portions corresponding to each overlay exhibiting a unique expression; and rescaling said one or more second portions to match a pre-determined resolution and storing said resized one or more second portions.

Description

Method and System of creating multiple expression emoticons

FIELD OF THE INVENTION

The present invention relates generally to the field of image processing and more particularly, to methods and systems for creating different expression emoticons.

BACKGROUND OF THE INVENTION

Information handling devices for example laptop computers, tablets, smart phones, desktop computers, etc., may be used by users to communicate with one another. A common form of communication is text based communication, e.g., chats communicated via Internet connected applications, SMS text message exchange, email exchange, and the like. A typical aspect of such form of communication involves sharing of images and text with users. The text may be used to represent general messages or may be used to an express an emotion. These days several the text based emotion representations are replaced with graphical symbols known as emoticons. The term emoticon as used herein is derived from the combination of the words“emotion” and“icon”, and refers to a minute graphical symbol used as a substitute for verbal or visual communication cues meant to assist conveying the mod or emotion of a textual message, instant messaging, short messaging service (SMS), email, chartroom and other forms of textual communication protocols. Such emoticons are generally in build in the messaging application or may be rendered from a third party application (as add-on) for use in a messaging session. Generally, the user is provided with a palette consisting of a pre existing number and pre-defined styled emoticons. The user is provided with an option to select one or more emoticons from the emoticon palette for sharing the same with other user in the messaging session. In another implementation, one or more emoticons may be selected by using a combination of one or more inputs on the keypad. The combination of inputs is mapped to a particular emoticon and the emoticon is automatically displayed when the combination of inputs on the keypad is received. The drawback with such emoticon palette is that the emoticons are pre-defined and are same for all users. More recently, some customizable emoticons have become available on some messaging applications. For example, one existing system allows the user to import an image from the file system. The image selected by the user is rescaled to match the resolution of emoticons. However, even for such customizable emoticons, the image file has to be already available, and such customized emoticons are inserted in the messaging application. Accordingly, a list/ palette of customized emoticons may be created by the user using different images exhibiting different expressions. Few systems also exist in the art that help in creating personalized animated emoticons based on an image from the user. However, such animated emoticons do not give a close resemblance to the user.

Accordingly, there exists a need for an improved system that overcomes the problems in the existing solution. A need exists for a system that can automatically create closely resembling emoticons exhibiting different expressions (emotions) of a user.

SUMMARY OF THE INVENTION

In an embodiment, a method for generating an emoticon corresponding to at least one emotional expression of a user is provided. The method includes the steps of: segmenting an image of the user into a plurality of segments, wherein the plurality of segments comprising a first set of segments corresponding to a presence of user in a first section of the image and a remaining set of segments comprising a plurality of objects excluding the user in a remaining section of the image; ascertaining a presence of a head of the user, at least an anterior portion of the head of the user and a face in the first section of the image; identifying a first portion in the first section and a second portion in the first section, wherein the first portion comprises the head and the face of the user and the second portion comprises remaining body of the user; processing the first portion and the second portion of the first section to determine a plurality of parameters corresponding to at least one of a gender, an age and an ethnicity of the user; determining at least one overlay template from a plurality of overlay templates based on the determined plurality of parameters; wherein each of the plurality of overlay templates comprises at least one template of a fictional characteristic of the face corresponding to at least one emotion of the user; defining a plurality of region of interests within the first portion of the image, wherein each of the plurality of region of interest comprises at least one real characteristic of the face of the user; obliterating the at least one real characteristic of the face of the user within each of the plurality of region of interests; and overlaying the at least one template of the fictional characteristic of the face on the the obliterated at least one real characteristic within each of the plurality of region of interests to generate the emoticon corresponding to the at least one emotional expression of the user.

In another embodiment, a system for generating an emoticon corresponding to at least one emotional expression of a user implementing the method described above is also is provided. In an embodiment, a method of creating multiple expression emoticons relating to a user is provided. The method includes the steps of: receiving a selection of an image from a user; processing said image to ascertain if said image includes a human head including at least a face at anterior part of said human head; on positive ascertaining, performing an image processing operation, said image processing operation including: 1) identifying a first portion in said image pertaining to said human head and a second portion in said image excluding said human head; 2) extracting said first portion from said image; analyzing said first portion to determine one or more parameters including age, ethnicity and skin colour; identifying a set of pre-stored overlays based on determined said one or more parameters, wherein each overlay in identified set exhibits a unique expression performing a morphological transformation on said first portion; overlaying each overlay exhibiting a unique expression onto said first portion obtained after morphological transformation; and creating one more second portions corresponding to each overlay exhibiting a unique expression; rescaling said one or more second portions to match a pre-determined resolution and storing said resized one or more second portions.

It is an objection of the invention to create emoticons based on age, skin tone and ethnicity. It is another objection to create multiple emoticons using a single image from user.

To further clarify advantages and features of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.

BRIEF DESCRIPTION OF FIGURES

These and other features, aspects, and advantages of the present invention will become better understood when the following detailed description is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:

Figure 1 illustrates a flow chart for a method for generating an emoticon corresponding to at least one emotional expression of a user, in accordance with an embodiment of the invention;

Figure 2 illustrates a block diagram of a system for for generating an emoticon corresponding to at least one emotional expression of a user, in accordance with an embodiment of the invention;

Figure 3 illustrates exemplary images illustrating the obliterating process referred in Figure 1 ;

Figure 4 illustrates processing of an exemplary image of a user in accordance with the invention;

Figure 5 illustrates plurality of exemplary emoticons corresponding to user image referred in Figure 4 created in accordance with the invention;

Figure 6 illustrates a flow chart for a method of creating multiple expression emoticons in accordance with an embodiment of the invention; and

Figure 7 illustrates a block diagram of a system for creating multiple expression emoticons in accordance with an embodiment of the invention.

Further, skilled artisans will appreciate that elements in the drawings are illustrated for simplicity and may not have been necessarily been drawn to scale. For example, the flow charts illustrate the method in terms of the most prominent steps involved to help to improve understanding of aspects of the present invention. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the drawings by conventional symbols, and the drawings may show only those specific details that are pertinent to understanding the embodiments of the present invention so as not to obscure the drawings with details that will be readily apparent to those of ordinary skill in the art having benefit of the description herein.

Detailed Description:

For the purpose of promoting an understanding of the principles of the invention, reference will now be made to the embodiment illustrated in the drawings and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended, such alterations and further modifications in the illustrated system, and such further applications of the principles of the invention as illustrated therein being contemplated as would normally occur to one skilled in the art to which the invention relates. The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such process or method. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The system, methods, and examples provided herein are illustrative only and not intended to be limiting. Embodiments of the present invention will be described below in detail with reference to the accompanying drawings.

Referring to Figure 1, a method for generating an emoticon corresponding to at least one emotional expression of a user is provided. The method 100 includes the step 102 of segmenting an image of the user into a plurality of segments, wherein the plurality of segments comprising a first set of segments corresponding to a presence of user in a first section of the image and a remaining set of segments comprising a plurality of objects excluding the user in a remaining section of the image. The first set of segments of the image corresponds to portions of the image that relate to the human face including human head. The remaining sets of segments i.e. remaining portions of the face are segmented for removal. The image of the user may be taken in real time using image capturing device (camera) of the user’s electronic device by the user or may be selected by the user from a pre-stored location m the user’s electronic device. The electronic device may include a smart phone, smart watch, smart glass, mobile device, net books, note books and other smart devices. The brightness, contrast on the face and the overall clarity may be adjusted for appropriate segmentation. Thereafter, the method 100 ascertains, in step 104, presence of a head of the user, at least an anterior portion of the head of the user and a face in the first section of the image. The user is generally provided with a view portion wherein the user is requested to fit the image into said view portion Once the image is received and suitably fitted into the view portion, the step 104 of ascertaining is performed. On negative ascertaining, the user may be sent an error notification to upload a fresh image appropriately in the view portion. On positive ascertaining, a first portion in the first section and a second portion in the first section is identified in step 106. The first portion comprises the head and the face of the user and the second portion comprises remaining body of the user. For the purposes of the present disclosure, the first portion includes the sensory organs the eyes, nose, ears, cheeks and mouth. The human head is also intended to include hair on top of human face i.e. skull and hair on the face in the form of moustache and beard. The human head may also include accessories such as glasses, turbans and other wearable items used by a user on daily basis. Suitably image processing algorithms known to a person skilled in the art may be used to ascertain if the image fitted into the view portion contains human face or not. For example, a cascade classifier algorithm of Viola-Jones may be used to detect whether a human face exists in the view portion of the image. In one implementation, to determine whether the image includes human face or not, the face detection technique involves identifying for areas of contrasts, between light and dark parts of the image - like the bridge of the nose is usually lighter than the surrounding area on both sides, the eye sockets are darker than the forehead. By repeatedly scanning through the image data calculating the difference between the greyscale pixel values underneath the white boxes and the black boxes, the face detection technique can detect faces. In another method, the processing technique to dentify a human face in an image may include separating skin regions from non-skin regions and then locating faces within skin regions. A chroma chart is prepared via a training process that shows likelihoods of different colours representing the skin. Using the chroma chart, a colour image is transformed into a gray scale image with the gray value at a pixel showing the likelihood of the pixel representing the skin. By segmenting the gray scale image, skin regions are separated from non- skin regions. Then, using the luminance component of the colour image and by template matching, faces are located within skin regions. It is to he noted that standard face detection techniques may be used to ascertain the presence of user face in an image. The first portion and the second portion of the first section are processed in step 108 to determine a plurality of parameters corresponding to at least one of a gender, an age and an ethnicity of the user. The method 100 further includes step 110 of determining at least one overlay template from a plurality of overlay templates based on the determined plurality of parameters; wherein each of the plurality of overlay templates comprises at least one template of a fictional characteristic of the face corresponding to at least one emotion of the user. A plurality of region of interests within the first portion of the image are defined in step 112, wherein each of the plurality of region of interest comprises at least one real characteristic of the face of the user. The real characteristic of the face may include various parts of the face including: right eye, left eye, upper lip, lower lip, right ear, left ear, nose, eyebrows, eyelashes etc. The at least one real characteristic of the face of the user are obliterated within each of the plurality of region of interests in step 112. The step 112 of obliterating the at least one real characteristic of the face of the user within each of the plurality of region of interests includes selecting a color of a neighboring region of the each of the plurality of region of interests respectively; and obliterating the at least one real characteristic of the face of the user with the selected color of the neighboring region of each of the plurality of region of interests respectively. The details of obliterating procedure have been explained in detail in reference to Figure 4. Once the real characteristics of the face are obliterated, at least one template of the fictional characteristic of the face is overlaid on the obliterated at least one real characteristic within each of the plurality of region of interests in step 114 to generate the emoticon corresponding to the at least one emotional expression of the user.

In an embodiment, step 102 of segmenting an image of the user into a plurality of segments comprises: accessing a first-type of convolutional neural network trained to segment humans within image to identify the first section and the remaining section of the image.

In an embodiment, step 106 of identifying a first portion in the first section and a second portion in the first section comprises: accessing a second-type of convolutional neural network trained to segment human head from a body of the humans to identify the first and second portion in the first section of the image. In an embodiment, the method 100 further includes normalizing the first and second portions of the first section of the image to straighten face of the user in a two-dimensional plane. In an implementation, the normalization process may use an affine transformation which maps the triangle formed by three vertices (corresponding to the eyes and the mouth) into a standard view. This normalization technique treats the face and the rest of the image as a thin sheet which can be scaled, rotated and sheared. In an embodiment, the method 100 further includes processing the first portion and the second portion of the first section of the image to remove an angle of tilt of the face within the image. In an embodiment, the step 108 of processing the first portion and the second portion of the first section to determine a plurality of parameters comprises accessing at least a third type of convolutional neural network to classify the user into at least one gender category, an age category and an ethnicity category.

In an embodiment, the at least one template of the fictional characteristic of the face comprises at least one fictional lip, a fictional eye, a fictional nose, a fictional head gear and a combination thereof. In an embodiment, the method 100 further includes accessing a convolutional neural network to convert the emoticon into an artwork. In an embodiment, the method 100 further includes eliminating an unevenness within the head found in the segmented first section of the image. In an embodiment, the method 100 further includes rendering the emoticon with the body of the user on a display interface. The display interface may include a display interface of user electronic device. In an embodiment, the method 100 further includes adjusting an intensity of at least brightness level and contrast level to increase clarity within the image.

In an embodiment, the method 100 further includes activating at least one image capturing device on receiving an image capture request from the user to capture an image of the face of the user. In an embodiment, the method 100 further includes accessing at least one pre-stored image of the user, wherein the at least one pre-stored image capture at least one emotional expression of the face of the user.

In another embodiment, the emoticons are configured to be used in a messaging session. In another implementation, the emoticons may he used an image and may be shared with other users. In another embodiment, the user may be configured to add text to the emoticons. In another embodiment, the method 100 further includes associating a sound with each emoticon based on associated expression. In an embodiment, speech bubbles may be associated with each emoticon based on the expression.

Referring to Figure 2, a block diagram of a system for generating an emoticon corresponding to at least one emotional expression of a user in accordance with an embodiment of the present invention is provided. The system 200 includes an image segmentor unit 202 configured to segment an image of the user into a plurality of segments, wherein the plurality of segments comprising a first set of segments corresponding to a presence of user in a first section of the image and a remaining set of segments comprising a plurality of objects excluding the user in a remaining section of the image. An object identifier unit 204 is provided to: ascertain a presence of a head of the user, at least an anterior portion of the head of the user and a face in the first section of the image; and identify a first portion in the first section and a second portion in the first section, wherein the first portion comprises the head and the face of the user and the second portion comprises remaining body of the user. The system 200 further includes a parameter identifier unit 206 configured to process the first portion and the second portion of the first section to determine a plurality of parameters corresponding to at least one of a gender, an age and an ethnicity of the user and an overlay identifier unit 208 configured to determine at least one overlay template from a plurality of overlay templates based on the determined plurality of parameters, wherein each of the plurality of overlay templates comprises at least one template of a fictional characteristic of the face corresponding to at least one emotion of the user. The parameter identifier unit 206 is further configured to: access at least a third type of convolutional neural network to classify the user into at least one gender category, an age category and an ethnicity category.

A region of interest locator 210 is provided to define a plurality of region of interests within the first portion of the image, wherein each of the plurality of region of interest comprises at least one real characteristic of the face of the user and an obliterator unit 212 is provided to obliterate the at least one real characteristic of the face of the user within each of the plurality of region of interests. The obliterator unit 212 selects a color of a neighboring region of the each of the plurality of region of interests respectively; and obliterates the at least one real characteristic of the face of the user with the selected color of the neighboring region of each of the plurality of region of interests respectively.

A controller unit 214 then overlays the at least one template of the fictional characteristic of the face on the obliterated at least one real characteristic within each of the plurality of region of interests to generate the emoticon corresponding to the at least one emotional expression of the user. The at least one template of the fictional characteristic of the face comprises at least one fictional lip, a fictional eye, a fictional nose, a fictional head gear and a combination thereof. The system 200 further includes a normalizer 216 to normalize the first and second portions of the first section of the image to straighten face of the user in a two-dimensional plane. In an implementation, the controller unit 214 further processes the first portion and the second portion of the first section of the image to remove an angle of tilt of the face within the image.

Referring to Figure 3, exemplary images illustrating the obliterating process referred in Figure 1. The obliterating step 112 referred in Figure 1 includes obliterating the at least one real characteristic of the face including right eye, left eye, upper lip, lower lip, nose, left ear, right ear, eye brows etc. In an implementation, the process of obliterating includes creating a plurality of masks of the original user image (figure 3a)) as explained below:

1) creating a first mask of the image of the user by changing RGB value corresponding to image of the user to RGB (0,0,0). A black image of the user black image of the same size as the original image of the user is created. The same is illustrated in Figure 3(b).

2) creating a second mask pertaining to face portion of the user, wherein said creating of second mask includes:

i. identifying plurality of dlib points pertaining to face portion of the user in the image of the user;

ii. creating an intermediary mask by changing RGB value corresponding to identified face portion of the user to RGB (255,255,255);

iii. overlaying intermediary mask on first mask to create said second mask;

The second mask is created by identifying the face portion identified using the 68 dlib points, filled with white colour and overlaid on first mask The same is illustrated in Figure 3(c). 3) creating a third mask pertaining to head portion of the user, wherein said creating of third mask includes:

i. creating an intermediary elliptical mask having diameter equal to the distance between the outer points of the left eye and right eye, and changing RGB value corresponding to identified intermediary elliptical mask to RGB (255,255,255);

ii. overlaying intermediary elliptical mask on first mask to create said third mask.

To cover the skin above the eyebrow, a mask is drawn that covers the head region. For this an elliptical mask is created, such that it's diameter is the distance between the outer points of the two eyes. The ellipse is filled with white colour and is overlaid on the first mask. The points pertaining to left eye and right eye are determined using the Dlib for face and feature recognition. The same is illustrated in Figure 3(d).

4) creating a fourth mask (Figure 3(e)) by combining said second mask and third mask. The fourth mask includes both the head portion and face portion.

5) creating a fifth mask (Figure 3(f)) pertaining to facial skin region of the user, wherein said creating of fifth mask includes:

i. performing a Graph cut on image of the user basis the fourth mask; and

ii. obtaining a segmented image pertaining to facial skin region of the user.

Thereafter, the obliterating process includes the steps of: a) identifying Dlib points corresponding to left eye and right eye respectively and storing said Dlib points in an array; b) creating circular portions enclosing left eye and red eye respectively based on said said Dlib points stored in the array; and c) changing RGB value of said first mask pertaining to area corresponding to said circular portions to (255,255,255). The resulting mask is illustrated in Figure 3(g) The obliterating process further includes identifying mean RGB value of RGB values corresponding to skin region below Dlib points corresponding to left eye and right eye respectively; changing RGB value of said first mask pertaining to area corresponding to said circular portions to identified mean RGB value.

Thereafter, the eye portions are obliterating by: 1) changing RGB value of said first mask pertaining to area corresponding to said circular portions to (255,255,255); 2) inverting RGB value of said first mask obtained after changing RGB value of said first mask pertaining to area corresponding to said circular portions to (255,255,255); c) combining said segmented image of the user and said first mask obtained after said inverting (Figure 3(h)); d) obliterating eye portion of the user by combining first mask obtained changing RGB value of said first mask pertaining to area corresponding to said circular portions to identified mean RGB value and image obtained after combining said segmented image of the user and said first mask obtained after said inverting. Figure 3(i) illustrates the image where the eyes of the user have been obliterated.

With regard to obliteration process of the lips, the process includes the steps of: 1) identifying Dlib points corresponding to lips in said image of the user and storing said D ib points in an array; creating a convex portion enclosed by Dlih points corresponding to lips; and changing RGB value of said first mask pertaining to area corresponding to said convex portion to (255,255,255). The resulting mask is illustrated in Figure 3(j). Thereafter, the mean RGB value of RGB values corresponding to skin region below Dlib points corresponding to left eye and right eye respectively are identified; and RGB value of said first mask pertaining to area corresponding to said circular portions is changed to identified mean RGB value. Thereafter, the process includes predicting RGB value corresponding to skin region in proximity to said convex portion; changing RGB value of said first mask pertaining to area corresponding to said convex portion to predicted RGB value: obliterating lip portion of the user by combining first mask obtained changing RGB value of said first mask pertaining to area corresponding to said convex portion to predicted RGB value and image obtained after combining said segmented image of the user and said first mask obtained after said inverting.

In an implementation, the at least one real characteristic of the face of the user are selectively obliterated, wherein said selective obliterating includes analyzing expression on face of the image using pre-trained CNNs. For selective obliteration, the expression on the face of the original image is analysed using pre-trained CNNs. If the expression is negative, we obliterate only the lip region to generate other negative emotions by changing the lip template only. To generate all the other emotions, both the lip & the eye template are obliterated. Similarly, if the expression is positive, we obliterate only the lip to generate other positive emotions. To generate all other emotions, both the lip & the eye template are obliterated. In an exemplary implementation, figure 3(k) indicates, obliteration of eyes only, figure 3(1) indicates obliteration of lips only, figure 3(m) indicates obliteration of both lips and eyes.

Referring to Figure 4, exemplary user images undergoing processing in accordance with the process of the present invention are illustrated. Figure 4(a) illustrates the original image of the user. The original image may be clicked by the user in real time or may be selected from a pre-stored location. The original image contains noise and unwanted features (segments) that may not be required for the purpose of creating emoticon. Figure 4(b) illustrates the image undergoing the segmentation. As can be noticed, the head including the hair and face region of the portion are being marked for segmentation. A plurality of markers are used to cover a wider region and segmentation is done in HSV & YChCr domain. Figure 4(c) shows the normalized image of the user. As can be noticed, the segmented head is normalized to straighten the face on the X-Y plane. Figure 4(d) indicate the image of the user after the obliteration step. In figure 4(d), all the real characteristics of the face of the user except the nose are obliterated.

Referring to Figure 5, exemplary emoticons corresponding to user image referred in Figure 4 are illustrated. As can be seen, only the nose feature of the user that was left after the obliteration is present in all the emoticons and the remaining features have been overlaid using fictional characteristics to exhibit different emoticons with different expressions.

Referring to Figure 6, a method 600 for creating multiple expression emoticons relating to a user is disclosed. The method illustrated in figure 6 is to be read in reference with method illustrated in Figure 1. The method 600 includes step 602 of receiving a selection of an image from a user. The use may use a camera to capture an image of the user's face. Additionally, or alternatively, the user may provide a previously created image of the user's face to the user device. The user is generally provided with a view portion wherein the user is requested to fit the image into said view portion. Once the image is received and suitably fitted into the view portion, the method 600 processes said image as fitted into the view portion to ascertain if said image includes a human head including at least a face at anterior part of said human head in step 604. For the purposes of the present disclosure, the human head includes at least human face containing the sensory organs the eyes, nose, ears, cheeks and mouth. The human head is also intended to include hair on top of human face i.e. skull and hair on the face in the form of moustache and beard. The human head may also include accessories such as glasses, turbans and other wearable items used by a user on daily basis. On negative ascertaining in step 604, the method 600 sends an error message to user and requests the user to include a fresh image. The method 600 also sends an error message to the user and requests the user to include a fresh/ new image in case it is found that the image selected by the user is not in appropriate format, or is not of appropriate size or clarity. In case it appears at step 604 that the image includes human face but is not in appropriate format, or is not of appropriate size or clarity, a request for a fresh/ new user may still be made to the user. Suitably image processing algorithms known to a person skilled in the art and as described previously in Figure 1 may be used to ascertain if the image fitted into the view portion contains human face or not.

On positive ascertaining in step 604, the method 600 performs an image processing operation at step 606 to remove the background (unwanted portion) and separate the human head including human face from the image. The image portion pertaining to said human head may be referred to as first portion and image portion excluding said human head may be referred to as second portion Suitably human head (face) recognition algorithms as suggested above are used to identify the first portion in the image using trained CNNs. The CNNs are suitably trained to identify skin regions, and hair region to make sure that the any essential region that forms part of the human head is not removed as unwanted portion. Suitable, background removal algorithms such as GrabCut Algorithm for texture based background removal may be used.

A Convolutional Neural Network (CNN) can be thought of as a layered image- processing pipeline designed to perform a particular task. The goal of the pipeline is to take an image as input, perform mathematical operations and provide a high-level user- friendly response. The processing within the network is sequential in nature: i.e., each layer in the network takes input from the layer(s) above it, does some computation before passing the resulting output to the next layer(s).Each layer is composed of “neurons” that are connected to“neurons” of other (in most cases adjacent) layers. Each connection has a numeric weight associated with it that signifies its importance. There are two main steps when working with CNNs: training and testing. Before a CNN can be used for a task, it needs to be trained for that task. In the training phase, the CNN is provided with a list of objects that need to be detected and classified by the network. The method involves creating, training and storing several CNNs for various attributes selected by analyzing millions of faces of different users. The CNNs are trained in a well- defined manner for each of the attributes identified by analyzing millions of faces of different users. For example, conventional techniques may involve a number of different training models, which may be utilized to respectively correspond to a particular aspect at which a human face may be depicted in a digital image. For example, to detect an image of a human face oriented 60.0 degrees off-axis utilizing conventional measures, a corresponding training model, which may be useful primarily for a specific range of off- axis orientations, such as between approximately 26.0 degrees and 36.0 degrees, may be utilized. In another example of a conventional technique, to detect a partially-cropped image of a human face or to detect an image of a partially-occluded human face in a digital image, two or more specific training models may be utilized. In an embodiment, parameters of a single neural network model for performing face detection may be developed. Model parameters of a neural network used for face detection may, at least in some embodiments, be leveraged from training a neural network to detect a plurality of different faces. In an embodiment, one or more training modules may be provided to build and train CNNs. The CNNs may be trained by analyzing millions of face images and the attributes related thereto. A threshold may be set for each CNN to evaluate the performance of the respective CNN.

In another algorithm to detect human faces in colour images and as well as for removing background from a single face colour image, the algorithm combines colour histogram for skin colour (in the HSV space), a threshold value of gray scale image to easily detect skin regions in a given image. Then, in order to reduce the number of non face regions, we calculate the number of holes of these selected regions. If the value is less than a particular threshold, then the region is selected. Also, ratio of the height and width of the detected skin region is calculated to differentiate face and non-face regions. Finally, Weber Local Descriptor (WLD) is calculated for each selected regions and then, each regions are divided into equal size block and corresponding entropy values of each block are calculated and compared with training samples to get the Euclidian distance between them. If the distance value is in between a tested threshold values, then the region block is face, otherwise it is non-face. Another technique for background removal may be based on Graph Cut Algorithm. It is to be noted that the present invention is contemplated to cover any of the openCV algorithms used for human face detection and background removal. The present invention may also involve use of Convolution Neural Networks that are suitably trained to identify the skin regions and non-skin regions. Further, the CNNs may also be suitably trained to identify any accessories such as glasses, ear accessories, turbans etc in an image. The present invention may also involve removing such as accessories as part of the background suitably trained CNNs and other openCY background removal algorithms.

In an implementation, the image processing operation 604 includes step of identifying and segmenting the hair portion. The step is primarily used to identify scalp hair but may also be used to identify facial hair as well. The step involves identifying approximate hair regions and placing a marker on the same. Thereafter, MeanShift algorithm is applied for segmentation in RGB domain. The step further involves involving a plurality of markers on hair to cover a wider region and segmenting in HSV & YCbCr domain.

Once the background (unwanted portion) is removed from the image and the first portion is extracted, the step 606 may further include performing a morphological transformation in on said first portion. The morphological transformation Is performed to make sure that the relevant features/ portion of the face are not removed during the background/ unwanted portion removal process. The morphological transformation helps in completing the human face if any portion has been inadvertently considered as a background portion and removed.

Once the first portion (human face) is extracted (and the background is removed from the image) and a morphological transformation is performed, the method 600 includes analysing said first portion to determine one or more parameters including age, ethnicity and skin colour in step 608. The aforesaid determination of one or more parameters including age, ethnicity and skin colour is again performed using suitably trained CNNs and other openCV algorithms. It is contemplated that conventional training approaches for detection of multi-aspect images of human faces may utilize numerous distinct training models, which may include training models to detect images of human faces rotated in-plane, training models to detect cropped and/or occluded images of human faces, training models to detect human faces oriented off-axis, training models to detect skin color/ texture, training models to detect ethnicity (demographic region), and training models to age, and so forth. In one implementation, the age may be determined using algorithm provided in paper entitled“ Estimating The Age Of Human Face In Image Processing Using Matlab”. In another implementation, the Viola-Jones algorithm may used for detecting the age of the user. In another implementation, the system as provided in US patent No.: “US 7606621 Bl” “Demographic classification using image components’ may be used for automatically extracting the demographic information from images. The system therein detects the face in an image, locates different components, extracts component features, and then classifies the components to identify the age, gender, or ethnicity of the person(s) in the image. Similarly, the skin texture may be determined using algorithm as described in“SKIN TEXTURE RECOGNITION USING NEURAL NETWORKS”. The above said algorithms are provided by way of examples. The present implementation may be implemented using any suitable algorithm that helps in identifying the skin color (texture), age, ethnicity (demographic region).

Once the morphological transformation is performed and the one or more parameters have been identified, the method 600 involves step 610 of identifying key asset/ feature Region of interest (ROIs) and filling pixel portions corresponding to ROIs with pixel portions of the portions close neighbourhood to said ROIs. In particularly, the method 600 involves identifying pixel portions corresponding to essential elements (ears, nose, lips, eye-brows) of the human face and replacing the pixel portions of said identified portions with pixel portions of the portions in the close neighbourhood to said essential elements. This is done to make sure that a consistent texture (skin tone) appearance is given to the first portion when said pixel portions of the essential elements are replaced with the pixel portions of the neighbourhood skin portions. In an embodiment, the step 610 further involves fine timing the first portion obtained by filling said ROIs. The fine tuning involves combination of morphology, blurring and CLAHE. The step 610 may be performed sequentially or simultaneously with step 608

Once the asset ROIs are filled and fine tuning is performed and the one or more parameters including age, ethnicity and skin colour are identified, the method 600 include step 612 of identifying a set of pre-stored overlays based on determined said one or more parameters, wherein each overlay in Identified set exhibits a unique expression. The best set of overlays with different (unique) expressions is identified using the pre- stored overlays. The term expression used herein may denote facial expression exhibiting some emotion such as happiness, sad, excited, anger etc.

In an implementation, the process of creating set of overlays includes analyzing a plurality of sample images including human face using Convolutional Neural Networks to identify said one or more parameters including face shape, age, ethnicity and skin colour pertaining to said sample images; identifying an expression exhibited by said human face in each of said plurality of sample images; mapping said expressions exhibited by said human face in each of said plurali ty of sample images with identified said one or more parameters pertaining to said sample images; and creating and storing a set of unique overlays based on said mapping. Each overlay may be different to exhibit a different expression based on the ethnicity. For example, a user belonging to an African region may have different shape of eyes, lips, ears, nose in the over lay in comparison to a user from an Asian Region. The CNNs are suitably trained to identify different set of shapes and position of the various essential elements based on a user expression. For instance, a smiling expression may be portrayed with a different shape of the lips in comparison to an angry expression.

Each of the identified set of pre- stored overlays/ asset template exhibiting a unique expression and having essential features in accordance with the ethnicity, skin colour, age etc are overlaid onto said first portion in step 614 for creating one more second portions in step 616 Each of the one or more second portions exhibits a unique expression The overlaying may involve suitable CNNs and image processing techniques for giving better results Thereafter, the one more second portions corresponding to each overlay exhibiting a unique expression The one or more second portions are essentially emoticons that exhibit different expressions. Suitable image processing techniques may be used for providing visual enrichment to said second portion. This may include making changes to any part of the face, color / texture/ skin tone of the face or any part thereof, dimensions/ shape of any part of the face to give a more realistic and close resemblance to the user. In another embodiment, the method 600 includes refining said second portions until a pre-set threshold is reached. The CNNs are suitably trained to compare the original image of the user with the second portion to achieve closest resemblance but for different expressions. In an embodiment, multiple iterations using a plurality of CNNs may be performed until the desired results are achieved.

In an embodiment, the CNNs are suitably trained to identify and recognize the eyebrows, eyes, mouth, nose, chin, forehead and other key elements of the human face. The identification helps in overlaying the assets in templates in correct position. In an embodiment, the method 600 includes identification of the forehead and placement of the laugh lines, tweaking their distances basis the features of the user. In an embodiment, the method 600 includes altering the size of the assets for appropriate positioning. DLibs and trained CNNs are used for performing the aforesaid identification and alterations.

Thereafter, the method 600 involves step 618 of rescaling said one or more second portions to a match pre-determined resolution (For eg.: similar to that of existing emoticons) and storing said resized one or more second portions. In an embodiment, the method 600 includes rendering said one more second portions corresponding to each overlay exhibiting a unique expression; fetching said one more second portions; and storing said one more second portions in a user device. In an embodiment, the method 600 includes identifying if the first portion or second portion fits appropriately in the view portion and performing the image processing operation in case the first portion or second portion do not fit therein. In another embodiment, the method 600 includes: resizing said rendering said one more second portions corresponding to each overlay exhibiting a unique expression; fetching said one more second portions; and storing said one more second portions in a user device. In an embodiment, the method 600 includes determining the best-suited position for an asset overlaying template by using the centroid of another feature as a reference.

In another embodiment, the one more second portions are configured to be used in a messaging session. In another implementation, the one more second portions may be used an image and may be shared with other users. In another embodiment, the user may be configured to add text to said second portions. In another embodiment, the method 600 includes ascertaining if said human head is included in said image selected by the user fits substantially into a predetermined view portion. In another embodiment, the method 600 associating a sound with each second portion based on the associated expression.

In another embodiment, the method 600 includes training a set of CNNs for detecting spectacles/glasses on said human face. A combination of CNNs trained on a dataset of people wearing eye accessories will be used to determine if the human face in said image is wearing glasses, whether or not the lenses are opaque and the size and thickness of the frame. Basis this analysis the eye and the eyebrow' overlay templates may be suitably configured.

In another embodiment, the method 600 includes training a set of CN Ns for facial hair detection. A combination of CNNs trained on a dataset of number of users having facial hair - a beard or a moustache will be used. Basis the analysis of the facial hair pattern via neural networks, the lip and the laugh-lines templates will be configured such that such that they don't appear atop the facial hair.

In an embodiment, the pre-stored CNNs modified or trained in real time based on the modifications/ changes to be made. The CNNs may be trained simultaneously as and when the other steps of the method are being performed. In an embodiment, the method may include receiving an input from the user in respect of face shape, age, skin tone, ethnicity, hair style, glasses style, etc. In another implementation, the user may be provided with an option to select one of the options provided in respect of aforesaid face shape, age, skin tone, ethnicity, hair style, glasses style, etc.

In another embodiment, the user may be provided with an option to do modifications to the created second portions based on its choice. The modifications may include, but not limited to, changing / adjusting the skin tone, overlaying any wearable accessory such as glasses onto said second portion (emoticon). In another embodiment, the method 600 includes storing the said second portions on a user device or on cloud.

In another embodiment, the method 600 includes creating one or more second portion based on the expression exhibited by pre-defined standard emoticons. For instance, a personalized second portion may be created by analyzing the expression exhibited by a standard emoticon selected by the user and identifying an overlay based on the identified expression and creating the second portions accordingly.

Referring to Figure 7, a block diagram of system 700 for creating multiple expression emoticons relating to a user is provided. The system 700 includes a receiving unit 702 for receiving a selection of an image from a user. An ascertaining unit 704 for ascertaining if said image includes a human head including at least a face at anterior part of said human head. An image processing unit 706 is provided for performing an image processing operation. The image processing operation includes identifying a first portion in said image pertaining to said human head and a second portion in said image excluding said human head; and extracting said first portion from said image. The system 700 further includes morphological transformation processor 708 for performing a morphological transformation on said first portion. A CNN based analyzer 710, which in operational interconnection with said image processing unit 706, identifies key asset/ feature Region of interest (ROIs) and replace pixel portions corresponding to ROIs with the pixel portions of neighborhood of said ROIs. The CNN based analyzer 710 is further configured for analyzing said first portion to determine one or more parameters including age, ethnicity and skin color. Once the morphological transformation is performed, the ROIS are filled and the one or more parameters are identified, a controlling unit 712 identifies a set overlays based on determined said one or more parameters, wherein each overlay in identified set exhibits a unique expression. Thereafter, the image processing unit 706 overlays each overlay exhibiting a unique expression onto said first portion obtained after morphological transformation and creates one more second portions corresponding to each overlay exhibiting a unique expression. An image rescaling unit 714 is further provided for rescaling said one or more second portions to match a pre determined resolution.

Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any component(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature of the invention.

Claims

I claim:

1. A method for generating an emoticon corresponding to at least one emotional expression of a user, the method comprising:

segmenting an image of the user into a plurality of segments, wherein the plurality of segments comprising a first set of segments corresponding to a presence of user in a first section of the image and a remaining set of segments comprising a plurality of objects excluding the user in a remaining section of the image;

ascertaining a presence of a head of the user, at least an anterior portion of the head of the user and a face in the first section of the image;

identifying a first portion in the first section and a second portion in the first section, wherein the first portion comprises the head and the face of the user and the second portion comprises remaining body of the user;

processing the first portion and the second portion of the first section to determine a plurality of parameters corresponding to at least one of a gender, an age and an ethnicity of the user;

determining at least one overlay template from a plurality of overlay templates based on the determined plurality of parameters; wherein each of the plurality of overlay templates comprises at least one template of a fictional characteristic of the face corresponding to at least one emotion of the user;

defining a plurality of region of interests within the first portion of the image, wherein each of the plurality of region of interest comprises at least one real characteristic of the face of the user;

obliterating the at least one real characteristic of the face of the user within each of the plurality of region of interests; and

overlaying the at least one template of the fictional characteristic of the face on the obliterated at least one real characteristic within each of the plurality of region of interests to generate the emoticon corresponding to the at least one emotional expression of the user.

2. The method as claimed in claim 1, wherein segmenting an image of the user into a plurality of segments comprises:

accessing a first-type of convolutional neural network trained to segment humans within image to identify the first section and the remaining section of the image;

wherein identifying a first portion in the first section and a second portion in the first section comprises:

accessing a second-type of convolutional neural network trained to segment human head from a body of the humans to identify the first and second portion in the first section of the image; and

wherein processing the first portion and the second portion of the first section to determine a plurality of parameters comprises:

accessing at least a third type of convolutional neural network to classify the user into at least one gender category, an age category and an ethnicity category.

3. The method as claimed in claim 1, wherein processing the first portion and the second portion of the first section to determine a plurality of parameters comprises: accessing at least a third type of convolutional neural network to classify the user into at least one gender category, an age category and an ethnicity category.

4. The method as claimed in claim 1, wherein the at least one template of the fictional characteristic of the face comprises at least one fictional lip, a fictional eye, a fictional nose, a fictional head gear and a combination thereof.

5. The method as claimed in claim 1, wherein obliterating the at least one real characteristic of the face of the user within each of the plurality of region of interests comprises:

selecting a color of a neighboring region of the each of the plurality of region of interests respectively; and

obliterating the at least one real characteristic of the face of the user with the selected color of the neighboring region of each of the plurality of region of interests respectively.

6. The method as claimed in claim 1, wherein obliterating the at least one real characteristic of the face of the user within each of the plurality of region of interests comprises: a. creating a first mask of the image of the user by changing RGB value corresponding to image of the user to RGB (0,0,0);

b. creating a second mask pertaining to face portion of the user, wherein said creating of second mask includes:

i. identifying plurality of d!ib points pertaining to face portion of the user in the image of the user;

ii. creating an intermediary mask by changing RGB value corresponding to identified face portion of the user to RGB (255,255,255); and

iii. overlaying intermediary mask on first mask to create said second mask;

c. creating a third mask pertaining to head portion of the user, wherein said creating of third mask includes:

ii. overlaying intermediary elliptical mask on first mask to create said third mask;

d. creating a fourth mask by combining said second mask and third mask; and e. creating a fifth mask pertaining to facial skin region of the user, wherein said creating of fifth mask includes:

i. performing a Graph cut on image of the user basis the fourth mask; and

ii. obtaining a segmented image pertaining to facial skin region of the user.

7. The method as claimed in claim 6 further comprising:

a. identifying Dlib points corresponding to left eye and right eye respectively and storing said Dlib points in an array;

b. creating circular portions enclosing left eye and red eye respectively based on said said Dlib points stored in the array; and

c. changing RGB value of said first mask pertaining to area corresponding to said circular· portions to (255,255,255).

8. The method as claimed in claim 7 further comprising:

a. identifying mean RGB value of RGB values corresponding to skin region below Dlib points corresponding to left eye and right eye respectively; b. changing RGB value of said first mask pertaining to area corresponding to said circular portions to identified mean RGB value.

9. The method as claimed in claim 7 further comprising:

c. changing RGB value of said first mask pertaining to area corresponding to said circular portions to (255,255,255).

d. inverting RGB value of said first mask obtained after changing RGB value of said first mask pertaining to area corresponding to said circular portions to (255,255,255):

e. combining said segmented image of the user and said first mask obtained after said inverting.

f. obliterating eye portion of the user by combining first mask obtained changing RGB value of said first mask pertaining to area corresponding to said circular portions to identified mean RGB value and image obtained after combining said segmented image of the user and said first mask obtained after said inverting

10. The method as claimed in claim 6 further comprising:

a. identifying Dlib points corresponding to lips in said image of the user and storing said Dlib points in an array;

b. creating a convex portion enclosed by Dlib points corresponding to lips; and

c. changing RGB value of said first mask pertaining to area corresponding to said convex portion to (255,255,255).

11. The method as claimed in claim 10 further comprising:

g. predicting RGB value corresponding to skin region in proximity to said convex portion;

h. changing RGB value of said first mask pertaining to area corresponding to said convex portion to predicted RGB value;

i. obliterating lip portion of the user by combining first mask obtained changing RGB value of said first mask pertaining to area corresponding to said convex portion to predicted RGB value and image obtained after combining said segmented image of the user and said first mask obtained after said inverting.

12. The method as claim 1 further comprising obliterating the at least one real characteristic of the face of the user selectively, wherein said selective obliterating includes analyzing expression on face of the image using pre-trained CNNs.

13. The method as claim 1, wherein the at least one real characteristic of the face includes right eye, left eye, upper lip, lower lip, nose, left ear and right ear.

14. The method as claimed in claim 1, further comprising:

accessing a convolutional neural network to convert the emoticon into an artwork; and rendering the emoticon with the body of the user on a display interface.

15. A system for generating an emoticon corresponding to at least one emotional expression of a user, the system comprising:

an image segmentor unit configured to segment an image of the user into a plurality of segments, wherein the plurality of segments comprising a first set of segments corresponding to a presence of user in a first section of the image and a remaining set of segments comprising a plurality of objects excluding the user in a remaining section of the image;

an object identifier unit configured to:

ascertain a presence of a head of the user, at least an anterior portion of the head of the user and a face in the first section of the image; and

identify a first portion in the first section and a second portion in the first section, wherein the first portion comprises the head and the face of the user and the second portion comprises remaining body of the user;

a parameter identifier unit configured to process the first portion and the second portion of the first section to determine a plurality of parameters corresponding to at least one of a gender, an age and an ethnicity of the user;

an overlay identifier unit configured to determine at least one overlay template from a plurality of overlay templates based on the determined plurality of parameters, wherein each of the plurality of overlay templates comprises at least one template of a fictional characteristic of the face corresponding to at least one emotion of the user;

a region of interest locator configured to define a plurality of region of interests within the first portion of the image, wherein each of the plurality of region of interest comprises at least one real characteristic of the face of the user;

an obliterator unit configured to obliterate the at least one real characteristic of the face of the user within each of the plurality of region of interests; and

a controller unit configured to overlay the at least one template of the fictional characteristic of the face on the obliterated at least one real characteristic within each of the plurality of region of interests to generate the emoticon corresponding to the at least one emotional expression of the user.

16. A method of creating multiple expression emoticons relating to a user, wherein said method comprising:

receiving a selection of an image from a user;

processing said image to ascertain if said image includes a human head including at least a face at anterior part of said human head;

on positive ascertaining, performing an image processing operation, said image processing operation including: identifying a first portion in said image pertaining to said human head and a second portion in said image excluding said human head;

extracting said first portion from said image;

performing morphological transformation on said first portion;

analyzing said first portion to determine one or more parameters including age, ethnicity and skin colour;

identifying a set of pre-stored overlay based on determined said one or more parameters, wherein each overlay in identified set exhibits a unique expression;

overlaying each overlay exhibiting a unique expression onto said first portion obtained after morphological transformation;

creating one more second portions corresponding to each overlay exhibiting a unique expression: and

rescaling said one or more second portions to match a pre-determined resolution and storing said resized one or more second portions.

17. The method as claimed in claim 16, wherein said human head including at least a face at anterior part of said human head are ascertained using trained Convolutional Neural Networks (CNNs)

18. The method as claimed in claim 16, wherein said set of overlays are created:

analyzing a plurality of sample images including human face using Convolutional Neural Networks to identify said one or more parameters including face shape, age, ethnicity and skin colour pertaining to said sample images;

identifying an expression exhibited by said human face in each of said plurality of sample images;

mapping said expressions exhibited by said human face in each of said plurality of sample images with identified said one or more parameters pertaining to said sample images; and

creating and storing a set of unique overlays based on said mapping.

19. The method as claimed in claim 16 further comprising rendering said one more second portions corresponding to each overlay exhibiting a unique expression; fetching said one more second portions; and storing said one more second portions in a user device.

20. The method as claimed in claim 16 further comprising at least one of:

a. associating a sound with each second portion based on the associated expression; and

b. creating one or more second portion based on the expression exhibited by pre-defined standard emoticons, wherein said creating includes analyzing expression exhibited by a standard emoticon selected by the user and identifying an overlay based on the identified expression.