WO2022001630A1

WO2022001630A1 - Method and system for capturing at least one smart media

Info

Publication number: WO2022001630A1
Application number: PCT/CN2021/099698
Authority: WO
Inventors: Shubham KABRA; ABDUSSAMAD, Md; Kaushal Prakash SHARMA; Nitin SETIA
Original assignee: Guangdong Oppo Mobile Telecommunications Corp., Ltd.
Priority date: 2020-06-29
Filing date: 2021-06-11
Publication date: 2022-01-06

Abstract

A method (300) and a system (100) for capturing at least one smart media. The method (300) encompasses receiving, a camera preview frame to capture at least one media (304). Thereafter the method (300) comprises performing, a media analysis of the camera preview frame to identify at least one face in the camera preview frame (306). Further the method (300) encompasses calculating, a distance between the at least one face and at least one camera sensor (204) (308). The method (300) thereafter leads to calculating, a face normalization level, wherein the face normalization level is based on a comparison of a distance threshold with the distance between the at least one face and at least one camera sensor (204), and one or more face parameters (310). The method (300) further comprises normalizing, the at least one face based on the face normalization level, to capture at least one smart media (312).

Description

METHOD AND SYSTEM FOR CAPTURING AT LEAST ONE SMART MEDIA

FIELD OF THE DISCLOSURE

The present disclosure generally relates to the field of media analysis and more particularly to a system and method for capturing at least one smart media.

BACKGROUND

This section is intended to provide information relating to field of the disclosure and thus any approach or functionality described below should not be assumed to be qualified as prior art merely by its inclusion in this section.

Media capturing, such as image and video capturing, is a common action or phenomenon in the age of social media. Specifically, images shot with a front camera of a device, popularly known as selfies, have recently gained more and more attraction from users. Capturing a selfie of one’s own or in a group has become a trending action on social media.

To help users to capture better media, various solutions have been developed, for example, features such as scene detection, multiple capturing modes, auto focus, auto zoom etc. have been adapted in camera devices to enhance the quality of media being captured. Although these existing technologies have provided various solutions to capture and analyse media, but these currently known solutions also have some limitations where there is a need for improvement.

SUMMARY

This section is provided to introduce certain objects and aspects of the present disclosure in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.

In order to overcome at least a few problems associated with the known solutions, an object of the present disclosure is to provide a novel method and system for capturing at least one smart media. It is another object of the present disclosure to adjust at least one media based on at least one face detected in the at least one media. Also, it is another object of the present disclosure to reduce bulkiness of face/s which are nearer to one or more camera sensors and/or the face/s which are covering a larger area in a camera preview frame. It is one another object of the present disclosure to normalise at least one face that appears to be abnormally larger in the at least one media such that the normalised at least one face covers a decent space in the at least one media instead of a larger space. Yet another object of the present disclosure is to accurately and efficiently normalize an area for one or more faces in one or more media such that the originality of the media is maintained.

In order to achieve the afore-mentioned objectives, the present disclosure provides a method and system for capturing at least one smart media. One aspect of the present disclosure relates to a method of capturing at least one smart media. The method encompasses receiving, at a camera unit, a camera preview frame to capture at least one media. Thereafter the method comprises performing, via a processing unit, a media analysis of the camera preview frame to identify at least one face in the camera preview frame. Further the method encompasses calculating, via the processing unit, a distance between the at least one face and at least one camera sensor. The method thereafter leads to calculating, via a smart face module, a face normalization level, wherein the face normalization level is based on a comparison of a distance threshold with the distance between the at least one face and at least one camera sensor, and one or more face parameters. The method further comprises normalizing, via the smart face module, the at least one face based on the face normalization level, to capture at least one smart media.

Further, another aspect of the present disclosure relates to a system for capturing at least one smart media. The system comprises a camera unit, configured to receive, a camera preview frame to capture at least one media. Further the system comprises a processing unit, configured to perform, a media analysis of the camera preview frame to identify at least one face in the camera preview frame, and calculate, a distance between the at least one face and at least one camera sensor. Thereafter, the system comprises a smart face module, configured to calculate, a face normalization level, wherein the face normalization level is based on a comparison of a distance threshold with the distance between the at least one face and at least one camera sensor, and one or more face parameters. The smart face module is further configured to normalize, the at least one face based on the face normalization level, to capture at least one smart media.

Yet, another aspect of the present disclosure relates to a user equipment for capturing at least one smart media. The user equipment comprises a system, configured to receive, a camera preview frame to capture at least one media. The system thereafter configured to perform, a media analysis of the camera preview frame to identify at least one face in the camera preview frame. Further the system is configured to calculate, a distance between the at least one face and at least one camera sensor. Thereafter the system calculates, a face normalization level, wherein the face normalization level is based on a comparison of a distance threshold with the distance between the at least one face and at least one camera sensor, and one or more face parameters. The system further configured to normalize, the at least one face based on the face normalization level, to capture at least one smart media.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated herein, and constitute a part of this disclosure, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that disclosure of such drawings includes disclosure of electrical components, electronic components or circuitry commonly used to implement such components.

FIG. 1 illustrates a block diagram of the system [100] , for capturing at least one smart media, in accordance with exemplary embodiments of the present disclosure.

FIG. 2 illustrates an exemplary architecture of a camera unit [102] , in accordance with exemplary embodiments of the present disclosure.

FIG. 3 illustrates an exemplary method flow diagram [300] , depicting method of capturing at least one smart media, in accordance with exemplary embodiments of the present disclosure.

FIG. 4 illustrates an exemplary flow diagram [400] , depicting an instance implementation of the process of capturing at least one smart selfie, in accordance with exemplary embodiments of the present disclosure.

The foregoing shall be more apparent from the following more detailed description of the present disclosure.

DETAILED DESCRIPTION

In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address all of the problems discussed above or might address only some of the problems discussed above.

As discussed in the background section, media capturing has now become a crucial feature for users of smart devices who constantly wish to capture moments on their devices. It is therefore important to analyse and capture such media in the best possible way, for which a number of solutions have been developed over a period of time. One of the major problems or limitations of capturing a media with the front camera of a device is that the face/snearer to the camera sensor/device appear to be much larger than the remaining image. There are no efficient solutions to overcome this problem.

Most of the prior art solutions provides various techniques focused on enhancing a face size and providing cosmetic beauty effects based on the face attributes. Also some of the currently known solution requires extracting a face region from an image to apply the face area corrections in the image. The presently known solutions fails to consider distance of one or more faces in the camera preview frame from the one or more camera sensors, as a deciding factor to provide a better and efficient normalization of the face/s which appeared to be abnormal in the camera preview frame.

For instance, one of the currently known solution encompasses enhancing portrait images that are processed in a batch mode. A batch processing method for enhancing an appearance of a face located in a digital image is provided in this currently known solution, where the image is one of a large number of images that are being processed through a batch process. This currently known solution provides a script file including an instruction for a location of each original digital image to acquire an original digital image containing one or more faces. Further this currently known solution detects a location of facial feature points in the one or more faces. Further using the location of the facial feature points, this solution segments the face into different regions to determine one or more facially relevant characteristics of the different regions and to further select one or more enhancement filters. Further this solution executes the enhancement filters on the particular regions, thereby producing an enhanced digital image from the original digital image.

Further, this known prior art solution only takes care of enhancement of digital image from the original digital image based on facial feature points and enhancement filters. This solution fails to check the distance of one or more faces from the camera sensor unit/s. Further, as the prior art solution is only considering enhancement based on the facial feature points, it fails to reduce bulkiness of one or more faces caused due to the lesser distance between the one or more faces and the one or more camera sensors.

Also, one other prior art solution comprises facial image enhancement based on an image data. Following receipt of the image data, this prior art solution isolates a face region data from the image data. Further, luminance data and chrominance data are generated corresponding to the facial area data to further generate a smoothed luminance data. Mapping data is then generated in accordance with at least one non-skin tone region of the image data, via this prior art solution. Further, an enhanced image data is generated according to the smoothed luminance data, chrominance data, and mapping data. The prior art solution is limited to facial image enhancement based on an image data and fails to consider distance of one or more faces in the camera preview frame from the one or more camera sensors, as a deciding factor to provide a better and efficient normalization of the face/s which appeared to be abnormal in the camera preview frame.

Therefore, in view of these and other existing limitations, there is an imperative need to provide a solution to overcome the limitations of prior existing solutions and to provide a more efficient method and system for capturing at least one smart media.

The present disclosure provides a method and system for capturing at least one smart media. The present disclosure proposes a method to normalize at least one face in at least one media. The person who is nearer to a camera lens/sensor generally covers larger area of media/image than the rest of the faces in field of view. The present disclosure comprises reducing bulkiness of the at least one face in a camera preview frame which is very near to camera sensor/s, based on one or more efficient normalization levels. After applying the normalization level/s, the at least one face which appeared to be abnormally larger is then normalized such that it covers a decent space instead of large area as covered before.

The present disclosure encompasses calculating the normalization level based on a comparison of a distance threshold with the distance between the at least one face and at least one camera sensor and one or more face parameters. Therefore, the present disclosure encompasses capturing at least one smart media based on the at least one normalized face in the camera preview frame.

As used herein, the “camera preview frame” comprises at least one real time preview of an event picked up by a camera unit. Further the real-time preview of the event comprises a real time preview of at least one of, at least one scene to be captured, at least one face and/or object to be captured and at least one surrounding or environment associated with the at least one face/object. For instance, camera preview frame may refer to the preview generated by a camera unit, which further can be seen on a display unit of a user equipment/device or a system when the user opens a camera application.

As used herein, a “processing unit” or “processor” includes one or more processors, wherein processor refers to any logic circuitry for processing instructions. A processor may be a general-purpose processor, a special purpose processor, a conventional processor, a digital signal processor, a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits, Field Programmable Gate Array circuits, any other type of integrated circuits, etc. The processor may perform signal coding data processing, input/output processing, and/or any other functionality that enables the working of the system according to the present disclosure. More specifically, the processor or processing unit is a hardware processor.

As used herein, “a user equipment” , “a user device” , “a smart camera device” , “a smart user device” , “an electronic device” may be any electrical, electronic and computing device or equipment, having at least one camera unit installed on it. The user equipment may include, but is not limited to, a mobile phone, smart phone, laptop, a general-purpose computer, desktop, personal digital assistant, tablet computer, wearable device or any other computing device which is capable of capturing and analyzing one or more media. The user equipment contains at least one input means configured to receive an input from a user, a camera unit, a processing unit, a storage unit, a display unit, a smart face module and any other such unit which is obvious to the person skilled in the art and is capable of implementing the features of the present disclosure.

As used herein, a “smart face module” may be an intelligent unit having an analysing, computing, detecting, comparing and normalizing capability, and/or the smart face module may be any other such similar unit configured to implement the features of the present disclosure and is obvious to a person skilled in the art.

As used herein, “media” refers to one or more images, one or more videos, one or more animations, etc. and any other type of media that can be captured using a camera and is obvious to a person skilled in the art.

The present disclosure is further explained in detail below with reference now to the diagrams.

Referring to FIG. 1, an exemplary block diagram of the system [100] , for capturing at least one smart media, in accordance with exemplary embodiments of the present disclosure is shown.

The system [100] comprises, at least one camera unit [102] , at least one processing unit [104] and at least one smart face module [106] . All of these components/units are assumed to be connected to each other unless otherwise indicated below. Also, in Fig. 1 only few units are shown, however the system [100] may comprise multiple such units or the system [100] may comprise any such numbers of the units, obvious to a person skilled in the art or as required to implement the features of the present disclosure.

The system [100] is configured to capture at least one smart media with the help of the interconnection between its components/units.

The at least one camera unit [102] of the system [100] is configured to receive, a camera preview frame to capture at least one media. The camera preview frame may further comprise one or more human-objects/faces. For instance, the camera preview frame comprises at least one real time preview of an event picked up by the camera unit [102] . Further considering an example, if a user accesses a camera unit [102] to capture a group selfie, the camera preview frame in this scenario comprises a real time preview of two or more persons being captured.

The at least one processing unit [104] is connected to the at least one camera unit [102] . The processing unit [104] is configured to perform, a media analysis of the camera preview frame to identify at least one face in the camera preview frame. The processing unit [104] is further configured to perform the media analysis to identify at least one face coordinate of the at least one face in the camera preview frame and at least one face detail of the at least one face in the camera preview frame. The at least one face detail of the at least one face may include but not limited to one or more salient features of the face/s like shapes, sizes and color etc.

Also, the determined face coordinate/s and the determined face detail/s may further provides one or more details relating to area covered by the at least one face in the camera preview frame.

For example, a camera preview frame comprising 2 people is received at the camera unit [102] to capture a media. The processing unit [104] thereafter analyses the received camera preview frame to identify faces of the 2 people present under the camera preview frame. The processing unit [104] further based on the analysis determines at least one face coordinate and at least one face detail of the identified 2 faces in the camera preview frame.

The processing unit [104] is further configured to calculate, a distance between the at least one face and at least one camera sensor [204] . For instance, the distance between each of the face present under camera field of view (FOV) and at least one camera sensor [204] is calculated via the processing unit [104] . The processing unit [104] is further configured to calculate the distance based on the determined/identified at least one face coordinate and at least one face detail of the at least one face present under the camera preview frame. Further considering the above example, the processing unit [104] may consider at least one of, at least one facial plane and at least one facial shape of the 2 identified faces under the camera preview frame to calculate the distance of the 2 identified faces from the one or more camera sensor [204] .

The at least one smart face module [106] , is connected to the at least one processing unit [104] and the at least one camera unit [102] . The smart face module [106] is configured to calculate, a face normalization level, wherein the face normalization level is based on a comparison of a distance threshold with the distance calculated between the at least one face and the at least one camera sensor [204] , and one or more face parameters. The one or more face parameters further comprises at least one of a face coordinate and a face detail. The face detail may include but not limited to one or more salient features of the face/s like shapes, sizes, color and the like. Also in an instance the one or more face parameters may be artificial intelligence based smart parameters, which further are compared to at least one of the at least one face coordinate and the at least one face detail of the at least one face identified in the camera preview frame, to calculate the face normalization level/s.

Also, the smart face module [106] is further configured to calculate the face normalization level in an event the distance between the at least one face and the at least one camera sensor [204] is less than the distance threshold. If in an instance, the distance between the at least one face and the at least one camera sensor [204] is not less than the distance threshold, no normalization level is calculated via the smart face module [106] in such instance.

Furthermore, in an instance the face normalization level calculated for at least one face may be inversely proportional to the calculated distance between the at least one face and the at least one camera sensor [204] .

Further, the distance threshold is determined via the smart face module [106] based on the distance between the at least one face and at least one camera sensor [204] . For instance, if a single face is identified via the processing unit [104] under the camera preview frame, the smart face module [106] further based on a calculated distance of the identified single face from the one or more camera sensors [204] , is further configured to calculate a distance threshold for the identified single face. In an example the calculated distance of the identified single face is further analyzed with respect to entire preview of camera preview frame, via the smart face module [106] to calculate the distance threshold.

Also, the smart face module [106] is further configured to determine the distance threshold based on a median of plurality of distances, wherein the plurality of distances are calculated between a plurality of faces and the at least one camera sensor, in an event the plurality of faces are identified in the camera preview frame.

Further the smart face module [106] is configured to normalize, the at least one face based on the face normalization level, to capture at least one smart media. The at least one face is normalized via the smart face module [106] based on corresponding calculated face normalization level/s.

For example, if 4 faces (i.e. F1, F2, F3 and F4) are identified under the camera preview frame via the processing unit [104] and also the corresponding distances of the 4 identified faces from the one or more camera sensors [204] , are calculated via the processing unit [104] . The smart face module [106] thereafter identifies a distance threshold based on a median of the corresponding distances of the 4 identified faces. Further, the smart face module [106] compares the distance threshold with each of the calculated distances of the 4 identified faces from the camera sensor/s. If say, the comparison indicates that calculated distances of F1 and F3 are less than the distance threshold and the calculated distance of the F2 and F4 are more than the distance threshold. Therefore, the smart face module [106] in said scenario thereafter calculates a normalization level for each of the F1 and F3 based at least on the calculated distances of the F1 and F3 from the camera sensor/s and on one or more face parameters, to normalize both F1 and F3.

Thereafter, processing unit [104] is further configured to process a first set of pixels surrounding the at least one face, the processing being based on the face normalization level. The processing unit [104] is configured to process the first set of pixels surrounding the at least one face to adjust at least one change occurred in the camera preview frame due to the normalization of at least one face, such that the initial camera preview frame is preserved. For instance, the processing unit [104] in order to process the first set of pixels surrounding the at least one face, adjusts the camera preview frame (to an initial camera preview frame similar to the frame prior to normalization) based on the at least one normalization level associated with the at least one normalized face.

FIG. 2 refers to an exemplary architecture of camera unit [102] , in accordance with the exemplary embodiments of the present disclosure.

The camera unit [102] comprises, at least one camera preview frame unit [202] , at least one camera sensor [204] , at least one camera HAL [206] and at least one camera framework [208] . Also, the camera unit [102] may comprise other various subunits such as camera driver etc., but the same are not shown in the Figure 2 for the purpose of clarity. Also, in Figure 2 only few units/sub-units of the camera unit [102] are shown, however the camera unit [102] may comprise multiple such units or the camera unit [102] may comprise any such numbers of the units, obvious to a person skilled in the art, required to implement the features of the present disclosure.

The camera preview frame unit [202] is configured to provide a graphical user interface to a user to provide a preview of at least one camera preview frame. The present disclosure encompasses that the camera preview frame unit [202] is configured to display a camera preview frame on a display unit. The display unit (not shown in Figure) may be a display unit integrated within the system [100] or may be any external display unit connected to the system [100] . Also, the camera preview frame comprises at least one real time preview of an event picked up by the camera unit [102] . Further the real-time preview of the event comprises a real time preview of at least one of, at least one scene to be captured, at least one human-object/object/face to be captured and the surrounding or environment associated with the at least one human-object/human-face. For instance, camera preview frame may refer to the preview generated by the camera unit [102] at the camera preview frame unit [202] that further displayed on the display unit.

The camera sensor [204] , may include various type of camera sensors such as including but not limited to a main camera sensor configured to receive at least one real time event to further generate the camera preview frame, a depth sensor configured to provide various effects, a telephoto sensor configured to provide at least one zooming effect in the camera preview frame and such other similar sensors which are obvious to a person skilled in the art to implement the features of the present disclosure.

The camera HAL [206] is configured to process a real time data based on a received triggering command from at least one of the one or more subunits of, the camera unit [102] and the system [100] .

The camera framework [208] is configured to provide a module to interact at least one of the one or more subunits of the camera unit [102] with the system [100] . The camera framework [208] is also configured to store files for input data, processing and the guiding mechanism.

Referring to Fig. 3, an exemplary method flow diagram [300] , depicting method of capturing at least one smart media, in accordance with exemplary embodiments of the present disclosure, is shown. As shown in Fig. 3, the method begins at step [302] . For instance, the method begins when a user access a camera unit [102] to capture a media.

At step [304] , the method comprises receiving, at a camera unit [102] , a camera preview frame to capture at least one media. The camera preview frame may further comprise one or more human-objects/faces. For instance, the camera preview frame comprises at least one real time preview of an event picked up by one or more camera sensors [204] of the camera unit [102] . In an example, if a user accesses a camera unit [102] to capture his selfie, the camera preview frame in this scenario comprises a real time preview of the user.

Next, at step [306] , the method comprises performing, via a processing unit [104] , a media analysis of the camera preview frame to identify at least one face in the camera preview frame. The method further encompasses performing via the processing unit [104] , the media analysis to identify at least one face coordinate of the at least one face in the camera preview frame and at least one face detail of the at least one face in the camera preview frame. The at least one face detail of the at least one face may include but not limited to one or more salient features of the face/s like shapes, sizes and colour etc. Also, the at least one face coordinate of the at least one face may further indicate at least one facial plane associated with the at least one face. The method may use known facial recognition techniques for identifying the faces.

Further, the identified face coordinate/s and the identified face detail/s of the at least one face identified under the camera preview frame may further provide one or more details relating to area covered by the at least one face in the camera preview frame.

For example, a camera preview frame comprising 5 people is received at the camera unit [102] to capture a media. The method thereafter encompasses analyzing via the processing unit [104] the received camera preview frame to identify faces of the 5 people present under the camera preview frame. The method further comprises determining via the processing unit [104] , at least one face coordinate and at least one face detail of each of the identified 5 faces in the camera preview frame, based on the analysis of camera preview frame.

Next, at step [308] , the method comprises calculating, via the processing unit [104] , a distance between the at least one face and at least one camera sensor [204] . For instance, the method encompasses calculating via the processing unit [104] , the distance of each of the face present under camera field of view (FOV) /camera preview frame from the at least one camera sensor [204] . The method further encompasses calculating via the processing unit [104] , the distance based at least on the determined/identified, at least one face coordinate and at least one face detail of the at least one face present under the camera preview frame. Further, considering the above example, the method encompasses calculating via the processing unit [104] the distance of each of the 5 identified faces from the one or more camera sensor [204] , based on at least one of, at least one facial plane and at least one facial shape of each of the 5 identified faces under the camera preview frame.

Next, at step [310] , the method comprises calculating, via a smart face module [106] , a face normalization level, wherein the face normalization level is based on a comparison of a distance threshold with the distance between the at least one face and at least one camera sensor [204] , and one or more face parameters. The one or more face parameters further comprises at least one of a face coordinate and a face detail. The face detail may include but not limited to one or more salient features of the face/s like shapes, sizes, color and the like. Also, in an instance the one or more face parameters may be artificial intelligence based smart parameters, which further are compared to at least one of the at least one face coordinate and the at least one face detail of the at least one face identified in the camera preview frame, to calculate the face normalization level/s.

Also, the method further encompasses calculating, via the smart face module [106] , the face normalization level based on an event the distance between the at least one face and the at least one camera sensor is less than the distance threshold. If in an instance the distance between the at least one face and the at least one camera sensor [204] is more than the distance threshold, in such instance the method does not further encompasses calculation of the normalization level for the at least one face.

Furthermore, in an instance the face normalization level for at least one face associated with a distance less than the distance threshold, may be inversely proportional to the calculated distance between the at least one face and the at least one camera sensor [204] .

Further, the method encompasses determining the distance threshold via the smart face module [106] , based on the distance between the at least one face and at least one camera sensor [204] . For instance if a single face is identified via the processing unit [104] under the camera preview frame, the method leads to calculating via the smart face module [106] a distance threshold for the identified single face, based on a calculated distance of the identified single face from the one or more camera sensors [204] . In an example the method further comprises analyzing via the smart face module [106] , the calculated distance of the identified single face with respect to entire preview of the camera preview frame, to calculate the distance threshold.

Also, the method encompasses determining via the smart face module [106] , the distance threshold based on taking median of plurality of distances, wherein the plurality of distances are calculated between a plurality of faces and the at least one camera sensor [204] , in an event the plurality of faces are identified in the camera preview frame.

Further, in an example the distance threshold (K) may be as follows:

K = Median (D ₁, D ₂, ……….. D _i) ,

Where:

K = Distance threshold,

D = distance of face/s (1 to i) identified under the camera preview frame from the camera sensor [204] , and

D _i = distance of i ^th face from the camera sensor [204] .

Next, at step [312] , the method comprises normalizing, via the smart face module [106] , the at least one face based on the face normalization level, to capture at least one smart media. The method comprises normalising the at least one face via the smart face module [106] based on corresponding calculated face normalization level/s.

For example, if the method identifies via the processing unit [104] , 3 faces (i.e. T1, T2 and T3) under the camera preview frame and also the method encompasses calculating via the processing unit [104] , corresponding distances of the 3 identified faces from the one or more camera sensors [204] . The method further leads to identifying via the smart face module [106] , a distance threshold based on a median of the corresponding distances of the 3 identified faces. Further the method encompasses comparing via the smart face module [106] , the distance threshold with each of the calculated distances of the 3 identified faces from the camera sensor/s [204] . Also, in an example the comparison further indicates that calculated distance of T2 is less than the distance threshold and the calculated distances of the T1 and T3 are more than the distance threshold. Therefore in said scenario the method further leads to calculating via the smart face module [106] , a normalization level for T1 based at least on the calculated distance of the T1 from the camera sensor/s [204] and on one or more face parameters, to normalize T1.

Also, the method comprising processing via the processing unit [104] , a first set of pixels surrounding the at least one face, the processing being based on the face normalization level. The method encompasses processing via the processing unit [104] , the first set of pixels surrounding the at least one face to adjust at least one change occurred in the camera preview frame due to the normalization of at least one face, and to further preserve the initial camera preview frame similar to that of prior normalization. For instance, the method in order to process the first set of pixels surrounding the at least one face, adjusts the camera preview frame (to an initial camera preview frame similar to prior normalization) via the processing unit [104] , based on the at least one normalization level associated with the at least one normalized face.

After successfully capturing at least one smart media, the method further terminates at step [314] .

Further, one aspect of the present disclosure relates to a user equipment for capturing at least one smart media. The user equipment comprises a system [100] , configured to receive, a camera preview frame to capture at least one media. The system thereafter configured to perform, a media analysis of the camera preview frame to identify at least one face in the camera preview frame. Further, the system is configured to calculate, a distance between the at least one face and at least one camera sensor [204] . The system further configured to calculate, a face normalization level, wherein the face normalization level is based on a comparison of a distance threshold with the distance between the at least one face and at least one camera sensor [204] , and one or more face parameters. Thereafter, system is configured to normalize, the at least one face based on the face normalization level, to capture at least one smart media.

Referring to Fig. 4, an exemplary flow diagram [400] , depicting an instance implementation of the process of capturing at least one smart selfie, in accordance with exemplary embodiments of the present disclosure, is shown. As indicated in Figure 4, the process starts at step [402] .

At step [404] the method encompasses accessing a camera unit [102] via a user, by opening a front camera of a user device, to capture one or more media. Further, the camera unit [102] is configured to receive, a camera preview frame comprising one or more human-objects/faces that are being captured. For instance, when a request to open the front camera received at a camera HAL [206] , the camera HAL [206] opens the camera sensor [204] and starts providing the camera preview frame. The camera preview frame is further shown as a preview on a display.

Next, at step [406] , the method comprises receiving a selfie capturing request at the camera unit [102] . In an instance when a user initiates the selfie capturing request, this request is further passed on to the camera HAL [206] for further processing.

Next, at step [408] , the method comprises identifying via the processing unit [104] , at least one face in the camera preview frame/field of view (FOV) . Also, on the received camera preview frame, all the face co-ordinates along with the salient features like shapes, sizes, colour of faces are identified via the processing unit [104] , based on a media analysis of the received camera preview frame.

Next, at step [410] , the method comprises identifying via the processing unit [104] , a face count of the identified faces. If the face count of the identified faces is more than 0 (zero) the process further leads to step [412] , otherwise the process further leads to step [418] .

Next, step [412] , the method comprises calculating, via the processing unit [104] , a distance of each of the at least one face identified under the camera preview frame from the at least one camera sensor [204] .

Further, at step [414] the method encompasses determining if the distance of the at least one face identified under the camera preview frame from the at least one camera sensor [204] is less than a distance threshold (i.e. K) or not. If the calculated distance of one or more identified faces in the camera preview frame, is less than the distance threshold, the method further leads to step [416] , otherwise the method leads to step [418] .

Next, at step [416] , the method comprises normalizing, via the smart face module [106] , the at least one face associated with calculated distance less the distance threshold, based on the face normalization level, to capture at least one smart media. The face normalization level is calculated via the smart face module [106] and is based on a comparison of the distance threshold with the distance between the at least one face and at least one camera sensor [204] , and one or more face parameters.

Further, the method also comprises processing a first set of pixels surrounding the at least one face, the processing being based on the face normalization level.

Next, at step [418] , the method comprises capturing the at least one smart media based on the normalized at least one face. Also, the method at step [418] encompasses saving the captured at least one smart media at a storage unit.

After successfully capturing at least one smart media, the method further terminates at step [420] .

As evident from the above disclosure, the present solution provides significant technical advancement over the existing solutions by capturing at least one smart media based on at least one face normalization. The present disclosure ensures that when a media is captured, the faces/objects therein are not too large as compared to other faces/objects due to the varying distance from the camera sensor. The disclosure automatically and dynamically normalises the large faces/objects in the preview frame and adjusts the media accordingly.

While considerable emphasis has been placed herein on the disclosed embodiments, it will be appreciated that many embodiments can be made and that many changes can be made to the embodiments without departing from the principles of the present disclosure. These and other changes in the embodiments of the present disclosure will be apparent to those skilled in the art, whereby it is to be understood that the foregoing descriptive matter to be implemented is illustrative and non-limiting.

Claims

A method for capturing at least one smart media, the method comprising:

- receiving, at a camera unit [102] , a camera preview frame to capture at least one media;

- performing, via a processing unit [104] , a media analysis of the camera preview frame to identify at least one face in the camera preview frame;

- calculating, via the processing unit [104] , a distance between the at least one face and at least one camera sensor [204] ;

- calculating, via a smart face module [106] , a face normalization level, wherein the face normalization level is based on:

a comparison of a distance threshold with the distance between the at least one face and at least one camera sensor [204] , and

one or more face parameters;

- normalizing, via the smart face module [106] , the at least one face based on the face normalization level, to capture at least one smart media.
The method as claimed in claim 1, wherein the calculating, via the smart face module [106] , the face normalization level is further based on an event the distance between the at least one face and the at least one camera sensor [204] is less than the distance threshold.
The method as claimed in claim 1, the method further comprising processing a first set of pixels surrounding the at least one face, the processing being based on the face normalization level.
The method as claimed in claim 1, wherein the performing, via the processing unit, the media analysis further comprises identifying at least one face coordinate of the at least one face in the camera preview frame and at least one face detail of the at least one face in the camera preview frame.
The method as claimed in claim 1, the method further comprises determining the distance threshold via the smart face module based on the distance between the at least one face and at least one camera sensor [204] .
The method as claimed in claim 5, the method further comprises determining the distance threshold via the smart face module based on a median of distances between a plurality of faces and the at least one camera sensor [204] , in an event the plurality of faces are identified via the processing unit in the camera preview frame.
The method as claimed in claim 1, wherein the one or more face parameters further comprises at least one of a face coordinate and a face detail.
A system [100] for capturing at least one smart media, the system [100] comprising:

- a camera unit [102] , configured to receive, a camera preview frame to capture at least one media;

- a processing unit [104] , configured to:

perform, a media analysis of the camera preview frame to identify at least one face in the camera preview frame, and

calculate, a distance between the at least one face and at least one camera sensor [204] ;

- a smart face module [106] , configured to:

calculate, a face normalization level, wherein the face normalization level is based on:

a comparison of a distance threshold with the distance between the at least one face and at least one camera sensor [204] , and

one or more face parameters; and

normalize, the at least one face based on the face normalization level, to capture at least one smart media.
The system as claimed in claim 8, wherein the smart face module [106] is further configured to calculate the face normalization level based on an event the distance between the at least one face and the at least one camera sensor [204] is less than the distance threshold.
The system as claimed in claim 8, wherein the processing unit [104] is further configured to process a first set of pixels surrounding the at least one face, the processing being based on the face normalization level.
The system as claimed in claim 8, wherein the processing unit [104] is further configured to perform the media analysis to identify at least one face coordinate of the at least one face in the camera preview frame and at least one face detail of the at least one face in the camera preview frame.
The system as claimed in claim 8, wherein the smart face module [106] is further configured to determine the distance threshold based on the distance between the at least one face and at least one camera sensor [204] .
The system as claimed in claim 12, wherein the smart face module [106] is further configured to determine the distance threshold based on a median of distances between a plurality of faces and the at least one camera sensor [204] , in an event the plurality of faces are identified in the camera preview frame.
The system as claimed in claim 8, wherein the one or more face parameters further comprises at least one of a face coordinate and a face detail.
A user equipment for capturing at least one smart media, the user equipment comprising:

- a system [100] , configured to:

receive, a camera preview frame to capture at least one media,

perform, a media analysis of the camera preview frame to identify at least one face in the camera preview frame,

calculate, a distance between the at least one face and at least one camera sensor [204] ,

calculate, a face normalization level, wherein the face normalization level is based on:

a comparison of a distance threshold with the distance between the at least one face and at least one camera sensor [204] , and

one or more face parameters, and

normalize, the at least one face based on the face normalization level, to capture at least one smart media.