CN112509005B

CN112509005B - Image processing method, image processing device, electronic equipment and storage medium

Info

Publication number: CN112509005B
Application number: CN202011434480.4A
Authority: CN
Inventors: 刘易周
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-12-10
Filing date: 2020-12-10
Publication date: 2023-01-20
Anticipated expiration: 2040-12-10
Also published as: CN112509005A; WO2022121577A1

Abstract

The present disclosure relates to an image processing method, an apparatus, an electronic device, and a storage medium, the method comprising: acquiring face video data in real time, and taking each frame of face image frame in the face video data as an image to be processed; carrying out face recognition on an image to be processed to obtain face key points; extracting reference key points of a preset area from the key points of the face, and determining an amplification factor according to the position information of the reference key points; and amplifying the face according to the amplification factor, and tracking the amplified face according to the face motion information obtained by face recognition. According to the scheme disclosed by the invention, the problem of performance bottleneck caused by more key points is solved by adopting the reference key points in the preset area; the amplification factor is determined by taking a plurality of key points in the preset area as reference, so that the accuracy of amplifying the face can be ensured.

Description

Image processing method, image processing device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.

Background

With the popularization of intelligent terminals and the development of image processing technologies, more and more application programs can process faces in images to achieve required effects, such as intelligent beauty, magic special effects, face tracking and the like. With the rapid development of software and hardware of the intelligent terminal, the application of the real-time rendering technology to the intelligent terminal becomes wider and wider, and the real-time display of the effects on the intelligent terminal becomes possible. For example, the image captured by the terminal in a static state or in real time may be semantically segmented by the deep neural network to obtain image processing results such as key points of the human face, a hair style region mask map, and mask maps of the positions of five sense organs, and a lot of creative effects such as enlargement and dislocation of five sense organs, and stickers and makeup of the human face may be realized by using the obtained image processing results.

In order to achieve the purpose of improving the face tracking precision or improving the face presenting effect, the face in the image may be focused so that the face is displayed in a larger area at the center of the screen. In the related art, the following two implementation manners are often adopted to perform focusing processing on a face:

(1) Acquiring all key point data of the face; transmitting all face key point data from a Central Processing Unit (CPU) to a Graphics Processing Unit (GPU); creating a critical area of a virtual frame based on all the key point data; the critical area of the virtual frame is focused to the center of the screen. Although the method has a good presentation effect, the transmission of all face key point data from the CPU to the GPU has certain influence on the performance of the equipment due to the fact that the number of all face key points is large (can reach hundreds).

(2) Acquiring single face key point data; and (3) directly performing warp (warping) operation on the image according to single face key point data, and focusing the face to the center of the screen. By adopting the method, because the adopted face key points are less, the stability of the presentation effect is poorer.

Therefore, there is a problem in that performance and accuracy are incompatible with each other in the manner of the related art.

Disclosure of Invention

The present disclosure provides an image processing method, an image processing apparatus, an electronic device, and a storage medium, which at least solve a problem in the related art that performance and accuracy of focusing processing on a face are incompatible. The technical scheme of the disclosure is as follows:

according to a first aspect of an embodiment of the present disclosure, there is provided an image processing method, including:

acquiring face video data in real time, and taking each frame of face image frame in the face video data as an image to be processed;

carrying out face recognition on the image to be processed to obtain face key points;

extracting reference key points of a preset area from the face key points, and determining an amplification factor according to the position information of the reference key points;

and amplifying the face according to the amplification factor, and performing face tracking on the amplified face according to the face motion information obtained by face recognition.

In one embodiment, the method further comprises:

acquiring target key points in the central area of the face from the reference key points;

determining a target position corresponding to a target key point according to a corresponding relation between the position of a preset target key point and the target position;

and moving the face in the direction of the target position relative to the target key point until the target key point reaches the target position.

In one embodiment, the correspondence between the preset target key point position and the target position includes:

the position of the target key point comprises a plurality of value intervals, each value interval corresponds to a corresponding change rate, the change rate is the degree of the target position changing along with the position change of the target key point, and the value intervals are determined based on the size of the image acquisition page in the preset direction.

In one embodiment, the size of the image acquisition page in a preset direction is divided to obtain a first value space, a second value space and a third value space which are sequentially connected; the change rate is the degree of change of the target position along with the position change of the target key point, and comprises the following steps:

obtaining a coordinate value in the preset direction from the position of the target key point;

when the coordinate value is located in a first value space or a third value space, the change rate of the target position is a first change rate;

when the coordinate value is located in a second value space, the change rate of the target position is a second change rate;

the first rate of change is greater than the second rate of change.

In one embodiment, the target keypoints are nose tip keypoints.

In one embodiment, the determining the amplification factor according to the position information of the reference keypoint includes:

determining a first relative distance of a horizontal area and a second relative distance of a vertical area according to the position information of the reference key point;

acquiring a three-dimensional angle of a human face, wherein the three-dimensional angle of the human face comprises a pitch angle and a yaw angle;

determining a first weight corresponding to the first relative distance and a second weight corresponding to the second relative distance according to the pitch angle and the yaw angle;

obtaining a sum of a product of the first relative distance and the first weight and a product of the second relative distance and the second weight;

and determining the amplification factor as the ratio of the width of the image acquisition page to the sum of the products.

In one embodiment, the determining a first weight corresponding to the first relative distance and a second weight corresponding to the second relative distance according to the pitch angle and the yaw angle includes:

determining the first weight as a ratio of the pitch angle to a sum of the pitch angle and the yaw angle;

determining the second weight as a ratio of the yaw angle to a sum of the pitch angle and the yaw angle.

In one embodiment, the preset area is a face T-shaped area, and the face T-shaped area comprises a forehead central area and a face central area; the reference key points comprise a left eye key point, a right eye key point, an glabellar key point and a nose tip key point; the determining a first relative distance of a horizontal area and a second relative distance of a vertical area according to the position information of the reference key point comprises:

determining the first relative distance as the distance between the key point of the left eye and the key point of the right eye;

determining the second relative distance as a distance between the glabellar key point and the nose tip key point.

According to a second aspect of the embodiments of the present disclosure, there is provided an image processing apparatus including:

the image acquisition module is configured to acquire face video data in real time and take each frame of face image frame in the face video data as an image to be processed;

the face recognition module is configured to perform face recognition on the image to be processed to obtain face key points;

the coefficient determining module is configured to extract reference key points of a preset area from the face key points, and determine an amplification coefficient according to position information of the reference key points;

and the face processing module is configured to amplify the face according to the amplification factor and track the face after amplification according to the face motion information obtained by face recognition.

In one embodiment, the apparatus further comprises:

a key point acquisition module configured to perform acquisition of a target key point in a face center region from the reference key points;

the position determining module is configured to determine a target position corresponding to a preset target key point according to the corresponding relation between the position of the target key point and the target position;

and the moving module is configured to move the face in the direction of the target position relative to the target key point until the target key point reaches the target position.

the first rate of change is greater than the second rate of change.

In one embodiment, the target keypoints are nose tip keypoints.

In one embodiment, the coefficient determining module includes:

a distance determination unit configured to perform determining a first relative distance of a horizontal area and a second relative distance of a vertical area from the position information of the reference keypoint;

an angle acquisition unit configured to perform acquisition of a three-dimensional angle of a human face, the three-dimensional angle of the human face including a pitch angle and a yaw angle;

a weight determination unit configured to perform determination of a first weight corresponding to the first relative distance and a second weight corresponding to the second relative distance from the pitch angle and the yaw angle;

a calculation unit configured to perform obtaining a sum of a product of the first relative distance and the first weight and a product of the second relative distance and the second weight;

a coefficient determination unit configured to perform determination that the magnification coefficient is a ratio of a width of an image acquisition page to a sum of the products.

In one embodiment, the weight determination unit is configured to perform determining the first weight as a ratio of the pitch angle to a sum of the pitch angle and the yaw angle; determining the second weight as a ratio of the yaw angle to a sum of the pitch angle and the yaw angle.

In one embodiment, the preset area is a face T-shaped area, and the face T-shaped area comprises a forehead central area and a face central area; the reference key points comprise a left eye key point, a right eye key point, an glabellar key point and a nose tip key point;

the distance determination unit is configured to perform determination that the first relative distance is a distance between the left-eye key point and the right-eye key point; determining the second relative distance as a distance between the glabellar key point and the nose tip key point.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to execute the instructions to implement the image processing method as described in any embodiment of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the image processing method described in any one of the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program stored in a readable storage medium, from which at least one processor of a device reads and executes the computer program, causing the device to perform the image processing method described in any one of the embodiments of the first aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

and taking each frame of face image frame in the face video data acquired in real time as an image to be processed, and performing face recognition on the image to be processed to obtain face key points. Then, extracting reference key points of a preset area from the face key points. The magnification factor is determined based on the position information of the reference keypoint in the preset region. And finally, amplifying the face according to the amplification factor, and carrying out face tracking on the amplified face according to the face motion information obtained by face recognition. By adopting the reference key points in the preset area, the problem of performance bottleneck caused by more key points is solved; the amplification factor is determined by taking a plurality of key points in the preset area as reference, so that the accuracy of amplifying the face can be ensured.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a diagram illustrating an application environment of an image processing method according to an exemplary embodiment.

FIG. 2 is a flow diagram illustrating an image processing method according to an exemplary embodiment.

FIG. 3 is a flowchart illustrating a step of determining a target location according to an exemplary embodiment.

FIG. 4 is a diagram illustrating a segmentation function in accordance with an exemplary embodiment.

FIG. 5 is a flowchart illustrating a step of determining a magnification factor in accordance with an exemplary embodiment.

FIG. 6 is a flowchart illustrating an image processing method according to an exemplary embodiment.

FIG. 7 is a schematic diagram illustrating processing an image according to an exemplary embodiment.

FIG. 8 is a diagram illustrating processing of an image according to an exemplary embodiment.

Fig. 9 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment.

Fig. 10 is an internal block diagram of an electronic device shown in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in other sequences than those illustrated or described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the disclosure, as detailed in the appended claims.

The image processing method provided by the present disclosure can be applied to the application environment shown in fig. 1. The terminal 110 is pre-deployed with a face pose estimation method for face pose estimation and an image processing logic for supporting image processing based on a face recognition result. The face pose estimation method can be a deep learning model-based method, an appearance-based method, a classification-based method, and the like. The face pose estimation method and image processing logic may be embedded in an application. The application is not limited to social applications, instant messaging applications, short video applications, etc. Specifically, the terminal 110 collects face video data in real time, and takes each frame of face image frame in the face video data as an image to be processed. And carrying out face recognition on the image to be processed to obtain face key points. Extracting reference key points of a preset area from the key points of the face, and determining an amplification factor according to the position information of the reference key points; and amplifying the face according to the amplification factor, and tracking the amplified face according to the face motion information obtained by face recognition. Among others, the terminal 110 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.

Fig. 2 is a flowchart illustrating an image processing method according to an exemplary embodiment, which is used in the terminal 110 as illustrated in fig. 2, and includes the following steps.

In step S210, the face video data is collected in real time, and each frame of the face image frame in the face video data is used as an image to be processed.

The face video data can be acquired through the image acquisition device. The image acquisition device may be a device provided in the terminal; or may be a stand-alone device such as a camera, video camera, etc.

Specifically, the client may automatically control the image capturing device to capture the face video data of the user after receiving the image processing instruction. The image processing instruction can be triggered by a user by clicking a preset face processing control and the like. The client takes the current frame of the human face image frame in the human face video data as an image to be processed, and processes the current frame of the human face image frame in real time according to the contents from the step S220 to the step S240 while acquiring the human face video data.

In some possible embodiments, the object to be treated may also be a part of another human body other than a human face, for example, a hand, a limb, or the like; even other species are possible, such as animals, buildings, stars, etc.

In some possible embodiments, the image to be processed may also be a still image taken in advance, saved in a local database or server, or a still image taken in real time.

In step S220, face recognition is performed on the image to be processed to obtain face key points.

The face recognition of the image to be processed can adopt a method based on a deep learning model. The Deep learning model may be any model that can be used for face keypoint recognition, such as a Deep Convolutional Neural Network (DCNN) model. The face key points may be predefined, and the number includes at least one. In the training process of the deep learning model, labeling each sample image according to predefined key point related information (such as key point sequencing, key point positions and the like). And training the deep learning model by using the labeled sample image to obtain the deep learning model capable of outputting the position information of the key points of the human face. Specifically, the client inputs the acquired image to be processed to the trained deep learning model in real time to obtain the key points of the human face.

In step S230, reference key points of a preset region are extracted from the key points of the face, and an amplification factor is determined according to the position information of the reference key points.

The preset region refers to a region capable of approximately representing the position of the face in the image to be processed, and may be, for example, a contour region, a central region, and the like of the face. The number of reference key points in the preset area may include a plurality, and the plurality does not exclude one.

The amplification factor is used for amplifying the face so as to increase the proportion of the face part in the image acquisition page, thereby achieving the effect of highlighting the face.

Specifically, the keypoint related information referring to the keypoints is configured in the client in advance. After the face key points output by the deep learning model are obtained, the client can extract the reference key points from the face key points according to the key point related information of the reference key points. The client acquires the position information of the reference key point. And calculating the size of the preset area according to the position information of the reference key point. The size of the preset area can be represented by parameters such as the size of a frame of the preset area and the distance between key points in the preset area. And obtaining the amplification factor through a preset algorithm. The preset algorithm is determined according to the specific situation, for example, the size of the preset area may be compared with a preset constant to obtain an amplification factor; or, pre-establishing a corresponding relation between the size of the preset area and the amplification factor, and acquiring the matched amplification factor from the corresponding relation; or, a correlation function is established through multiple experimental summaries, and the amplification factor is obtained through calculation of the correlation function according to the size of the preset area.

In step S240, the face is amplified according to the amplification factor, and face tracking is performed on the amplified face according to the face motion information obtained by face recognition.

The face motion information is not limited to face motion angles, face motion tracks and the like, and when the key points of the face can be output through the deep learning model, the face motion information is synchronously output.

Specifically, after obtaining the amplification factor, the client performs amplification factor-multiplied display on the face in the current frame image to be processed. Meanwhile, the client acquires the face motion information of the current frame, and controls the amplified face in the image to be processed to move according to the face motion information, so that the real-time face tracking effect is achieved.

Further, in order to enable the image processing function to be more comprehensive, the process of amplifying the human face can be displayed through the animation special effect.

In the image processing method, each frame of face image frame in the face video data collected in real time is used as an image to be processed, and face recognition is carried out on the image to be processed to obtain face key points. Then, extracting reference key points of a preset area from the face key points. The magnification factor is determined based on the position information of the reference keypoint in the preset region. And finally, amplifying the face according to the amplification factor, and carrying out face tracking on the amplified face according to the face motion information obtained by face recognition. By adopting the reference key points in the preset area, the problem of performance bottleneck caused by more key points is solved; the amplification factor is determined by taking a plurality of key points in the preset area as a reference, so that the precision of the amplification processing of the human face can be ensured.

In an exemplary embodiment, as shown in fig. 3, processing the face in the image may further include a process of moving the face. The method can be realized by the following steps:

in step S310, target keypoints in the center region of the face are acquired from the reference keypoints.

In step S320, a target position corresponding to the target key point is determined according to a preset correspondence between the position of the target key point and the target position.

In step S330, the face is moved in the direction of the target position relative to the target key point until the target key point reaches the target position.

The target key points refer to key points capable of approximately representing the center position of the face, for example, the preset region may be a five sense organ region (including regions of eyes, nose and mouth), a T-shaped region (including a central region of the forehead and a central region of the face, i.e., a region formed by the forehead and a nasal tube), a nose bridge region (a region from the forehead to the forehead), and the like; the target key points can correspondingly select the tip key points, the alar key points, the upper lip key points and the like in the preset area.

The target position is the position of the target key point after moving, and is used for enabling the face to be displayed near the center position of the image acquisition page under the condition that the face presenting effect is not influenced.

Specifically, the key point related information of the target key point may be pre-configured in the client. After the client side obtains the face key points, extracting the target key points from the face key points according to the key point related information of the target key points. And determining a target position corresponding to the target key point of the current frame from the preset corresponding relation between the position of the target key point and the target position according to the position information of the target key point. And the client moves the face according to the direction of the target position relative to the target key point until the target key point reaches the target position.

Further, after the face is moved or before the face is moved, the face may be amplified by the magnification factor times according to the amplification factor obtained in the above embodiment, with the target key point as the center.

Further, in order to enable the image processing function to be more comprehensive, the moving and amplifying process of the human face can be displayed through the animation special effect.

In the embodiment, the target key points are extracted from the reference key points, and the target positions are obtained by taking the target key points as the reference, so that on one hand, the efficiency of face processing can be improved; on the other hand, the target key points are points capable of roughly representing the center position of the face, so that the accuracy of face processing can be improved by adopting the target key points.

In an exemplary embodiment, the preset correspondence between the positions of the target key points and the target positions includes: the position of the target key point comprises a plurality of value intervals, each value interval corresponds to a corresponding change rate, the change rate is the degree of the target position changing along with the position change of the target key point, and the value intervals are determined based on the size of the image acquisition page in the preset direction.

The image acquisition page can be an image page displayed by the client.

The preset direction may be determined according to a photographing angle of the image to be processed. For example, the shooting angle is shooting for erecting the screen, and the preset direction can be the horizontal direction when the terminal device erects the screen for placing. The size in the preset direction may be characterized by the pixel size.

The rate of change is used to measure the degree to which the target location changes with the location of the target keypoint within the image to be processed.

Specifically, the relative positions of the target keypoints and the image to be processed include edge regions and center regions located in the image to be processed. The value intervals of the edge region and the central region may be predefined. The change rate of the target key point in the value range of the edge area is different from the change rate of the target key point in the value range corresponding to the central area.

In this embodiment, the corresponding change rate is configured according to the relative position of the target key point and the image to be processed, so that a better face tracking effect can be presented in the image processing process, and the presentation effect of the face is not distorted.

In an exemplary embodiment, the size of an image acquisition page in a preset direction is divided to obtain a first value space, a second value space and a third value space which are sequentially connected; the change rate is the degree of change of the target position along with the position change of the target key point, and comprises the following steps: acquiring coordinate values in a preset direction from the positions of the target key points; when the coordinate value is located in the first value space or the third value space, the change rate of the target position is a first change rate; when the coordinate value is located in the second value space, the change rate of the target position is a second change rate; the first rate of change is greater than the second rate of change.

The first rate of change and/or the second rate of change may be constant or indefinite.

The sequential connection means that the values of the first value space and the second value space are connected end to end, and the values of the second value interval and the third value interval are connected end to end. The first value space and the second value interval can be used for representing the edge area of the image acquisition page. The third value interval may be used to represent a central region of the image acquisition page.

Illustratively, if the pixels of the image capturing device are 720 × 1280px (pixels), 720 is the horizontal pixel width when the terminal device is placed in a vertical screen, and 1280 is the vertical pixel width when the terminal device is placed in a vertical screen. The preset direction is the horizontal direction when the terminal device is placed in a vertical screen mode. Then the pixel width 720 in the horizontal direction can be divided into three consecutive value ranges, for example, into three value ranges of 0 to 200, 200 to 520, and 520 to 720.

In particular, the location of the target keypoints can be characterized by pixel coordinates. And the client acquires the coordinate value in the preset direction from the position of the target key point. And judging which value space the coordinate value belongs to. If the target key point belongs to the first value area or the third value area and the target key point is located in the edge area of the image acquisition page, the client obtains a target position according to the first change rate; and if the coordinate value is located in the second value range and indicates that the target key point is located in the central area of the image acquisition page, the client side obtains the target position according to the second change rate. Since the first rate of change is greater than the second rate of change, the target site will change more in the edge region than in the center region.

In one specific embodiment, the pixels of the image to be processed are 720 x 1280px, and 720 is the horizontal pixel width when the terminal device is placed in a vertical screen. The horizontal pixel money is divided into three value intervals of 0-200, 200-520 and 520-720 which are connected in sequence. The pixel coordinates of the target position in the horizontal direction can be obtained by the following piecewise function:

wherein, offset and centrorposx are pixel coordinates of the target position in the horizontal direction; currPixelPosX is the pixel coordinate of the target keypoint in the horizontal direction.

The target position may also be obtained with reference to the piecewise function shown in fig. 4. To improve the accuracy of the target location, the pixel coordinates of the target location may also be mapped to a space of 0-1. As shown in fig. 4, in the edge regions where the horizontal pixel coordinates are 0 to 200 and 520 to 720, the position of the target changes rapidly from (0, y) to (0.5, y) with the position of the target key point; in the central area where the horizontal pixel coordinates are 520 to 720, the position of the target position changes gently with the position change of the target key point in the vicinity of (0.5, y), even without changing. Wherein y represents a coordinate value of the target key point in the vertical direction. It is to be understood that, although the change rate (which can be represented by a slope) in the piecewise function shown in fig. 4 is constant, the change rate of the piecewise function may also be indefinite in practical applications.

In this embodiment, by configuring the piecewise function and obtaining the target position according to the piecewise function, the efficiency of image processing can be increased, so that the image processing process can present a better face tracking effect, and the face presentation effect is not distorted.

In an exemplary embodiment, as shown in fig. 5, in step S230, determining the amplification factor according to the position information of the reference keypoint may be implemented by:

in step S510, a first relative distance of the horizontal area and a second relative distance of the vertical area are determined according to the location information of the reference keypoint.

The horizontal area and the vertical area are obtained by dividing a preset area. And acquiring a group of key points which are respectively positioned at two ends of the horizontal area in the horizontal area, and calculating to obtain a first relative distance of the horizontal area according to the position information of the group of key points. Similarly, for the vertical area, a group of key points at two ends of the vertical area in the vertical area are obtained, and the second relative distance of the vertical area is calculated according to the position information of the group of key points.

In some possible embodiments, the preset area is a face T-shaped area, and the face T-shaped area includes a forehead central area and a face central area; the reference keypoints include left-eye keypoints, right-eye keypoints, glabellar keypoints, and nose tip keypoints. The first relative distance may be a distance between the left-eye key point and the right-eye key point, and may be calculated according to the position information of the left-eye key point and the right-eye key point. The second relative distance may be a distance between the key points between the eyebrows and the key points of the nose tip, and may be calculated according to the position information of the key points between the eyebrows and the key points of the nose tip.

In step S520, a three-dimensional face angle is obtained, where the three-dimensional face angle includes a pitch angle and a yaw angle.

The three-dimensional angle of the face can be represented by an Euler angle. The euler angle refers to the rotation angle of an object around three coordinate axes (x, y, z axes) of a coordinate system. The euler angles can be obtained by performing gesture recognition on key points of the face. Illustratively, the pose estimation algorithm of Opencv (an open source computer vision library) is adopted to solve a rotation vector according to key points of the human face, and the rotation vector is converted into an euler angle. In the present embodiment, the euler angles include a pitch angle and a yaw angle. Pitch (pitch) represents the angle of rotation of the object about the x-axis; yaw (yaw) represents the angle of rotation of an object about the y-axis.

In step S530, a first weight corresponding to the first relative distance and a second weight corresponding to the second relative distance are determined according to the pitch angle and the yaw angle.

Specifically, for the horizontal region, the ratio of the pitch angle to the sum of the pitch angle and the yaw angle may be taken as a first weight a, that is:

for the vertical region, the ratio of yaw angle to the sum of pitch angle and yaw angle may be taken as a second weight B, namely:

in step S540, the sum of the product of the first relative distance and the first weight and the product of the second relative distance and the second weight is acquired.

In step S550, the enlargement factor is determined as a ratio of the width of the image pickup page to the sum of the products.

Specifically, after obtaining a first relative distance of the horizontal area, a second relative distance of the vertical area, a first weight corresponding to the first relative distance, and a second weight corresponding to the second relative distance, the client calculates a sum of a product between the first relative distance and the first weight and a product between the second relative distance and the second weight. The sum of the products can be obtained by the following formula:

scaleHelpValue＝ewidth*A+nHeight*B

wherein, scalehrpvalue represents the sum of products; ewidth represents a first relative distance of the horizontal region; nHeight represents the second relative distance of the vertical region; a represents a first weight; b represents a second weight.

And finally, calculating the ratio of the width of the image acquisition page in the preset direction to the sum of the products as an amplification factor. The amplification factor can be obtained by the following formula:

wherein, scaleValue represents an amplification factor; and width represents the width of the image acquisition page in the preset direction.

In the embodiment, the amplification factor can be quickly obtained according to the pre-configured calculation formula, so that the problem of performance bottleneck caused by more key points is solved, and the acquisition efficiency of the amplification factor is accelerated. The weights corresponding to the horizontal area and the vertical area are obtained by combining the three-dimensional angles of the face, and then reasonable amplification coefficients are obtained based on the weights, so that the face amplification precision can be ensured.

FIG. 6 is a flow diagram illustrating an image processing method according to an exemplary embodiment. In this embodiment, the terminal is a user handheld device with a built-in image capture device, for example, a smart phone, a tablet computer, a portable wearable device, and the like. The image to be processed is a current frame human face image frame in human face video data collected in real time through user handheld equipment. As shown in fig. 6, the following steps are included.

In step S602, facial video data is captured by a user handheld device.

In step S604, a face is identified in the current face image frame in the face video data through the deep learning model, so as to obtain a face key point.

In step S606, a three-dimensional face angle obtained by performing face pose estimation according to the face key points is obtained. The three-dimensional angle of the human face is characterized by Euler angles, including a pitch angle and a yaw angle.

In step S608, reference key points of the T-shaped region are extracted from the face key points. The reference keypoints include left-eye keypoints, right-eye keypoints, nose tip keypoints, and eyebrow tip keypoints.

In step S610, a first relative distance of the horizontal area is calculated according to the position information of the left-eye key point and the right-eye key point. And calculating to obtain a second relative distance of the vertical area according to the position information of the nose tip key points and the eyebrow tip key points.

In step S612, a ratio of the pitch angle to a sum of the pitch angle and the yaw angle is used as a first weight; and taking the ratio of the yaw angle to the sum of the pitch angle and the yaw angle as a second weight.

In step S614, the sum of the product of the first relative distance and the first weight and the product of the second relative distance and the second weight is acquired.

In step S616, the enlargement factor is determined as a ratio of the width of the image pickup page to the sum of the products.

In step S618, a target position corresponding to the position of the nose tip key point of the current frame is determined from the correspondence between the preset positions of the nose tip key points and the target positions. The correspondence between the positions of the key points of the tip of the nose and the target positions can be represented by a piecewise function, and the specific implementation manner of the piecewise function can refer to the above embodiments, which are not specifically set forth herein.

In step S620, the face of the person is moved until the tip of the nose key point reaches the target position. And taking the key point of the nose tip as a center, and amplifying the human face by times of an amplification factor according to the amplification factor.

Fig. 7 is a schematic view of a face processed in the manner in the present embodiment; fig. 8 is a schematic diagram of a processing manner of a single key point in the related art. Comparing fig. 7 and 8, it can be seen that, for the same original image, the processing manner through a single key point in the related art is not stable enough and is prone to distortion (too much stretching at the left ear). By the mode of the method, the operation pressure of the equipment can be reduced, and a better image processing effect can be obtained.

It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least a part of the steps in the above-mentioned flowcharts may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or the stages is not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a part of the steps or the stages in other steps.

Fig. 9 is a block diagram illustrating an image processing apparatus 900 according to an example embodiment. Referring to fig. 9, the apparatus 900 includes an image acquisition module 901, a face recognition module 902, a coefficient determination module 903, and a face processing module 904.

An image collecting module 901 configured to perform real-time collection of face video data, and take each frame of face image frame in the face video data as an image to be processed; a face recognition module 902 configured to perform face recognition on the image to be processed to obtain face key points; a coefficient determining module 903 configured to extract a reference key point of a preset region from the face key points, and determine an amplification coefficient according to position information of the reference key point; and the face processing module 904 is configured to perform face amplification according to the amplification factor, and perform face tracking on the amplified face according to the face motion information obtained by face recognition.

In an exemplary embodiment, the apparatus further comprises: a key point acquisition module configured to perform acquisition of a target key point in a face center region from reference key points; the position determining module is configured to determine a target position corresponding to a target key point according to the corresponding relation between the position of a preset target key point and the target position; and the moving module is configured to move the face in the direction of the target position relative to the target key point until the target key point reaches the target position.

In an exemplary embodiment, the size of an image acquisition page in a preset direction is divided to obtain a first value space, a second value space and a third value space which are connected in sequence; the change rate is the degree of change of the target position along with the position change of the target key point, and comprises the following steps: obtaining a coordinate value in a preset direction from the position of the target key point; when the coordinate value is located in the first value space or the third value space, the change rate of the target position is a first change rate; when the coordinate value is in the second value space, the change rate of the target position is a second change rate; the first rate of change is greater than the second rate of change.

In an exemplary embodiment, the target keypoints are nose tip keypoints.

In an exemplary embodiment, the coefficient determination module 903 comprises: a distance determination unit configured to perform determining a first relative distance of the horizontal area and a second relative distance of the vertical area from the position information of the reference keypoint; the angle acquisition unit is configured to acquire a human face three-dimensional angle, and the human face three-dimensional angle comprises a pitch angle and a yaw angle; a weight determination unit configured to perform determination of a first weight corresponding to the first relative distance and a second weight corresponding to the second relative distance from the pitch angle and the yaw angle; a calculation unit configured to perform obtaining a sum of a product of the first relative distance and the first weight and a product of the second relative distance and the second weight; a coefficient determination unit configured to perform determination of an amplification factor as a ratio of a width of the image acquisition page to a sum of the products.

In an exemplary embodiment, the weight determination unit is configured to perform determining the first weight as a ratio of a pitch angle to a sum of a pitch angle and a yaw angle; determining a second weight as a ratio of yaw angle to a sum of pitch angle and yaw angle.

In an exemplary embodiment, the preset region is a face T-shaped region, and the face T-shaped region includes a forehead central region and a face central region; the reference key points comprise a left eye key point, a right eye key point, an glabellar key point and a nose tip key point; a distance determination unit configured to perform determination that the first relative distance is a distance between the left-eye key point and the right-eye key point; determining the second relative distance as the distance between the key point between the eyebrows and the key point of the nose tip.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 10 shows a block diagram of an apparatus 1000 for image processing according to an example embodiment. For example, the device 1000 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a gaming console, a tablet device, a medical device, an exercise device, a personal digital assistant, and so forth.

Referring to fig. 10, device 1000 may include one or more of the following components: processing component 1002, memory 1004, power component 1006, multimedia component 1008, audio component 1010, interface to input/output (I/O) 1012, sensor component 1014, and communications component 1016.

The processing component 1002 generally controls the overall operation of the device 1000, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 1002 may include one or more processors 1020 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 1002 may include one or more modules that facilitate interaction between processing component 1002 and other components. For example, the processing component 1002 may include a multimedia module to facilitate interaction between the multimedia component 1008 and the processing component 1002.

The memory 1004 is configured to store various types of data to support operation at the device 1000. Examples of such data include instructions for any application or method operating on device 1000, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1004 may be implemented by any type or combination of volatile or non-volatile storage devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 1006 provides power to the various components of the device 1000. The power components 1006 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 1000.

The multimedia component 1008 includes a screen that provides an output interface between the device 1000 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1008 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 1000 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 1010 is configured to output and/or input audio signals. For example, the audio component 1010 includes a Microphone (MIC) configured to receive external audio signals when the device 1000 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 1004 or transmitted via the communication component 1016. In some embodiments, audio component 1010 also includes a speaker for outputting audio signals.

I/O interface 1012 provides an interface between processing component 1002 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 1014 includes one or more sensors for providing status assessment of various aspects of the device 1000. For example, sensor assembly 1014 may detect the open/closed status of device 1000, the relative positioning of components, such as a display and keypad of device 1000, the change in position of device 1000 or a component of device 1000, the presence or absence of user contact with device 1000, the orientation or acceleration/deceleration of device 1000, and the change in temperature of device 1000. The sensor assembly 1014 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 1014 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1014 can also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

Communications component 1016 is configured to facilitate communications between device 1000 and other devices in a wired or wireless manner. The device 1000 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 1016 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 1016 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the device 1000 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 1004 comprising instructions, executable by the processor 1020 of the device 1000 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image processing method, comprising:

extracting reference key points of a preset region from the face key points according to key point related information of pre-configured reference key points, and determining an amplification factor according to position information of the reference key points, wherein the amplification factor is generated based on the size of the preset region, and the size of the preset region is calculated according to the position information of the reference key points;

moving the face until a target key point in the reference key points reaches a target position in the image to be processed, wherein the target position is used for enabling the face to be displayed near the center position of the image acquisition page;

2. The image processing method according to claim 1, characterized in that the method further comprises:

acquiring the target key points in the central area of the face from the reference key points;

determining the target position corresponding to the target key point according to the corresponding relation between the position of the preset target key point and the target position;

3. The image processing method according to claim 2, wherein the correspondence between the preset target key point position and the target position comprises:

4. The image processing method according to claim 3, wherein the size of the image acquisition page in a preset direction is divided to obtain a first value space, a second value space and a third value space which are sequentially connected; the change rate is the degree of change of the target position along with the position change of the target key point, and comprises the following steps:

the first rate of change is greater than the second rate of change.

5. The image processing method according to claim 3, wherein the target key point is a nose tip key point.

6. The image processing method according to claim 1, wherein determining the magnification factor according to the position information of the reference keypoint comprises:

determining a first relative distance of a horizontal area and a second relative distance of a vertical area according to the position information of the reference key point, wherein the horizontal area and the vertical area are obtained by dividing the preset area;

7. The method according to claim 6, wherein said determining a first weight corresponding to the first relative distance and a second weight corresponding to the second relative distance from the pitch angle and the yaw angle comprises:

8. The image processing method according to claim 6, wherein the preset region is a face T-shaped region, and the face T-shaped region comprises a forehead central region and a face central region; the reference key points comprise a left eye key point, a right eye key point, an glabellar key point and a nose tip key point;

the determining a first relative distance of a horizontal area and a second relative distance of a vertical area according to the position information of the reference key point comprises:

determining that the first relative distance is the distance between the key point of the left eye and the key point of the right eye;

9. An image processing apparatus characterized by comprising:

the coefficient determining module is configured to extract a reference key point of a preset region from the face key points according to key point related information of a preset reference key point, and determine an amplification coefficient according to position information of the reference key point, wherein the amplification coefficient is generated based on the size of the preset region, and the size of the preset region is calculated according to the position information of the reference key point;

the moving module is configured to move the face until a target key point in the reference key points reaches a target position in the image to be processed, wherein the target position is used for enabling the face to be displayed near the center position of the image acquisition page;

and the face processing module is configured to amplify the face according to the amplification factor and perform face tracking on the amplified face according to the face motion information obtained by face recognition.

10. The image processing apparatus according to claim 9, characterized in that the apparatus further comprises:

a key point obtaining module configured to perform obtaining the target key point in a face center region from the reference key points;

the position determining module is configured to determine the target position corresponding to the target key point according to the corresponding relation between the position of a preset target key point and the target position;

the moving module is configured to move the face in the direction of the target position relative to the target key point until the target key point reaches the target position.

11. The image processing apparatus according to claim 10, wherein the correspondence between the positions of the preset target key points and the target positions comprises:

the position of the target key point comprises a plurality of value intervals, each value interval corresponds to a corresponding change rate, the change rate is the degree of the target position changing along with the position change of the target key point, and the value intervals are determined based on the distance between the position of the target key point and the boundary of the image acquisition page.

12. The apparatus according to claim 11, wherein the change rate is a degree to which the target position changes with a change in position of the target key point, and includes:

when the distance between the target key point and the boundary of the image acquisition page is smaller than a threshold value, the change rate of the target position is a first change rate;

when the distance is greater than or equal to the threshold value, the change rate of the target position is a second change rate;

the first rate of change is greater than the second rate of change.

13. The image processing apparatus according to claim 11, wherein the target key point is a nose tip key point.

14. The image processing apparatus according to claim 9, wherein the coefficient determination module includes:

a distance determining unit configured to perform determining a first relative distance of a horizontal area and a second relative distance of a vertical area according to the position information of the reference key point, the horizontal area and the vertical area being obtained by dividing the preset area;

a coefficient determination unit configured to perform determination that the magnification coefficient is a ratio of a width of the image acquisition page to a sum of the products.

15. The image processing apparatus according to claim 14, wherein the weight determination unit is configured to perform determining the first weight as a ratio of the pitch angle to a sum of the pitch angle and the yaw angle; determining the second weight as a ratio of the yaw angle to a sum of the pitch angle and the yaw angle.

16. The image processing apparatus according to claim 14, wherein the preset region is a face T-shaped region, and the face T-shaped region includes a forehead central region and a face central region; the reference key points comprise a left-eye key point, a right-eye key point, an inter-eyebrow key point and a nose tip key point;

the distance determination unit is configured to perform determining that the first relative distance is a distance between the left-eye key point and the right-eye key point; determining the second relative distance as the distance between the glabellar key point and the nose tip key point.

17. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the image processing method of any one of claims 1 to 8.

18. A storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the image processing method of any one of claims 1 to 8.