WO2022121577A1

WO2022121577A1 - Image processing method and apparatus

Info

Publication number: WO2022121577A1
Application number: PCT/CN2021/128769
Authority: WO
Inventors: 刘易周
Original assignee: 北京达佳互联信息技术有限公司
Priority date: 2020-12-10
Filing date: 2021-11-04
Publication date: 2022-06-16
Also published as: CN112509005B; CN112509005A

Abstract

An image processing method and apparatus, an electronic device, and a storage medium. The method comprises: acquiring face video data and using each face image frame in the face video data as an image to be processed; performing face recognition on the image to be processed, so as to obtain key points of a face; extracting a reference key point of a preset region from the key points of the face, and determining an amplification coefficient according to position information of the reference key point; and amplifying the face according to the amplification coefficient, and performing face tracking on the amplified face according to face motion information obtained from face recognition.

Description

Image processing method and device

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based on the Chinese patent application with the application number of 202011434480.4 and the filing date of December 10, 2020, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is incorporated herein by reference.

technical field

The present disclosure relates to the technical field of image processing, and in particular, to an image processing method, an apparatus, an electronic device, and a storage medium.

Background technique

With the popularization of smart terminals and the development of image processing technology, more and more applications can process faces in images to achieve desired effects, such as smart beauty, magic special effects, and face tracking. With the rapid development of software and hardware of intelligent terminals, the application of real-time rendering technology in intelligent terminals has become more and more extensive, and it is also possible to display these effects in real time in intelligent terminals. For example, a deep neural network can be used to perform semantic segmentation on the images captured by the terminal or in real time to obtain image processing results such as key points of the face, the mask map of the hairstyle area, and the mask map of the facial features. The processing results achieve many creative effects, such as the enlargement and dislocation of facial features, face stickers, and makeup.

SUMMARY OF THE INVENTION

According to a first aspect of the embodiments of the present disclosure, an image processing method is provided, including: collecting face video data, and using each frame of face image in the face video data as an image to be processed; performing face recognition to obtain face key points; extracting reference key points in a preset area from the face key points, and determining an amplification factor according to the position information of the reference key points; Amplify, and perform face tracking on the enlarged face according to the face motion information obtained by the face recognition.

In one embodiment, the method further includes: obtaining target key points in the central area of the face from the reference key points; The target position corresponding to the target key point; the face is moved in the direction of the target position relative to the target key point until the target key point reaches the target position.

In one embodiment, the preset correspondence between the position of the target key point and the target position includes: the position of the target key point includes a plurality of value intervals, and each value interval corresponds to a corresponding change The change rate corresponds to the degree to which the target position changes as the position of the target key point changes, and the value interval is determined based on the size of the image capture page in a preset direction.

In one embodiment, the size of the image capture page in a preset direction is divided to obtain a first value space, a second value space and a third value space connected in sequence; the rate of change is as follows The method is determined: the coordinate value in the preset direction is obtained from the position of the target key point; in response to the coordinate value being located in the first value space or the third value space, the rate of change of the target position is a first rate of change; when the coordinate value is located in the second value space, the rate of change of the target position is a second rate of change; the first rate of change is greater than the second rate of change.

In one embodiment, the target key point is a nose tip key point.

In one embodiment, the determining the magnification factor according to the position information of the reference key point includes: determining a first relative distance of a horizontal area and a second relative distance of a vertical area according to the position information of the reference key point; obtaining The three-dimensional angle of the face, which includes a pitch angle and a yaw angle; a first weight corresponding to the first relative distance is determined according to the pitch angle and the yaw angle, and a first weight corresponding to the second relative distance is determined according to the pitch angle and the yaw angle. the second weight corresponding to the relative distance; obtain the sum of the product of the first relative distance and the first weight and the product of the second relative distance and the second weight; determine the magnification factor as the image capture page The ratio of the width of , to the sum of said products.

In one embodiment, the determining a first weight corresponding to the first relative distance and a second weight corresponding to the second relative distance according to the pitch angle and the yaw angle includes: determining The first weight is the ratio of the pitch angle to the sum of the pitch angle and the yaw angle; the second weight is determined to be the sum of the yaw angle and the pitch angle and the yaw angle. and the ratio.

In one embodiment, the preset area is a T-shaped area of the human face, and the T-shaped area of the human face includes the central area of the forehead and the central area of the human face; the reference key points include left eye key points, right eye key points, The key point between the eyebrows and the key point of the tip of the nose; the determining of the first relative distance in the horizontal area and the second relative distance in the vertical area according to the position information of the reference key point includes: determining the first relative distance as the left The distance between the eye key point and the right eye key point; the second relative distance is determined as the distance between the eyebrow key point and the nose tip key point.

According to a second aspect of the embodiments of the present disclosure, there is provided an image processing apparatus, including: an image acquisition module configured to collect face video data, and use each frame of face image in the face video data as an image to be processed a face recognition module, configured to perform face recognition on the to-be-processed image to obtain a face key point; a coefficient determination module, configured to extract a reference key point of a preset area from the face key point, The amplification factor is determined according to the position information of the reference key point; the face processing module is configured to amplify the human face according to the amplification factor, and according to the face motion information obtained by the face recognition face tracking.

In one embodiment, the apparatus further includes: a key point acquisition module configured to acquire target key points in the central area of the face from the reference key points; a position determination module configured to obtain target key points according to a preset target The correspondence between the position of the key point and the target position determines the target position corresponding to the target key point; the moving module is configured to move the face in the direction of the target position relative to the target key point, until the target key point reaches the target position.

In one embodiment, the size of the image capture page in a preset direction is divided to obtain a first value space, a second value space and a third value space connected in sequence; the rate of change is as follows The method is determined: the coordinate value in the preset direction is obtained from the position of the target key point; when the coordinate value is located in the first value space or the third value space, the rate of change of the target position is the first rate of change; in response to the coordinate value being located in the second value space, the rate of change of the target position is the second rate of change; the first rate of change is greater than the second rate of change.

In one embodiment, the target key point is a nose tip key point.

In one embodiment, the coefficient determination module includes: a distance determination unit configured to determine a first relative distance of a horizontal area and a second relative distance of a vertical area according to the position information of the reference key point; an angle acquisition unit , is configured to obtain a three-dimensional angle of the face, the three-dimensional angle of the face includes a pitch angle and a yaw angle; a weight determination unit is configured to determine the first relative distance from the first relative distance according to the pitch angle and the yaw angle the corresponding first weight, and the second weight corresponding to the second relative distance; the computing unit is configured to obtain the product of the first relative distance and the first weight and the second relative distance and the the sum of the products of the second weights; the coefficient determination unit is configured to determine the enlargement coefficient as the ratio of the width of the image capturing page to the sum of the products.

In one embodiment, the weight determination unit is configured to determine the first weight as a ratio of the pitch angle to the sum of the pitch angle and the yaw angle; determine the second weight as the The ratio of the yaw angle to the sum of the pitch angle and the yaw angle.

In one embodiment, the preset area is a T-shaped area of the human face, and the T-shaped area of the human face includes the central area of the forehead and the central area of the human face; the reference key points include left eye key points, right eye key points, a key point between the eyebrows and a key point on the tip of the nose; the distance determining unit is configured to determine the first relative distance as the distance between the left eye key point and the right eye key point; determine the second relative distance The distance is the distance between the key point between the eyebrows and the key point of the tip of the nose.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement any one of the first aspect The image processing method described in the embodiment.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a storage medium, when an instruction in the storage medium is executed by a processor of an electronic device, the electronic device can execute the image described in any one of the embodiments of the first aspect Approach.

According to a fifth aspect of the embodiments of the present disclosure, there is provided a computer program product, the program product comprising a computer program, the computer program being stored in a readable storage medium, and at least one processor of a device from the readable storage medium The computer program is read and executed, so that the device executes the image processing method described in any one of the embodiments of the first aspect.

Each frame of face image in the collected face video data is used as an image to be processed, and face recognition is performed on the image to be processed to obtain face key points. Then, the reference key points of the preset area are extracted from the face key points. The magnification factor is determined based on the position information of the reference key point in the preset area. Finally, the face is enlarged according to the amplification factor, and face tracking is performed on the enlarged face according to the face motion information obtained by face recognition. By using the reference key points in the preset area, the performance bottleneck caused by many key points is avoided; by using multiple key points in the preset area as a reference to determine the magnification factor, it can also ensure that the face is enlarged. precision.

It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

Description of drawings

The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate embodiments consistent with the present disclosure, and together with the description, serve to explain the principles of the present disclosure and do not unduly limit the present disclosure.

FIG. 1 is an application environment diagram of an image processing method according to an exemplary embodiment.

Fig. 2 is a flowchart of an image processing method according to an exemplary embodiment.

Fig. 3 is a flowchart showing a step of determining a target position according to an exemplary embodiment.

Fig. 4 is a schematic diagram of a piecewise function according to an exemplary embodiment.

Fig. 5 is a flowchart showing a step of determining an amplification factor according to an exemplary embodiment.

Fig. 6 is a flowchart of an image processing method according to an exemplary embodiment.

Fig. 7 is a schematic diagram of processing an image according to an exemplary embodiment.

Fig. 8 is a schematic diagram of processing an image according to an exemplary embodiment.

Fig. 9 is a block diagram of an image processing apparatus according to an exemplary embodiment.

Fig. 10 is an internal structure diagram of an electronic device according to an exemplary embodiment.

Detailed ways

In order to make those skilled in the art better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first", "second" and the like in the description and claims of the present disclosure and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.

In order to achieve the purpose of improving the tracking accuracy of the face, or improving the effect of presenting the face, the face in the image can be focused, so that the face can be displayed at the center of the screen with a larger area. In the related art, the following two implementations are often used to focus on the human face:

(1) Obtain all the key point data of the face; transfer all the key point data of the face from the CPU (Central Processing Unit, central processing unit) to the GPU (Graphics Processing Unit, graphics processor); create based on all the key point data A critical area of a virtual box; focuses the critical area of a virtual box to the center of the screen. Although the rendering effect is better in this way, due to the large number of all face key points (up to hundreds), transferring all face key point data from the CPU to the GPU has a certain impact on the performance of the device .

(2) Acquire single face key point data; directly perform image warp (deformation) operation according to the single face key point data, and focus the face to the center of the screen. In this way, because there are fewer face key points, the rendering effect is less stable.

Therefore, in the related art, performance and accuracy are difficult to be compatible.

The image processing method provided by the present disclosure can be applied to the application environment shown in FIG. 1 . The terminal 110 is pre-deployed with a face pose estimation method for face pose estimation, and an image processing logic supporting image processing based on the face recognition result. The face pose estimation method can be a deep learning model-based method, an appearance-based method, a classification-based method, and the like. Face pose estimation methods and image processing logic can be embedded in the application. Applications are not limited to social applications, instant messaging applications, short video applications, and the like. In some embodiments, the terminal 110 collects face video data, and uses each frame of face image in the face video data as an image to be processed. Perform face recognition on the image to be processed to obtain face key points. The reference key points of the preset area are extracted from the face key points, and the amplification factor is determined according to the position information of the reference key points; face for face tracking. The terminal 110 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices.

FIG. 2 is a flowchart of an image processing method according to an exemplary embodiment. As shown in FIG. 2 , the image processing method used in the terminal 110 includes the following steps.

In step S210, face video data is collected, and each frame of face image in the face video data is used as an image to be processed.

Wherein, the face video data can be acquired by the image acquisition device. The image acquisition device may be a device provided in the terminal; it may also be an independent device, such as a camera, a video camera, and the like.

In some embodiments, the client may automatically control the image acquisition device to acquire the user's face video data after receiving the image processing instruction. The image processing instruction may be triggered by the user by clicking on a preset face processing control or the like. The client takes the face image of the current frame in the face video data as the image to be processed, and processes the face image of the current frame in real time according to the content described in steps S220 to S240 while collecting the face video data.

In some embodiments, the objects to be processed may also be other human body parts other than the human face, such as hands, limbs, etc.; and may even be other types, such as animals, buildings, stars, and the like.

In some embodiments, the to-be-processed image may also be a pre-shot still image stored in a local database or a server, or a still image captured in real time.

In step S220, face recognition is performed on the image to be processed to obtain face key points.

Among them, a method based on a deep learning model can be used for face recognition of the image to be processed. The deep learning model can be any model that can be used for face key point recognition, for example, DCNN (Deep Convolutional Network, deep convolutional neural network model) and the like. The face key points can be predefined, and the number includes at least one. During the training process of the deep learning model, each sample image is annotated according to pre-defined key point related information (such as key point ranking, key point positions, etc.). The labeled sample images are used to train the deep learning model to obtain a deep learning model that can output the position information of the key points of the face. In some embodiments, the client inputs the acquired image to be processed into the trained deep learning model in real time to obtain the face key points.

In step S230, the reference key points of the preset area are extracted from the face key points, and the magnification coefficient is determined according to the position information of the reference key points.

The preset area refers to an area that can roughly represent the position of the face in the image to be processed, for example, it may be a contour area, a center area, and the like of the human face. The number of reference key points in the preset area may include multiple, which does not exclude the case of one.

The magnification factor is used to amplify the human face, so as to increase the proportion of the human face in the image capture page, so as to achieve the effect of highlighting the human face.

In some embodiments, the key point related information of the reference key point is pre-configured in the client. After obtaining the face key points output by the deep learning model, the client can extract the reference key points from the face key points according to the key point related information of the reference key points. The client obtains the location information of the reference key point. The size of the preset area is calculated according to the position information of the reference key points. The size of the preset area can be characterized by parameters such as the frame size of the preset area and the distance between key points in the preset area. The amplification factor is obtained by a preset algorithm. The preset algorithm depends on the specific situation. For example, the size of the preset area can be compared with a preset constant to obtain the magnification factor; The matching magnification factor is obtained from the method; or, a correlation function is established by summarizing multiple experiments, and the magnification factor is calculated according to the size of the preset area through the correlation function.

In step S240, the human face is enlarged according to the enlargement coefficient, and face tracking is performed on the enlarged human face according to the face motion information obtained by the face recognition.

Among them, the face motion information is not limited to including the face motion angle, the face motion trajectory, etc., and the face motion information can be output synchronously when the key points of the face are output through the deep learning model.

In some embodiments, after obtaining the magnification factor, the client performs an enlarged display on the face in the to-be-processed image of the current frame by the magnification factor. At the same time, the client obtains the face motion information of the current frame, and controls the enlarged face in the to-be-processed image to move according to the face motion information, so as to achieve the effect of real-time face tracking.

Further, in order to make the image processing function more comprehensive, the process of face enlargement can be displayed through animation special effects.

In the above image processing method, each frame of face image in the collected face video data is used as an image to be processed, and face recognition is performed on the image to be processed to obtain face key points. Then, the reference key points of the preset area are extracted from the face key points. The magnification factor is determined based on the position information of the reference key point in the preset area. Finally, the face is enlarged according to the amplification factor, and face tracking is performed on the enlarged face according to the face motion information obtained by face recognition. By using the reference key points in the preset area, the performance bottleneck caused by many key points is avoided; by using multiple key points in the preset area as a reference to determine the magnification factor, it can also ensure that the face is enlarged. precision.

In an exemplary embodiment, as shown in FIG. 3 , the processing of the human face in the image may further include a process of moving the human face. This can be achieved by the following steps:

In step S310, the target key point in the central area of the face is obtained from the reference key point.

In step S320, the target position corresponding to the target key point is determined according to the preset correspondence between the position of the target key point and the target position.

In step S330, the face is moved in the direction of the target position relative to the target key point until the target key point reaches the target position.

Among them, the target key point refers to the key point that can roughly represent the central position of the face. For example, the preset area can be the facial features area (including the area of the eyes, nose, and mouth), the T-shaped area (including the central area of the forehead and the face of the human face). The central area of the forehead and the nasal canal), the bridge of the nose (the area between the forehead and the quasi-head), etc.; the target key points can be selected correspondingly in the preset area. key points etc.

The target position refers to the position where the target key point is moved, and is used to display the face near the center of the image capture page without affecting the rendering effect of the face.

In some embodiments, keypoint-related information of target keypoints may be pre-configured in the client. After the client obtains the face key points, the target key points are extracted from the face keys according to the key point related information of the target key points. Further, according to the position information of the target key point, the target position corresponding to the target key point of the current frame is determined from the preset correspondence between the position of the target key point and the target position. The client moves the face according to the direction of the target position relative to the target key point until the target key point reaches the target position.

Further, after the face is moved or before the face is moved, the face can also be enlarged by a multiplier of the enlargement factor according to the enlargement factor obtained in the above embodiment, with the target key point as the center.

Further, in order to make the image processing function more comprehensive, the moving and enlarging process of the human face can be displayed through animation special effects.

In this embodiment, the target key points are extracted from the reference key points, and the target position is obtained based on the target key points. On the one hand, the efficiency of face processing can be improved; on the other hand, since the target key points can roughly represent the Therefore, the use of target key points can also improve the accuracy of face processing.

In some embodiments, the preset correspondence between the position of the target key point and the target position includes: the position of the target key point includes a plurality of value intervals, each value interval corresponds to a corresponding change rate, and the change rate is The degree to which the target position changes with the position of the target key point, and the value interval is determined based on the size of the image capture page in the preset direction.

The image collection page may be an image page displayed by the client.

The preset direction can be determined according to the shooting angle of the image to be processed. For example, the shooting angle is vertical screen shooting, and the preset direction may be the horizontal direction when the terminal device is placed in vertical screen. The size in the preset direction can be characterized by the pixel size.

The rate of change is used to measure the degree to which the target position changes as the position of the target keypoints within the image to be processed changes.

In some embodiments, the relative positions of the target key points and the image to be processed include an edge region and a central region of the image to be processed. The value interval of the edge area and the center area can be predefined. The rate of change of the target key point in the value range of the edge area is different from the change rate of the target key point in the value range corresponding to the central area.

In this embodiment, by configuring the corresponding change rate according to the relative position of the target key point and the image to be processed, the image processing process can present a better face tracking effect, and the presentation effect of the face will not be distorted.

In some embodiments, the size of the image capture page in the preset direction is divided to obtain a first value space, a second value space and a third value space connected in sequence; the rate of change is determined in the following manner: from The coordinate value in the preset direction is obtained from the position of the target key point; when the coordinate value is in the first value space or the third value space, the change rate of the target position is the first change rate; when the coordinate value is in the second value space, the change rate is the first change rate; In the value space, the rate of change of the target position is the second rate of change; the first rate of change is greater than the second rate of change.

Wherein, the first change rate and/or the second change rate may be constant or indefinite.

Sequential connection means that the values of the first value space and the second value space are connected end to end, and the values of the second value interval and the third value interval are connected end to end. The first value space and the second value interval can be used to represent the edge area of the image capturing page. The third value interval can be used to represent the central area of the image capture page.

In some embodiments, if the pixel of the image acquisition device is 720*1280px (pixel, pixel), 720 is the horizontal pixel width when the terminal device is placed in a vertical screen, and 1280 is the vertical pixel width when the terminal device is placed in a vertical screen. The default orientation is the horizontal orientation when the terminal device is placed in portrait orientation. Then, the pixel width 720 in the horizontal direction can be divided to obtain three value intervals connected in sequence, for example, divided into three value intervals of 0-200, 200-520, and 520-720.

In some embodiments, the locations of target keypoints may be characterized by pixel coordinates. The client obtains the coordinate value of the preset direction from the position of the target key point. Determine which value space the coordinate value belongs to. If it belongs to the first value interval or the third value interval, it means that the target key point is located in the edge area of the image capture page, and the client obtains the target position according to the first change rate; if the coordinate value is in the second value interval, it means If the target key point is located in the central area of the image capture page, the client obtains the target position according to the second change rate. Since the first change rate is greater than the second change rate, the change range of the target part in the edge area will be greater than that in the center area.

In a specific embodiment, the pixels of the image to be processed are 720*1280px, and 720 is the horizontal pixel width when the terminal device is placed in a vertical screen. The horizontal pixels are divided into three value ranges of 0-200, 200-520, and 520-720 which are connected in sequence. The pixel coordinates of the target position in the horizontal direction can be obtained by the following piecewise function:

Among them, offset and centerPosX are the pixel coordinates of the target position in the horizontal direction; curPixelPosX is the pixel coordinates of the target key point in the horizontal direction.

The target position can also be obtained with reference to the piecewise function shown in FIG. 4 . In order to improve the accuracy of the target position, the pixel coordinates of the target position can also be mapped to the space of 0-1. As shown in Figure 4, in the edge area where the horizontal pixel coordinates are 0 to 200 and 520 to 720, the target position changes rapidly from (0, y) to (0.5, y) with the position of the target key point; In the central area where the horizontal pixel coordinates are from 520 to 720, the target position changes smoothly around (0.5, y) with the position change of the target key point, or even does not change. Among them, y represents the coordinate value of the target key point in the vertical direction. It can be understood that although the rate of change of the piecewise function shown in FIG. 4 (which can be represented by a slope) is a constant, the rate of change of the piecewise function may also be an indefinite number in practical applications.

In this embodiment, by configuring a segment function and obtaining the target position according to the segment function, the efficiency of image processing can be accelerated, so that the image processing process can present a better face tracking effect, and the presentation of the face will not be affected. The effect is distorted.

In some embodiments, as shown in FIG. 5 , in step S230, the amplification factor is determined according to the position information of the reference key point, which can be achieved by the following steps:

In step S510, the first relative distance of the horizontal area and the second relative distance of the vertical area are determined according to the position information of the reference key point.

The horizontal area and the vertical area are obtained by dividing the preset area. A group of key points located at both ends of the horizontal area in the horizontal area are acquired, and the first relative distance of the horizontal area is obtained by calculating the position information of the group of key points. Similarly, for the vertical area, a group of key points located at both ends of the vertical area in the vertical area are obtained, and the second relative distance of the vertical area is obtained by calculating the position information of a group of key points.

In some embodiments, the preset area is a T-shaped area of the human face, and the T-shaped area of the human face includes the central area of the forehead and the central area of the human face; the reference key points include the left eye key point, the right eye key point, the eyebrow key point and the tip of the nose key point. Then, the first relative distance may be the distance between the left eye key point and the right eye key point, which may be calculated according to the position information of the left eye key point and the right eye key point. The second relative distance may be the distance between the key point between the eyebrows and the key point of the tip of the nose, which may be calculated according to the position information of the key point between the eyebrows and the key point of the tip of the nose.

In step S520, a three-dimensional angle of the face is obtained, and the three-dimensional angle of the face includes a pitch angle and a yaw angle.

Among them, the three-dimensional angle of the face can be represented by Euler angles. Euler angle refers to the rotation angle of an object around the three coordinate axes (x, y, z axis) of the coordinate system. Euler angles can be obtained by performing gesture recognition on the key points of the face. In some embodiments, the pose estimation algorithm of Opencv (an open source computer vision library) is used to solve the rotation vector according to the key points of the face, and convert the rotation vector into Euler angles. In this embodiment, the Euler angle includes a pitch angle and a yaw angle. The pitch angle (pitch) represents the angle that the object rotates around the x-axis; the yaw angle (yaw) represents the angle that the object rotates around the y-axis.

In step S530, a first weight corresponding to the first relative distance and a second weight corresponding to the second relative distance are determined according to the pitch angle and the yaw angle.

In some embodiments, for the horizontal region, the ratio of the pitch angle to the sum of the pitch angle and the yaw angle can be used as the first weight A, that is:

For the vertical area, the ratio of the yaw angle to the sum of the pitch angle and the yaw angle can be used as the second weight B, namely:

In step S540, the product of the first relative distance and the first weight and the sum of the product of the second relative distance and the second weight are obtained.

In step S550, the magnification factor is determined as the ratio of the width of the image capturing page to the sum of the products.

In some embodiments, after the client obtains the first relative distance of the horizontal area, the second relative distance of the vertical area, and the first weight corresponding to the first relative distance and the second weight corresponding to the second relative distance, The sum of the product between the first relative distance and the first weight and the product between the second relative distance and the second weight is calculated. The sum of the products can be obtained by the following formula:

scaleHelpValue=ewidth*A+nHeight*B

Among them, scaleHelpValue represents the sum of the products; ewidth represents the first relative distance of the horizontal area; nHeight represents the second relative distance of the vertical area; A represents the first weight; B represents the second weight.

Finally, the ratio of the width of the image capture page in the preset direction to the sum of the products is calculated as an enlargement factor. The magnification factor can be obtained by the following formula:

Among them, scaleValue represents the magnification factor; width represents the width of the image capture page in the preset direction.

In this embodiment, the amplification factor can be quickly obtained according to the preconfigured calculation formula, which avoids the performance bottleneck caused by many key points, and at the same time speeds up the acquisition efficiency of the amplification factor. By combining the three-dimensional angle of the face to obtain the corresponding weights of the horizontal area and the vertical area, and then obtaining a reasonable magnification factor based on the weight, the accuracy of face magnification can be ensured.

Fig. 6 is a flowchart of an image processing method according to some embodiments. In this embodiment, the terminal is a user's handheld device with a built-in image acquisition device, such as a smart phone, a tablet computer, a portable wearable device, and the like. The image to be processed is the face image of the current frame in the face video data collected by the user's handheld device. As shown in Figure 6, the following steps are included.

In step S602, face video data is collected through the user's handheld device.

In step S604, face recognition is performed on the face image of the current frame in the face video data by using a deep learning model to obtain face key points.

In step S606, the three-dimensional angle of the face obtained by estimating the pose of the face according to the key points of the face is obtained. The three-dimensional angle of the face is represented by Euler angles, including pitch angle and yaw angle.

In step S608, the reference key points of the T-shaped area are extracted from the face key points. The reference key points include the left eye key point, the right eye key point, the nose tip key point and the eyebrow tip key point.

In step S610, the first relative distance of the horizontal area is obtained by calculating according to the position information of the left eye key point and the right eye key point. The second relative distance of the vertical area is calculated according to the position information of the key point of the nose tip and the key point of the eyebrow tip.

In step S612, the ratio of the pitch angle to the sum of the pitch angle and the yaw angle is used as the first weight; the ratio of the yaw angle to the sum of the pitch angle and the yaw angle is used as the second weight.

In step S614, the product of the first relative distance and the first weight and the sum of the product of the second relative distance and the second weight are obtained.

In step S616, the magnification factor is determined as the ratio of the width of the image capturing page to the sum of the products.

In step S618, the target position corresponding to the position of the nose tip key point in the current frame is determined from the preset correspondence between the position of the nose tip key point and the target position. The correspondence between the position of the key point of the nose tip and the target position can be represented by a piecewise function, and the specific implementation of the piecewise function can refer to the above-mentioned embodiments, which will not be described in detail here.

In step S620, the human face is moved until the key point of the nose tip reaches the target position. Taking the key point of the nose tip as the center, the face is enlarged by the magnification factor according to the magnification factor.

FIG. 7 is a schematic diagram obtained by processing a human face by the method in this embodiment; FIG. 8 is a schematic diagram obtained by using a processing method of a single key point in the related art. Comparing Fig. 7 and Fig. 8, it can be seen that for the same original image, the processing method of a single key point in the related art may not be stable enough and prone to distortion (the left ear is too stretched). By means of the present disclosure, the operating pressure of the device can be reduced, and a better image processing effect can also be obtained.

It should be understood that although the steps in the above flow charts are displayed in sequence according to the arrows, these steps are not necessarily executed in the sequence indicated by the arrows. Unless explicitly stated herein, there is no strict order in the execution of these steps, and these steps may be performed in other orders. Moreover, at least a part of the steps in the above flow chart may include multiple steps or multiple stages. These steps or stages are not necessarily executed at the same time, but may be executed at different times. The execution sequence of these steps or stages It is also not necessarily performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of a step or phase within the other steps.

FIG. 9 is a block diagram of an image processing apparatus 900 according to some embodiments. Referring to FIG. 9 , the apparatus 900 includes an image acquisition module 901 , a face recognition module 902 , a coefficient determination module 903 and a face processing module 904 .

The image acquisition module 901 is configured to collect face video data, and use each frame of face image in the face video data as an image to be processed; the face recognition module 902 is configured to perform face recognition on the image to be processed, and obtain a human face. face key points; the coefficient determination module 903 is configured to extract the reference key points of the preset area from the face key points, and determine the amplification factor according to the position information of the reference key points; the face processing module 904 is configured to be based on the amplification factor The face is enlarged, and face tracking is performed on the enlarged face according to the face motion information obtained by face recognition.

In some embodiments, the apparatus further includes: a key point acquisition module configured to acquire target key points in the central area of the face from reference key points; a position determination module configured to obtain target key points according to preset target key points The corresponding relationship between the position and the target position is determined, and the target position corresponding to the target key point is determined; the moving module is configured to move the face in the direction of the target position relative to the target key point until the target key point reaches the target position.

In some embodiments, the target keypoint is a nose tip keypoint.

In some embodiments, the coefficient determination module 903 includes: a distance determination unit configured to determine a first relative distance of the horizontal area and a second relative distance of the vertical area according to the position information of the reference key points; an angle acquisition unit configured to In order to obtain the three-dimensional angle of the face, the three-dimensional angle of the face includes a pitch angle and a yaw angle; the weight determination unit is configured to determine the first weight corresponding to the first relative distance according to the pitch angle and the yaw angle, and the second relative distance. a second weight corresponding to the distance; a calculation unit configured to obtain the sum of the product of the first relative distance and the first weight and the sum of the product of the second relative distance and the second weight; the coefficient determination unit configured to determine the magnification factor as the image The ratio of the width of the capture page to the sum of the products.

In some embodiments, the weight determination unit is configured to determine the first weight as the ratio of the pitch angle to the sum of the pitch angle and the yaw angle; and to determine the second weight as the ratio of the yaw angle to the sum of the pitch angle and the yaw angle ratio.

In some embodiments, the preset area is a T-shaped area of the human face, and the T-shaped area of the human face includes the central area of the forehead and the central area of the human face; the reference key points include the left eye key point, the right eye key point, the eyebrow key point and the tip of the nose key point; a distance determination unit configured to determine the first relative distance as the distance between the left eye key point and the right eye key point; and determine the second relative distance as the distance between the eyebrow key point and the nose tip key point.

Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.

Fig. 10 shows a block diagram of a device 1000 for image processing according to some embodiments. For example, device 1000 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, or the like.

10, a device 1000 may include one or more of the following components: a processing component 1002, a memory 1004, a power supply component 1006, a multimedia component 1008, an audio component 1010, an input/output (I/O) interface 1012, a sensor component 1014, and Communication component 1016.

The processing component 1002 generally controls the overall operation of the device 1000, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component 1002 can include one or more processors 1020 to execute instructions to perform all or some of the steps of the methods described above. Additionally, processing component 1002 may include one or more modules that facilitate interaction between processing component 1002 and other components. For example, processing component 1002 may include a multimedia module to facilitate interaction between multimedia component 1008 and processing component 1002.

Memory 1004 is configured to store various types of data to support operation at device 1000 . Examples of such data include instructions for any application or method operating on device 1000, contact data, phonebook data, messages, pictures, videos, and the like. Memory 1004 may be implemented by any type of volatile or non-volatile storage device or combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable programmable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.

Power supply assembly 1006 provides power to various components of device 1000 . Power supply components 1006 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to device 1000 .

Multimedia component 1008 includes a screen that provides an output interface between the device 1000 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 1008 includes a front-facing camera and/or a rear-facing camera. When the device 1000 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.

Audio component 1010 is configured to output and/or input audio signals. For example, audio component 1010 includes a microphone (MIC) that is configured to receive external audio signals when device 1000 is in operating modes, such as call mode, recording mode, and voice recognition mode. The received audio signal may be further stored in memory 1004 or transmitted via communication component 1016 . In some embodiments, audio component 1010 also includes a speaker for outputting audio signals.

The I/O interface 1012 provides an interface between the processing component 1002 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.

Sensor assembly 1014 includes one or more sensors for providing status assessment of various aspects of device 1000 . For example, the sensor component 1014 can detect the open/closed state of the device 1000, the relative positioning of components, such as the display and keypad of the device 1000, and the sensor component 1014 can also detect a change in the position of the device 1000 or a component of the device 1000 , the presence or absence of user contact with the device 1000 , the device 1000 orientation or acceleration/deceleration and the temperature change of the device 1000 . Sensor assembly 1014 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 1014 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1014 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

Communication component 1016 is configured to facilitate wired or wireless communication between device 1000 and other devices. Device 1000 may access wireless networks based on communication standards, such as WiFi, carrier networks (such as 2G, 3G, 4G, or 5G), or a combination thereof. In some embodiments, the communication component 1016 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In some embodiments, the communication component 1016 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

In some embodiments, device 1000 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gates An array (FPGA), controller, microcontroller, microprocessor, or other electronic component implementation for performing the above method.

In some embodiments, there is also provided a non-transitory computer-readable storage medium including instructions, such as memory 1004 including instructions, executable by the processor 1020 of the device 1000 to perform the method described above. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

Other embodiments of the present disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of what is disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common general knowledge or techniques in the technical field not disclosed by this disclosure . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

An image processing method, comprising:

collecting face video data, and using each frame of face image in the face video data as an image to be processed;

Perform face recognition on the to-be-processed image to obtain face key points;

Extract the reference key points of the preset area from the face key points, and determine the amplification factor according to the position information of the reference key points;

The human face is enlarged according to the enlargement coefficient, and face tracking is performed on the enlarged human face according to the face motion information obtained by the face recognition.
The image processing method according to claim 1, wherein the method further comprises:

Obtain the target key point in the central area of the face from the reference key point;

Determine the target position corresponding to the target key point according to the preset correspondence between the position of the target key point and the target position;

The face is moved in the direction of the target position relative to the target key point until the target key point reaches the target position.
The image processing method according to claim 2, wherein the preset correspondence between the position of the target key point and the target position comprises:

The position of the target key point includes a plurality of value intervals, each of the value intervals corresponds to a corresponding change rate, and the change rate is the degree to which the target position changes with the position change of the target key point, The value interval is determined based on the size of the image capture page in the preset direction.
The image processing method according to claim 3, wherein the size of the image capture page in a preset direction is divided to obtain a first value space, a second value space and a third value space connected in sequence ; the rate of change is determined as follows:

Obtain the coordinate value in the preset direction from the position of the target key point;

In response to the coordinate value being located in the first value space or the third value space, the rate of change of the target position is the first rate of change;

In response to the coordinate value being located in the second value space, the rate of change of the target position is a second rate of change;

The first rate of change is greater than the second rate of change.
The image processing method according to claim 3, wherein the target key point is a nose tip key point.
The image processing method according to claim 1, wherein the determining the amplification factor according to the position information of the reference key point comprises:

Determine the first relative distance of the horizontal area and the second relative distance of the vertical area according to the position information of the reference key point;

obtaining a three-dimensional angle of the face, where the three-dimensional angle of the face includes a pitch angle and a yaw angle;

determining a first weight corresponding to the first relative distance and a second weight corresponding to the second relative distance according to the pitch angle and the yaw angle;

obtaining the sum of the product of the first relative distance and the first weight and the product of the second relative distance and the second weight;

The magnification factor is determined as the ratio of the width of the image capture page to the sum of the products.
The image processing method according to claim 6, wherein the first weight corresponding to the first relative distance is determined according to the pitch angle and the yaw angle, and a weight corresponding to the second relative distance is determined. Second weight, including:

determining that the first weight is the ratio of the pitch angle to the sum of the pitch angle and the yaw angle;

The second weight is determined as a ratio of the yaw angle to the sum of the pitch angle and the yaw angle.
The image processing method according to claim 6, wherein the preset area is a T-shaped area of the human face, and the T-shaped area of the human face includes the central area of the forehead and the central area of the human face; the reference key points include the left eye key point, right eye key point, eyebrow key point and nose tip key point;

The determining of the first relative distance of the horizontal area and the second relative distance of the vertical area according to the position information of the reference key point includes:

determining that the first relative distance is the distance between the left eye key point and the right eye key point;

The second relative distance is determined as the distance between the key point between the eyebrows and the key point of the tip of the nose.
An image processing device, comprising:

an image acquisition module, configured to collect face video data, and use each frame of face image in the face video data as an image to be processed;

a face recognition module, configured to perform face recognition on the to-be-processed image to obtain face key points;

a coefficient determination module, configured to extract a reference key point of a preset area from the face key points, and determine an amplification coefficient according to the position information of the reference key point;

The face processing module is configured to amplify the face according to the amplification factor, and perform face tracking on the enlarged face according to the face motion information obtained by the face recognition.
The image processing apparatus according to claim 9, wherein the apparatus further comprises:

a key point acquisition module, configured to acquire target key points in the central area of the face from the reference key points;

a position determination module, configured to determine the target position corresponding to the target key point according to the preset correspondence between the position of the target key point and the target position;

The moving module is configured to move the face in the direction of the target position relative to the target key point until the target key point reaches the target position.
The image processing apparatus according to claim 10, wherein the preset correspondence between the positions of the target key points and the target positions comprises:

The position of the target key point includes a plurality of value intervals, each of the value intervals corresponds to a corresponding change rate, and the change rate is the degree to which the target position changes with the position change of the target key point, The value interval is determined based on the distance between the position of the target key point and the boundary of the image capture page.
The image processing apparatus of claim 11, wherein the rate of change is determined as follows:

In response to the distance between the target key point and the boundary of the image capture page being less than a threshold, the rate of change of the target position is a first rate of change;

In response to the distance being greater than or equal to the threshold, the rate of change of the target location is a second rate of change;

The first rate of change is greater than the second rate of change.
The image processing apparatus according to claim 11, wherein the target key point is a nose tip key point.
The image processing apparatus according to claim 9, wherein the coefficient determination module comprises:

a distance determination unit, configured to determine a first relative distance of the horizontal area and a second relative distance of the vertical area according to the position information of the reference key point;

an angle obtaining unit, configured to obtain a three-dimensional angle of the face, where the three-dimensional angle of the face includes a pitch angle and a yaw angle;

a weight determination unit configured to determine a first weight corresponding to the first relative distance and a second weight corresponding to the second relative distance according to the pitch angle and the yaw angle;

a calculation unit configured to obtain the sum of the product of the first relative distance and the first weight and the product of the second relative distance and the second weight;

A coefficient determination unit configured to determine the enlargement coefficient as a ratio of the width of the image capture page to the sum of the products.
The image processing apparatus according to claim 14, wherein the weight determination unit is configured to determine the first weight as a ratio of the pitch angle to the sum of the pitch angle and the yaw angle; determine The second weight is a ratio of the yaw angle to the sum of the pitch angle and the yaw angle.
The image processing device according to claim 14, wherein the preset area is a T-shaped area of a human face, and the T-shaped area of the human face includes a central area of the forehead and a central area of the human face; the reference key points include a left eye key point, right eye key point, eyebrow key point and nose tip key point;

The distance determination unit is configured to determine the first relative distance as the distance between the left eye key point and the right eye key point; determine the second relative distance as the between the eyebrow key point and the eyebrow key point. The distance between the nose tip key points.
An electronic device comprising:

processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the following processes:

collecting face video data, and using each frame of face image in the face video data as an image to be processed;

Perform face recognition on the to-be-processed image to obtain face key points;

Extract the reference key points of the preset area from the face key points, and determine the amplification factor according to the position information of the reference key points;

The human face is enlarged according to the enlargement coefficient, and face tracking is performed on the enlarged human face according to the face motion information obtained by the face recognition.
A storage medium, wherein, when the instructions in the storage medium are executed by a processor of an electronic device, the electronic device is enabled to perform the following processing:

collecting face video data, and using each frame of face image in the face video data as an image to be processed;

Perform face recognition on the to-be-processed image to obtain face key points;

Extract the reference key points of the preset area from the face key points, and determine the amplification factor according to the position information of the reference key points;

The human face is enlarged according to the enlargement coefficient, and face tracking is performed on the enlarged human face according to the face motion information obtained by the face recognition.