WO2022121577A1 - Procédé et appareil de traitement d'images - Google Patents

Procédé et appareil de traitement d'images Download PDF

Info

Publication number
WO2022121577A1
WO2022121577A1 PCT/CN2021/128769 CN2021128769W WO2022121577A1 WO 2022121577 A1 WO2022121577 A1 WO 2022121577A1 CN 2021128769 W CN2021128769 W CN 2021128769W WO 2022121577 A1 WO2022121577 A1 WO 2022121577A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
key point
target
image
relative distance
Prior art date
Application number
PCT/CN2021/128769
Other languages
English (en)
Chinese (zh)
Inventor
刘易周
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Publication of WO2022121577A1 publication Critical patent/WO2022121577A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular, to an image processing method, an apparatus, an electronic device, and a storage medium.
  • a deep neural network can be used to perform semantic segmentation on the images captured by the terminal or in real time to obtain image processing results such as key points of the face, the mask map of the hairstyle area, and the mask map of the facial features.
  • the processing results achieve many creative effects, such as the enlargement and dislocation of facial features, face stickers, and makeup.
  • an image processing method including: collecting face video data, and using each frame of face image in the face video data as an image to be processed; performing face recognition to obtain face key points; extracting reference key points in a preset area from the face key points, and determining an amplification factor according to the position information of the reference key points; Amplify, and perform face tracking on the enlarged face according to the face motion information obtained by the face recognition.
  • the method further includes: obtaining target key points in the central area of the face from the reference key points; The target position corresponding to the target key point; the face is moved in the direction of the target position relative to the target key point until the target key point reaches the target position.
  • the preset correspondence between the position of the target key point and the target position includes: the position of the target key point includes a plurality of value intervals, and each value interval corresponds to a corresponding change
  • the change rate corresponds to the degree to which the target position changes as the position of the target key point changes, and the value interval is determined based on the size of the image capture page in a preset direction.
  • the size of the image capture page in a preset direction is divided to obtain a first value space, a second value space and a third value space connected in sequence; the rate of change is as follows The method is determined: the coordinate value in the preset direction is obtained from the position of the target key point; in response to the coordinate value being located in the first value space or the third value space, the rate of change of the target position is a first rate of change; when the coordinate value is located in the second value space, the rate of change of the target position is a second rate of change; the first rate of change is greater than the second rate of change.
  • the target key point is a nose tip key point.
  • the determining the magnification factor according to the position information of the reference key point includes: determining a first relative distance of a horizontal area and a second relative distance of a vertical area according to the position information of the reference key point; obtaining The three-dimensional angle of the face, which includes a pitch angle and a yaw angle; a first weight corresponding to the first relative distance is determined according to the pitch angle and the yaw angle, and a first weight corresponding to the second relative distance is determined according to the pitch angle and the yaw angle.
  • the second weight corresponding to the relative distance obtain the sum of the product of the first relative distance and the first weight and the product of the second relative distance and the second weight; determine the magnification factor as the image capture page The ratio of the width of , to the sum of said products.
  • the determining a first weight corresponding to the first relative distance and a second weight corresponding to the second relative distance according to the pitch angle and the yaw angle includes: determining The first weight is the ratio of the pitch angle to the sum of the pitch angle and the yaw angle; the second weight is determined to be the sum of the yaw angle and the pitch angle and the yaw angle. and the ratio.
  • the preset area is a T-shaped area of the human face, and the T-shaped area of the human face includes the central area of the forehead and the central area of the human face;
  • the reference key points include left eye key points, right eye key points, The key point between the eyebrows and the key point of the tip of the nose;
  • the determining of the first relative distance in the horizontal area and the second relative distance in the vertical area according to the position information of the reference key point includes: determining the first relative distance as the left The distance between the eye key point and the right eye key point; the second relative distance is determined as the distance between the eyebrow key point and the nose tip key point.
  • an image processing apparatus including: an image acquisition module configured to collect face video data, and use each frame of face image in the face video data as an image to be processed a face recognition module, configured to perform face recognition on the to-be-processed image to obtain a face key point; a coefficient determination module, configured to extract a reference key point of a preset area from the face key point, The amplification factor is determined according to the position information of the reference key point; the face processing module is configured to amplify the human face according to the amplification factor, and according to the face motion information obtained by the face recognition face tracking.
  • the apparatus further includes: a key point acquisition module configured to acquire target key points in the central area of the face from the reference key points; a position determination module configured to obtain target key points according to a preset target The correspondence between the position of the key point and the target position determines the target position corresponding to the target key point; the moving module is configured to move the face in the direction of the target position relative to the target key point, until the target key point reaches the target position.
  • the preset correspondence between the position of the target key point and the target position includes: the position of the target key point includes a plurality of value intervals, and each value interval corresponds to a corresponding change
  • the change rate corresponds to the degree to which the target position changes as the position of the target key point changes, and the value interval is determined based on the size of the image capture page in a preset direction.
  • the size of the image capture page in a preset direction is divided to obtain a first value space, a second value space and a third value space connected in sequence; the rate of change is as follows The method is determined: the coordinate value in the preset direction is obtained from the position of the target key point; when the coordinate value is located in the first value space or the third value space, the rate of change of the target position is the first rate of change; in response to the coordinate value being located in the second value space, the rate of change of the target position is the second rate of change; the first rate of change is greater than the second rate of change.
  • the target key point is a nose tip key point.
  • the coefficient determination module includes: a distance determination unit configured to determine a first relative distance of a horizontal area and a second relative distance of a vertical area according to the position information of the reference key point; an angle acquisition unit , is configured to obtain a three-dimensional angle of the face, the three-dimensional angle of the face includes a pitch angle and a yaw angle; a weight determination unit is configured to determine the first relative distance from the first relative distance according to the pitch angle and the yaw angle the corresponding first weight, and the second weight corresponding to the second relative distance; the computing unit is configured to obtain the product of the first relative distance and the first weight and the second relative distance and the the sum of the products of the second weights; the coefficient determination unit is configured to determine the enlargement coefficient as the ratio of the width of the image capturing page to the sum of the products.
  • the weight determination unit is configured to determine the first weight as a ratio of the pitch angle to the sum of the pitch angle and the yaw angle; determine the second weight as the The ratio of the yaw angle to the sum of the pitch angle and the yaw angle.
  • the preset area is a T-shaped area of the human face, and the T-shaped area of the human face includes the central area of the forehead and the central area of the human face;
  • the reference key points include left eye key points, right eye key points, a key point between the eyebrows and a key point on the tip of the nose;
  • the distance determining unit is configured to determine the first relative distance as the distance between the left eye key point and the right eye key point; determine the second relative distance The distance is the distance between the key point between the eyebrows and the key point of the tip of the nose.
  • an electronic device comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement any one of the first aspect The image processing method described in the embodiment.
  • a storage medium when an instruction in the storage medium is executed by a processor of an electronic device, the electronic device can execute the image described in any one of the embodiments of the first aspect Approach.
  • a computer program product comprising a computer program, the computer program being stored in a readable storage medium, and at least one processor of a device from the readable storage medium The computer program is read and executed, so that the device executes the image processing method described in any one of the embodiments of the first aspect.
  • Each frame of face image in the collected face video data is used as an image to be processed, and face recognition is performed on the image to be processed to obtain face key points.
  • the reference key points of the preset area are extracted from the face key points.
  • the magnification factor is determined based on the position information of the reference key point in the preset area.
  • the face is enlarged according to the amplification factor, and face tracking is performed on the enlarged face according to the face motion information obtained by face recognition.
  • FIG. 1 is an application environment diagram of an image processing method according to an exemplary embodiment.
  • Fig. 2 is a flowchart of an image processing method according to an exemplary embodiment.
  • Fig. 3 is a flowchart showing a step of determining a target position according to an exemplary embodiment.
  • Fig. 4 is a schematic diagram of a piecewise function according to an exemplary embodiment.
  • Fig. 5 is a flowchart showing a step of determining an amplification factor according to an exemplary embodiment.
  • Fig. 6 is a flowchart of an image processing method according to an exemplary embodiment.
  • Fig. 7 is a schematic diagram of processing an image according to an exemplary embodiment.
  • Fig. 8 is a schematic diagram of processing an image according to an exemplary embodiment.
  • Fig. 9 is a block diagram of an image processing apparatus according to an exemplary embodiment.
  • Fig. 10 is an internal structure diagram of an electronic device according to an exemplary embodiment.
  • a deep neural network can be used to perform semantic segmentation on the images captured by the terminal or in real time to obtain image processing results such as key points of the face, the mask map of the hairstyle area, and the mask map of the facial features.
  • the processing results achieve many creative effects, such as the enlargement and dislocation of facial features, face stickers, and makeup.
  • the face in the image can be focused, so that the face can be displayed at the center of the screen with a larger area.
  • the following two implementations are often used to focus on the human face:
  • the image processing method provided by the present disclosure can be applied to the application environment shown in FIG. 1 .
  • the terminal 110 is pre-deployed with a face pose estimation method for face pose estimation, and an image processing logic supporting image processing based on the face recognition result.
  • the face pose estimation method can be a deep learning model-based method, an appearance-based method, a classification-based method, and the like. Face pose estimation methods and image processing logic can be embedded in the application. Applications are not limited to social applications, instant messaging applications, short video applications, and the like.
  • the terminal 110 collects face video data, and uses each frame of face image in the face video data as an image to be processed. Perform face recognition on the image to be processed to obtain face key points.
  • the reference key points of the preset area are extracted from the face key points, and the amplification factor is determined according to the position information of the reference key points; face for face tracking.
  • the terminal 110 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices.
  • FIG. 2 is a flowchart of an image processing method according to an exemplary embodiment. As shown in FIG. 2 , the image processing method used in the terminal 110 includes the following steps.
  • step S210 face video data is collected, and each frame of face image in the face video data is used as an image to be processed.
  • the face video data can be acquired by the image acquisition device.
  • the image acquisition device may be a device provided in the terminal; it may also be an independent device, such as a camera, a video camera, and the like.
  • the client may automatically control the image acquisition device to acquire the user's face video data after receiving the image processing instruction.
  • the image processing instruction may be triggered by the user by clicking on a preset face processing control or the like.
  • the client takes the face image of the current frame in the face video data as the image to be processed, and processes the face image of the current frame in real time according to the content described in steps S220 to S240 while collecting the face video data.
  • the objects to be processed may also be other human body parts other than the human face, such as hands, limbs, etc.; and may even be other types, such as animals, buildings, stars, and the like.
  • the to-be-processed image may also be a pre-shot still image stored in a local database or a server, or a still image captured in real time.
  • step S220 face recognition is performed on the image to be processed to obtain face key points.
  • a method based on a deep learning model can be used for face recognition of the image to be processed.
  • the deep learning model can be any model that can be used for face key point recognition, for example, DCNN (Deep Convolutional Network, deep convolutional neural network model) and the like.
  • the face key points can be predefined, and the number includes at least one.
  • each sample image is annotated according to pre-defined key point related information (such as key point ranking, key point positions, etc.).
  • the labeled sample images are used to train the deep learning model to obtain a deep learning model that can output the position information of the key points of the face.
  • the client inputs the acquired image to be processed into the trained deep learning model in real time to obtain the face key points.
  • step S230 the reference key points of the preset area are extracted from the face key points, and the magnification coefficient is determined according to the position information of the reference key points.
  • the preset area refers to an area that can roughly represent the position of the face in the image to be processed, for example, it may be a contour area, a center area, and the like of the human face.
  • the number of reference key points in the preset area may include multiple, which does not exclude the case of one.
  • the magnification factor is used to amplify the human face, so as to increase the proportion of the human face in the image capture page, so as to achieve the effect of highlighting the human face.
  • the key point related information of the reference key point is pre-configured in the client.
  • the client can extract the reference key points from the face key points according to the key point related information of the reference key points.
  • the client obtains the location information of the reference key point.
  • the size of the preset area is calculated according to the position information of the reference key points.
  • the size of the preset area can be characterized by parameters such as the frame size of the preset area and the distance between key points in the preset area.
  • the amplification factor is obtained by a preset algorithm. The preset algorithm depends on the specific situation.
  • the size of the preset area can be compared with a preset constant to obtain the magnification factor;
  • the matching magnification factor is obtained from the method; or, a correlation function is established by summarizing multiple experiments, and the magnification factor is calculated according to the size of the preset area through the correlation function.
  • step S240 the human face is enlarged according to the enlargement coefficient, and face tracking is performed on the enlarged human face according to the face motion information obtained by the face recognition.
  • the face motion information is not limited to including the face motion angle, the face motion trajectory, etc., and the face motion information can be output synchronously when the key points of the face are output through the deep learning model.
  • the client after obtaining the magnification factor, performs an enlarged display on the face in the to-be-processed image of the current frame by the magnification factor. At the same time, the client obtains the face motion information of the current frame, and controls the enlarged face in the to-be-processed image to move according to the face motion information, so as to achieve the effect of real-time face tracking.
  • the process of face enlargement can be displayed through animation special effects.
  • each frame of face image in the collected face video data is used as an image to be processed, and face recognition is performed on the image to be processed to obtain face key points.
  • the reference key points of the preset area are extracted from the face key points.
  • the magnification factor is determined based on the position information of the reference key point in the preset area.
  • the face is enlarged according to the amplification factor, and face tracking is performed on the enlarged face according to the face motion information obtained by face recognition.
  • the processing of the human face in the image may further include a process of moving the human face. This can be achieved by the following steps:
  • step S310 the target key point in the central area of the face is obtained from the reference key point.
  • step S320 the target position corresponding to the target key point is determined according to the preset correspondence between the position of the target key point and the target position.
  • step S330 the face is moved in the direction of the target position relative to the target key point until the target key point reaches the target position.
  • the target key point refers to the key point that can roughly represent the central position of the face.
  • the preset area can be the facial features area (including the area of the eyes, nose, and mouth), the T-shaped area (including the central area of the forehead and the face of the human face). The central area of the forehead and the nasal canal), the bridge of the nose (the area between the forehead and the quasi-head), etc.; the target key points can be selected correspondingly in the preset area. key points etc.
  • the target position refers to the position where the target key point is moved, and is used to display the face near the center of the image capture page without affecting the rendering effect of the face.
  • keypoint-related information of target keypoints may be pre-configured in the client.
  • the target key points are extracted from the face keys according to the key point related information of the target key points.
  • the target position corresponding to the target key point of the current frame is determined from the preset correspondence between the position of the target key point and the target position. The client moves the face according to the direction of the target position relative to the target key point until the target key point reaches the target position.
  • the face after the face is moved or before the face is moved, the face can also be enlarged by a multiplier of the enlargement factor according to the enlargement factor obtained in the above embodiment, with the target key point as the center.
  • the moving and enlarging process of the human face can be displayed through animation special effects.
  • the target key points are extracted from the reference key points, and the target position is obtained based on the target key points.
  • the efficiency of face processing can be improved; on the other hand, since the target key points can roughly represent the Therefore, the use of target key points can also improve the accuracy of face processing.
  • the preset correspondence between the position of the target key point and the target position includes: the position of the target key point includes a plurality of value intervals, each value interval corresponds to a corresponding change rate, and the change rate is The degree to which the target position changes with the position of the target key point, and the value interval is determined based on the size of the image capture page in the preset direction.
  • the image collection page may be an image page displayed by the client.
  • the preset direction can be determined according to the shooting angle of the image to be processed.
  • the shooting angle is vertical screen shooting
  • the preset direction may be the horizontal direction when the terminal device is placed in vertical screen.
  • the size in the preset direction can be characterized by the pixel size.
  • the rate of change is used to measure the degree to which the target position changes as the position of the target keypoints within the image to be processed changes.
  • the relative positions of the target key points and the image to be processed include an edge region and a central region of the image to be processed.
  • the value interval of the edge area and the center area can be predefined.
  • the rate of change of the target key point in the value range of the edge area is different from the change rate of the target key point in the value range corresponding to the central area.
  • the image processing process can present a better face tracking effect, and the presentation effect of the face will not be distorted.
  • the size of the image capture page in the preset direction is divided to obtain a first value space, a second value space and a third value space connected in sequence; the rate of change is determined in the following manner: from The coordinate value in the preset direction is obtained from the position of the target key point; when the coordinate value is in the first value space or the third value space, the change rate of the target position is the first change rate; when the coordinate value is in the second value space, the change rate is the first change rate; In the value space, the rate of change of the target position is the second rate of change; the first rate of change is greater than the second rate of change.
  • first change rate and/or the second change rate may be constant or indefinite.
  • Sequential connection means that the values of the first value space and the second value space are connected end to end, and the values of the second value interval and the third value interval are connected end to end.
  • the first value space and the second value interval can be used to represent the edge area of the image capturing page.
  • the third value interval can be used to represent the central area of the image capture page.
  • the pixel of the image acquisition device is 720*1280px (pixel, pixel)
  • 720 is the horizontal pixel width when the terminal device is placed in a vertical screen
  • 1280 is the vertical pixel width when the terminal device is placed in a vertical screen.
  • the default orientation is the horizontal orientation when the terminal device is placed in portrait orientation.
  • the pixel width 720 in the horizontal direction can be divided to obtain three value intervals connected in sequence, for example, divided into three value intervals of 0-200, 200-520, and 520-720.
  • the locations of target keypoints may be characterized by pixel coordinates.
  • the client obtains the coordinate value of the preset direction from the position of the target key point. Determine which value space the coordinate value belongs to. If it belongs to the first value interval or the third value interval, it means that the target key point is located in the edge area of the image capture page, and the client obtains the target position according to the first change rate; if the coordinate value is in the second value interval, it means If the target key point is located in the central area of the image capture page, the client obtains the target position according to the second change rate. Since the first change rate is greater than the second change rate, the change range of the target part in the edge area will be greater than that in the center area.
  • the pixels of the image to be processed are 720*1280px, and 720 is the horizontal pixel width when the terminal device is placed in a vertical screen.
  • the horizontal pixels are divided into three value ranges of 0-200, 200-520, and 520-720 which are connected in sequence.
  • the pixel coordinates of the target position in the horizontal direction can be obtained by the following piecewise function:
  • offset and centerPosX are the pixel coordinates of the target position in the horizontal direction; curPixelPosX is the pixel coordinates of the target key point in the horizontal direction.
  • the target position can also be obtained with reference to the piecewise function shown in FIG. 4 .
  • the pixel coordinates of the target position can also be mapped to the space of 0-1.
  • the target position changes rapidly from (0, y) to (0.5, y) with the position of the target key point;
  • the target position changes smoothly around (0.5, y) with the position change of the target key point, or even does not change.
  • y represents the coordinate value of the target key point in the vertical direction. It can be understood that although the rate of change of the piecewise function shown in FIG. 4 (which can be represented by a slope) is a constant, the rate of change of the piecewise function may also be an indefinite number in practical applications.
  • the efficiency of image processing can be accelerated, so that the image processing process can present a better face tracking effect, and the presentation of the face will not be affected.
  • the effect is distorted.
  • the amplification factor is determined according to the position information of the reference key point, which can be achieved by the following steps:
  • step S510 the first relative distance of the horizontal area and the second relative distance of the vertical area are determined according to the position information of the reference key point.
  • the horizontal area and the vertical area are obtained by dividing the preset area.
  • a group of key points located at both ends of the horizontal area in the horizontal area are acquired, and the first relative distance of the horizontal area is obtained by calculating the position information of the group of key points.
  • the vertical area a group of key points located at both ends of the vertical area in the vertical area are obtained, and the second relative distance of the vertical area is obtained by calculating the position information of a group of key points.
  • the preset area is a T-shaped area of the human face, and the T-shaped area of the human face includes the central area of the forehead and the central area of the human face; the reference key points include the left eye key point, the right eye key point, the eyebrow key point and the tip of the nose key point.
  • the first relative distance may be the distance between the left eye key point and the right eye key point, which may be calculated according to the position information of the left eye key point and the right eye key point.
  • the second relative distance may be the distance between the key point between the eyebrows and the key point of the tip of the nose, which may be calculated according to the position information of the key point between the eyebrows and the key point of the tip of the nose.
  • step S520 a three-dimensional angle of the face is obtained, and the three-dimensional angle of the face includes a pitch angle and a yaw angle.
  • the three-dimensional angle of the face can be represented by Euler angles.
  • Euler angle refers to the rotation angle of an object around the three coordinate axes (x, y, z axis) of the coordinate system.
  • Euler angles can be obtained by performing gesture recognition on the key points of the face.
  • the pose estimation algorithm of Opencv an open source computer vision library
  • the Euler angle includes a pitch angle and a yaw angle.
  • the pitch angle (pitch) represents the angle that the object rotates around the x-axis
  • the yaw angle (yaw) represents the angle that the object rotates around the y-axis.
  • step S530 a first weight corresponding to the first relative distance and a second weight corresponding to the second relative distance are determined according to the pitch angle and the yaw angle.
  • the ratio of the pitch angle to the sum of the pitch angle and the yaw angle can be used as the first weight A, that is:
  • the ratio of the yaw angle to the sum of the pitch angle and the yaw angle can be used as the second weight B, namely:
  • step S540 the product of the first relative distance and the first weight and the sum of the product of the second relative distance and the second weight are obtained.
  • the magnification factor is determined as the ratio of the width of the image capturing page to the sum of the products.
  • the client obtains the first relative distance of the horizontal area, the second relative distance of the vertical area, and the first weight corresponding to the first relative distance and the second weight corresponding to the second relative distance
  • the sum of the product between the first relative distance and the first weight and the product between the second relative distance and the second weight is calculated.
  • the sum of the products can be obtained by the following formula:
  • scaleHelpValue represents the sum of the products; ewidth represents the first relative distance of the horizontal area; nHeight represents the second relative distance of the vertical area; A represents the first weight; B represents the second weight.
  • magnification factor can be obtained by the following formula:
  • scaleValue represents the magnification factor
  • width represents the width of the image capture page in the preset direction.
  • the amplification factor can be quickly obtained according to the preconfigured calculation formula, which avoids the performance bottleneck caused by many key points, and at the same time speeds up the acquisition efficiency of the amplification factor.
  • the amplification factor can be quickly obtained according to the preconfigured calculation formula, which avoids the performance bottleneck caused by many key points, and at the same time speeds up the acquisition efficiency of the amplification factor.
  • Fig. 6 is a flowchart of an image processing method according to some embodiments.
  • the terminal is a user's handheld device with a built-in image acquisition device, such as a smart phone, a tablet computer, a portable wearable device, and the like.
  • the image to be processed is the face image of the current frame in the face video data collected by the user's handheld device. As shown in Figure 6, the following steps are included.
  • step S602 face video data is collected through the user's handheld device.
  • step S604 face recognition is performed on the face image of the current frame in the face video data by using a deep learning model to obtain face key points.
  • step S606 the three-dimensional angle of the face obtained by estimating the pose of the face according to the key points of the face is obtained.
  • the three-dimensional angle of the face is represented by Euler angles, including pitch angle and yaw angle.
  • step S608 the reference key points of the T-shaped area are extracted from the face key points.
  • the reference key points include the left eye key point, the right eye key point, the nose tip key point and the eyebrow tip key point.
  • step S610 the first relative distance of the horizontal area is obtained by calculating according to the position information of the left eye key point and the right eye key point.
  • the second relative distance of the vertical area is calculated according to the position information of the key point of the nose tip and the key point of the eyebrow tip.
  • step S612 the ratio of the pitch angle to the sum of the pitch angle and the yaw angle is used as the first weight; the ratio of the yaw angle to the sum of the pitch angle and the yaw angle is used as the second weight.
  • step S614 the product of the first relative distance and the first weight and the sum of the product of the second relative distance and the second weight are obtained.
  • the magnification factor is determined as the ratio of the width of the image capturing page to the sum of the products.
  • step S618 the target position corresponding to the position of the nose tip key point in the current frame is determined from the preset correspondence between the position of the nose tip key point and the target position.
  • the correspondence between the position of the key point of the nose tip and the target position can be represented by a piecewise function, and the specific implementation of the piecewise function can refer to the above-mentioned embodiments, which will not be described in detail here.
  • step S620 the human face is moved until the key point of the nose tip reaches the target position. Taking the key point of the nose tip as the center, the face is enlarged by the magnification factor according to the magnification factor.
  • FIG. 7 is a schematic diagram obtained by processing a human face by the method in this embodiment
  • FIG. 8 is a schematic diagram obtained by using a processing method of a single key point in the related art. Comparing Fig. 7 and Fig. 8, it can be seen that for the same original image, the processing method of a single key point in the related art may not be stable enough and prone to distortion (the left ear is too stretched). By means of the present disclosure, the operating pressure of the device can be reduced, and a better image processing effect can also be obtained.
  • steps in the above flow charts are displayed in sequence according to the arrows, these steps are not necessarily executed in the sequence indicated by the arrows. Unless explicitly stated herein, there is no strict order in the execution of these steps, and these steps may be performed in other orders. Moreover, at least a part of the steps in the above flow chart may include multiple steps or multiple stages. These steps or stages are not necessarily executed at the same time, but may be executed at different times. The execution sequence of these steps or stages It is also not necessarily performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of a step or phase within the other steps.
  • FIG. 9 is a block diagram of an image processing apparatus 900 according to some embodiments.
  • the apparatus 900 includes an image acquisition module 901 , a face recognition module 902 , a coefficient determination module 903 and a face processing module 904 .
  • the image acquisition module 901 is configured to collect face video data, and use each frame of face image in the face video data as an image to be processed; the face recognition module 902 is configured to perform face recognition on the image to be processed, and obtain a human face. face key points; the coefficient determination module 903 is configured to extract the reference key points of the preset area from the face key points, and determine the amplification factor according to the position information of the reference key points; the face processing module 904 is configured to be based on the amplification factor The face is enlarged, and face tracking is performed on the enlarged face according to the face motion information obtained by face recognition.
  • the apparatus further includes: a key point acquisition module configured to acquire target key points in the central area of the face from reference key points; a position determination module configured to obtain target key points according to preset target key points The corresponding relationship between the position and the target position is determined, and the target position corresponding to the target key point is determined; the moving module is configured to move the face in the direction of the target position relative to the target key point until the target key point reaches the target position.
  • a key point acquisition module configured to acquire target key points in the central area of the face from reference key points
  • a position determination module configured to obtain target key points according to preset target key points The corresponding relationship between the position and the target position is determined, and the target position corresponding to the target key point is determined
  • the moving module is configured to move the face in the direction of the target position relative to the target key point until the target key point reaches the target position.
  • the preset correspondence between the position of the target key point and the target position includes: the position of the target key point includes a plurality of value intervals, each value interval corresponds to a corresponding change rate, and the change rate is The degree to which the target position changes with the position of the target key point, and the value interval is determined based on the size of the image capture page in the preset direction.
  • the size of the image capture page in the preset direction is divided to obtain a first value space, a second value space and a third value space connected in sequence; the rate of change is determined in the following manner: from The coordinate value in the preset direction is obtained from the position of the target key point; when the coordinate value is in the first value space or the third value space, the change rate of the target position is the first change rate; when the coordinate value is in the second value space, the change rate is the first change rate; In the value space, the rate of change of the target position is the second rate of change; the first rate of change is greater than the second rate of change.
  • the target keypoint is a nose tip keypoint.
  • the coefficient determination module 903 includes: a distance determination unit configured to determine a first relative distance of the horizontal area and a second relative distance of the vertical area according to the position information of the reference key points; an angle acquisition unit configured to In order to obtain the three-dimensional angle of the face, the three-dimensional angle of the face includes a pitch angle and a yaw angle; the weight determination unit is configured to determine the first weight corresponding to the first relative distance according to the pitch angle and the yaw angle, and the second relative distance.
  • a calculation unit configured to obtain the sum of the product of the first relative distance and the first weight and the sum of the product of the second relative distance and the second weight; the coefficient determination unit configured to determine the magnification factor as the image The ratio of the width of the capture page to the sum of the products.
  • the weight determination unit is configured to determine the first weight as the ratio of the pitch angle to the sum of the pitch angle and the yaw angle; and to determine the second weight as the ratio of the yaw angle to the sum of the pitch angle and the yaw angle ratio.
  • the preset area is a T-shaped area of the human face, and the T-shaped area of the human face includes the central area of the forehead and the central area of the human face;
  • the reference key points include the left eye key point, the right eye key point, the eyebrow key point and the tip of the nose key point;
  • a distance determination unit configured to determine the first relative distance as the distance between the left eye key point and the right eye key point; and determine the second relative distance as the distance between the eyebrow key point and the nose tip key point.
  • Fig. 10 shows a block diagram of a device 1000 for image processing according to some embodiments.
  • device 1000 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, or the like.
  • a device 1000 may include one or more of the following components: a processing component 1002, a memory 1004, a power supply component 1006, a multimedia component 1008, an audio component 1010, an input/output (I/O) interface 1012, a sensor component 1014, and Communication component 1016.
  • the processing component 1002 generally controls the overall operation of the device 1000, such as operations associated with display, phone calls, data communications, camera operations, and recording operations.
  • the processing component 1002 can include one or more processors 1020 to execute instructions to perform all or some of the steps of the methods described above.
  • processing component 1002 may include one or more modules that facilitate interaction between processing component 1002 and other components.
  • processing component 1002 may include a multimedia module to facilitate interaction between multimedia component 1008 and processing component 1002.
  • Memory 1004 is configured to store various types of data to support operation at device 1000 . Examples of such data include instructions for any application or method operating on device 1000, contact data, phonebook data, messages, pictures, videos, and the like. Memory 1004 may be implemented by any type of volatile or non-volatile storage device or combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable programmable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM erasable programmable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic or Optical Disk Magnetic Disk
  • Power supply assembly 1006 provides power to various components of device 1000 .
  • Power supply components 1006 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to device 1000 .
  • Multimedia component 1008 includes a screen that provides an output interface between the device 1000 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action.
  • the multimedia component 1008 includes a front-facing camera and/or a rear-facing camera. When the device 1000 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.
  • Audio component 1010 is configured to output and/or input audio signals.
  • audio component 1010 includes a microphone (MIC) that is configured to receive external audio signals when device 1000 is in operating modes, such as call mode, recording mode, and voice recognition mode.
  • the received audio signal may be further stored in memory 1004 or transmitted via communication component 1016 .
  • audio component 1010 also includes a speaker for outputting audio signals.
  • the I/O interface 1012 provides an interface between the processing component 1002 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.
  • Sensor assembly 1014 includes one or more sensors for providing status assessment of various aspects of device 1000 .
  • the sensor component 1014 can detect the open/closed state of the device 1000, the relative positioning of components, such as the display and keypad of the device 1000, and the sensor component 1014 can also detect a change in the position of the device 1000 or a component of the device 1000 , the presence or absence of user contact with the device 1000 , the device 1000 orientation or acceleration/deceleration and the temperature change of the device 1000 .
  • Sensor assembly 1014 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact.
  • Sensor assembly 1014 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor assembly 1014 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 1016 is configured to facilitate wired or wireless communication between device 1000 and other devices.
  • Device 1000 may access wireless networks based on communication standards, such as WiFi, carrier networks (such as 2G, 3G, 4G, or 5G), or a combination thereof.
  • the communication component 1016 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 1016 also includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module may be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • device 1000 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gates An array (FPGA), controller, microcontroller, microprocessor, or other electronic component implementation for performing the above method.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable gates
  • controller microcontroller, microprocessor, or other electronic component implementation for performing the above method.
  • non-transitory computer-readable storage medium including instructions, such as memory 1004 including instructions, executable by the processor 1020 of the device 1000 to perform the method described above.
  • the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

Abstract

La présente invention concerne un procédé et un appareil de traitement d'images, un dispositif électronique et un support de stockage. Le procédé consiste à : acquérir des données vidéo de visage et utiliser chaque trame d'image de visage dans les données vidéo de visage en tant qu'image à traiter ; effectuer une reconnaissance faciale sur l'image à traiter, de façon à obtenir des points clés d'un visage ; extraire un point clé de référence d'une région prédéfinie à partir des points clés du visage et déterminer un coefficient d'amplification en fonction d'informations de position du point clé de référence ; et amplifier le visage selon le coefficient d'amplification, puis effectuer un suivi de visage sur le visage amplifié en fonction d'informations de mouvement de visage obtenues à partir de la reconnaissance faciale.
PCT/CN2021/128769 2020-12-10 2021-11-04 Procédé et appareil de traitement d'images WO2022121577A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011434480.4 2020-12-10
CN202011434480.4A CN112509005B (zh) 2020-12-10 2020-12-10 图像处理方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022121577A1 true WO2022121577A1 (fr) 2022-06-16

Family

ID=74970472

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/128769 WO2022121577A1 (fr) 2020-12-10 2021-11-04 Procédé et appareil de traitement d'images

Country Status (2)

Country Link
CN (1) CN112509005B (fr)
WO (1) WO2022121577A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116306733A (zh) * 2023-02-27 2023-06-23 荣耀终端有限公司 一种放大二维码的方法及电子设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112509005B (zh) * 2020-12-10 2023-01-20 北京达佳互联信息技术有限公司 图像处理方法、装置、电子设备及存储介质
CN113778233B (zh) * 2021-09-16 2022-04-05 广东魅视科技股份有限公司 一种操控显示设备的方法、装置及可读介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460343A (zh) * 2018-02-06 2018-08-28 北京达佳互联信息技术有限公司 图像处理方法、系统及服务器
CN108550185A (zh) * 2018-05-31 2018-09-18 Oppo广东移动通信有限公司 人脸美化处理方法和装置
CN110175558A (zh) * 2019-05-24 2019-08-27 北京达佳互联信息技术有限公司 一种人脸关键点的检测方法、装置、计算设备及存储介质
CN110415164A (zh) * 2018-04-27 2019-11-05 武汉斗鱼网络科技有限公司 人脸变形处理方法、存储介质、电子设备及系统
US20200335136A1 (en) * 2019-07-02 2020-10-22 Beijing Dajia Internet Information Technology Co., Ltd. Method and device for processing video
CN112509005A (zh) * 2020-12-10 2021-03-16 北京达佳互联信息技术有限公司 图像处理方法、装置、电子设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460343A (zh) * 2018-02-06 2018-08-28 北京达佳互联信息技术有限公司 图像处理方法、系统及服务器
CN110415164A (zh) * 2018-04-27 2019-11-05 武汉斗鱼网络科技有限公司 人脸变形处理方法、存储介质、电子设备及系统
CN108550185A (zh) * 2018-05-31 2018-09-18 Oppo广东移动通信有限公司 人脸美化处理方法和装置
CN110175558A (zh) * 2019-05-24 2019-08-27 北京达佳互联信息技术有限公司 一种人脸关键点的检测方法、装置、计算设备及存储介质
US20200335136A1 (en) * 2019-07-02 2020-10-22 Beijing Dajia Internet Information Technology Co., Ltd. Method and device for processing video
CN112509005A (zh) * 2020-12-10 2021-03-16 北京达佳互联信息技术有限公司 图像处理方法、装置、电子设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116306733A (zh) * 2023-02-27 2023-06-23 荣耀终端有限公司 一种放大二维码的方法及电子设备
CN116306733B (zh) * 2023-02-27 2024-03-19 荣耀终端有限公司 一种放大二维码的方法及电子设备

Also Published As

Publication number Publication date
CN112509005A (zh) 2021-03-16
CN112509005B (zh) 2023-01-20

Similar Documents

Publication Publication Date Title
US11114130B2 (en) Method and device for processing video
WO2022121577A1 (fr) Procédé et appareil de traitement d'images
WO2019134516A1 (fr) Procédé et dispositif de génération d'image panoramique, support d'informations et appareil électronique
US11030733B2 (en) Method, electronic device and storage medium for processing image
CN109087238B (zh) 图像处理方法和装置、电子设备以及计算机可读存储介质
CN109242765B (zh) 一种人脸图像处理方法、装置和存储介质
CN108470322B (zh) 处理人脸图像的方法、装置及可读存储介质
JP2016531362A (ja) 肌色調整方法、肌色調整装置、プログラム及び記録媒体
CN112348933B (zh) 动画生成方法、装置、电子设备及存储介质
US20200312022A1 (en) Method and device for processing image, and storage medium
CN109325908B (zh) 图像处理方法及装置、电子设备和存储介质
EP3975046B1 (fr) Procédé et appareil de détection d'image occluse et support
WO2023273499A1 (fr) Procédé et appareil de mesure de profondeur, dispositif électronique et support de stockage
WO2023273498A1 (fr) Procédé et appareil de détection de profondeur, dispositif électronique et support de stockage
CN107977636B (zh) 人脸检测方法及装置、终端、存储介质
CN112541400A (zh) 基于视线估计的行为识别方法及装置、电子设备、存储介质
CN111144266B (zh) 人脸表情的识别方法及装置
CN110807769B (zh) 图像显示控制方法及装置
WO2020114097A1 (fr) Procédé et appareil de détermination de zone de délimitation, dispositif électronique et support de stockage
CN111340691A (zh) 图像处理方法、装置、电子设备及存储介质
CN107239758B (zh) 人脸关键点定位的方法及装置
CN111489284B (zh) 一种图像处理方法、装置和用于图像处理的装置
CN113642551A (zh) 指甲关键点检测方法、装置、电子设备及存储介质
CN110110742B (zh) 多特征融合方法、装置、电子设备及存储介质
CN116320721A (zh) 一种拍摄方法、装置、终端及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21902284

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25.09.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21902284

Country of ref document: EP

Kind code of ref document: A1