CN110136229B

CN110136229B - Method and equipment for real-time virtual face changing

Info

Publication number: CN110136229B
Application number: CN201910448048.1A
Authority: CN
Inventors: 钟文坤; 韩磊
Original assignee: Guangzhou Hiscene Information Technology Co ltd
Current assignee: Guangzhou Hiscene Information Technology Co ltd
Priority date: 2019-05-27
Filing date: 2019-05-27
Publication date: 2023-07-14
Anticipated expiration: 2039-05-27
Also published as: CN110136229A

Abstract

The application aims to provide a method and equipment for real-time virtual face changing, wherein the method comprises the following steps: acquiring face characteristic point information of a face to be replaced in a current image frame of a video stream of the face to be replaced; tracking the characteristic point information of the face characteristic point information in the current image frame of the video stream; according to the characteristic point information of the face and the template characteristic point information of the template face after the characteristic point information is tracked, replacing the face to be replaced with the template face; and performing color fusion and edge processing on the current image frame after the face replacement. According to the face feature point information frame association stable tracking method and device, the face shake phenomenon is avoided through frame association stable tracking before and after the face feature point information, the face change effect is more natural, and the use experience of a user is improved.

Description

Method and equipment for real-time virtual face changing

Technical Field

The present application relates to the field of image processing, and in particular, to a technique for real-time virtual face changing.

Background

Along with the integration of entertainment modes such as social networks, live video broadcasting, interactive entertainment and the like into daily life of people, the processing technology of face related images is increasingly favored by people, such as face beautifying, cartoon, virtual face changing and the like, and the application demands of functions are increasing. Virtual face changing is widely applied as an important component of face editing, such as face changing of customer service according to user requirements when the customer service is in communication, so as to obtain better visual experience and increase certain interestingness and practicability.

Disclosure of Invention

An object of the present application is to provide a method and apparatus for real-time virtual face-changing.

According to one aspect of the present application, there is provided a method for real-time virtual face-changing, the method comprising:

acquiring face characteristic point information of a face to be replaced in a current image frame of a video stream of the face to be replaced;

tracking the characteristic point information of the face characteristic point information in the current image frame of the video stream;

according to the characteristic point information of the face and the template characteristic point information of the template face after the characteristic point information is tracked, replacing the face to be replaced with the template face;

and performing color fusion and edge processing on the current image frame after the face replacement.

According to another aspect of the present application, there is provided an apparatus for real-time virtual face-changing, the apparatus comprising:

the face feature point information of the face to be replaced in the current image frame of the video stream of the face to be replaced is acquired;

the two-module is used for tracking the characteristic point information of the face characteristic point information in the current image frame of the video stream;

the three modules are used for replacing the face to be replaced by the template face according to the face feature point information tracked by the feature point information and the template feature point information of the template face;

And the four modules are used for carrying out color fusion and edge processing on the current image frame after the face replacement.

According to one aspect of the present application, there is provided an apparatus for real-time virtual face-changing, the apparatus comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the operations of the method as described above.

According to one aspect of the present application, there is provided a computer readable medium storing instructions that, when executed, cause the processor to perform the operations of the method as described above.

Compared with the prior art, the method and the device have the advantages that the face characteristic point information of the face to be replaced is obtained in the current image frame of the video stream related to the face to be replaced, the face characteristic point information is tracked in the current image frame of the video stream, and according to the face characteristic point information and the template characteristic point information of the template face after the characteristic point information is tracked, the template face is used for replacing the face to be replaced, and the current image frame after the face to be replaced is subjected to color fusion and edge processing. According to the face feature point information processing method and device, the face feature point information is associated with the front frame and the rear frame of the video stream, the face feature points are stably tracked, the face shake phenomenon caused by the limit of the positioning accuracy of the face feature points is avoided, the face change effect is more natural, and the use experience of a user is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings, in which:

fig. 1 illustrates an exemplary diagram for real-time virtual face replacement according to an embodiment of the present application, where fig. a illustrates a template face, fig. b illustrates a face to be replaced, and fig. c illustrates a replaced face;

FIG. 2 illustrates a flow chart of a method for real-time virtual face-changing according to one embodiment of the present application;

FIG. 3 illustrates an example diagram of face detection according to one embodiment of the present application;

FIG. 4 illustrates an example diagram of extracting face feature points according to one embodiment of the present application;

FIG. 5 illustrates an example diagram of a triangulation according to one embodiment of the present application;

fig. 6 illustrates an example graph of affine transformation based on triangulation according to one embodiment of the present application, wherein graph (a) illustrates an example of template feature point information of a template face, graph (b) illustrates a face to be replaced, and graph (c) illustrates an example of face feature point information of a face to be replaced, the template feature point information of the graph (a) corresponding to triangles in the face feature point information of the graph (c);

FIG. 7 illustrates an example diagram of a color fusion according to one embodiment of the present application;

fig. 8 illustrates an exemplary diagram of face contour adaptation according to an embodiment of the present application, where fig. (a) illustrates a template face, fig. (b) illustrates a face to be replaced, and fig. (c) illustrates a face to be replaced after face contour adaptation;

FIG. 9 illustrates functional blocks of an apparatus for real-time virtual face-changing according to one embodiment of the present application;

FIG. 10 illustrates an exemplary system that can be used to implement various embodiments described herein.

The same or similar reference numbers in the drawings refer to the same or similar parts.

Detailed Description

The present application is described in further detail below with reference to the accompanying drawings.

In one typical configuration of the present application, the terminal, the devices of the services network, and the trusted party each include one or more processors (e.g., central processing units (Central Processing Unit, CPU)), input/output interfaces, network interfaces, and memory.

The Memory may include non-volatile Memory in a computer readable medium, random access Memory (Random Access Memory, RAM) and/or non-volatile Memory, etc., such as Read Only Memory (ROM) or Flash Memory (Flash Memory). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase-Change Memory (PCM), programmable Random Access Memory (Programmable Random Access Memory, PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (Dynamic Random Access Memory, DRAM), other types of Random Access Memory (RAM), read-Only Memory (ROM), electrically erasable programmable read-Only Memory (EEPROM), flash Memory or other Memory technology, read-Only Memory (Compact Disc Read-Only Memory, CD-ROM), digital versatile disks (Digital Versatile Disc, DVD) or other optical storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device.

The device referred to in the present application includes, but is not limited to, a user device, a network device, or a device formed by integrating a user device and a network device through a network. The user equipment includes, but is not limited to, any mobile electronic product which can perform man-machine interaction with a user (for example, perform man-machine interaction through a touch pad), such as a smart phone, a tablet computer and the like, and the mobile electronic product can adopt any operating system, such as an Android operating system, an iOS operating system and the like. The network device includes an electronic device capable of automatically performing numerical calculation and information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a programmable logic device (Programmable Logic Device, PLD), a field programmable gate array (Field Programmable Gate Array, FPGA), a digital signal processor (Digital Signal Processor, DSP), an embedded device, and the like. The network device includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a cloud of servers; here, the Cloud is composed of a large number of computers or network servers based on Cloud Computing (Cloud Computing), which is a kind of distributed Computing, a virtual supercomputer composed of a group of loosely coupled computer sets. Including but not limited to the internet, wide area networks, metropolitan area networks, local area networks, VPN networks, wireless Ad Hoc networks (Ad Hoc networks), and the like. Preferably, the device may be a program running on the user device, the network device, or a device formed by integrating the user device and the network device, the touch terminal, or the network device and the touch terminal through a network.

Of course, those skilled in the art will appreciate that the above-described devices are merely examples, and that other devices now known or hereafter may be present as appropriate for the application, are intended to be within the scope of the present application and are incorporated herein by reference.

In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

Fig. 1 illustrates an exemplary diagram for real-time virtual face replacement according to an embodiment of the present application, where fig. a illustrates a template face, fig. b illustrates a face to be replaced, and fig. c illustrates a face after replacement. The present solution is performed by a computing device including, but not limited to, a user device, a network device, a combination of a user device and a network device, and the like. The computing device comprises an output device for outputting processed image information, such as a display screen; the computing device also comprises video image processing means for detecting, replacing, processing and the like faces in the images. The computing device performs feature point information tracking on face feature point information of a face to be replaced in a current image frame shown in a picture (b) in a video stream of the face to be replaced, replaces the face to be replaced with a template face shown in a picture (a) based on triangulation and affine transformation, and performs color fusion, edge processing and the like on the replaced current image frame to obtain a final processing result shown in a picture (c). The human face background fusion used in the prior virtual face change mostly uses a poisson fusion algorithm, the calculated amount is too large, the real-time requirement is influenced, and when the illumination difference of two human faces is large, the fusion effect is not natural, the naked eyes see the trace of the artificial editing.

Referring to the example shown in fig. 1, fig. 2 illustrates a method for real-time virtual face-changing according to an aspect of the present application, wherein the method includes steps S101, S102, S103, and S104. In step S101, a computing device obtains face feature point information about a face to be replaced in a current image frame of a video stream of the face to be replaced; in step S102, the computing device performs feature point information tracking on the face feature point information in the current image frame of the video stream; in step S103, the computing device replaces the face to be replaced with the template face according to the feature point information of the face and the template feature point information of the template face after the feature point information is tracked; in step S104, the computing device performs color fusion and edge processing on the current image frame after the face replacement.

In step S101, a computing device obtains face feature point information about a face to be replaced in a current image frame of a video stream of the face to be replaced. For example, the computing device further comprises an input device, which is used for receiving video streams about the face to be replaced sent by other devices, such as a data transmission interface and the like; alternatively, the computing device further comprises camera means for capturing video streams etc. concerning the face to be replaced, such as cameras, depth cameras etc. The face characteristic point information of the face to be replaced comprises image position information of characteristic points corresponding to the marking characteristics such as eyes, nose, mouth, face outline and the like of the face to be replaced in the current image frame and the like. The computing device extracts the face feature points of the face to be replaced by receiving video streams about the face to be replaced, which are sent by other devices, based on the face positions in the current image frames in the video streams, through face feature point extraction and other modes, so that face feature point information is obtained. As in some implementations, the step S101 includes a substep S1011 (not shown) and a substep S1012 (not shown), in which step S1011 the computing device obtains image location information about the face to be replaced in a current image frame in the video stream of the face to be replaced; in step S1012, the computing device extracts face feature point information in the current image frame according to the image position information. For example, the image position information of the face to be replaced includes pixel coordinates of the contour of the face to be replaced or a custom external graph (such as an external rectangle) in an image coordinate system of the current image frame, such as a line segment equation corresponding to four line segments of a border of the external rectangle, or corner point pixel coordinates corresponding to at least two opposite angles of the external rectangle, and the like. In some embodiments, the image position information of the face to be replaced is determined by selecting in the current image frame based on user operation (such as frame selection, etc.), for example, by clicking frame selection operation of the user, etc., and corresponding face circumscribed rectangle is determined at the clicking or frame selection position of the user; in other embodiments, the image position information of the face to be replaced is determined by performing face detection on the current image frame by the computing device, as in step S1011, the computing device performs face detection on the current image frame in the video stream related to the face to be replaced, determines image position information corresponding to at least one face, and determines the image position information of the face to be replaced from the image position information of the at least one face. For example, the Face detection includes a computing device determining, in the current image frame, one or more Face contours or pixel coordinates of bounding rectangles in the current image frame, etc., by Face-related features, including, but not limited to, adaboost cascading Face detection algorithm based on Haar features, ACF (Aggregate Channel Features for Multi-view Face Detection, multi-view Face detection based on aggregate channels), DPM (Deformable Part Model, deformable component model), cascades CNN (A convolutional neural network Cascade for Face detection, face detection algorithm of convolutional neural network), denseBox (DenseBox: unifying Landmark Localization with End to End Object Detection, object detection algorithm based on full convolutional network), faceness-Net (Faceness-Net: face Detection through Deep Facial Part Responses, video Face detection recognition algorithm based on deep learning), facer-CNN, pyramidBox (Pyramidbox: A Context-assisted Single Shot Face Detector, environmentally assisted single step Face detector), etc.; here we use Adaboost cascading face detection algorithm based on Haar feature, the method has fast detection speed and good robustness, and can well realize face detection, and rectangular frames in the example diagram of face detection shown in fig. 3 and two diagonal coordinates (x 1, y 1), (x 2, y 2) of the rectangular frames are obtained. Of course, those skilled in the art will appreciate that the above-described face detection algorithm is merely exemplary, and that other face detection algorithms that may be present or later developed are applicable to the present application and are intended to be within the scope of the present application and are incorporated herein by reference.

In some embodiments, the computing device may detect that one or more face circumscribed rectangular frames exist in the current image frame through the face detection algorithm, and the computing device may determine the face circumscribed rectangular frame of the face to be replaced from the one or more face circumscribed rectangular frames based on a selected operation of a user or the like; or the computing equipment performs target tracking according to the image position information of the face to be replaced in the last image frame of the video stream, and determines the image position information of the face to be replaced from the image position information corresponding to the one or more face circumscribed rectangles. The computing device then extracts feature point information related to the face based on the image location information of the face to be replaced, where the feature point positioning algorithms include, but are not limited to, GBDT (Gradient Boosting Decison Tree, algorithm gradient lifting tree), ASM (Active Shape Models, active shape model), AAM (Active Appearance Models, active texture model), DCNN (Extensive Facial Landmark Localization with Coarse to fine Convolutional Network Cascade, coarse-to-fine face feature point positioning algorithm), TCDCN (Facial Landmark Detection by Deep Multi-task Learning, task driven depth model), MTCNN (Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks, face detection and alignment based on a multi-tasking cascade convolutional neural network), TCNN (Facial Landmark Detection with Tweaked Convolutional Neural Networks, face detection to adjust convolutional neural network), and the like. Here, we use GBDT as a regression tree-based face alignment algorithm to stepwise regress the face shape from the current shape to the true shape by building a cascaded residual regression tree; each leaf node of each GBDT stores a residual regression quantity, when an input falls on a node, the residual is added to the input to play a regression purpose, and finally all the residual are overlapped to obtain an example graph of extracted face characteristic point information shown in fig. 4, 75 characteristic points of the face are extracted, the algorithm has high calculation speed and stable performance, each face on a PC can be positioned by only 1-2 milliseconds, the number of the extracted characteristic points is not limited to 75, and other key points such as 29, 68, 83, 106 and the like. Of course, those skilled in the art will appreciate that the above-described facial feature point positioning algorithm is merely exemplary, and that other facial feature point positioning algorithms that may be present in the present application or in the future are applicable and are intended to be included within the scope of the present application and are incorporated herein by reference.

In step S102, the computing device performs feature point information tracking on the face feature point information in a current image frame of the video stream. For example, in the currently known face feature point positioning method, certain instability exists, even if a face is motionless in a video, the face feature point information of the face is affected by positioning accuracy, jitter in a plurality of pixels exists in front and rear frames, if a result obtained by directly positioning the face feature point information is subjected to subsequent face replacement or the like, and a certain jitter phenomenon exists in a video stream for the replaced face. According to the scheme, the face characteristic point information in the front frame and the rear frame is tracked, so that effective correction processing can be carried out, and a stable face image presentation effect is obtained. The feature point information tracking comprises the step of tracking based on the face feature point information of the face to be replaced of the previous image frame to obtain predicted tracking face feature point information, and the final face feature point information is determined by combining the face feature point information in the current image frame. As in some implementations, in step S102, the computing device performs feature point information tracking on the face feature point information in a current video of the video stream using an optical flow tracking algorithm. For example, the optical flow tracking algorithm includes a KLT (Kanade-Lucas-Tomasi) optical flow method, and the specific method includes: for each feature point, it is assumed that face feature point information corresponding to a previous image frame is extracted from the previous image frame of the video stream by the method for extracting face feature point information, position information of face feature points in the current image frame is predicted based on the KLT optical flow method, corresponding tracking face feature point information is obtained, and face feature point information corresponding to the current image frame is extracted by combining the method for extracting face feature point information, so that corresponding final face feature point information is obtained. As in some embodiments, the step S102 includes a substep S1021 (not shown) and a substep S1022 (not shown), and in the step S1021, the computing device predicts the tracked face feature point information of the face to be replaced in the current image frame of the video stream by using an optical flow tracking algorithm according to the face feature point information of the face to be replaced in the previous image frame of the video stream; in step S1022, the computing device determines final face feature point information in the current image frame according to the tracked face feature point information and the face feature point information in the current image frame.

In some embodiments, in step S1022, the computing device obtains weight information of the tracking face feature point information and weight information corresponding to the face feature point information in the current image frame, and determines final face feature point information in the current image frame according to the tracking face feature point information, the weight information of the tracking face feature point information, and the weight information of the face feature point information. For example, for each feature point, the position coordinate obtained from the face feature point information in the previous image frame is set to be p (t-1), and the position of the feature point in the current frame is predicted by the KLT optical flow method and set to be p0 (t). The coordinates of the face feature point information obtained in the current frame are set as p (t), and the final coordinates ps (t) of the face feature point information are finally obtained (namely, the tracking result and the current positioning result are combined):

ps(t)＝(1-α)p(t)+αp0(t) (1)

wherein alpha represents weight information of the tracking face feature point information, and 1-alpha represents weight information of the face feature point information in the current image frame. Here, α may be preset or may be determined based on a related parameter of the previous and subsequent image frames. As in some embodiments, the weight information of the tracked face feature point information is inversely related to the displacement of the face to be replaced in the previous image frame and the current image frame.

For example, assume α is an exponential correlation function based on a natural base e:

wherein, the liquid crystal display device comprises a liquid crystal display device,

d＝||p(t)-p(t-1)|| (3)

σ ² ＝(h*h)/1500 (4)

in the formula (4), h is the height of the rectangular frame of the face obtained by face detection, wherein the value of 1500 is only an example, and other values can be selected as the corresponding denominator.

According to the formula, the final obtained coordinates combine the pixel coordinates of the face feature point information obtained by current positioning and the pixel coordinates of the face feature point predicted by the previous frame through the respective weight information, if the face moves more and more in the previous image frame to the current image frame, d is larger and alpha is smaller, and the final feature point coordinates ps (t) tend to use the coordinates obtained by current positioning; if the motion of the face in the previous image frame to the current image frame is smaller, d is smaller, alpha is larger, and the pixel coordinates ps (t) of the final face feature point information tend to use tracking by the previous image frame to obtain the tracked face feature point information. The method fully considers the motion of the human face and the resolution of the human face (namely h) to adaptively adjust the position of the human face characteristic point of the current frame, and has stability and instantaneity more than the method of smoothing the track of the continuous frame.

In step S103, the computing device replaces the face to be replaced with the template face according to the feature point information of the face and the template feature point information of the template face after the feature point information is tracked. For example, the template feature point information of the template face includes image position information of the features such as eyes, nose, mouth, face contour, etc. of the template face for replacing the face to be replaced in the template image, etc. The computing equipment acquires the face feature points after feature point information tracking processing, and replaces the face to be replaced with the template face based on the face feature points and template feature point information of the template face for replacing the face to be replaced, such as Delaunay triangulation, face contour adaptation and the like.

In some embodiments, the method further includes step S106 (not shown) before step S103, and in step S106, the computing device detects whether the face to be replaced matches the template face based on the feature points of the face and the template feature point information of the template face after the feature point information is tracked; and if the face to be replaced is mismatched with the face shape of the template face, carrying out face contour adaptation processing on the face to be replaced according to the face characteristic point information tracked according to the characteristic point information and the template characteristic point information of the template face, so that the processed face to be replaced is matched with the face shape of the template face.

For example, the face matching includes, but is not limited to, performing polygon coincidence ratio matching on a polygon formed by the outer outline of the face feature point information and a polygon formed by the outer outline of the template feature point information under similar conditions, if an external face rectangle corresponding to the outer outline of the face feature point information and an external template rectangle corresponding to the outer outline of the template feature point information are determined first, ensuring that the centers of the external face rectangle and the external template rectangle coincide under the condition that the lengths or widths of the external face rectangle and the external template rectangle are consistent, calculating the coincidence area S1 and the non-coincidence area S2 of the polygon formed by the outer outline of the face feature point information and the template polygon formed by the outer outline of the template feature point information, and if the proportion of the sum (S1+S2) of the occupation area and the non-coincidence area of the face feature point information is greater than or equal to a certain threshold value, determining that the face feature point information and the face of the template feature point information do not match; and if the proportion of the non-overlapping area S2 to the sum (S1+S2) of the overlapping area and the non-overlapping area is smaller than a certain threshold value, determining that the face feature point information is matched with the face shape of the template feature point information. In some embodiments, if the face shapes are matched, the computing device may divide at least one of the face to be replaced and the template face into a plurality of triangle areas according to the face feature point information and the template feature point information of the template face, and replace the face to be replaced with the template face according to affine transformation; and if the face feature point information is not matched with the template feature point information, the computing equipment performs face contour adaptation according to the face feature point information and the template feature point information of the template face after the feature point information is tracked.

As in some embodiments, in step S103, the computing device divides at least one of the face to be replaced and the template face into a plurality of triangle areas according to the face feature point information and the template feature point information of the template face, and replaces the face to be replaced with the template face according to affine transformation. For example, based on 75 feature points in the template face shown in fig. 4, the computing device divides the face region into 119 triangles for the 75 feature points by the Delaunay triangulation method, as in the triangulation example shown in fig. 5, each triangle is composed of 3 feature points, and a total of 119 sets of feature points. Then, as shown in fig. 6, a graph (a) shows an example of template feature point information of a template face, a graph (b) shows an example of face feature point information of a face to be replaced, a graph (c) shows an example of face feature point information of a face to be replaced, the template feature point information of the graph (a) corresponds to each triangle in the face feature point information of the graph (c), an image of each triangle of the face in the template feature point information of the template face is transformed into a triangle area formed by face feature point information corresponding to a current frame through affine transformation, a corresponding relation is finally established between the template feature information of the template face and the face feature point information of the current image frame, the template face is transformed into the face to be replaced according to the corresponding relation, and the face to be replaced is covered in the current image frame.

In other embodiments, step S103 includes sub-step S1031 (not shown) and step S1032 (not shown). In step S1031, the computing device performs face contour adaptation on the face to be replaced according to the face feature point information tracked by the feature point information and the template feature point information of the template face; in step S1032, the computing device replaces the face to be replaced with the template face according to the face feature point information after the face contour adaptation and the template feature point information of the template face. For example, the face replacement is carried out based on the Delaunay triangulation method, although the face is natural after the face is transformed, if the face shape of the real face is greatly different from the face shape of the template face, the face shape of the transformed face is consistent with the face to be replaced originally, and the face shape is inconsistent with the template face in the whole naked eye. In order to reduce the influence, the deformed face shape is more lifelike to the template face, the face shape of the template face is matched to the face shape of the face to be replaced, the face shape of the face to be replaced is deformed, the face shapes of the face to be replaced and the template face are similar, and then the template face is replaced to the face to be replaced to obtain a corresponding current image frame. Since the transformation matrix from the contour of the template face to the face to be replaced comprises 6 unknowns, at least 3 point pairs are needed, namely at least three point pairs are taken from the contour of the template feature point information and the contour of the face feature point information, corresponding transformation matrix information is calculated, and then the face feature point information is adjusted according to the corresponding transformation matrix information. As in some embodiments, the step S1031 includes a substep S10311 (not shown) and a substep S10312 (not shown). In step S10311, the computing device takes at least three point pairs from the contour of the face feature point information and the contour of the template feature point information, and determines transformation matrix information from the template feature point information to the face feature point information; in step S10312, the computing device adjusts the face feature point information according to the transformation matrix information. For example, the contour of the face feature point information includes feature points of the face outline and feature points of four corners of eyes in the face feature points, and the contour of the template feature point information includes feature points of the template face outline and feature points of four corners of eyes, and the feature point set shown in fig. 4 includes 17 outline feature points and feature points of four corners of eyes, and total 21 feature points. For example, the outline of the face feature point information includes face outline feature points in the face feature points, and the outline of the template feature point information includes template face outline feature points. And taking at least three point pairs from the outline of the face feature point information and the outline of the template feature point information, solving 6 unknowns, and obtaining corresponding transformation matrix information, thereby adjusting the face feature point information according to the transformation matrix information, for example, calculating pixel coordinates of the face feature point through the template feature point and the transformation matrix information, for example, calculating adjustment position information of the face feature point adjustment through the template feature point and the transformation matrix information, and adjusting the pixel coordinates of the face feature point information based on the adjustment position information, wherein the adjustment about the face feature point can be the outline of only adjusting the face feature point information, or can be all face feature point information corresponding to the whole face.

In some embodiments, in step S10311, the computing device takes at least three point pairs from the contour of the face feature point information and the contour of the template feature point information, and determines optimal transformation matrix information from the template feature point information to the face feature point information by using a least squares principle; in step S10312, the computing device adjusts the face feature point information according to the optimal transformation matrix information. For example, assuming that the feature point coordinates of the template feature point information of the template face are (x, y), the feature point coordinates of the face feature point information of the face to be replaced are (x ', y'), calculating an optimal similarity transformation matrix M from the template face to the real face using a least square method according to the following formula with at least three feature point pairs,

according to the formula (5), calculating to obtain a similarity transformation matrix M, transforming template characteristic point information of a template face into the current image frame based on the similarity transformation matrix, as shown in a graph (8), wherein the graph (a) shows the template face, the graph (b) shows the face to be replaced, the graph (c) shows the face to be replaced after face contour adaptation, the computing equipment projects the outer contour of the template characteristic point of the deep color point shown in the graph (a) into the graph (b) through the similarity transformation matrix, and transforms the contour of the face characteristic point of the shallow color point in the graph (b) to the position of the corresponding deep color point to form a result after face contour adaptation shown in the graph (c).

In some embodiments, in step S10312, the computing device adjusts the face feature point information according to the optimal transformation matrix information and a moving least squares image deformation algorithm. For example, as shown in fig. 8, assuming that the coordinates of (b) a light color point (a face contour feature point mark of a face to be replaced) are (x 1n, y1 n) (actually, x ', y'), and the coordinates of a dark color point (a face contour feature point of a template face) are (x 2n, y2 n) (actually, a point obtained by transforming x, y through a transformation matrix), n= … 20, and based on these 21 points, the face of (b) is deformed into a face shape like a deep color point by using an image deformation method of least squares to obtain a graph (c) (actually, the entire graph is deformed by using the algorithm, except that the face portion is deformed significantly).

Then, feature points according to the outline of the face after transformation, such as 21 dark points in fig. 8 (c), are calculated, the points are sequentially connected on the face to be replaced to obtain a polygon, and the polygon image of the template face is transformed to the corresponding polygon area of the face to be replaced to obtain a new face after face replacement.

In some embodiments, the face feature point information and the template feature point information may be subjected to face contour adaptation to perform subsequent face replacement, color fusion, edge processing, and the like, and in other embodiments, in order to make the image obtained by face change more natural, triangulation, affine transformation, and the like may be performed after the face contour adaptation is performed to perform replacement of the template feature point and the face feature point information. In some embodiments, in step S1032, the computing device divides at least one of the face to be replaced and the template face into a plurality of triangle areas according to the face feature point information after the face contour adaptation and the template feature point information of the template face, and replaces the face to be replaced with the template face according to affine transformation. For example, triangulating at least one of the template feature point information and the face feature point information, establishing a mapping relation between the template feature information of the template face and the face feature point information of the current image frame based on the obtained triangle, transforming the template face into the face to be replaced according to the mapping relation, and covering the face to be replaced in the current image frame.

In step S104, the computing device performs color fusion and edge processing on the current image frame after the face replacement. For example, because the face template is likely to be inconsistent with the skin color and illumination of the current real face, such as that the forehead part of the real face cannot be covered by the face template, the covered face area and the uncovered skin color area may be inconsistent in skin color and tone, and obvious marks for artificial editing are generated, so that color fusion needs to be performed on the face, most of the currently known face fusion algorithms adopt a general poisson fusion algorithm, the calculated amount of the algorithm is large, the real-time requirement is difficult to meet, and when the illumination difference of the two faces is large, the fusion effect performance can be influenced. The scheme provides a stable human face color fusion method, wherein, because the whole colors of two human faces are fused, the color tone (H), the saturation (S) and the brightness (V) are adjusted, and the HSV color space accords with the intuitive characteristics of a person, the color fusion is carried out on the current image frame after the human face is replaced in the HSV color space. In some embodiments, the color fusion may be processed based on the colors of all pixels of the image frame; in other embodiments, the color fusion is based on the color of the pixels of the face region before and after replacement. As in some embodiments, in step S104, the computing device converts the color space of the original current image frame before the face replacement and the current image frame after the face replacement into HSV color space, performs color fusion, and converts the color space of the current image frame after fusion into RGB color space; and performing edge processing on the current image frame after the color fusion by using a filter. The original current image frame refers to an image frame in an original video stream before the face is not replaced, and the current image frame refers to a current image frame after the face to be replaced is replaced by the template face. The computing equipment firstly converts the color space of the original current image frame and the current image frame from RGB color space to HSV space, obtains the fused current image frame based on a preset color fusion algorithm, and converts the color space of the image frame from HSV space to RGB space, thereby obtaining the corresponding current image frame. At this time, a relatively obvious boundary exists at the edge part of the replacement face in the current image frame, so that obvious trace of artificial editing is seen, and the computing device carries out smooth transition on the background edge of the current image frame and the template face (the replacement face) through a filter, wherein the filter comprises but is not limited to Gaussian filtering, domain smoothing filtering, median filtering and the like. In some embodiments, the color fusion algorithm comprises: stretching an H channel in the HSV color space to 0-180, and stretching an S channel and a V channel to 0-255, and respectively calculating pixel average values of the original current image frame and the current image frame in each color channel; and determining HSV color distribution in the fused current image frame according to the original current image frame, the pixel average value of each color channel in the current image frame and a preset fusion algorithm. For example, the computing device converts the original current image frame and the current image frame into an HSV image space respectively to obtain HSV images I1 and I2, stretches the H channel to 0-180, stretches the s channel and the V channel to 0-255, so that one pixel occupies 3 bytes, and is suitable for computer operation. For each H, S, V channels of the two images, the average value of the pixel values is calculated respectively to obtain an average value pixel value M1H of an H channel of I1, an average value pixel value M1S of an S channel of I1, an average value pixel value M1V of a V channel of I1 and M2H, M2S and M2V of I2. And then calculating the fused face HSV image according to the following formula:

M1h＝(M1h+90)％180；

M2h＝(M2h+90)％180；

I3h(i,j)＝(M1h+I2h(i,j)–M2h+180)％180；

I3s(i,j)＝M1s+I2s(i,j)–M2s；

I3v(i,j)＝M1v+I2v(i,j)–M2v；

When the value of I3s (I, j) is larger than 255, the value of I3s (I, j) is 255; when the value of I3s (I, j) is smaller than 0, then I3s (I, j) takes on the value 0, and I3v (I, j) is the same.

Wherein, I3h (I, j), I3s (I, j) and I3v (I, j) are pixel values of three HSV channels of the fused face image at the image coordinate position (I, j), and I2h (I, j), I2s (I, j) and I2v (I, j) are pixel values of three HSV channels of the replaced face image at the image coordinate position (I, j).

Finally, the fused face image is converted from the HSV color space to the BGR color space, as shown in fig. 7, which illustrates an example of the replaced current image frame after color fusion.

In some embodiments, the video stream comprises a real-time video stream transmitted by the user device during real-time video communication. For example, the user A holds a user device (such as a mobile phone, etc.), the user A establishes a real-time video communication connection with the user B through a corresponding application in the user device, and the user device of the user A receives the real-time video stream shot by the user B through real-time video communication. Based on the operation of the user A, the user equipment of the user A performs real-time virtual face changing on the video stream shot by the user B in the real-time video stream, such as performing virtual face changing operation locally on the user equipment A, or sending a corresponding face changing operation instruction to the corresponding network equipment, and receiving the real-time video stream after face changing returned by the network equipment; or the user equipment of the user B performs real-time virtual face changing operation on the shot real-time video stream based on the operation of the user B, and transmits the video stream subjected to face changing to the user A and the like, wherein the virtual face changing operation can be completed at the user equipment end of the user B or at the network equipment end.

In some embodiments, the video stream comprises a real-time video stream captured by a camera of an augmented reality device. For example, a user's nail holds an augmented reality device comprising a camera for capturing a real-time video stream in front of the current user's nail. The real-time video stream comprises a corresponding face to be replaced, the augmented reality device performs virtual face changing operation based on the real-time video stream or transmits the real-time video stream to the network device, and the network device performs virtual face changing and the like on the real-time video stream. In some implementations, the method further includes step S105 (not shown), in step S105, the computing device determining presentation location information of the replaced face on a display device of the augmented reality device; and presenting the replaced template face based on the presentation position information. For example, the augmented reality device comprises a display means for overlaying the template face; and the augmented reality equipment calculates the position of the face to be replaced in the display device based on the corresponding coordinate conversion parameters, and superimposes and presents the corresponding template face at the position. In some embodiments, the real-time video stream is presented through a display device of the augmented reality device, and the augmented reality device superimposes a corresponding template face at a corresponding position (to-be-replaced face position) of the display device; in other embodiments, the real-time video stream is used for calculating the position information of the face to be replaced in the display device, and the template face is overlaid and presented at the corresponding position of the display device, so that the augmented reality effect of replacing the face to be replaced in the real world with the template face is achieved.

Referring to the example shown in fig. 1, fig. 9 illustrates a computing device for real-time virtual face-changing according to one aspect of the present application, wherein the device includes one-to-one module 101, two-to-two module 102, three-to-three module 103, and four-to-four module 104. A one-to-one module 101, configured to obtain face feature point information of a face to be replaced in a current image frame of a video stream related to the face to be replaced; a second module 102, configured to track feature point information of the face feature point information in a current image frame of the video stream; a three-module 103, configured to replace the face to be replaced with the template face according to the feature point information of the face and the template feature point information of the template face after the feature point information is tracked; and the four modules 104 are used for carrying out color fusion and edge processing on the current image frame after the face replacement.

And the one-to-one module 101 is used for acquiring the face characteristic point information of the face to be replaced in the current image frame of the video stream of the face to be replaced. For example, the computing device further comprises an input device, which is used for receiving video streams about the face to be replaced sent by other devices, such as a data transmission interface and the like; alternatively, the computing device further comprises camera means for capturing video streams etc. concerning the face to be replaced, such as cameras, depth cameras etc. The face characteristic point information of the face to be replaced comprises image position information of characteristic points corresponding to the marking characteristics such as eyes, nose, mouth, face outline and the like of the face to be replaced in the current image frame and the like. The computing device extracts the face feature points of the face to be replaced by receiving video streams about the face to be replaced, which are sent by other devices, based on the face positions in the current image frames in the video streams, through face feature point extraction and other modes, so that face feature point information is obtained. As in some embodiments, the one-to-one module 101 includes one-to-one sub-module 1011 (not shown) and one-to-two sub-modules 1012 (not shown), where the one-to-one sub-module 1011 is configured to obtain image position information of the face to be replaced in a current image frame in the video stream related to the face to be replaced; and the one-to-two sub-module 1012 is used for extracting the face feature point information in the current image frame according to the image position information. For example, the image position information of the face to be replaced includes pixel coordinates of the contour of the face to be replaced or a custom external graph (such as an external rectangle) in an image coordinate system of the current image frame, such as a line segment equation corresponding to four line segments of a border of the external rectangle, or corner point pixel coordinates corresponding to at least two opposite angles of the external rectangle, and the like. In some embodiments, the image position information of the face to be replaced is determined by selecting in the current image frame based on user operation (such as frame selection, etc.), for example, by clicking frame selection operation of the user, etc., and corresponding face circumscribed rectangle is determined at the clicking or frame selection position of the user; in other embodiments, the image location information of the face to be replaced is determined by performing face detection on the current image frame by the computing device, for example, a sub-module 1011 is configured to perform face detection on the current image frame in the video stream related to the face to be replaced, determine image location information corresponding to at least one face, and determine the image location information of the face to be replaced from the image location information of the at least one face. For example, the Face detection includes a computing device determining, in the current image frame, one or more Face contours or pixel coordinates of bounding rectangles in the current image frame, etc., by Face-related features, including, but not limited to, adaboost cascading Face detection algorithm based on Haar features, ACF (Aggregate Channel Features for Multi-view Face Detection, multi-view Face detection based on aggregate channels), DPM (Deformable Part Model, deformable component model), cascades CNN (A convolutional neural network Cascade for Face detection, face detection algorithm of convolutional neural network), denseBox (DenseBox: unifying Landmark Localization with End to End Object Detection, object detection algorithm based on full convolutional network), faceness-Net (Faceness-Net: face Detection through Deep Facial Part Responses, video Face detection recognition algorithm based on deep learning), facer-CNN, pyramidBox (Pyramidbox: A Context-assisted Single Shot Face Detector, environmentally assisted single step Face detector), etc.; here we use Adaboost cascading face detection algorithm based on Haar feature, the method has fast detection speed and good robustness, and can well realize face detection, and rectangular frames in the example diagram of face detection shown in fig. 3 and two diagonal coordinates (x 1, y 1), (x 2, y 2) of the rectangular frames are obtained. Of course, those skilled in the art will appreciate that the above-described face detection algorithm is merely exemplary, and that other face detection algorithms that may be present or later developed are applicable to the present application and are intended to be within the scope of the present application and are incorporated herein by reference.

And a second module 102, configured to track feature point information of the face feature point information in a current image frame of the video stream. For example, in the currently known face feature point positioning method, certain instability exists, even if a face is motionless in a video, the face feature point information of the face is affected by positioning accuracy, jitter in a plurality of pixels exists in front and rear frames, if a result obtained by directly positioning the face feature point information is subjected to subsequent face replacement or the like, and a certain jitter phenomenon exists in a video stream for the replaced face. According to the scheme, the face characteristic point information in the front frame and the rear frame is tracked, so that effective correction processing can be carried out, and a stable face image presentation effect is obtained. The feature point information tracking comprises the step of tracking based on the face feature point information of the face to be replaced of the previous image frame to obtain predicted tracking face feature point information, and the final face feature point information is determined by combining the face feature point information in the current image frame. As in some embodiments, the second module 102 is configured to perform feature point information tracking on the face feature point information in the current video of the video stream using an optical flow tracking algorithm. For example, the optical flow tracking algorithm includes a KLT (Kanade-Lucas-Tomasi) optical flow method, and the specific method includes: for each feature point, it is assumed that face feature point information corresponding to a previous image frame is extracted from the previous image frame of the video stream by the method for extracting face feature point information, position information of face feature points in the current image frame is predicted based on the KLT optical flow method, corresponding tracking face feature point information is obtained, and face feature point information corresponding to the current image frame is extracted by combining the method for extracting face feature point information, so that corresponding final face feature point information is obtained. In some embodiments, the second module 102 includes a second sub-module 1021 (not shown) and a second sub-module 1022 (not shown), where the second sub-module 1021 is configured to predict the tracked face feature point information of the face to be replaced in the current image frame of the video stream by using an optical flow tracking algorithm according to the face feature point information of the face to be replaced in the current image frame of the video stream; and a two-two sub-module 1022, configured to determine final face feature point information in the current image frame according to the tracked face feature point information and the face feature point information in the current image frame.

In some embodiments, the two sub-modules 1022 are configured to obtain weight information of the tracked face feature point information and weight information corresponding to the face feature point information in the current image frame, and determine final face feature point information in the current image frame according to the tracked face feature point information, the weight information of the tracked face feature point information, and the weight information of the face feature point information. For example, for each feature point, the position coordinate obtained from the face feature point information in the previous image frame is set to be p (t-1), and the position of the feature point in the current frame is predicted by the KLT optical flow method and set to be p0 (t). The coordinates of the face feature point information obtained in the current frame are set as p (t), and the final coordinates ps (t) of the face feature point information are finally obtained (namely, the tracking result and the current positioning result are combined):

ps(t)＝(1-α)p(t)+αp0(t) (6)

d＝||p(t)-p(t-1)|| (8)

σ ² ＝(h*h)/1500 (9)

in the formula (9), h is the height of the rectangular frame of the face obtained by face detection, wherein the value of 1500 is only an example, and other values can be selected as the corresponding denominator.

And the three modules 103 are used for replacing the face to be replaced by the template face according to the face feature point information and the template feature point information of the template face after the feature point information is tracked. For example, the template feature point information of the template face includes image position information of the features such as eyes, nose, mouth, face contour, etc. of the template face for replacing the face to be replaced in the template image, etc. The computing equipment acquires the face feature points after feature point information tracking processing, and replaces the face to be replaced with the template face based on the face feature points and template feature point information of the template face for replacing the face to be replaced, such as Delaunay triangulation, face contour adaptation and the like.

In some embodiments, the apparatus further includes a six module 106 (not shown) before executing the three modules 103, and the six module 106 is configured to detect whether the face to be replaced matches the template face based on the feature points of the face and the template feature point information of the template face after the feature point information is tracked; and if the face to be replaced is mismatched with the face shape of the template face, carrying out face contour adaptation processing on the face to be replaced according to the face characteristic point information tracked according to the characteristic point information and the template characteristic point information of the template face, so that the processed face to be replaced is matched with the face shape of the template face.

In some embodiments, a three-module 103 is configured to divide at least one of the face to be replaced and the template face into a plurality of triangle areas according to the face feature point information and the template feature point information of the template face, and replace the face to be replaced with the template face according to affine transformation. For example, based on 75 feature points in the template face shown in fig. 4, the computing device divides the face region into 119 triangles for the 75 feature points by the Delaunay triangulation method, as in the triangulation example shown in fig. 5, each triangle is composed of 3 feature points, and a total of 119 sets of feature points. Then, as shown in fig. 6, a graph (a) shows an example of template feature point information of a template face, a graph (b) shows an example of face feature point information of a face to be replaced, a graph (c) shows an example of face feature point information of a face to be replaced, the template feature point information of the graph (a) corresponds to each triangle in the face feature point information of the graph (c), an image of each triangle of the face in the template feature point information of the template face is transformed into a triangle area formed by face feature point information corresponding to a current frame through affine transformation, a corresponding relation is finally established between the template feature information of the template face and the face feature point information of the current image frame, the template face is transformed into the face to be replaced according to the corresponding relation, and the face to be replaced is covered in the current image frame.

In other embodiments, a tri-module 103 includes a tri-one sub-module 1031 (not shown) and a tri-two sub-module 1032 (not shown). The three-in-one sub-module 1031 is configured to perform face contour adaptation on the face to be replaced according to the face feature point information tracked by the feature point information and the template feature point information of the template face; and the three-two submodule 1032 is used for replacing the face to be replaced by the template face according to the face characteristic point information after the face contour adaptation and the template characteristic point information of the template face. For example, the face replacement is carried out based on the Delaunay triangulation method, although the face is natural after the face is transformed, if the face shape of the real face is greatly different from the face shape of the template face, the face shape of the transformed face is consistent with the face to be replaced originally, and the face shape is inconsistent with the template face in the whole naked eye. In order to reduce the influence, the deformed face shape is more lifelike to the template face, the face shape of the template face is matched to the face shape of the face to be replaced, the face shape of the face to be replaced is deformed, the face shapes of the face to be replaced and the template face are similar, and then the template face is replaced to the face to be replaced to obtain a corresponding current image frame. Since the transformation matrix from the contour of the template face to the face to be replaced comprises 6 unknowns, at least 3 point pairs are needed, namely at least three point pairs are taken from the contour of the template feature point information and the contour of the face feature point information, corresponding transformation matrix information is calculated, and then the face feature point information is adjusted according to the corresponding transformation matrix information. As in some embodiments, the one-three-one sub-module 1031 includes one-three-one unit 10311 (not shown) and one-three-two unit 10312 (not shown). A three-one unit 10311, configured to take at least three point pairs from the contour of the face feature point information and the contour of the template feature point information, and determine transformation matrix information from the template feature point information to the face feature point information; and a three-two unit 10312 for adjusting the face feature point information according to the transformation matrix information. For example, the contour of the face feature point information includes feature points of the face outline and feature points of four corners of eyes in the face feature points, and the contour of the template feature point information includes feature points of the template face outline and feature points of four corners of eyes, and the feature point set shown in fig. 4 includes 17 outline feature points and feature points of four corners of eyes, and total 21 feature points. For example, the outline of the face feature point information includes face outline feature points in the face feature points, and the outline of the template feature point information includes template face outline feature points. And taking at least three point pairs from the outline of the face feature point information and the outline of the template feature point information, solving 6 unknowns, and obtaining corresponding transformation matrix information, thereby adjusting the face feature point information according to the transformation matrix information, for example, calculating pixel coordinates of the face feature point through the template feature point and the transformation matrix information, for example, calculating adjustment position information of the face feature point adjustment through the template feature point and the transformation matrix information, and adjusting the pixel coordinates of the face feature point information based on the adjustment position information, wherein the adjustment about the face feature point can be the outline of only adjusting the face feature point information, or can be all face feature point information corresponding to the whole face.

In some embodiments, a three-one unit 10311 is configured to take at least three point pairs from the contour of the face feature point information and the contour of the template feature point information, and determine optimal transformation matrix information from the template feature point information to the face feature point information by using a least squares principle; and a three-two unit 10312 for adjusting the facial feature point information according to the optimal transformation matrix information. For example, assuming that the feature point coordinates of the template feature point information of the template face are (x, y), the feature point coordinates of the face feature point information of the face to be replaced are (x ', y'), calculating an optimal similarity transformation matrix M from the template face to the real face using a least square method according to the following formula with at least three feature point pairs,

according to the formula (10), calculating to obtain a similarity transformation matrix M, transforming template characteristic point information of a template face into the current image frame based on the similarity transformation matrix, as shown in a graph (8), wherein the graph (a) shows the template face, the graph (b) shows the face to be replaced, the graph (c) shows the face to be replaced after face contour adaptation, the computing equipment projects the outer contour of the template characteristic point of the deep color point shown in the graph (a) into the graph (b) through the similarity transformation matrix, and transforms the contour of the face characteristic point of the shallow color point in the graph (b) to the position of the corresponding deep color point to form a result after face contour adaptation shown in the graph (c).

In some embodiments, a three-two unit 10312 is used for adjusting the face feature point information according to the optimal transformation matrix information and a moving least square image deformation algorithm. For example, as shown in fig. 8, assuming that the coordinates of the light color point of the (b) image are (x 1n, y1 n) (actually, x ', y'), the coordinates of the dark color point are (x 2n, y2 n) (actually, the points obtained by transforming x, y into a transformation matrix), and n= … 20, the face of the (b) image is deformed into the face shape of the dark color point by using the image deformation method of least squares according to the points 21, so as to obtain the image (c) (actually, the whole image is deformed by using the algorithm, but the face part is deformed obviously).

In some embodiments, the face feature point information and the template feature point information may be subjected to face contour adaptation to perform subsequent face replacement, color fusion, edge processing, and the like, and in other embodiments, in order to make the image obtained by face change more natural, triangulation, affine transformation, and the like may be performed after the face contour adaptation is performed to perform replacement of the template feature point and the face feature point information. In some embodiments, the one-three-two sub-module 1032 is configured to divide at least one of the face to be replaced and the template face into a plurality of triangle areas according to the face feature point information after the face contour adaptation and the template feature point information of the template face, and replace the face to be replaced with the template face according to affine transformation. For example, triangulating at least one of the template feature point information and the face feature point information, establishing a mapping relation between the template feature information of the template face and the face feature point information of the current image frame based on the obtained triangle, transforming the template face into the face to be replaced according to the mapping relation, and covering the face to be replaced in the current image frame.

And the four modules 104 are used for carrying out color fusion and edge processing on the current image frame after the face replacement. For example, because the face template is likely to be inconsistent with the skin color and illumination of the current real face, such as that the forehead part of the real face cannot be covered by the face template, the covered face area and the uncovered skin color area may be inconsistent in skin color and tone, and obvious marks for artificial editing are generated, so that color fusion needs to be performed on the face, most of the currently known face fusion algorithms adopt a general poisson fusion algorithm, the calculated amount of the algorithm is large, the real-time requirement is difficult to meet, and when the illumination difference of the two faces is large, the fusion effect performance can be influenced. The scheme provides a stable human face color fusion method, wherein, because the whole colors of two human faces are fused, the color tone (H), the saturation (S) and the brightness (V) are adjusted, and the HSV color space accords with the intuitive characteristics of a person, the color fusion is carried out on the current image frame after the human face is replaced in the HSV color space. In some embodiments, the color fusion may be processed based on the colors of all pixels of the image frame; in other embodiments, the color fusion is based on the color of the pixels of the face region before and after replacement. As in some embodiments, in step S104, the computing device converts the color space of the original current image frame before the face replacement and the current image frame after the face replacement into HSV color space, performs color fusion, and converts the color space of the current image frame after fusion into RGB color space; and performing edge processing on the current image frame after the color fusion by using a filter. The original current image frame refers to an image frame in an original video stream before the face is not replaced, and the current image frame refers to a current image frame after the face to be replaced is replaced by the template face. The computing equipment firstly converts the color space of the original current image frame and the current image frame from RGB color space to HSV space, obtains the fused current image frame based on a preset color fusion algorithm, and converts the color space of the image frame from HSV space to RGB space, thereby obtaining the corresponding current image frame. At this time, a relatively obvious boundary exists at the edge part of the replacement face in the current image frame, so that obvious trace of artificial editing is seen, and the computing device carries out smooth transition on the background edge of the current image frame and the template face (the replacement face) through a filter, wherein the filter comprises but is not limited to Gaussian filtering, domain smoothing filtering, median filtering and the like. In some embodiments, the color fusion algorithm comprises: stretching an H channel in the HSV color space to 0-180, and stretching an S channel and a V channel to 0-255, and respectively calculating pixel average values of the original current image frame and the current image frame in each color channel; and determining HSV color distribution in the fused current image frame according to the original current image frame, the pixel average value of each color channel in the current image frame and a preset fusion algorithm. For example, the computing device converts the original current image frame and the current image frame into an HSV image space respectively to obtain HSV images I1 and I2, stretches the H channel to 0-180, stretches the s channel and the V channel to 0-255, so that one pixel occupies 3 bytes, and is suitable for computer operation. For each H, S, V channels of the two images, the average value of the pixel values is calculated respectively to obtain an average value pixel value M1H of an H channel of I1, an average value pixel value M1S of an S channel of I1, an average value pixel value M1V of a V channel of I1 and M2H, M2S and M2V of I2. And then calculating the fused face HSV image according to the following formula:

M1h＝(M1h+90)％180；

M2h＝(M2h+90)％180；

I3h(i,j)＝(M1h+I2h(i,j)–M2h+180)％180；

I3s(i,j)＝M1s+I2s(i,j)–M2s；

I3v(i,j)＝M1v+I2v(i,j)–M2v；

In some embodiments, the video stream comprises a real-time video stream captured by a camera of an augmented reality device. For example, a user's nail holds an augmented reality device comprising a camera for capturing a real-time video stream in front of the current user's nail. The real-time video stream comprises a corresponding face to be replaced, the augmented reality device performs virtual face changing operation based on the real-time video stream or transmits the real-time video stream to the network device, and the network device performs virtual face changing and the like on the real-time video stream. In some implementations, the apparatus further includes a five module 105 (not shown) for determining presentation position information of the replaced face on a display device of the augmented reality apparatus; and presenting the replaced template face based on the presentation position information. For example, the augmented reality device comprises a display means for overlaying the template face; and the augmented reality equipment calculates the position of the face to be replaced in the display device based on the corresponding coordinate conversion parameters, and superimposes and presents the corresponding template face at the position. In some embodiments, the real-time video stream is presented through a display device of the augmented reality device, and the augmented reality device superimposes a corresponding template face at a corresponding position (to-be-replaced face position) of the display device; in other embodiments, the real-time video stream is used for calculating the position information of the face to be replaced in the display device, and the template face is overlaid and presented at the corresponding position of the display device, so that the augmented reality effect of replacing the face to be replaced in the real world with the template face is achieved.

In addition to the methods and apparatus described in the above embodiments, the present application also provides a computer-readable storage medium storing computer code which, when executed, performs a method as described in any one of the preceding claims.

The present application also provides a computer program product which, when executed by a computer device, performs a method as claimed in any preceding claim.

The present application also provides a computer device comprising:

one or more processors;

a memory for storing one or more computer programs;

the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any preceding claim.

FIG. 10 illustrates an exemplary system that may be used to implement various embodiments described herein;

in some embodiments, as shown in fig. 10, system 1000 can be implemented as any of the devices described above in each of the described embodiments. In some embodiments, system 1000 can include one or more computer-readable media (e.g., system memory or NVM/storage 1020) having instructions and one or more processors (e.g., processor(s) 1005) coupled with the one or more computer-readable media and configured to execute the instructions to implement the modules to perform the actions described herein.

For one embodiment, the system control module 1010 may include any suitable interface controller to provide any suitable interface to at least one of the processor(s) 1005 and/or any suitable device or component in communication with the system control module 1010.

The system control module 1010 may include a memory controller module 1030 to provide an interface to the system memory 1015. The memory controller module 1030 may be a hardware module, a software module, and/or a firmware module.

System memory 1015 may be used, for example, to load and store data and/or instructions for system 1000. For one embodiment, system memory 1015 may comprise any suitable volatile memory, such as, for example, suitable DRAM. In some embodiments, the system memory 1015 may comprise double data rate type four synchronous dynamic random access memory (DDR 4 SDRAM).

For one embodiment, the system control module 1010 may include one or more input/output (I/O) controllers to provide an interface to NVM/storage 1020 and communication interface(s) 1025.

For example, NVM/storage 1020 may be used to store data and/or instructions. NVM/storage 1020 may include any suitable nonvolatile memory (e.g., flash memory) and/or may include any suitable nonvolatile storage device(s) (e.g., one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives).

NVM/storage 1020 may include storage resources that are physically part of the device on which system 1000 is installed or which may be accessed by the device without being part of the device. For example, NVM/storage 1020 may be accessed over a network via communication interface(s) 1025.

Communication interface(s) 1025 may provide an interface for system 1000 to communicate over one or more networks and/or with any other suitable device. The system 1000 may wirelessly communicate with one or more components of a wireless network in accordance with any of one or more wireless network standards and/or protocols.

For one embodiment, at least one of the processor(s) 1005 may be packaged together with logic of one or more controllers (e.g., memory controller module 1030) of the system control module 1010. For one embodiment, at least one of the processor(s) 1005 may be packaged together with logic of one or more controllers of the system control module 1010 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 1005 may be integrated on the same die with logic of one or more controllers of the system control module 1010. For one embodiment, at least one of the processor(s) 1005 may be integrated on the same die with logic of one or more controllers of the system control module 1010 to form a system on chip (SoC).

In various embodiments, system 1000 may be, but is not limited to being: a server, workstation, desktop computing device, or mobile computing device (e.g., laptop computing device, handheld computing device, tablet, netbook, etc.). In various embodiments, system 1000 may have more or fewer components and/or different architectures. For example, in some embodiments, system 1000 includes one or more cameras, keyboards, liquid Crystal Display (LCD) screens (including touch screen displays), non-volatile memory ports, multiple antennas, graphics chips, application Specific Integrated Circuits (ASICs), and speakers.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, using Application Specific Integrated Circuits (ASIC), a general purpose computer or any other similar hardware device. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions as described above. Likewise, the software programs of the present application (including associated data structures) may be stored on a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. In addition, some steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

Furthermore, portions of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application by way of operation of the computer. Those skilled in the art will appreciate that the form of computer program instructions present in a computer readable medium includes, but is not limited to, source files, executable files, installation package files, etc., and accordingly, the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Herein, a computer-readable medium may be any available computer-readable storage medium or communication medium that can be accessed by a computer.

Communication media includes media whereby a communication signal containing, for example, computer readable instructions, data structures, program modules, or other data, is transferred from one system to another. Communication media may include conductive transmission media such as electrical cables and wires (e.g., optical fibers, coaxial, etc.) and wireless (non-conductive transmission) media capable of transmitting energy waves, such as acoustic, electromagnetic, RF, microwave, and infrared. Computer readable instructions, data structures, program modules, or other data may be embodied as a modulated data signal, for example, in a wireless medium, such as a carrier wave or similar mechanism, such as that embodied as part of spread spectrum technology. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. The modulation may be analog, digital or hybrid modulation techniques.

By way of example, and not limitation, computer-readable storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. For example, computer-readable storage media include, but are not limited to, volatile memory, such as random access memory (RAM, DRAM, SRAM); and nonvolatile memory such as flash memory, various read only memory (ROM, PROM, EPROM, EEPROM), magnetic and ferromagnetic/ferroelectric memory (MRAM, feRAM); and magnetic and optical storage devices (hard disk, tape, CD, DVD); or other now known media or later developed computer-readable information/data that can be stored for use by a computer system.

An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to operate a method and/or a solution according to the embodiments of the present application as described above.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the apparatus claims can also be implemented by means of one unit or means in software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.

Claims

1. A method for real-time virtual face-changing, wherein the method comprises:

performing color fusion and edge processing on the current image frame after the face replacement;

wherein the step of tracking the feature point information of the face feature point information in the current image frame of the video stream includes:

extracting face feature point information of the face to be replaced in a previous image frame in the video stream, and predicting face feature point information of a face feature point corresponding to the previous image frame in a current image frame of the video stream by utilizing an optical flow tracking algorithm, wherein the optical flow tracking algorithm comprises a KLT optical flow method;

acquiring weight information of the tracking face feature point information and weight information corresponding to the face feature point information of the current image frame, wherein the weight information of the tracking face feature point information is inversely related to the displacement of the face to be replaced in the previous image frame and the current image frame; and determining final face feature point information in the current image frame according to the face feature point information, the face feature point information of the current image frame, the weight information of the face feature point information and the weight information of the face feature point information of the current image frame.

2. The method according to claim 1, wherein the replacing the face to be replaced with the template face according to the face feature point information tracked by the feature point information and the template feature point information of the template face comprises:

performing face contour adaptation on the face to be replaced according to the face feature point information tracked by the feature point information and the template feature point information of the template face;

and replacing the face to be replaced with the template face according to the face feature point information after the face contour adaptation and the template feature point information of the template face.

3. The method according to claim 2, wherein the face contour adaptation of the face to be replaced according to the face feature point information tracked by the feature point information and the template feature point information of the template face includes:

at least three point pairs are taken from the outline of the face feature point information and the outline of the template feature point information, and transformation matrix information from the template feature point information to the face feature point information is determined;

and adjusting the face characteristic point information according to the transformation matrix information.

4. A method according to claim 3, wherein the taking at least three point pairs from the contour of the face feature point information and the contour of the template feature point information, determining the transformation matrix information of the template feature point information to the face feature point information, includes:

at least three point pairs are taken from the outline of the face feature point information and the outline of the template feature point information, and the optimal transformation matrix information from the template feature point information to the face feature point information is determined by utilizing a least square principle;

wherein the adjusting the face feature point information according to the transformation matrix information includes:

and adjusting the face characteristic point information according to the optimal transformation matrix information.

5. The method of claim 4, wherein the adjusting the face feature point information according to the optimal transformation matrix information comprises:

and adjusting the face characteristic point information according to the optimal transformation matrix information and a moving least square image deformation algorithm.

6. The method according to any one of claims 2 to 5, wherein the face feature point information adapted according to the face contour and the template feature point information of the template face, replacing the face to be replaced with the template face, includes:

Dividing at least one of the face to be replaced and the template face into a plurality of triangular areas according to the face feature point information after the face contour is adapted and the template feature point information of the template face, and replacing the face to be replaced with the template face according to affine transformation.

7. The method according to claim 1, wherein the replacing the face to be replaced with the template face according to the face feature point information tracked by the feature point information and the template feature point information of the template face comprises:

dividing at least one of the face to be replaced and the template face into a plurality of triangular areas according to the face feature point information and the template feature point information of the template face, and replacing the face to be replaced with the template face according to affine transformation.

8. The method according to any one of claims 1 to 5, wherein the acquiring face feature point information about the face to be replaced in a current image frame of a video stream of the face to be replaced includes:

acquiring image position information of a face to be replaced in a current image frame in a video stream of the face to be replaced;

And extracting face characteristic point information in the current image frame according to the image position information.

9. The method of claim 8, wherein the acquiring image location information about the face to be replaced in a current image frame in a video stream of the face to be replaced comprises:

and carrying out face detection on a current image frame in a video stream related to the face to be replaced, determining image position information corresponding to at least one face, and determining the image position information of the face to be replaced from the image position information of the at least one face.

10. The method according to any one of claims 1 to 5, wherein the performing color fusion and edge processing on the current image frame after the face replacement includes:

converting the color space of the original current image frame before face replacement and the color space of the current image frame after face replacement into an HSV color space, performing color fusion, and converting the color space of the current image frame after fusion into an RGB color space;

and performing edge processing on the current image frame after the color fusion by using a filter.

11. The method of claim 10, wherein the color fusion comprises:

stretching an H channel in the HSV color space to 0-180, and stretching an S channel and a V channel to 0-255, and respectively calculating pixel average values of the original current image frame and the current image frame in each color channel;

And determining HSV color distribution in the fused current image frame according to the original current image frame, the pixel average value of each color channel in the current image frame and a preset fusion algorithm.

12. The method of any of claims 1-5, wherein the video stream comprises a real-time video stream transmitted by a user device at the time of real-time video communication.

13. The method of any of claims 1-5, wherein the video stream comprises a real-time video stream captured by a camera of an augmented reality device.

14. The method of claim 13, wherein the method further comprises:

determining presentation position information of the replaced face on a display device of the augmented reality device;

and presenting the replaced template face based on the presentation position information.

15. An apparatus for real-time virtual face-changing, wherein the apparatus comprises:

the four modules are used for carrying out color fusion and edge processing on the current image frame after the face replacement;

16. An apparatus for real-time virtual face-changing, wherein the apparatus comprises:

a processor; and

a memory arranged to store computer executable instructions which, when executed, cause the processor to perform the operations of the method of any one of claims 1 to 14.

17. A computer readable medium storing instructions that, when executed, cause a system to perform the operations of the method of any one of claims 1 to 14.