WO2016011834A1

WO2016011834A1 - Image processing method and system

Info

Publication number: WO2016011834A1
Application number: PCT/CN2015/077353
Authority: WO
Inventors: 邢小月; 姜涌; 孟昭龙
Original assignee: 邢小月; 姜涌; 孟昭龙
Priority date: 2014-07-23
Filing date: 2015-04-24
Publication date: 2016-01-28
Also published as: CN104123749A

Abstract

The present invention provides an image processing method and system. The method comprises: simulating a model of a first person according to at least one image containing the first person; determining a target image containing a second person; determining feature information for displaying the second person in the target image; adjusting the display of the first person according to the feature information in the model of the first person; and replacing the second person by the first person after display adjustment in the target image. By means of the technical solution provided in the present invention, the problem that replaced persons and backgrounds are inconsistent and conflicting due to absence of the relation between the persons before and after replacement in image processing is solved.

Description

Image processing method and system

Technical field

The present invention relates to image processing technologies, and in particular, to an image processing method and system.

Background technique

In the prior art, when image processing is performed to replace one character in an image into another image, the head or face of one character is simply cut along the contour and superimposed on the corresponding position of the image of another character. Similar to the effect of the photo sticker. On the one hand, the characters replaced by the characters may be inconsistent with the background due to illumination, viewing angle, etc., and the characters may be in conflict with the background color and the color tone may not match; on the other hand, when the face of the person is replaced with the face of another person, Only the expression of the original character can be retained, and this expression is usually inconsistent with the background. The disharmony between the character and the background appearing after the replacement of the prior art obviously cannot meet the needs of the people.

The shortcomings of the prior art are:

There is no connection between the characters before and after the replacement, which causes the replaced characters to appear inconsistent and conflict with the background.

Summary of the invention

The present invention is directed to the above problem, and provides an image processing method and system for solving the problem that the image of a person is inconsistent with the background of the replaced image when the image is simulated and replaced.

An embodiment of the present invention provides an image processing method, which may include the following steps:

Simulating a model of the first character based on at least one image containing the first person;

Determining a target image containing the second person;

Determining to display feature information of the second person in the target image;

Adjusting the display of the first character according to the feature information in the model of the first character;

In the target image, the second character is replaced with the adjusted first character.

An embodiment of the present invention provides an image processing system, which may include:

a model simulation module, configured to simulate a model of the first character according to at least one image including the first character;

a target image determining module, configured to determine a target image that includes the second person;

a feature information determining module, configured to determine that feature information of the second person is displayed in the target image;

Adjusting a display module, configured to adjust a display of the first character according to the feature information in a model of the first character;

a character replacement module for replacing the second character with the adjusted first character in the target image.

The beneficial effects of the present invention are as follows:

In the technical solution provided by the embodiment of the present invention, the model of the first character is first simulated, and then the feature information displayed in the target image is adjusted according to the second character as the replaced object, so that the first character and the second character are The characters have the same display characteristics in the target image, thereby overcoming the problem of inconsistency with the background of the target image after replacement.

DRAWINGS

Specific embodiments of the present invention will be described below with reference to the accompanying drawings, in which:

1 is a schematic flowchart of an implementation process of an image processing method according to an embodiment of the present invention;

2 is a schematic flowchart of an implementation process of a face detection algorithm according to an embodiment of the present invention;

3 is a schematic diagram of extracting Haar-like features in an embodiment of the present invention;

4 is a schematic flowchart of a method for implementing an integral map in an embodiment of the present invention;

5 is a schematic diagram of a waterfall cascade detector according to an embodiment of the present invention;

6 is a schematic diagram of a face and a face of a calibration in an embodiment of the present invention;

FIG. 7 is a schematic diagram of a process of creating a local feature in an embodiment of the present invention; FIG.

FIG. 8 is a schematic flowchart of a method for calculating a new location of each feature point according to an embodiment of the present invention; FIG.

FIG. 9 is a schematic diagram of a face detection result according to an embodiment of the present invention; FIG.

FIG. 10 is a schematic flowchart of a three-dimensional face reconstruction implementation process according to an embodiment of the present invention; FIG.

11 is a schematic diagram of an original image and a three-dimensional model in an embodiment of the present invention;

12 is a schematic diagram of an example of an expression of a model in an embodiment of the present invention;

FIG. 13 is a schematic diagram of character expression points of a person in an embodiment of the present invention; FIG.

FIG. 14 is a schematic structural diagram of an image processing system according to an embodiment of the present invention.

detailed description

The embodiments of the present invention will be further described in detail below with reference to the accompanying drawings. The illustrative embodiments of the present invention and the description thereof are intended to explain the present invention, but are not intended to limit the invention.

FIG. 1 is a schematic diagram of an implementation process of an image processing method, as shown in FIG. 1 , which may include the following steps:

Step 101: Simulate a model of the first character according to at least one image including the first character;

Step 102: Determine a target image that includes the second person;

Step 103: Determine to display feature information of the second person in the target image;

Step 104: Adjusting, in the model of the first character, the display of the first character according to the feature information;

Step 105: In the target image, replace the second character with the displayed first character.

Specifically, when the image is replaced, the editing work of the character can be automatically completed according to the picture or image sequence provided by the user. For example, the following can be:

a. The user provides one or more pictures or a sequence of images as the material, all of which contain the same person, namely: the first person;

b. The system simulates the model of the first character based on the material provided by the user. The model can be adjusted for different perspectives, illuminations, etc., and can be deformed differently;

c. The user specifies another character in the picture or image sequence, ie: the second person;

d. The system detects relevant feature information of the designated second character in each frame of the image. These feature information refers to features such as position, contour, relative viewing angle, illumination and deformation;

e. on each frame of the image, the first character model is adjusted to the characteristics of the second character in the frame image, And replace the second person.

In the implementation, the extraction, processing, adjustment, and conversion of the single image are described. Since each of the multiple image sequences and the video image is composed of a single image, the technology provided by the embodiment of the present invention is used. Based on the scheme, it is easy to draw a sequence of images composed of multiple or batch pictures, or to process video images. For example, one of the simplest ways is to replace each image sequence or video. After that, the replaced image sequence or video is composed. How to extend the processing of the entire image sequence or video based on the processing of a single image is easily understood and modified by those skilled in the art.

In an embodiment, the character in the embodiment of the present invention may be an anthropomorphic person, such as a cartoon character, a 3D character, or the like, which is not limited to a human character, nor is it a naturally occurring person, and is referred to as a “person” in the embodiment. Most of the following embodiments are also based on human image processing because they are the most representative and most complex. Therefore, the portrait is taken as an example here; however, the technical solution provided by the embodiment of the present invention can also use other image processing, because it discloses an alternative scheme for image processing, that is, the field of image processing. The solution in the embodiment of the present invention can be adopted for the purpose of replacing all the patterns in the embodiment. The theory is not limited to the characters. The portrait is only used to teach the person skilled in the art how to implement the present invention, but it does not mean that only the invention can be used. For portraits, the implementation process can be used in the corresponding environment in combination with practice.

In the implementation, when the model of the face of the first person is simulated according to at least one image including the first person, the method may include:

Detecting the area of the face of the first person;

Determining the contours of the facial features and cheeks in the area of the detected face;

The model of the face of the first character is obtained by fitting the detected area of the facial features and the contour of the cheek to the existing three-dimensional 3D model of the face.

Specifically, according to the simulated model of the first person, it may be a model of a person's whole body or a model of a person's face. In the embodiment, the implementation of the face of the person will be taken as an example, but those skilled in the art should know that Use the corresponding image tool to process, you can get the processing side not limited to the face , such as the model of a person's whole body.

Taking the implementation of a person's face as an example, it can be as follows:

a. detecting the location and area of the face of the first person from a sequence of pictures or images provided by the user;

b. In the area of the detected face, determine the facial features of the face and the contours of the cheeks, such as: eyes, nose, eyebrows, mouth and ears;

c. The detected facial features and cheeks are attached to an existing 3D model of the face, so that it can automatically display different perspectives, illuminations and expression changes according to the parameters.

In the implementation, when determining the feature information of displaying the face of the second person in the target image, the method may include:

Detecting the area of the face of the second person;

Based on the detected area of the facial features and the outline of the cheek, it is determined that the feature information of the face of the second person is displayed in the target image.

Specifically, the related characteristic information of the specified second character in each frame image is detected, which may be as follows:

a. detecting the position and area of the face of the second person from the sequence of pictures or images;

c. Inferring the relevant characteristic information of the second person through the detected area of the facial features and the outline of the cheek. These characteristics include changes in perspective, lighting, and expressions.

Specifically, in the detected area of the face, the face recognition algorithm is used to determine the contour of the facial features and the cheeks. The ASM (Active Shape Model) algorithm can be used to determine the contours of the facial features and cheeks.

In the implementation, the ASM algorithm is used for explanation because the ASM algorithm is typical in the face recognition algorithm, and is also commonly used, and is easy to be understood by those skilled in the art, so here is ASM. The algorithm is an example; however, in theory, other algorithms are also possible, as long as the contour of the facial features and the cheeks can be determined, for example, an AAM (Active Appearance Model) or SDM (Supervised Descent Method) and other algorithms. Therefore, the ASM algorithm is only used to teach the person skilled in the art how to implement the invention, but it does not mean that only the ASM algorithm can be used, and the corresponding algorithm can be determined in the implementation process in combination with practical needs.

In the implementation, replacing the second person with the adjusted first character is to replace the face of the second person with the adjusted face according to the area of the face of the first person and the area of the face of the second person The face of a character.

Specifically, the first character model is further replaced by the second character, which can be as follows:

a. adjusting the model of the first character according to the related feature information of the second character to make it similar to the related characteristic of the second character;

b. erasing the face area of the second person in each frame image by detecting the contour of the facial features and the contour of the cheek;

c. In each frame of image, the model of the adjusted first person is placed in the face area of the second person.

In the implementation, in the detected area of the face, a face recognition algorithm can be used to determine the contour of the facial features and the cheeks.

In the implementation, after the second character is replaced with the adjusted first character in the target image, the method may further include:

Add an image to the first person in the target image.

This is convenient for adding images such as props to the original characters after replacement. These items include glasses, hats, clothes, backpacks, and shoes.

In the implementation, the first person or the second is detected according to the sequence of pictures or images provided by the user. There are many ways to position and area a person's face, as shown in Figure 3.

Among the listed methods, the method based on statistical model is a popular method at present. For details, please refer to: “Review of Face Detection Research” by Liang Luhong et al. (Journal of Computers Vol 25No 5May2002), which has great advantages. . Its advantages are:

1. A priori knowledge and parameter model that does not depend on the face can avoid errors caused by inaccurate or incomplete knowledge;

2. The method of instance learning is used to obtain the parameters of the model, which is more reliable in statistical sense;

3. By increasing the number of learning examples, the range of detection modes can be expanded to improve robustness.

First, the method of statistical model

The face detection algorithm based on integrated machine learning proposed by Viola and Jones around 2001 has obvious advantages over other methods. For details, see: Ai Haizhou et al., "Face Detection and Retrieval" (in the Natural Science Foundation Project 60273005) ); Wu Bo et al., "Multi-view face detection based on continuous adaboost algorithm" (in Computer Research and Development, 2005). Recent literature also indicates that other face detection methods superior to the Viola and Jones methods have not been found yet. For details, see: "Comparative Testing of Face Detection Algorithms" by N Degtyarev et al. (Image and Signal Processing, 2010). This method not only has high detection accuracy, but the most important thing is that its operation speed is much faster than other methods.

Several key steps in the Viola and Jones face detection methods can be found in: "Rapid object detection using a boosted cascade of simple features" by Paul Viola and Michael Jones (available at Accepted Conference on Computer Vision and Pattern Recognition 2001). :

1. Extract Haar-like features (Haar-like features)

The Haar-like feature is a simple rectangular feature proposed by Viola et al., which is named after the Haar wavelet. The Haar type feature is defined as the difference between the sum of the weight gray levels of the corresponding areas of the black rectangle and the white rectangle in the image sub-window. As shown in Figure 4, the two simplest feature operators are shown. As can be seen in Figure 4, at the specific structure of the face, the operator calculates a larger value.

2. Calculate the integral map

When the number of operators is large, the above calculation is too large. Viola et al. invented the integral graph method, which greatly speeds up the calculation. As shown in FIG. 5, the value at point 1 is the pixel integral of the A region, and the value at the point 2 is the pixel integral of the AB region. By performing an integration operation on the entire picture, it is convenient to calculate the D pixel integral value of any region as 4+1-2-3.

3. Training the Adaboost model

In the discrete Adaboost algorithm, the Haar-like feature operator subtracts a certain threshold from the calculation result, which can be regarded as a face detector. Because its accuracy is not high, it is called a weak classifier. In the loop of the Adaboost algorithm, the training picture library is first classified by various weak classifiers, and the weak classifier with the highest accuracy is retained, and the weight of the picture that determines the error is increased, and the next cycle is entered. Finally, the weak classifiers retained in each cycle are combined to become an accurate face detector called a strong classifier. For details of the calculation process, see: "Multi-view face detection based on continuous adaboost algorithm" by Wu Bo et al. (in Computer Research and Development, 2005); "Rapid object detection using a by Paul Viola and Michael Jones" Boosted cascade of simple features (available at Accepted Conference on Computer Vision and Pattern Recognition 2001).

4. Establish a waterfall cascade detector

The waterfall cascade detector is a detection structure proposed for the face detection speed problem. As shown in Figure 6, each layer of the waterfall is a strong classifier trained by the adaboost algorithm. Set the threshold of each layer so that most face images can pass, and on this basis, try to discard the counterexamples. The more complex the layer is, the more complex the classification is.

Such a detector structure would like to have a series of screens with decreasing mesh size. Each step can screen out some of the negative examples of the front screen, and finally the samples passing through all the screens are accepted as human faces. Waterfall type detector training algorithm, see: Wu Bo et al. "Multi-view face detection based on continuous adaboost algorithm" (in Computer Research and Development 2005).

In the above algorithm implementation, OpenCV (Open Source Computer Vision Library) face detection program flow is adopted. The specific program source code can be found at the following website. Record: http://www.opencv.org.cn/index.php/%E4%BA%BA%E8%84%B8%E6%A3%80%E6%B5%8B.

OpenCV is a cross-platform computer vision library based on (open source) distribution that runs on Linux, Windows and Mac OS operating systems. It is lightweight and efficient - it consists of a series of C functions and a small number of C++ classes, and provides interfaces to languages such as Python, Ruby, and MATLAB, and implements many general-purpose algorithms for image processing and computer vision.

OpenCV's face detection program uses Viola and Jones face detection methods, mainly to call the trained waterfall cascade classifier cascade for pattern matching.

cvHaarDetectObjects, first grayscale the image, determine whether to perform canny edge processing (not used by default) according to the incoming parameters, and then match. After matching, the found matching blocks are collected, and the noise is filtered. If the number of adjacent ones exceeds the specified value (incoming min_neighbors), the result is output, otherwise it is deleted.

Matching loop: The matching classifier is enlarged by the scale (input value), and the original image is scaled down to match, until the matching classifier is larger than the original image, and the matching result is returned. When matching, cvRunHaarClassifierCascade is called to match, all results are stored in CvSeq*Seq (a sequence of dynamically growing elements), and the result is passed to cvHaarDetectObjects.

The cvRunHaarClassifierCascade function is based on the incoming image and cascade. And different matching methods can be performed according to the type of cascade type (tree type, stump (incomplete tree) or other).

The function cvRunHaarClassifierCascade is used to detect a single image. First use cvSetImagesForHaarClassifierCascade to set the integral map and the appropriate scale factor (=> window size) before the function call. A positive value is returned when the analyzed rectangular boxes all pass through each layer of the cascade classifier (this is a candidate target), otherwise 0 or a negative value is returned.

The training of the classifier adopts the Hal classifier, and the training of the Haar classifier is independent of the face detection process. The training of the classifier is divided into two phases:

a. Create a sample, complete with the creatsamples.exe that comes with OpenCV;

b. Train the classifier to generate the xml file, which is done by haartraining.exe that comes with OpenCV.

The training process can be seen in the following 1 and 2:

1. http://034080116.blog.163.com/blog/static/334061912009641073715/;

2.\OpenCV\apps\HaarTraining\doc\haartraining.doc;

In the above address, address 1 can be seen in the blog, and the source file of the Hal training provided by address 2 can be found in the openCVS installation package directory after downloading and installing.

At the same time, the training algorithm adaboost used in OpenCV is gentle adaboost, which is the most suitable solution for face detection. For details, please refer to:

1. http://www.opencv.org.cn/forum/viewtopic.php? f=1&t=4264#p15258

2. http://www.opencv.org.cn/forum/viewtopic.php? t=3880

For example, in the detected face area, determining the facial features of the face, the positional relationship, and the outline information of the cheeks, such as eyes, nose, eyebrows, mouth, and ears, can be implemented by many algorithms. The invention preferentially uses the ASM algorithm, and the ASM algorithm will be introduced below.

ASM is a model based on the Point Distribution Model (PDM). In PDM, geometric shapes of similarly shaped objects such as faces, hands, hearts, lungs, etc. can pass through several key landmarks. The coordinates are successively connected in series to form a shape vector to represent. The embodiment of the present invention introduces the basic principle and method of the algorithm by taking a human face as an example. First, a face image with 68 key feature points is given, as shown in Figure 6. In the actual application process, ASM includes two parts: training and search.

First, ASM training

ASM training consists of two parts.

1. Create a shape model: this part consists of the following steps

1.1 Collect n training samples

If you need to perform ASM training on the face key area of the face, you need to involve n sample images containing the face area. Need to be reminded that the collected images as long as they contain facial and facial areas The domain is fine, and there is no need to consider the normalization of the image size.

1.2 Manually record k key feature points in each training sample

As shown in FIG. 7, for any picture in the training set, it is necessary to record the position coordinate information of several (68 in FIG. 7) key feature points, and save the coordinate information in the text file. This step can be done by a program written by the programmer. Each time the program loads a training sample, the user clicks on the key feature points in the picture. Each time the program is clicked, the program automatically records the coordinates of the current mouse click position and saves it for later use.

1.3 Building the shape vector of the training set

The k key feature points calibrated in a picture are combined into a shape vector.

Formula 1)

among them,

Indicates the coordinates of the jth feature point on the i-th training sample, and n represents the number of training samples. In this way, n training samples constitute n shape vectors.

1.4 shape normalization

The purpose of this step is to normalize or align the face shape manually calibrated in front, so as to eliminate the non-shape interference caused by external factors such as different angles, distances, and posture changes in the picture, so that the point distribution model More effective. In general, this step is normalized using the Procrustes method. To put it simply, the method is to align a series of point distribution models with appropriate translation, rotation, and scaling transformations to the same point distribution model without changing the point distribution model, thereby changing the disordered state of the acquired raw data. To reduce interference from non-shape factors. Using the Procrustes method to align the training set of π={α ₁ , α ₂ ,..., α _n }, there are four parameters that need to be calculated for each α _i : rotation angle rotation θ _i , scaling Scale s _i , horizontal shift

Vertical shift amount

Let M(s _i , θ _i )[α _i ] denote a transformation of α _i with a rotation angle of θ _i and a scaling scale of s _i . The process of aligning α _i to α _k is to find θ _i , s _i ,

Make

Minimize the process. among them

Here W is a diagonal matrix, which can be obtained by the following calculation:

Indicates the distance between the kth point and the 1st point in an image,

Represents the entire training set between different images

Variance by calculation

Thereby getting:

It is not difficult to find that the Procrustes method is just a way to solve the transformation matrix. In ASM, Procrustes is used to perform the alignment of the point distribution model. The specific steps are as follows:

(1) Aligning all face models in the training set to the first face model;

(2) Calculate the average face model

(3) Align all face models to the average face model

(4) Repeat (2), (3) until convergence.

1.5 The aligned shape vector is processed by PCA

(1) Calculate the average shape vector:

Formula (2)

(2) Calculate the covariance matrix:

Formula (3)

(3) Calculate the eigenvalues of the covariance matrix S and sort them in order from large to small:

Thus, λ ₁ , λ ₂ , ..., λ _{q are obtained} , where λ ₁ >0. The first t eigenvectors P=(p ₁ , p ₂ , . . . , p _t ) are selected such that the corresponding eigenvalues satisfy:

Formula (4)

Here f _v is a scale factor determined by the number of feature vectors, usually 95%, and V _T is the sum of all features. which is:

V _T =∑λ _i

Thus any shape vector used for training can be expressed as:

Formula (5)

In the above formula, b _s is a vector containing t parameters, where

In addition, in order to ensure that the shape resulting from the change in b _s is similar to the shape in the training set, some restrictions on b _s are required, namely

Where D _max is usually 3, if b is D _m >D _max during the update process, then use

Bind b _s .

2. Construct local features for each feature point

In order to be able to find new locations for each feature point during each iteration, local features need to be created separately for them. For the i-th feature point, the creation process of the local feature is as shown in FIG. 7. On both sides of the i-th feature point on the i-th training image, along the line connecting the two feature points perpendicular to the point Select m pixels in the direction to form a vector with length 2m+1, and derive the gray value of the pixel included in the vector to obtain a local texture g _ij for the i-th feature on other training sample images in the training set. By performing the same operation, the n local textures g _i1 , g _i2 , . . . , g _{in of} the i-th feature point are obtained. Then, find their mean:

Formula (6)

And variance:

Formula (7)

This gives the local features of the i-th feature point. By performing the same operation on all other feature points, the local features of each feature point are obtained. Thus, the similarity measure between the new feature g of a feature point and its trained local feature can be expressed in terms of Mahalanobis distance:

Formula (8)

Second, ASM search

After the ASM model is established through the training of the sample set, the ASM search can be performed. First, the average shape is affine transformed to obtain an initial model:

X=M(s,θ)[α _i ]+X _c (9)

The above expression represents scaling the S to the average shape with its center counterclockwise rotation θ, and then shifting X _{c to} obtain the initial model X.

The initial model is used to search for the target shape in the new image, so that the feature points in the searched final shape are closest to the corresponding real feature points. This search process is mainly realized by the affine transformation and the change of the parameter b. The specific algorithm can be implemented by repeating the following two steps:

2.1 Calculate the new position of each feature point

First overlay the initial ASM model on the image, as shown in Figure 8.

For the i-th feature point in the model, select 1 (1>m) pixels on both sides of the two feature points perpendicular to the front and rear of the feature points, and then calculate the gray value derivative of the one pixel. Normalization to obtain a local feature, including 2(1-m)+1 sub-local features, and then using the previous formula to calculate the Mahalanobis distance between these sub-local features and the local features of the current feature points, so that Markov The center of the smallest sub-local feature is the new position of the current feature point, which produces a displacement. Find their new locations for all feature points and group their displacement into a vector:

dX = (dX ₁ , dX ₂ , ... dX _k ).

2.2 Parameters in the affine change and update of b

The position X of the current feature point is closest to the corresponding new position X+dX by affine transformation and adjusting its parameters.

After the affine transformation, the amount of change in the affine transformation parameters can be obtained.

At the same time, it is obtained by formula (9):

M(s(1+ds), (θ+dθ))[α _i +dα _i ]+(X _c +dX _c ) (10)

At the same time, X can be represented by (9). Therefore, the above formula can be expressed as:

M(s(1+ds),(θ+dθ))[α _i +da _i ]=M(S,θ)[α _i ]+dX+Xc-(Xc=dX _C ) (11)

At the same time, it can be obtained by formula (9):

M ^-1 (s,θ)=M(s ^-1 ,θ) (12)

From equation (11) and equation (12):

Dα _i =M(s(1+ds) ^-1 , -(θ+dθ))[M(S,θ)+dX-dXc]-α (13)

At the same time, it can be obtained by formula (5):

Formula (14)

Subtracting equation (5) from equation (14) yields:

Dα _i ≈P×db (15)

which is:

Db=P ^-1 dα _i (16)

Db=P ^T dα _i (17)

Combining equations (17) and (13) can find db. Therefore, the above parameter update process is:

Therefore, the affine transformation parameters and b can be updated as follows: X _c = X _c + w _t dX _c , Y _c = Y _c + w _t dY _c θ = θ + w _θ dθ, s = s (1 + w _s Ds), b=b+w _b db (18)

In the above formula, w _t , w _θ , w _s , W _b are weights used to control parameter changes. Thus, a new shape can be obtained from the equations (5) and (9). The search process ends when the parameters of the affine transformation and the change in b are not very large or the number of iterations reaches a specified threshold. The test results are shown in Figure 9.

The detected facial features and cheeks are attached to an existing 3D model of the face, enabling them to automatically display different perspectives, illuminations, and expression changes based on parameter settings. The specific implementation method is as follows:

Select “BJUT-3D Face Database” 3D face database, pre-sampling, smoothing and coordinate correction, etc., and select data of about 60,000 points and 120,000 triangles for 100 males and 100 females as dense faces. Sample set. Then, 60 three-dimensional feature points per person are selected by manual interaction as a sparsely corresponding sample set, and the average model of 200 people is used as a general model.

The reconstruction is divided into the following four steps, as shown in Figure 10:

a. The face feature points are detected by the ASM template. Adopt the improved ASM algorithm. Automatically extract 60 feature points;

b. Obtain feature point depth information using a sparse deformation model. Using the a priori 3D face statistical knowledge, the 3D feature point sample set is optimally approximated by the planar projection and linear combination to obtain the 3D feature points corresponding to the photo feature points.

c. Deform the general face model to obtain a specific three-dimensional face according to the displacement of the three-dimensional feature points. Select the thin plate spline interpolation algorithm (TPS). For details, see: BOOKSTEINFL.Principlewarps: thin-platesplines and the decomposition of deformation (in IEEE Transon PAMI198911(6): 567-585), which elastically transforms the original model into a specific face model. .

d. Reconstruct the color information of the model through texture mapping. The photo texture is affine transformed and then orthogonally projected onto the surface of the 3D model.

Further, after the original character model adjustment is replaced with the image in which the target character is located, the method may further include:

Add items to the replaced original character, including glasses, hats, clothes, backpacks, and shoes.

Specifically, after the automatic editing system replaces the first character specified by the user with the picture or image sequence of the second character, the item may be further added to the replaced first character. Its props It can be glasses, hats, clothes, backpacks, etc.

Further, replacing the original character model adjustment with the image in which the target character is located may further include:

According to the two-dimensional feature points detected by ASM, all the feature points of the texture map fall within the face area and are adjusted.

Further, all the feature points of the texture mapping may further include:

The feature points used are corrected by the skin color model.

For example, the present invention uses ASM to detect two-dimensional feature points in the model reconstruction process, and the feature points used in texture mapping need to be corrected based on the skin color model, so that all feature points fall within the face region, thereby avoiding texture mapping. The absence of the side texture.

1) Skin color point determination

Based on the YUV and YIQ space and adding Gamma correction to reduce the effect of illumination on image quality, see: CHEN Lu, YANG Jie's "Automatic 3D face model reconstruction using" to detect skin color information.

In the YUV space, U and V are two mutually orthogonal vectors on the plane, and the chrominance signal (ie, the sum of U and V) is a two-dimensional vector called a chrominance signal vector, and each color corresponds to A chrominance signal vector whose saturation is represented by the modulus Ch and the hue is represented by the phase angle θ:

Formula (19)

Formula (20)

The pixel P of the color image is transformed from the RGB space to the YUV space, and if the condition θ _p ∈ [105, 150] is satisfied, P is the skin color point. In the YIQ space, the I component represents a hue from orange to blue-green, and the smaller the I value, the more yellow it contains, and the less blue-green. Through experimental and statistical analysis, it can be determined that the I value of skin color in the YIQ space changes in [20, 90]. The gamma correction is performed on the three components R, G and B respectively. The corrected values are recorded as Rgamma, Ggamma, Bgamm:

U=-0.147×R _gamma -0.289×G _gamma +0.436×B _gamma (21)

V=-0.615×R _gamma -0.515×G _gamma -0.100×B _gamma (22)

Equation (23)

I=0.596×R _gamma -0.274×G _gamma -0.322×B _gamma (24)

Based on the obtained sum value, it is determined that the pixel is a skin color point. If satisfied

Equation (25)

Then, the pixel is determined to be a skin color point.

2) Correction feature points

Since the ASM template adopts a symmetric template, the face feature points of one side will be out of bounds for the face feature extraction which is not completely positive, and the side information is missing for the subsequent texture reconstruction. The skin color point is determined for the side feature points, and if it is not the skin color point, it falls outside the face, and the point is indented toward the center of the face until all the side feature points become skin color points.

3) Texture mapping using corrected feature points

Since the model reconstruction must use symmetric feature points, the two-dimensional feature points before correction are still used to calculate the three-dimensional feature points, and finally the model is obtained. The map features the corrected feature points to map the 3D feature points, which effectively avoids the side texture loss.

The three-dimensional face model of different poses, illuminations and expressions generated by the model also has a good sense of reality. As shown in FIG. 11, the original input face image and the generated face three-dimensional model are rich synthetic facial expressions, and 44 basic action units (Action Units) are established based on the Facial Action Coding System (FACS). , AU), each AU can control the displacement of one or several face feature points in three dimensions. Combining different AUs can produce various expressions such as emotions and sorrows. The TPS is used to interpolate and deform the three-dimensional feature points to realize the expression change, as shown in Fig. 12, the simulated expression examples.

The relevant characteristic information of the second person is inferred from the detected area of the facial features and the outline of the cheek. These characteristics include changes in perspective, lighting, and expressions. As shown in Fig. 13, the specific embodiment is as follows.

When the feature points of the characters in the picture or image sequence have been confirmed, we can easily find the AU units that have been defined in advance (as described above). These AUs can describe the expression of a face in a very detailed way. The algorithm determines the expression of the face of the person as long as the specific position of the feature point and the specific shape of the AU are determined.

As for estimating the pose of the person's head through the face feature points in the two-dimensional image, the algorithm utilizes the POSIT method.

1, the basic idea: the algorithm is divided into two parts

(1) SOP (Standard Operation Procedure) with proportional coefficient, and find the rotation matrix and translation vector according to the linear equations;

(2) From the obtained rotation matrix and translation vector coefficients, update the scale factor (Scale Factor), and then update the original point by the scale factor, and iterate.

2, the algorithm process:

(1) Assume rotation matrix

And translation vector

f is the focal length; in the perspective projection transformation

And in the SOP,

Where the scale factor is

(2) For the basic perspective projection transformation, the 3D point a=(a _x , a _y , a _z ) ^{T is} projected onto the image plane to obtain the homogeneous coordinates m=(wx, wy, w) ^T , and the transformation process is

Since m is a homogeneous coordinate, the right side of the equation is divided by T _z and will not be affected.

Where s=f/T _z is obtained

among them

(3) Now the transformation process is

Equation group

w initial value is 1;

(4) Let K ₁ = (sR ₁₁ sR ₁₂ sR ₁₃ sT _x ) ^T , K ₂ = (sR ₂₁ sR ₂₂ sR ₂₃ sT _y ) ^T ,

A is a (n+1)×4 matrix,

Then the initial equations become

Apply the least squares method to get the solution

(5) At least 4 non-coplanar 2D-3D point pairs. After K1 and K2 are obtained, divide it by the known fixed value s to obtain R1, R2, Tx, Ty, and then get R3=R1. ×R2, and normalize R1, R2, R3 into unit vectors;

(6) then update

Because for different 2D-3D point pairs, s=f/T _z is a fixed value, f is the focal length, which is a known fixed value parameter, and T _z is also a known fixed value parameter, which can be regarded as all 3D points. The average of the Z coordinates; for different 3D points, a is different, so w is different, so that the original 2D point becomes (wx, wy) ^T ;

(7) Starting from step (2), the original 3D point and the updated 2D point are solved by the least squares method to obtain a new K1, K2; then w is updated, and the 2D point coordinates are updated;

3. Solution process:

(1) Give the initial position of the camera: focal length f, image coordinate center, ie (c _x , c _y ), image range, ie a reasonable range of 2D coordinate values.

(2) There are 8 unknowns, at least 4 2D-3D point pairs are required;

(3) The first 2D-3D point pair must be (0,0)-(0,0,0);

(3) The algorithm execution stop condition is: limit the number of iterations, and set the threshold value (accuracy) threshold value of the 2D point each time.

In an implementation, when the display of the face of the first person is adjusted according to the feature information in the model of the first person, the feature information may be one of the following parameters or a combination thereof: a three-dimensional 3D gesture of the face of the second person The degree of the basic motion unit AU of the face of the second person, the ratio of the length and width of the contour of the face of the second person, and the degree of lightness and darkness of the skin around the feature point of the face of the second person.

a. according to the relevant feature information of the second character, adjust the model of the first character to be similar to the related characteristics of the second character;

A.1 adjusting the posture of the first character 3D model according to the estimated posture of the 3D face of the second character;

A.2 adjusting the expression of the first character 3D model according to the estimated state of the AU of the second character;

A.3 adjusting the face shape of the first character 3D model according to the contour of the face of the second character, mainly the ratio of the length to the width;

A.4 Adjusting the brightness of the face around the corresponding feature point of the first character according to the brightness of the skin around all the feature points of the second person's face.

c. in each frame image, placing the adjusted model of the first person in the face area of the second person;

Based on the same inventive concept, an image processing system is further provided in the embodiment of the present invention. Since the principle of the problem solved by the system is similar to an image processing method, the implementation of these systems can be referred to the implementation of the method, and the repetition is not Let me repeat.

14 is a schematic structural diagram of an image processing system, as shown in FIG. 14, which may include:

a model simulation module 1401, configured to simulate a model of the first character according to at least one image including the first person;

a target image determining module 1402, configured to determine a target image that includes the second person;

The feature information determining module 1403 is configured to determine that the feature information of the second person is displayed in the target image;

Adjusting the display module 1404, for adjusting the display of the first character according to the feature information in the model of the first character;

The character replacement module 1405 is configured to replace the second character with the adjusted first character in the target image.

In an implementation, the model simulation module 1401 can include:

a first detecting unit, configured to detect an area of a face of the first person;

a first determining unit, configured to determine an outline of the facial features and a cheek in the detected area of the face;

The fitting unit is configured to fit the detected facial features and the outline of the cheek to the existing human face 3D model to obtain a model of the simulated first person's face.

In an implementation, the target image determining module 1402 may include:

a second detecting unit, configured to detect an area of a face of the second person;

a second determining unit, configured to determine an outline of the facial features and a cheek in the detected area of the face;

And a feature unit configured to determine feature information of displaying a face of the second person in the target image according to the detected area of the facial features and the outline of the cheek.

In the implementation, the feature information determining module 1403 is further configured to replace the face of the second person with the face of the second person to display the face of the adjusted first person according to the area of the face of the first person and the area of the face of the second person.

In an implementation, the adjustment display module 1404 is further configured to adjust, in the model of the first character, the display of the face of the first person according to the feature information of one of the following parameters or a combination thereof: a 3D posture of the face of the second person, The degree of the AU of the face of the second person, the aspect ratio of the outline of the face of the second person, and the degree of lightness and darkness of the skin around the feature point of the face of the second person.

In an implementation, the adjustment display module 1404 is further configured to determine a contour of the facial features and the cheeks using a face recognition algorithm in the detected area of the face.

In implementation, further can include:

The item adding module is configured to add an image to the first person in the target image after replacing the second character with the adjusted first character in the target image.

In the technical solution provided by the embodiment of the present invention, the model of the original character is simulated, and the feature information of the target person is comprehensively considered, and the original character model is adjusted and replaced with the image of the target person. When the image simulation is replaced, the image of the person does not match the image of the captured angle of view, and the expression of the character cannot be changed. Can be applied in many scenes such as: friendship, love, parent-child, karaoke face change, immersive to bring a character to the environment where another character is located; can also virtualize a character to replace another character to do Some things; you can also replace characters you don't like in photos and switch to characters you like.

According to the technical solution provided by the embodiment of the present invention, the user can replace or edit the face of any person in any photo or video by using only one picture; after the replacement, the shooting angle of the face of the first person can be based on the target. The change of the shooting angle of the character changes; after the replacement, the expression of the face of the first character can be changed according to the change of the expression of the target character; and the first character can be brought to the world where the target character is located.

Those skilled in the art will appreciate that embodiments of the present invention may be provided as a method, system, or Computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

While the preferred embodiment of the invention has been described, it will be understood that Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and the modifications and

Claims

An image processing method includes the following steps:

Simulating a model of the first character based on at least one image containing the first person;

Determining a target image containing the second person;

Determining to display feature information of the second person in the target image;

Adjusting the display of the first character according to the feature information in the model of the first character;

In the target image, the second character is replaced with the adjusted first character.
The method according to claim 1, wherein when simulating the model of the face of the first person based on the at least one image including the first person, the method comprises:

Detecting the area of the face of the first person;

Determining the contours of the facial features and cheeks in the area of the detected face;

The model of the face of the first character is obtained by fitting the detected area of the facial features and the contour of the cheek to the existing three-dimensional 3D model of the face.
The method according to claim 1 or 2, wherein when determining the feature information of the face of the second person in the target image, the method comprises:

Detecting the area of the face of the second person;

Determining the contours of the facial features and cheeks in the area of the detected face;

Based on the detected area of the facial features and the outline of the cheek, it is determined that the feature information of the face of the second person is displayed in the target image.
The method according to claim 3, wherein the replacing the second character with the display of the adjusted first character is based on the area of the face of the first person and the area of the face of the second person The face is replaced with the face of the adjusted first character.
The method according to claim 4, wherein, in the model of the first person, when the display of the face of the first person is adjusted according to the feature information, the feature information is one of the following parameters or a combination thereof: The 3D posture of the face of the two persons, the state of the basic action unit AU of the face of the second person, the ratio of the length and width of the outline of the face of the second person, and the feature point of the face of the second person The degree of darkness of the surrounding skin.
The method according to claim 3, wherein in the area of the detected face, a face recognition algorithm is used to determine the contour of the facial features and the cheeks.
The method according to any one of claims 1 to 6, wherein after the second character is replaced with the adjusted first character in the target image, the method further comprises:

Add an image to the first person in the target image.
An image processing system, comprising:

a model simulation module, configured to simulate a model of the first character according to at least one image including the first character;

a target image determining module, configured to determine a target image that includes the second person;

a feature information determining module, configured to determine that feature information of the second person is displayed in the target image;

Adjusting a display module, configured to adjust a display of the first character according to the feature information in a model of the first character;

a character replacement module for replacing the second character with the adjusted first character in the target image.
The system of claim 8 wherein said model simulation module comprises:

a first detecting unit, configured to detect an area of a face of the first person;

a first determining unit, configured to determine an outline of the facial features and a cheek in the detected area of the face;

The fitting unit is configured to fit the detected facial features and the outline of the cheek to the existing human face 3D model to obtain a model of the simulated first person's face.
The system of claim 8 or 9, wherein the target image determining module comprises:

a second detecting unit, configured to detect an area of a face of the second person;

a second determining unit, configured to determine an outline of the facial features and a cheek in the detected area of the face;

And a feature unit configured to determine feature information of displaying a face of the second person in the target image according to the detected area of the facial features and the outline of the cheek.
The system according to claim 10, wherein said feature information determining module is further configured to replace the face of the second person with the display according to the area of the face of the first person and the area of the face of the second person Adjust the face of the first character.
The system according to claim 11, wherein the adjustment display module is further configured to adjust a display of a face of the first person according to one of the following parameters or the combination of the feature information in a model of the first person : the 3D posture of the face of the second person, the state of the AU of the face of the second person, the ratio of the length and width of the outline of the face of the second person, and the brightness of the skin around the feature point of the face of the second person Darkness.
The system of claim 10 wherein said adjustment display module is further operative to determine a contour of the facial features and cheeks using a face recognition algorithm in the region of the detected face.
The system of any of claims 8 to 13, further comprising:

The item adding module is configured to add an image to the first person in the target image after replacing the second character with the adjusted first character in the target image.