WO2023162132A1

WO2023162132A1 - Image transformation device, method, and program

Info

Publication number: WO2023162132A1
Application number: PCT/JP2022/007870
Authority: WO
Inventors: 雄貴蔵内; 真奈笹川; 直紀萩山; 文香佐野; 隆二山本
Original assignee: 日本電信電話株式会社
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2023-08-31
Also published as: JPWO2023162132A1

Abstract

An image transformation device according to an embodiment comprises: a feature point recognition unit that recognizes feature points of facial parts recognized from an image including a human face; a change amount correction unit that, when transforming the expression on the recognized face in the image into a target transformed facial expression, corrects an amount of change representing the amount of deformation for each feature point of the facial parts that corresponds to the transformed facial expression, on the basis of at least one of the ratio of the angle of the face in the image as measured from the front to a limit angle of the face at which the face can be recognized from the front, and the ratio of the area of the face not hidden by any objects to the entire area of the face; and a facial expression transformation unit that deforms the feature points according to the corrected amount of change and thereby obtains a transformed image in which the expression on the human face has been transformed.

Description

Image conversion device, method and program

The embodiments of the present invention relate to an image conversion device, method and program.

Non-Patent Document 1 discloses the possibility of manipulating emotional experience through real-time facial expression deformation (facial expression conversion) feedback. In Non-Patent Document 1, a subject's face is tracked in real time to perform natural facial expression transformation processing. In Non-Patent Document 1, the Rigid MLS (Moving Least Squares) method is used as an image transformation method to transform facial expressions in facial images. The Rigid MLS method is a method of distorting an image by recognizing feature points in an image and moving them. Such a technique is also disclosed in Non-Patent Document 2. The face image is an image obtained by photographing the face of the subject, an image obtained by extracting the face of a computer-generated avatar, or the like.

However, if the above feature points cannot be recognized because the angle of the subject's face changes or a part of the face is hidden, facial expression conversion stops at unnatural timing. , only face images obtained by unnatural conversion can be obtained. In other words, it is not possible to seamlessly convert expressions appearing in facial images.

SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and its object is to provide an image conversion apparatus, method, and program capable of seamlessly converting expressions appearing in facial images. It is in.

To solve the above problems, an image conversion apparatus according to an aspect of the present invention includes a feature point recognition unit that recognizes feature points of facial parts recognized from an image including a human face; Based on at least one of the ratio of the angle of the face from the front to the limit angle at which the face cannot be recognized from the front, and the ratio of the area excluding the area where the face is hidden by an object to the entire area of the face a change amount correction unit for correcting a change amount representing a deformation amount of each of the feature points of the facial parts according to the converted expression when converting the recognized facial expression into the converted expression to be converted; and an expression conversion unit for obtaining a converted image in which the expression of the person's face is converted by deforming the feature points according to the corrected amount of change.

In order to solve the above problems, an image conversion method according to this aspect is a method performed by an image conversion device for converting an expression in an image of a person's face, wherein a feature point recognition unit of the image conversion device: recognizing feature points of facial parts recognized from an image containing a human face; and the ratio of the area excluding the area where the face is obscured by an object to the entire area of the face. correcting the amount of change representing the amount of deformation of each of the feature points of the facial parts according to the converted facial expression when converting to the converted facial expression; obtaining a transformed image in which the expression of the person's face is transformed by transforming the feature points by an amount.

According to the present invention, facial expressions appearing in facial images can be seamlessly converted.

FIG. 1 is a block diagram showing an example of the configuration of an image conversion device according to one embodiment of the invention. FIG. 2 is a diagram showing an example of the hardware configuration of the image conversion device. FIG. 3 is a diagram showing an example of facial feature points. FIG. 4 is a diagram showing an example of a storage form of feature points. FIG. 5 is a diagram showing an example of a storage form of the amount of change. FIG. 6 is a flow chart showing an example of image conversion processing operation by the image conversion device. FIG. 7 is a diagram showing an example of a neural network used by the display ratio calculator. FIG. 8 is a diagram showing an example of a grid cell (grid area) processed by the display ratio calculator.

[One embodiment]
An embodiment according to the present invention will be described below with reference to the drawings.
(Configuration example)
FIG. 1 is a block diagram showing an example of the configuration of an image conversion device according to one embodiment of the invention.
In the example shown in FIG. 1, an image conversion apparatus 100 according to an embodiment of the present invention includes an image acquisition unit 11, a feature point recognition unit 12, a face angle calculation unit 13, a display ratio calculation unit 14, and a converted facial expression input unit 15. , a variation storage unit 16 , a variation correction unit 17 , an expression conversion unit 18 , and an image output unit 19 .

The image acquisition unit 11 acquires the face image of the user from, for example, an image captured by a web camera or an avatar. The image acquisition unit 11 outputs the acquired face image to the feature point recognition unit 12, the display ratio calculation unit 14, and the facial expression conversion unit 18. FIG.

The feature point recognition unit 12 receives the face image acquired by the image acquisition unit 11, and recognizes feature points of facial parts recognized from the face image. A method of recognizing feature points in the feature point recognition unit 12 will be described later. The feature point recognition section 12 outputs the recognized feature points to the face angle calculation section 13 and the change amount correction section 17 .

The face angle calculation unit 13 receives the feature points recognized by the feature point recognition unit 12 as input, and calculates the current position of the center of the face with reference to the angle of the face in the face image, for example, the position when the face is facing forward. and (sometimes referred to as the angle of the face from the front) is calculated, and the calculated angle data (data) is output to the change amount correction unit 17 .

The display ratio calculation unit 14 receives the face image acquired by the image acquisition unit 11 as input, calculates the ratio of the portion of the face that is hidden from the entire face with respect to the face image, and uses the calculated ratio data as the amount of change. Output to the correction unit 17 .

The converted facial expression input unit 15 inputs a converted facial expression (sometimes referred to as a converted facial expression to be converted), which is a facial expression to be converted, such as a smile, specified and input by the user from a user interface such as a keyboard. to get The converted facial expression input unit 15 outputs the acquired converted facial expression to the change amount correction unit 17 .

The amount of change representing the amount of deformation (the amount of movement of coordinate values) for each feature point is stored (stored) in advance in the amount of change storage unit 16 for each facial expression to be converted. The amount of change is information indicating how much the coordinate values of each feature point should be moved according to the facial expression to be converted. The amount of change can be obtained in advance by, for example, adjusting a specific face image so that the user applies facial expression transformation processing to an expressionless face so as to obtain a natural facial expression.

The change amount correction unit 17 inputs the feature points recognized by the feature point recognition unit 12 , the face angle calculated by the face angle calculation unit 13 , and the display ratio calculated by the display ratio calculation unit 14 .
Further, the change amount correction unit 17 reads from the change amount storage unit 16 the change amount according to the desired facial expression to be converted indicated by the converted facial expression input from the converted facial expression input unit 15 .
Based on the input feature points, face angle, and display ratio, the change amount correction unit 17 calculates the amount of change corrected by a formula to be described later for the amount of change in the facial expression that is to be converted, and calculates the amount of change that has been calculated. The data is output to the facial expression conversion section 18 .

The facial expression conversion unit 18 receives the amount of change corrected by the amount of change correction unit 17 as input. The facial expression conversion unit 18 corrects each feature point in the input face image based on the corrected amount of change, that is, the amount of change representing the amount of deformation according to the converted facial expression to be converted. By moving based on the amount of movement, which is the amount of change, a face image in which the expression of the face image is converted is obtained. The facial expression conversion section 18 outputs the converted face image to the image output section 19 .

The image output unit 19 receives the face image after conversion from the facial expression conversion unit 18, and outputs the input face image. Here, output includes, for example, storing in a storage medium, displaying on a display, transmitting to another device via a communication network, and the like.

FIG. 2 is a diagram showing an example of the hardware configuration of the image conversion device 100. As shown in FIG.
The image conversion apparatus 100 is configured by a computer such as a personal computer, a smart phone, a server computer, or the like. As shown in FIG. 2, the image conversion device 100 has a hardware processor (sometimes simply referred to as a processor) 111A such as a CPU (Central Processing Unit). By using a multi-core and multi-thread CPU, a plurality of information processes can be executed at the same time. Also, the processor 111A may include multiple CPUs. In the image conversion apparatus 100, a program memory 111B, a data memory 112, a communication interface 114, and an input/output interface 113 are connected to the processor 111A via a bus 115. connected through

The communication interface 114 can include, for example, one or more wired or wireless communication modules. The communication interface 114 can communicate with other computers, web cameras, etc. connected via a cable, a LAN (Local Area Network), or a network (NW) such as the Internet. can.

An input device 200 and an output device 300 are connected to the input/output interface 113 . The input device 200 includes an input device such as a keyboard, a pointing device such as a mouse, a sensor device such as a camera, and the like. Also, the output device 300 is a display device such as a liquid crystal display, a CRT (Cathode Ray Tube) display, or the like. The input device 200 and the output device 300 can also be those using a so-called tablet type input/display device. This type of input/display device is configured by arranging an input detection sheet that employs an electrostatic method or a pressure method on the display screen of a display device that uses liquid crystal or organic EL (Electro Luminescence), for example. be. The input/output interface 113 inputs operation information input by the input device 200 to the processor 111A, and causes the output device 300 to display display information generated by the processor 111A.

Note that the input device 200 and the output device 300 do not have to be connected to the input/output interface 113 . The input device 200 and the output device 300 are provided with a communication unit for connecting to the communication interface 114 directly or via a network, so that information can be exchanged with the processor 111A.

In addition, the input/output interface 113 may have a read/write function of a recording medium such as a semiconductor memory such as a flash memory, or read/write of such a recording medium. It may have a connection function with a reader writer (reader writer) with the function. Furthermore, the input/output interface 113 may have a connection function with other devices.

The program memory 111B is used as a non-temporary tangible computer-readable storage medium by combining a non-volatile memory that can be written and read at any time and a non-volatile memory that can only be read at any time. It is a thing. Non-volatile memories that can be written and read at any time are, for example, HDDs (Hard Disk Drives), SSDs (Solid State Drives), and the like. A non-volatile memory that can only be read at any time is, for example, a ROM (Read Only Memory). The program memory 111B stores a program necessary for the processor 111A to execute various control processes according to one embodiment, such as an image conversion program. That is, the image acquisition unit 11, the feature point recognition unit 12, the face angle calculation unit 13, the display ratio calculation unit 14, the converted expression input unit 15, the change amount correction unit 17, the expression conversion unit 18, and the image output unit 19 Each of the processing function units in each unit can be realized by causing the processor 111A to read and execute the image conversion program stored in the program memory 111B. Some or all of these processing functions may be implemented in various other forms, including integrated circuits such as Application Specific Integrated Circuits (ASICs) or field-programmable gate arrays (FPGAs). May be.

The data memory 112 is used as a tangible computer-readable storage medium, for example, by combining the above nonvolatile memory and a volatile memory such as RAM (random access memory). This data memory 112 is used to store various data acquired and created in the process of performing various processes. That is, in the data memory 112, an area for storing various data is appropriately secured in the process of performing various processes.

FIG. 3 is a diagram showing an example of facial feature points. The asterisks in FIG. 3 are feature points recognized by the processor 111A, and the numbers attached to each feature point are unique feature point IDs (IDentifiers) for identifying each feature point. The number of feature point IDs and the portion of the face for each feature point ID are determined by the feature point recognition method employed. For example, the feature point with the feature point ID "18" is predetermined as the left edge of the left eyebrow.

FIG. 4 is a diagram showing an example of a storage form of feature points. As shown in FIG. 4, the data memory 112 stores the x-coordinates and y-coordinates of the feature points in the face image in association with the feature point IDs in the form of a table. Coordinate values are in pixels. Therefore, in the example of FIG. 3, the data memory 112 stores the xy coordinates of the feature points with the feature point IDs "1" to "68".

The data memory 112 stores the converted facial expression designated by the user, which is obtained when the processor 111A operates as the above-described converted facial expression input unit 15. FIG.
The data memory 112 can store the conversion amount stored in the change amount storage unit 16 described above.

FIG. 5 is a diagram showing an example of the storage form of the amount of change. As shown in FIG. 5, in the data memory 112, the amount of change in the x-coordinate and the amount of change in the y-coordinate of the feature point are stored in association with the feature point ID for each transformed facial expression, regardless of the person who is the subject. The amount of change is stored in a table format. The delta value is in pixels. The amount of change is represented by the direction and amount of movement of the feature point. For example, a movement amount of "+1" represents a movement of 1 pixel in the positive direction.

The data memory 112 can store face images converted when the processor 111A operates as the facial expression converter 18 described above.
In addition, the data memory 112 can store various intermediate data generated during the operation of the processor 111A.

(motion)
Next, the operation of the image conversion device 100 will be described.
FIG. 6 is a flow chart showing an example of image conversion processing operation by the image conversion device 100 . The processor 111A of the image conversion device 100 reads and executes the image conversion program stored in the program memory 111B, thereby starting the operation of the image conversion device 100 shown in this flowchart. Execution of the image conversion program by the processor 111A is started when an instruction to perform image conversion is issued from the input device 200 via the input/output interface 113 or via the communication interface 114 .

The processor 111A operates as the converted facial expression input unit 15 and waits for the user to input a specified converted facial expression, such as a smile, which is the facial expression to be converted (step S1). For example, the processor 111A determines whether or not the input signal from the input device 200 via the input/output interface 113 or the communication interface 114 includes a designated input of a converted facial expression. If there is an input specifying a converted facial expression, the processor 111A proceeds to the process of step S2.

The processor 111A stores the designated converted facial expression in the data memory 112 (step S2).

The processor 111A operates as the image acquisition unit 11 and acquires a face image (step S3). For example, the processor 111A acquires an image of the subject's face captured by the camera of the input device 200 via the input/output interface 113 . Alternatively, processor 111A obtains through communication interface 114 a face image captured by a network-connected web camera or other computer-generated avatar face. Processor 111A causes data memory 112 to store the acquired face image.

The processor 111A operates as the feature point recognition unit 12 and recognizes feature points from the face image stored in the data memory 112 (step S4). The processor 111A uses, for example, the face_landmark_detection function of dlib (see, for example, http://dlib.net/face_landmark_detection.py.html) to recognize feature points in the face image. Specifically, the processor 111A extracts a luminance gradient direction distribution called HOG (Histogram of Oriented Gradients) features from the input face image. A model that is learned based on data that associates HOG features with positions of facial feature points is generally provided. Therefore, the processor 111A inputs the extracted HOG features to this learning model to obtain the positions of the feature points of the face. The processor 111A causes the data memory 112 to store the positions of the acquired feature points.

The processor 111A operates as the face angle calculation unit 13 and calculates the angle of the face in the face image using, for example, opencv (step S5).
Specifically, the processor 111A measures in advance the three-dimensional position (P_3d) of the feature points of the facial parts when the face is facing forward, and stores this in the data memory 112. FIG.
The processor 111A obtains the two-dimensional position (P'_2d) of the current feature point of the facial part of the facial image.
The processor 111A calculates the two-dimensional position (P_2d) of the feature point of the facial part when the three-dimensional position (P_3d) is rotated or moved.
The processor 111A calculates the two-dimensional positions using, for example, the opencv ProjectPoints2 function (see, for example, http://opencv.jp/opencv-2svn/py/camera_calibration_and_3d_reconstruction.html#projectpoints2).

The processor 111A calculates the sum of squares (D) of the distances between the two-dimensional position (P_2d) and the two-dimensional position (P'_2d).
The processor 111A obtains the angle (and the amount of movement) that minimizes the sum of squares D by global optimization.

The processor 111A uses, for example, the opencv solvPnP function (see, for example, http://opencv.jp/opencv-2svn/cpp/camera_calibration_and_3d_reconstruction.html#cv-solvepnp) to determine the angle to be minimized (and The amount of movement) is calculated as the angle (a) of the face from the front.

The processor 111A activates the face recognition tool and moves the face, and obtains the positions of the feature points when recognition becomes impossible. is calculated in advance as an angle that does not occur, and is stored in the data memory 112 .

Next, the processor 111A operates as the display ratio calculation unit 14 and calculates the display ratio of the face, which is the ratio of the area hidden by objects other than the face to the entire area of the face with respect to the face image. (Step S6). For example, if 10% of the entire face is hidden by objects other than the face, the display ratio of the face is 10%.

Here, an example of calculation by the display ratio calculation unit 14 will be described with reference to FIGS. 7 and 8. FIG.　

FIG. 7 is a diagram showing an example of a neural network used by the display ratio calculation unit. FIG. 8 is a diagram showing an example of grid cells processed by the display ratio calculator. Here, an example relating to an input image containing animals and various objects will be described, but the same can be applied when these are human faces and objects hiding faces, such as hands or other objects. be.

In the examples shown in FIGS. 7 and 8, the known YOLO (You Only Look Once) (general object detection technique by deep learning) can be used. This technique is disclosed, for example, in the following documents.
“Joseph Redmon, et al., “YOLOv3: An Incremental Improvement”, arXiv preprint, arXiv:1804.02767, 2018.”

In this method, the processor 111A resizes the face image into a square, and transforms it into a CNN (Convolutional Neural Network), which is a neural network widely used in the field of image processing, as shown in FIG. )). The processor 111A extracts features from the face image through 24 convolution layers and 4 pooling layers (see symbol a in FIG. 7) in the CNN shown in FIG. In the Conn. Layer (see symbol b in FIG. 7), the Bounding Box of objects in the image and the object type probabilities can be estimated. The final output size 7×7 of the convolutional layer matches the number of divisions of the grid cell.

The input image is divided into S×S grid cells as shown in FIG. 8 (see FIG. 8(a)).
The processor 111A estimates the bounding box of B objects for each of the divided grid cells. The processor 111A stores a total of five scores for each bounding box, consisting of the coordinate values, width, height (x, y, w, h) of the bounding box and the confidence score that the bounding box is an object. Output the value (see (b) in FIG. 8).

The coordinate values x, y are the center coordinates of the Bounding Box with reference to the grid cell boundary, the width w and height h are relative values to the size of the entire image, and the confidence score is the Represents the probability that the bounding box is an object or background. This probability is "1" for an object and "0" for a background.

As an index to measure the estimation accuracy of the object area, there is IoU (Intersection over Union), which indicates the degree of agreement between the correct bounding box and the estimated bounding box. In the above YOLO, the reliability score of the Bounding Box represents the IoU.

The processor 111A estimates object type probabilities for each grid cell. For example, the processor 111A estimates the probability of belonging to which class, that is, the conditional probability, when the grid cell is an object in C classification classes (( c) see).

The processor 111A integrates the class probabilities estimated here with the above bounding boxes to obtain a plurality of bounding boxes indicating what the object is (see (d) in FIG. 8).

The processor 111A selects these bounding boxes, including overlapping regions, based on the bounding boxes with high reliability scores using a method called NMS ((Non-Maximum Suppression) (see (e) in FIG. 8). NMS. suppresses a region with a large IoU value (a high degree of overlap) with a threshold value, thereby obtaining a detection result of an object region.

When there is a face area and an object area overlapping this area, the processor 111A can calculate the display ratio of the face by dividing the area of the overlapping area by the area of the face area. can.

Next, the processor 111A operates as the change amount correction unit 17, reads the change amount according to the facial expression to be converted from the change amount storage unit 16, and performs the feature points recognized in S4, the face angle calculated in S5, Then, based on the display ratio calculated in S6, the amount of change obtained by correcting the read amount of change according to the expression to be converted is calculated (step S7).

Specifically, the processor 111A obtains the angle of the face, that is, the angle a of the face from the front, the limit face angle A at which recognition is possible, and the ratio H of the area where the face is hidden to the area of the entire face. , the amount of change in expression conversion is attenuated, that is, the amount of change is corrected according to the following equation (1), and the corrected result is stored in the data memory 112 .
ΔP _new =ΔP・(1−H)・a/A … Formula (1)
ΔP _new on the left side of equation (1) is the amount of change after attenuation, that is, after correction of the facial expression transformation, and ΔP on the right side is the amount of change of the facial expression transformation before correction.

That is, in the above example, (1) the ratio a/A between the face angle a from the front and the limit face angle A that can be recognized, and (2) the area where the face is hidden with respect to the entire face area. Based on the ratio H, the amount of change after correction is calculated.
Not limited to this example, for example, within the range of allowable accuracy, (1) the ratio a / A of the face angle a from the front and the limit face angle A that can be recognized, and (2) The amount of change after correction may be calculated based on one of the ratio H of the area where the face is hidden to the area of the entire face.

By correcting the amount of change in this way, even if the face angle changes or a part of the face is hidden, even if the feature points cannot be recognized, the facial expression conversion stops at an unnatural timing. It is possible to convert the expression of the face image naturally.

The processor 111A operates as the facial expression transforming unit 18 to transform the facial image stored in the data memory 112 (step S8). That is, the processor 111A converts the face image based on the result of correcting the amount of change corresponding to the converted facial expression stored in the data memory 112 . For example, processor 111A utilizes an implementation of MLS (see, eg, https://github.com/Jarvis73/Moving-Least-Squares), or the like.

Specifically, the processor 111A moves each feature point by the amount of change after correction of the amount of change corresponding to the converted facial expression stored in the data memory 112 . For example, when converting a facial expression into a smile, the x-y coordinates of the control point with feature point ID "1" are (23, 45) before conversion (see FIG. 4). is incremented by 1 and the y coordinate is incremented by 2 (see FIG. 5) to move the pixel of the feature point to (24, 47).

Then, for the feature points, the processor 111A applies Affine transformation (including Helmert transformation = similarity transformation and rigid deformation = rigid deformation) shown in Equation (2) below.

However, x and y in the above equation (2) are coordinates of nearby feature points, x' and y' are coordinates obtained by adding the amount of change to the coordinates of the feature points, and a, b, c, and d is a parameter, and t _x , t _y are translation parameters. The processor 111A calculates the least square means of the coordinates x, y of the feature points and the coordinates x′, y′ obtained by adding the amount of change, and sets the parameters a, b, c, d, t _x , t _y are determined by global optimization. Then, the processor 111A uses x and y as the coordinates of the target point to be transformed, and uses the determined parameters to determine the coordinates after transformation. The processor 111A uses the parameters a, b, c, d, t _x , and t _y thus obtained to obtain the coordinates after the feature point is transformed by the above affine transformation.

The processor 111A stores the converted face image in the data memory 112 as a converted image.

The processor 111A operates as the image output unit 19 and outputs the converted image stored in the data memory 112 (step S9). For example, the processor 111A causes the output device 300 to display a facial image via the input/output interface 113. FIG. Alternatively, the processor 111A transmits to the network via the communication interface 114 and displays it on a display device connected to the network, or displays it on a display unit of another computer connected to the network.

The processor 111A determines whether or not to end the operation as the image conversion device 100 shown in the flowchart of FIG. 6 (step S10). For example, the processor 111A checks whether or not the user has instructed to end image conversion from the input device 200 via the input/output interface 113 or via the communication interface 114 . Here, when ending the above operation (YES in step S10), the processor 111A ends the operation shown in the flowchart of FIG.

On the other hand, if the above operation has not yet been completed (NO in step S10), the processor 111A operates as the converted facial expression input unit 15 and determines whether or not the user has entered a change designation input for the converted facial expression. (step S11). If there is no change specification input for the converted facial expression (NO in step S11), the processor 111A proceeds to the process of step S3. Also, if there is an input specifying a change in the converted facial expression (YES in step S10), the processor 111A proceeds to the process of step S2.

The image conversion device 100 according to the embodiment described above includes a face angle calculator 13 , a display ratio calculator 14 , a change amount corrector 17 , and a facial expression converter 18 . A facial expression conversion unit 18 obtains a transformed image in which the facial expression of a person is transformed by transforming the feature points by a deformation amount according to the transformed facial expression to be transformed.
Therefore, the image conversion apparatus 100 according to one embodiment stops facial expression conversion at unnatural timing even if feature points cannot be recognized due to a change in the angle of the face or a part of the face being hidden. Therefore, the expression of the face image can be converted naturally.

[Other embodiments]
In addition, this invention is not limited to the said one embodiment.
For example, the flow of each process described above is not limited to the procedures described above, and the order of some steps may be changed, and some steps may be performed in parallel. .

In addition, the flow of each process described above was for the case of converting the expression of a face image acquired in real time, but instead of real-time processing, the expression of a saved face image is converted. can be applied as well.

In addition, the method described in each embodiment can be applied to a program (software means) that can be executed by a computer (computer), such as a magnetic disk (floppy disk, hard disk). etc.), optical discs (CD-ROM, DVD, MO, etc.), semiconductor memories (ROM, RAM, flash memory, etc.), etc., and can be transmitted and distributed via communication media. The programs stored on the medium also include a setting program for configuring software means (including not only execution programs but also tables and data structures) to be executed by the computer. A computer that realizes this apparatus reads a program recorded on a recording medium, and optionally constructs software means by a setting program, and executes the above-described processing by controlling the operation by this software means. The term "recording medium" as used herein is not limited to those for distribution, and includes storage media such as magnetic disks, semiconductor memories, etc. provided in computers or devices connected via a network.

It should be noted that the present invention is not limited to the above-described embodiments, and can be variously modified in the implementation stage without departing from the gist of the present invention. Further, each embodiment may be implemented in combination as appropriate, in which case the combined effect can be obtained. Furthermore, various inventions are included in the above embodiments, and various inventions can be extracted by combinations selected from a plurality of disclosed constituent elements. For example, even if some constituent elements are deleted from all the constituent elements shown in the embodiments, if the problem can be solved and effects can be obtained, the configuration with the constituent elements deleted can be extracted as an invention.

DESCRIPTION OF SYMBOLS 100... Image conversion apparatus 11... Image acquisition part 12... Feature point recognition part 13... Face angle calculation part 14... Display ratio calculation part 15... Conversion facial expression input part 16... Change amount storage part 17... Change amount correction part 18... Facial expression conversion Unit 19 Image output unit 111A Processor 111B Program memory 112 Data memory 113 Input/output interface 114 Communication interface 115 Bus 200 Input device 300 Output device

Claims

a feature point recognition unit that recognizes feature points of facial parts recognized from an image containing a human face;
The ratio of the angle of the face from the front to the limit angle at which the face cannot be recognized from the front in the image, and the ratio of the area excluding the area where the face is hidden by an object to the entire area of the face Based on at least one of them, when the recognized facial expression is converted into a converted expression to be converted, a change amount representing a deformation amount of each of the feature points of the facial parts corresponding to the converted expression is corrected. a change amount correction unit;
a facial expression transforming unit that transforms the feature points according to the corrected amount of change to obtain a transformed image in which the facial expression of the person is transformed;
An image conversion device comprising:
The change amount correction unit
At least the ratio of the angle of the face from the front to the limit angle at which the face cannot be recognized from the front in the image, and the ratio of the area of the face excluding the area where the face is hidden by an object to the total area of the face correcting the amount of change by multiplying one by a predetermined amount of change for each feature point of the face part;
2. The image conversion device according to claim 1.
calculating a two-dimensional position of the feature point of the face part when the three-dimensional position of the feature point of the face part is rotated or moved when the face is facing forward; calculating as the angle of the face from the front the angle at which the sum of squares of the distances from the two-dimensional positions of the feature points of the face part is minimized;
2. The image conversion device according to claim 1.
a storage device that stores in advance a change amount representing a deformation amount for each of the feature points for each of the converted facial expressions to be converted;
a conversion expression input unit for inputting the conversion expression to be converted;
further comprising
The change amount correction unit
reading the amount of change corresponding to the input converted facial expression from the storage device and correcting the read amount of change;
4. The image conversion device according to any one of claims 1 to 3.
A method performed by an image transformation device for transforming expressions in an image of a person's face, comprising:
Recognizing feature points of facial parts recognized from an image including a human face by a feature point recognition unit of the image conversion device;
The ratio of the angle of the face from the front to the limit angle at which the face cannot be recognized from the front in the image and the face hidden by an object with respect to the entire area of the face are obtained by the change amount correction unit of the image conversion device. Each of the feature points of the facial parts corresponding to the converted facial expression when the recognized facial expression is converted to the converted facial expression to be converted based on at least one of the proportions of the areas from which the area where the facial expression is removed is correcting the amount of change representing the amount of deformation for
Obtaining a converted image in which the expression of the person's face is converted by deforming the feature points according to the corrected amount of change by the expression conversion unit of the image conversion device;
An image conversion method comprising:
An image conversion processing program that causes a processor to function as each part of the image conversion device according to any one of claims 1 to 4.