WO2023162132A1 - Image transformation device, method, and program - Google Patents
Image transformation device, method, and program Download PDFInfo
- Publication number
- WO2023162132A1 WO2023162132A1 PCT/JP2022/007870 JP2022007870W WO2023162132A1 WO 2023162132 A1 WO2023162132 A1 WO 2023162132A1 JP 2022007870 W JP2022007870 W JP 2022007870W WO 2023162132 A1 WO2023162132 A1 WO 2023162132A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- face
- image
- converted
- amount
- change
- Prior art date
Links
- 230000009466 transformation Effects 0.000 title claims abstract description 13
- 238000000034 method Methods 0.000 title claims description 31
- 230000008859 change Effects 0.000 claims abstract description 73
- 230000008921 facial expression Effects 0.000 claims abstract description 72
- 230000014509 gene expression Effects 0.000 claims abstract description 27
- 230000001815 facial effect Effects 0.000 claims abstract description 25
- 238000012937 correction Methods 0.000 claims abstract description 23
- 230000001131 transforming effect Effects 0.000 claims abstract description 6
- 238000006243 chemical reaction Methods 0.000 claims description 61
- 230000015654 memory Effects 0.000 description 41
- 238000004364 calculation method Methods 0.000 description 17
- 238000004891 communication Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 14
- 230000008569 process Effects 0.000 description 11
- 238000012545 processing Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 9
- 238000001514 detection method Methods 0.000 description 5
- 239000000470 constituent Substances 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 3
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 2
- 238000005401 electroluminescence Methods 0.000 description 2
- 230000002996 emotional effect Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 210000000887 face Anatomy 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000002301 combined effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 210000004709 eyebrow Anatomy 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
Definitions
- the embodiments of the present invention relate to an image conversion device, method and program.
- Non-Patent Document 1 discloses the possibility of manipulating emotional experience through real-time facial expression deformation (facial expression conversion) feedback.
- a subject's face is tracked in real time to perform natural facial expression transformation processing.
- the Rigid MLS (Moving Least Squares) method is used as an image transformation method to transform facial expressions in facial images.
- the Rigid MLS method is a method of distorting an image by recognizing feature points in an image and moving them.
- Such a technique is also disclosed in Non-Patent Document 2.
- the face image is an image obtained by photographing the face of the subject, an image obtained by extracting the face of a computer-generated avatar, or the like.
- facial expression conversion stops at unnatural timing. , only face images obtained by unnatural conversion can be obtained. In other words, it is not possible to seamlessly convert expressions appearing in facial images.
- the present invention has been made in view of the above circumstances, and its object is to provide an image conversion apparatus, method, and program capable of seamlessly converting expressions appearing in facial images. It is in.
- an image conversion apparatus includes a feature point recognition unit that recognizes feature points of facial parts recognized from an image including a human face; Based on at least one of the ratio of the angle of the face from the front to the limit angle at which the face cannot be recognized from the front, and the ratio of the area excluding the area where the face is hidden by an object to the entire area of the face a change amount correction unit for correcting a change amount representing a deformation amount of each of the feature points of the facial parts according to the converted expression when converting the recognized facial expression into the converted expression to be converted; and an expression conversion unit for obtaining a converted image in which the expression of the person's face is converted by deforming the feature points according to the corrected amount of change.
- an image conversion method is a method performed by an image conversion device for converting an expression in an image of a person's face, wherein a feature point recognition unit of the image conversion device: recognizing feature points of facial parts recognized from an image containing a human face; and the ratio of the area excluding the area where the face is obscured by an object to the entire area of the face. correcting the amount of change representing the amount of deformation of each of the feature points of the facial parts according to the converted facial expression when converting to the converted facial expression; obtaining a transformed image in which the expression of the person's face is transformed by transforming the feature points by an amount.
- facial expressions appearing in facial images can be seamlessly converted.
- FIG. 1 is a block diagram showing an example of the configuration of an image conversion device according to one embodiment of the invention.
- FIG. 2 is a diagram showing an example of the hardware configuration of the image conversion device.
- FIG. 3 is a diagram showing an example of facial feature points.
- FIG. 4 is a diagram showing an example of a storage form of feature points.
- FIG. 5 is a diagram showing an example of a storage form of the amount of change.
- FIG. 6 is a flow chart showing an example of image conversion processing operation by the image conversion device.
- FIG. 7 is a diagram showing an example of a neural network used by the display ratio calculator.
- FIG. 8 is a diagram showing an example of a grid cell (grid area) processed by the display ratio calculator.
- FIG. 1 is a block diagram showing an example of the configuration of an image conversion device according to one embodiment of the invention.
- an image conversion apparatus 100 according to an embodiment of the present invention includes an image acquisition unit 11, a feature point recognition unit 12, a face angle calculation unit 13, a display ratio calculation unit 14, and a converted facial expression input unit 15. , a variation storage unit 16 , a variation correction unit 17 , an expression conversion unit 18 , and an image output unit 19 .
- the image acquisition unit 11 acquires the face image of the user from, for example, an image captured by a web camera or an avatar.
- the image acquisition unit 11 outputs the acquired face image to the feature point recognition unit 12, the display ratio calculation unit 14, and the facial expression conversion unit 18.
- the feature point recognition unit 12 receives the face image acquired by the image acquisition unit 11, and recognizes feature points of facial parts recognized from the face image. A method of recognizing feature points in the feature point recognition unit 12 will be described later.
- the feature point recognition section 12 outputs the recognized feature points to the face angle calculation section 13 and the change amount correction section 17 .
- the face angle calculation unit 13 receives the feature points recognized by the feature point recognition unit 12 as input, and calculates the current position of the center of the face with reference to the angle of the face in the face image, for example, the position when the face is facing forward. and (sometimes referred to as the angle of the face from the front) is calculated, and the calculated angle data (data) is output to the change amount correction unit 17 .
- the display ratio calculation unit 14 receives the face image acquired by the image acquisition unit 11 as input, calculates the ratio of the portion of the face that is hidden from the entire face with respect to the face image, and uses the calculated ratio data as the amount of change. Output to the correction unit 17 .
- the converted facial expression input unit 15 inputs a converted facial expression (sometimes referred to as a converted facial expression to be converted), which is a facial expression to be converted, such as a smile, specified and input by the user from a user interface such as a keyboard. to get The converted facial expression input unit 15 outputs the acquired converted facial expression to the change amount correction unit 17 .
- a converted facial expression sometimes referred to as a converted facial expression to be converted
- the converted facial expression input unit 15 outputs the acquired converted facial expression to the change amount correction unit 17 .
- the amount of change representing the amount of deformation (the amount of movement of coordinate values) for each feature point is stored (stored) in advance in the amount of change storage unit 16 for each facial expression to be converted.
- the amount of change is information indicating how much the coordinate values of each feature point should be moved according to the facial expression to be converted.
- the amount of change can be obtained in advance by, for example, adjusting a specific face image so that the user applies facial expression transformation processing to an expressionless face so as to obtain a natural facial expression.
- the change amount correction unit 17 inputs the feature points recognized by the feature point recognition unit 12 , the face angle calculated by the face angle calculation unit 13 , and the display ratio calculated by the display ratio calculation unit 14 . Further, the change amount correction unit 17 reads from the change amount storage unit 16 the change amount according to the desired facial expression to be converted indicated by the converted facial expression input from the converted facial expression input unit 15 . Based on the input feature points, face angle, and display ratio, the change amount correction unit 17 calculates the amount of change corrected by a formula to be described later for the amount of change in the facial expression that is to be converted, and calculates the amount of change that has been calculated. The data is output to the facial expression conversion section 18 .
- the facial expression conversion unit 18 receives the amount of change corrected by the amount of change correction unit 17 as input.
- the facial expression conversion unit 18 corrects each feature point in the input face image based on the corrected amount of change, that is, the amount of change representing the amount of deformation according to the converted facial expression to be converted. By moving based on the amount of movement, which is the amount of change, a face image in which the expression of the face image is converted is obtained.
- the facial expression conversion section 18 outputs the converted face image to the image output section 19 .
- the image output unit 19 receives the face image after conversion from the facial expression conversion unit 18, and outputs the input face image.
- output includes, for example, storing in a storage medium, displaying on a display, transmitting to another device via a communication network, and the like.
- FIG. 2 is a diagram showing an example of the hardware configuration of the image conversion device 100.
- the image conversion apparatus 100 is configured by a computer such as a personal computer, a smart phone, a server computer, or the like.
- the image conversion device 100 has a hardware processor (sometimes simply referred to as a processor) 111A such as a CPU (Central Processing Unit). By using a multi-core and multi-thread CPU, a plurality of information processes can be executed at the same time. Also, the processor 111A may include multiple CPUs.
- a program memory 111B, a data memory 112, a communication interface 114, and an input/output interface 113 are connected to the processor 111A via a bus 115. connected through
- the communication interface 114 can include, for example, one or more wired or wireless communication modules.
- the communication interface 114 can communicate with other computers, web cameras, etc. connected via a cable, a LAN (Local Area Network), or a network (NW) such as the Internet. can.
- LAN Local Area Network
- NW network
- the input device 200 includes an input device such as a keyboard, a pointing device such as a mouse, a sensor device such as a camera, and the like.
- the output device 300 is a display device such as a liquid crystal display, a CRT (Cathode Ray Tube) display, or the like.
- the input device 200 and the output device 300 can also be those using a so-called tablet type input/display device.
- This type of input/display device is configured by arranging an input detection sheet that employs an electrostatic method or a pressure method on the display screen of a display device that uses liquid crystal or organic EL (Electro Luminescence), for example. be.
- the input/output interface 113 inputs operation information input by the input device 200 to the processor 111A, and causes the output device 300 to display display information generated by the processor 111A.
- the input device 200 and the output device 300 do not have to be connected to the input/output interface 113 .
- the input device 200 and the output device 300 are provided with a communication unit for connecting to the communication interface 114 directly or via a network, so that information can be exchanged with the processor 111A.
- the input/output interface 113 may have a read/write function of a recording medium such as a semiconductor memory such as a flash memory, or read/write of such a recording medium. It may have a connection function with a reader writer (reader writer) with the function. Furthermore, the input/output interface 113 may have a connection function with other devices.
- the program memory 111B is used as a non-temporary tangible computer-readable storage medium by combining a non-volatile memory that can be written and read at any time and a non-volatile memory that can only be read at any time. It is a thing. Non-volatile memories that can be written and read at any time are, for example, HDDs (Hard Disk Drives), SSDs (Solid State Drives), and the like. A non-volatile memory that can only be read at any time is, for example, a ROM (Read Only Memory).
- the program memory 111B stores a program necessary for the processor 111A to execute various control processes according to one embodiment, such as an image conversion program.
- each of the processing function units in each unit can be realized by causing the processor 111A to read and execute the image conversion program stored in the program memory 111B.
- Some or all of these processing functions may be implemented in various other forms, including integrated circuits such as Application Specific Integrated Circuits (ASICs) or field-programmable gate arrays (FPGAs). May be.
- ASICs Application Specific Integrated Circuits
- FPGAs field-programmable gate arrays
- the data memory 112 is used as a tangible computer-readable storage medium, for example, by combining the above nonvolatile memory and a volatile memory such as RAM (random access memory).
- This data memory 112 is used to store various data acquired and created in the process of performing various processes. That is, in the data memory 112, an area for storing various data is appropriately secured in the process of performing various processes.
- FIG. 3 is a diagram showing an example of facial feature points.
- the asterisks in FIG. 3 are feature points recognized by the processor 111A, and the numbers attached to each feature point are unique feature point IDs (IDentifiers) for identifying each feature point.
- IDentifiers unique feature point IDs
- the number of feature point IDs and the portion of the face for each feature point ID are determined by the feature point recognition method employed. For example, the feature point with the feature point ID "18" is predetermined as the left edge of the left eyebrow.
- FIG. 4 is a diagram showing an example of a storage form of feature points.
- the data memory 112 stores the x-coordinates and y-coordinates of the feature points in the face image in association with the feature point IDs in the form of a table. Coordinate values are in pixels. Therefore, in the example of FIG. 3, the data memory 112 stores the xy coordinates of the feature points with the feature point IDs "1" to "68".
- the data memory 112 stores the converted facial expression designated by the user, which is obtained when the processor 111A operates as the above-described converted facial expression input unit 15.
- FIG. The data memory 112 can store the conversion amount stored in the change amount storage unit 16 described above.
- FIG. 5 is a diagram showing an example of the storage form of the amount of change.
- the amount of change in the x-coordinate and the amount of change in the y-coordinate of the feature point are stored in association with the feature point ID for each transformed facial expression, regardless of the person who is the subject.
- the amount of change is stored in a table format.
- the delta value is in pixels.
- the amount of change is represented by the direction and amount of movement of the feature point. For example, a movement amount of "+1" represents a movement of 1 pixel in the positive direction.
- the data memory 112 can store face images converted when the processor 111A operates as the facial expression converter 18 described above. In addition, the data memory 112 can store various intermediate data generated during the operation of the processor 111A.
- FIG. 6 is a flow chart showing an example of image conversion processing operation by the image conversion device 100 .
- the processor 111A of the image conversion device 100 reads and executes the image conversion program stored in the program memory 111B, thereby starting the operation of the image conversion device 100 shown in this flowchart. Execution of the image conversion program by the processor 111A is started when an instruction to perform image conversion is issued from the input device 200 via the input/output interface 113 or via the communication interface 114 .
- the processor 111A operates as the converted facial expression input unit 15 and waits for the user to input a specified converted facial expression, such as a smile, which is the facial expression to be converted (step S1). For example, the processor 111A determines whether or not the input signal from the input device 200 via the input/output interface 113 or the communication interface 114 includes a designated input of a converted facial expression. If there is an input specifying a converted facial expression, the processor 111A proceeds to the process of step S2.
- the processor 111A stores the designated converted facial expression in the data memory 112 (step S2).
- the processor 111A operates as the image acquisition unit 11 and acquires a face image (step S3).
- the processor 111A acquires an image of the subject's face captured by the camera of the input device 200 via the input/output interface 113 .
- processor 111A obtains through communication interface 114 a face image captured by a network-connected web camera or other computer-generated avatar face.
- Processor 111A causes data memory 112 to store the acquired face image.
- the processor 111A operates as the feature point recognition unit 12 and recognizes feature points from the face image stored in the data memory 112 (step S4).
- the processor 111A uses, for example, the face_landmark_detection function of dlib (see, for example, http://dlib.net/face_landmark_detection.py.html) to recognize feature points in the face image.
- the processor 111A extracts a luminance gradient direction distribution called HOG (Histogram of Oriented Gradients) features from the input face image.
- HOG Histogram of Oriented Gradients
- a model that is learned based on data that associates HOG features with positions of facial feature points is generally provided. Therefore, the processor 111A inputs the extracted HOG features to this learning model to obtain the positions of the feature points of the face.
- the processor 111A causes the data memory 112 to store the positions of the acquired feature points.
- the processor 111A operates as the face angle calculation unit 13 and calculates the angle of the face in the face image using, for example, opencv (step S5). Specifically, the processor 111A measures in advance the three-dimensional position (P_3d) of the feature points of the facial parts when the face is facing forward, and stores this in the data memory 112. FIG. The processor 111A obtains the two-dimensional position (P'_2d) of the current feature point of the facial part of the facial image. The processor 111A calculates the two-dimensional position (P_2d) of the feature point of the facial part when the three-dimensional position (P_3d) is rotated or moved.
- the processor 111A calculates the two-dimensional positions using, for example, the opencv ProjectPoints2 function (see, for example, http://opencv.jp/opencv-2svn/py/camera_calibration_and_3d_reconstruction.html#projectpoints2).
- the processor 111A calculates the sum of squares (D) of the distances between the two-dimensional position (P_2d) and the two-dimensional position (P'_2d). The processor 111A obtains the angle (and the amount of movement) that minimizes the sum of squares D by global optimization.
- the processor 111A uses, for example, the opencv solvPnP function (see, for example, http://opencv.jp/opencv-2svn/cpp/camera_calibration_and_3d_reconstruction.html#cv-solvepnp) to determine the angle to be minimized (and The amount of movement) is calculated as the angle (a) of the face from the front.
- the opencv solvPnP function see, for example, http://opencv.jp/opencv-2svn/cpp/camera_calibration_and_3d_reconstruction.html#cv-solvepnp
- the processor 111A activates the face recognition tool and moves the face, and obtains the positions of the feature points when recognition becomes impossible. is calculated in advance as an angle that does not occur, and is stored in the data memory 112 .
- the processor 111A operates as the display ratio calculation unit 14 and calculates the display ratio of the face, which is the ratio of the area hidden by objects other than the face to the entire area of the face with respect to the face image. (Step S6). For example, if 10% of the entire face is hidden by objects other than the face, the display ratio of the face is 10%.
- FIG. 7 An example of calculation by the display ratio calculation unit 14 will be described with reference to FIGS. 7 and 8.
- FIG. 7 is a diagram showing an example of a neural network used by the display ratio calculation unit.
- FIG. 8 is a diagram showing an example of grid cells processed by the display ratio calculator.
- an example relating to an input image containing animals and various objects will be described, but the same can be applied when these are human faces and objects hiding faces, such as hands or other objects. be.
- the known YOLO You Only Look Once
- This technique is disclosed, for example, in the following documents. “Joseph Redmon, et al., “YOLOv3: An Incremental Improvement”, arXiv preprint, arXiv:1804.02767, 2018.”
- the processor 111A resizes the face image into a square, and transforms it into a CNN (Convolutional Neural Network), which is a neural network widely used in the field of image processing, as shown in FIG. )).
- the processor 111A extracts features from the face image through 24 convolution layers and 4 pooling layers (see symbol a in FIG. 7) in the CNN shown in FIG. In the Conn. Layer (see symbol b in FIG. 7), the Bounding Box of objects in the image and the object type probabilities can be estimated.
- the final output size 7 ⁇ 7 of the convolutional layer matches the number of divisions of the grid cell.
- the input image is divided into S ⁇ S grid cells as shown in FIG. 8 (see FIG. 8(a)).
- the processor 111A estimates the bounding box of B objects for each of the divided grid cells.
- the processor 111A stores a total of five scores for each bounding box, consisting of the coordinate values, width, height (x, y, w, h) of the bounding box and the confidence score that the bounding box is an object. Output the value (see (b) in FIG. 8).
- the coordinate values x, y are the center coordinates of the Bounding Box with reference to the grid cell boundary, the width w and height h are relative values to the size of the entire image, and the confidence score is the Represents the probability that the bounding box is an object or background. This probability is "1" for an object and "0" for a background.
- IoU Intersection over Union
- the processor 111A estimates object type probabilities for each grid cell. For example, the processor 111A estimates the probability of belonging to which class, that is, the conditional probability, when the grid cell is an object in C classification classes (( c) see).
- the processor 111A integrates the class probabilities estimated here with the above bounding boxes to obtain a plurality of bounding boxes indicating what the object is (see (d) in FIG. 8).
- the processor 111A selects these bounding boxes, including overlapping regions, based on the bounding boxes with high reliability scores using a method called NMS ((Non-Maximum Suppression) (see (e) in FIG. 8).
- NMS suppresses a region with a large IoU value (a high degree of overlap) with a threshold value, thereby obtaining a detection result of an object region.
- the processor 111A can calculate the display ratio of the face by dividing the area of the overlapping area by the area of the face area. can.
- the processor 111A operates as the change amount correction unit 17, reads the change amount according to the facial expression to be converted from the change amount storage unit 16, and performs the feature points recognized in S4, the face angle calculated in S5, Then, based on the display ratio calculated in S6, the amount of change obtained by correcting the read amount of change according to the expression to be converted is calculated (step S7).
- the processor 111A obtains the angle of the face, that is, the angle a of the face from the front, the limit face angle A at which recognition is possible, and the ratio H of the area where the face is hidden to the area of the entire face.
- the amount of change in expression conversion is attenuated, that is, the amount of change is corrected according to the following equation (1), and the corrected result is stored in the data memory 112 .
- ⁇ P new ⁇ P ⁇ (1 ⁇ H) ⁇ a/A ...
- Formula (1) ⁇ P new on the left side of equation (1) is the amount of change after attenuation, that is, after correction of the facial expression transformation, and ⁇ P on the right side is the amount of change of the facial expression transformation before correction.
- the amount of change after correction is calculated.
- (1) the ratio a / A of the face angle a from the front and the limit face angle A that can be recognized, and (2) The amount of change after correction may be calculated based on one of the ratio H of the area where the face is hidden to the area of the entire face.
- the facial expression conversion stops at an unnatural timing. It is possible to convert the expression of the face image naturally.
- the processor 111A operates as the facial expression transforming unit 18 to transform the facial image stored in the data memory 112 (step S8). That is, the processor 111A converts the face image based on the result of correcting the amount of change corresponding to the converted facial expression stored in the data memory 112 .
- processor 111A utilizes an implementation of MLS (see, eg, https://github.com/Jarvis73/Moving-Least-Squares), or the like.
- the processor 111A moves each feature point by the amount of change after correction of the amount of change corresponding to the converted facial expression stored in the data memory 112 .
- the x-y coordinates of the control point with feature point ID "1" are (23, 45) before conversion (see FIG. 4). is incremented by 1 and the y coordinate is incremented by 2 (see FIG. 5) to move the pixel of the feature point to (24, 47).
- x and y in the above equation (2) are coordinates of nearby feature points
- x' and y' are coordinates obtained by adding the amount of change to the coordinates of the feature points
- a, b, c, and d is a parameter
- t x , t y are translation parameters.
- the processor 111A calculates the least square means of the coordinates x, y of the feature points and the coordinates x′, y′ obtained by adding the amount of change, and sets the parameters a, b, c, d, t x , t y are determined by global optimization.
- the processor 111A uses x and y as the coordinates of the target point to be transformed, and uses the determined parameters to determine the coordinates after transformation.
- the processor 111A uses the parameters a, b, c, d, t x , and t y thus obtained to obtain the coordinates after the feature point is transformed by the above affine transformation.
- the processor 111A stores the converted face image in the data memory 112 as a converted image.
- the processor 111A operates as the image output unit 19 and outputs the converted image stored in the data memory 112 (step S9).
- the processor 111A causes the output device 300 to display a facial image via the input/output interface 113.
- FIG. Alternatively, the processor 111A transmits to the network via the communication interface 114 and displays it on a display device connected to the network, or displays it on a display unit of another computer connected to the network.
- the processor 111A determines whether or not to end the operation as the image conversion device 100 shown in the flowchart of FIG. 6 (step S10). For example, the processor 111A checks whether or not the user has instructed to end image conversion from the input device 200 via the input/output interface 113 or via the communication interface 114 . Here, when ending the above operation (YES in step S10), the processor 111A ends the operation shown in the flowchart of FIG.
- step S10 the processor 111A operates as the converted facial expression input unit 15 and determines whether or not the user has entered a change designation input for the converted facial expression. (step S11). If there is no change specification input for the converted facial expression (NO in step S11), the processor 111A proceeds to the process of step S3. Also, if there is an input specifying a change in the converted facial expression (YES in step S10), the processor 111A proceeds to the process of step S2.
- the image conversion device 100 includes a face angle calculator 13 , a display ratio calculator 14 , a change amount corrector 17 , and a facial expression converter 18 .
- a facial expression conversion unit 18 obtains a transformed image in which the facial expression of a person is transformed by transforming the feature points by a deformation amount according to the transformed facial expression to be transformed. Therefore, the image conversion apparatus 100 according to one embodiment stops facial expression conversion at unnatural timing even if feature points cannot be recognized due to a change in the angle of the face or a part of the face being hidden. Therefore, the expression of the face image can be converted naturally.
- each embodiment can be applied to a program (software means) that can be executed by a computer (computer), such as a magnetic disk (floppy disk, hard disk). etc.), optical discs (CD-ROM, DVD, MO, etc.), semiconductor memories (ROM, RAM, flash memory, etc.), etc., and can be transmitted and distributed via communication media.
- the programs stored on the medium also include a setting program for configuring software means (including not only execution programs but also tables and data structures) to be executed by the computer.
- a computer that realizes this apparatus reads a program recorded on a recording medium, and optionally constructs software means by a setting program, and executes the above-described processing by controlling the operation by this software means.
- the term "recording medium” as used herein is not limited to those for distribution, and includes storage media such as magnetic disks, semiconductor memories, etc. provided in computers or devices connected via a network.
- the present invention is not limited to the above-described embodiments, and can be variously modified in the implementation stage without departing from the gist of the present invention. Further, each embodiment may be implemented in combination as appropriate, in which case the combined effect can be obtained. Furthermore, various inventions are included in the above embodiments, and various inventions can be extracted by combinations selected from a plurality of disclosed constituent elements. For example, even if some constituent elements are deleted from all the constituent elements shown in the embodiments, if the problem can be solved and effects can be obtained, the configuration with the constituent elements deleted can be extracted as an invention.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Processing (AREA)
Abstract
An image transformation device according to an embodiment comprises: a feature point recognition unit that recognizes feature points of facial parts recognized from an image including a human face; a change amount correction unit that, when transforming the expression on the recognized face in the image into a target transformed facial expression, corrects an amount of change representing the amount of deformation for each feature point of the facial parts that corresponds to the transformed facial expression, on the basis of at least one of the ratio of the angle of the face in the image as measured from the front to a limit angle of the face at which the face can be recognized from the front, and the ratio of the area of the face not hidden by any objects to the entire area of the face; and a facial expression transformation unit that deforms the feature points according to the corrected amount of change and thereby obtains a transformed image in which the expression on the human face has been transformed.
Description
本発明の実施形態は、画像変換装置、方法およびプログラムに関する。
The embodiments of the present invention relate to an image conversion device, method and program.
非特許文献1は、リアルタイムな表情変形(表情変換)フィードバックによる感情体験の操作の可能性について開示している。非特許文献1では、被験者の顔をリアルタイムにトラッキング(tracking)して自然な表情変形処理を施している。非特許文献1では、画像変換法としてRigid MLS(Moving Least Squares)法を使用して、顔画像における表情を変形している。Rigid MLS法は、画像から認識した画像中の特徴点を認識して、これを移動させることで、画像を歪めるという手法である。このような手法は非特許文献2にも開示される。なお、顔画像とは、被験者の顔を撮影した画像、コンピュータが生成したアバターの顔を抽出した画像、などである。
Non-Patent Document 1 discloses the possibility of manipulating emotional experience through real-time facial expression deformation (facial expression conversion) feedback. In Non-Patent Document 1, a subject's face is tracked in real time to perform natural facial expression transformation processing. In Non-Patent Document 1, the Rigid MLS (Moving Least Squares) method is used as an image transformation method to transform facial expressions in facial images. The Rigid MLS method is a method of distorting an image by recognizing feature points in an image and moving them. Such a technique is also disclosed in Non-Patent Document 2. The face image is an image obtained by photographing the face of the subject, an image obtained by extracting the face of a computer-generated avatar, or the like.
しかしながら、被験者の顔の角度が変わったり、顔の一部が隠れたりすることで、上記の特徴点の認識ができなかった場合、不自然なタイミング(timing)にて表情変換が止まってしまうため、不自然な変換による顔画像しか得ることができない。すなわち、顔の画像に表れる表情をシームレス(seamless)に変換することができない。
However, if the above feature points cannot be recognized because the angle of the subject's face changes or a part of the face is hidden, facial expression conversion stops at unnatural timing. , only face images obtained by unnatural conversion can be obtained. In other words, it is not possible to seamlessly convert expressions appearing in facial images.
この発明は、上記事情に着目してなされたもので、その目的とするところは、顔の画像に表れる表情をシームレスに変換することができるようにした画像変換装置、方法およびプログラムを提供することにある。
SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and its object is to provide an image conversion apparatus, method, and program capable of seamlessly converting expressions appearing in facial images. It is in.
上記課題を解決するために、この発明の一態様に係る画像変換装置は、人の顔が含まれる画像から認識された顔パーツの特徴点を認識する特徴点認識部と、前記画像における顔が正面から認識できなくなる限界の角度に対する、正面からの前記顔の角度の比率と、前記顔の全体の領域に対する前記顔が物体で隠れている領域が除かれた領域の割合の少なくとも一方に基づいて、前記認識された顔の表情を変換するべき変換表情に変換するときの、前記変換表情に応じた前記顔パーツの特徴点のそれぞれについての変形量を表す変化量を補正する変化量補正部と、前記補正した変化量により前記特徴点を変形することで前記人の顔の表情を変換した変換画像を得る表情変換部と、を備える。
To solve the above problems, an image conversion apparatus according to an aspect of the present invention includes a feature point recognition unit that recognizes feature points of facial parts recognized from an image including a human face; Based on at least one of the ratio of the angle of the face from the front to the limit angle at which the face cannot be recognized from the front, and the ratio of the area excluding the area where the face is hidden by an object to the entire area of the face a change amount correction unit for correcting a change amount representing a deformation amount of each of the feature points of the facial parts according to the converted expression when converting the recognized facial expression into the converted expression to be converted; and an expression conversion unit for obtaining a converted image in which the expression of the person's face is converted by deforming the feature points according to the corrected amount of change.
上記課題を解決するために、この一態様に係る画像変換方法は、人の顔の画像における表情を変換する画像変換装置により行なわれる方法であって、前記画像変換装置の特徴点認識部により、人の顔が含まれる画像から認識された顔パーツの特徴点を認識することと、前記画像変換装置の変化量補正部により、前記画像における顔が正面から認識できなくなる限界の角度に対する、正面からの前記顔の角度の比率と、前記顔の全体の領域に対する前記顔が物体で隠れている領域が除かれた領域の割合の少なくとも一方に基づいて、前記認識された顔の表情を変換するべき変換表情に変換するときの、前記変換表情に応じた前記顔パーツの特徴点のそれぞれについての変形量を表す変化量を補正することと、前記画像変換装置の表情変換部により、前記補正した変化量により前記特徴点を変形することで前記人の顔の表情を変換した変換画像を得ることと、を具備する。
In order to solve the above problems, an image conversion method according to this aspect is a method performed by an image conversion device for converting an expression in an image of a person's face, wherein a feature point recognition unit of the image conversion device: recognizing feature points of facial parts recognized from an image containing a human face; and the ratio of the area excluding the area where the face is obscured by an object to the entire area of the face. correcting the amount of change representing the amount of deformation of each of the feature points of the facial parts according to the converted facial expression when converting to the converted facial expression; obtaining a transformed image in which the expression of the person's face is transformed by transforming the feature points by an amount.
本発明によれば、顔の画像に表れる表情をシームレスに変換することができる。
According to the present invention, facial expressions appearing in facial images can be seamlessly converted.
[一実施形態]
以下、図面を参照して、この発明に係わる一実施形態を説明する。
(構成例)
図1は、この発明の一実施形態に係る画像変換装置の構成の一例を示すブロック図である。
図1に示される例では、この発明の一実施形態に係る画像変換装置100は、画像取得部11、特徴点認識部12、顔角度算出部13、表示割合算出部14、変換表情入力部15、変化量格納部16、変化量補正部17、表情変換部18、及び画像出力部19を有する。 [One embodiment]
An embodiment according to the present invention will be described below with reference to the drawings.
(Configuration example)
FIG. 1 is a block diagram showing an example of the configuration of an image conversion device according to one embodiment of the invention.
In the example shown in FIG. 1, animage conversion apparatus 100 according to an embodiment of the present invention includes an image acquisition unit 11, a feature point recognition unit 12, a face angle calculation unit 13, a display ratio calculation unit 14, and a converted facial expression input unit 15. , a variation storage unit 16 , a variation correction unit 17 , an expression conversion unit 18 , and an image output unit 19 .
以下、図面を参照して、この発明に係わる一実施形態を説明する。
(構成例)
図1は、この発明の一実施形態に係る画像変換装置の構成の一例を示すブロック図である。
図1に示される例では、この発明の一実施形態に係る画像変換装置100は、画像取得部11、特徴点認識部12、顔角度算出部13、表示割合算出部14、変換表情入力部15、変化量格納部16、変化量補正部17、表情変換部18、及び画像出力部19を有する。 [One embodiment]
An embodiment according to the present invention will be described below with reference to the drawings.
(Configuration example)
FIG. 1 is a block diagram showing an example of the configuration of an image conversion device according to one embodiment of the invention.
In the example shown in FIG. 1, an
画像取得部11は、例えばwebカメラ(camera)により撮影された画像またはアバター(avatar)などからユーザ(user)の顔画像を取得する。画像取得部11は、取得した顔画像を、特徴点認識部12、表示割合算出部14、及び表情変換部18に出力する。
The image acquisition unit 11 acquires the face image of the user from, for example, an image captured by a web camera or an avatar. The image acquisition unit 11 outputs the acquired face image to the feature point recognition unit 12, the display ratio calculation unit 14, and the facial expression conversion unit 18. FIG.
特徴点認識部12は、画像取得部11が取得した顔画像を入力とし、その顔画像から認識される顔パーツ(parts)の特徴点を認識する。この特徴点認識部12における特徴点の認識手法については後述する。特徴点認識部12は、認識した特徴点を顔角度算出部13及び変化量補正部17に出力する。
The feature point recognition unit 12 receives the face image acquired by the image acquisition unit 11, and recognizes feature points of facial parts recognized from the face image. A method of recognizing feature points in the feature point recognition unit 12 will be described later. The feature point recognition section 12 outputs the recognized feature points to the face angle calculation section 13 and the change amount correction section 17 .
顔角度算出部13は、特徴点認識部12が認識した特徴点を入力とし、顔画像における顔の角度、例えば顔が正面を向いたときの位置を基準とした、顔の中心の現在の位置との間の角度(正面からの顔の角度と称することがある)を算出して、この算出した角度のデータ(data)を変化量補正部17に出力する。
The face angle calculation unit 13 receives the feature points recognized by the feature point recognition unit 12 as input, and calculates the current position of the center of the face with reference to the angle of the face in the face image, for example, the position when the face is facing forward. and (sometimes referred to as the angle of the face from the front) is calculated, and the calculated angle data (data) is output to the change amount correction unit 17 .
表示割合算出部14は、画像取得部11が取得した顔画像を入力とし、その顔画像に対して顔の全体のうち隠れている部分の割合を算出し、この算出した割合のデータを変化量補正部17に出力する。
The display ratio calculation unit 14 receives the face image acquired by the image acquisition unit 11 as input, calculates the ratio of the portion of the face that is hidden from the entire face with respect to the face image, and uses the calculated ratio data as the amount of change. Output to the correction unit 17 .
変換表情入力部15は、キーボード(keyboard)などのユーザインタフェース(user interface)からユーザが指定入力した、笑顔などの変換したい先の表情である変換表情(変換するべき変換表情と称することがある)を取得する。変換表情入力部15は、取得した変換表情を変化量補正部17に出力する。
The converted facial expression input unit 15 inputs a converted facial expression (sometimes referred to as a converted facial expression to be converted), which is a facial expression to be converted, such as a smile, specified and input by the user from a user interface such as a keyboard. to get The converted facial expression input unit 15 outputs the acquired converted facial expression to the change amount correction unit 17 .
変化量格納部16には、変換したい先の表情ごとに、各特徴点についての変形量(座標値の移動量)を表す変化量が予め格納(記憶)される。変化量は、変換したい先の表情に応じて各特徴点の座標値を、どの程度移動すべきかを示す情報である。変化量は、例えば、ユーザが特定の顔画像について無表情顔に表情変形処理を適用しながら、自然な表情となるように調整して、予め求めることができる。
The amount of change representing the amount of deformation (the amount of movement of coordinate values) for each feature point is stored (stored) in advance in the amount of change storage unit 16 for each facial expression to be converted. The amount of change is information indicating how much the coordinate values of each feature point should be moved according to the facial expression to be converted. The amount of change can be obtained in advance by, for example, adjusting a specific face image so that the user applies facial expression transformation processing to an expressionless face so as to obtain a natural facial expression.
変化量補正部17は、特徴点認識部12が認識した特徴点、顔角度算出部13により算出した顔角度、及び表示割合算出部14により算出した表示割合を入力する。
また、変化量補正部17は、変換表情入力部15から入力された変換表情で示される変換したい先の表情に応じた変化量を変化量格納部16から読み出す。
変化量補正部17は、これら入力した特徴点、顔角度、及び表示割合に基づいて、変換したい先の表情における変化量を後述する式によって補正した変化量を算出し、この算出した変化量のデータを表情変換部18に出力する。 The changeamount correction unit 17 inputs the feature points recognized by the feature point recognition unit 12 , the face angle calculated by the face angle calculation unit 13 , and the display ratio calculated by the display ratio calculation unit 14 .
Further, the changeamount correction unit 17 reads from the change amount storage unit 16 the change amount according to the desired facial expression to be converted indicated by the converted facial expression input from the converted facial expression input unit 15 .
Based on the input feature points, face angle, and display ratio, the changeamount correction unit 17 calculates the amount of change corrected by a formula to be described later for the amount of change in the facial expression that is to be converted, and calculates the amount of change that has been calculated. The data is output to the facial expression conversion section 18 .
また、変化量補正部17は、変換表情入力部15から入力された変換表情で示される変換したい先の表情に応じた変化量を変化量格納部16から読み出す。
変化量補正部17は、これら入力した特徴点、顔角度、及び表示割合に基づいて、変換したい先の表情における変化量を後述する式によって補正した変化量を算出し、この算出した変化量のデータを表情変換部18に出力する。 The change
Further, the change
Based on the input feature points, face angle, and display ratio, the change
表情変換部18は、変化量補正部17が補正した変化量を入力とする。表情変換部18は、上記補正した変化量、すなわち変換するべき変換表情に応じた変形量を表す変化量に基づいて、入力された顔画像における各特徴点を、入力した、その特徴点の補正した変化量である移動量に基づいて移動することで、顔画像の表情を変換した顔画像を得る。表情変換部18は、変換後の顔画像を画像出力部19に出力する。
The facial expression conversion unit 18 receives the amount of change corrected by the amount of change correction unit 17 as input. The facial expression conversion unit 18 corrects each feature point in the input face image based on the corrected amount of change, that is, the amount of change representing the amount of deformation according to the converted facial expression to be converted. By moving based on the amount of movement, which is the amount of change, a face image in which the expression of the face image is converted is obtained. The facial expression conversion section 18 outputs the converted face image to the image output section 19 .
画像出力部19は、表情変換部18からの変換後の顔画像を入力とし、入力された顔画像を出力する。ここで、出力とは、例えば、記憶媒体に記憶すること、ディスプレイ(display)で表示すること、通信ネットワークを介して他の機器へ送信すること、などを含む。
The image output unit 19 receives the face image after conversion from the facial expression conversion unit 18, and outputs the input face image. Here, output includes, for example, storing in a storage medium, displaying on a display, transmitting to another device via a communication network, and the like.
図2は、画像変換装置100のハードウェア構成の一例を示す図である。
画像変換装置100は、例えば、パーソナルコンピュータ(Personal computer)、スマートホン(smart phone)、サーバコンピュータ(server computer)、などのコンピュータにより構成される。画像変換装置100は、図2に示すように、CPU(Central Processing Unit)等のハードウェアプロセッサ(hardware processor)(単にプロセッサと称することがある)111Aを有する。なお、CPUは、マルチコア(multi-core)及びマルチスレッド(multithread)のものを用いることで、同時に複数の情報処理を実行することができる。また、プロセッサ111Aは、複数のCPUを備えていても良い。そして、画像変換装置100では、このプロセッサ111Aに対し、プログラムメモリ(program memory)111Bと、データメモリ(data memory)112と、通信インタフェース114と、入出力インタフェース113とが、バス(bus)115を介して接続される。 FIG. 2 is a diagram showing an example of the hardware configuration of theimage conversion device 100. As shown in FIG.
Theimage conversion apparatus 100 is configured by a computer such as a personal computer, a smart phone, a server computer, or the like. As shown in FIG. 2, the image conversion device 100 has a hardware processor (sometimes simply referred to as a processor) 111A such as a CPU (Central Processing Unit). By using a multi-core and multi-thread CPU, a plurality of information processes can be executed at the same time. Also, the processor 111A may include multiple CPUs. In the image conversion apparatus 100, a program memory 111B, a data memory 112, a communication interface 114, and an input/output interface 113 are connected to the processor 111A via a bus 115. connected through
画像変換装置100は、例えば、パーソナルコンピュータ(Personal computer)、スマートホン(smart phone)、サーバコンピュータ(server computer)、などのコンピュータにより構成される。画像変換装置100は、図2に示すように、CPU(Central Processing Unit)等のハードウェアプロセッサ(hardware processor)(単にプロセッサと称することがある)111Aを有する。なお、CPUは、マルチコア(multi-core)及びマルチスレッド(multithread)のものを用いることで、同時に複数の情報処理を実行することができる。また、プロセッサ111Aは、複数のCPUを備えていても良い。そして、画像変換装置100では、このプロセッサ111Aに対し、プログラムメモリ(program memory)111Bと、データメモリ(data memory)112と、通信インタフェース114と、入出力インタフェース113とが、バス(bus)115を介して接続される。 FIG. 2 is a diagram showing an example of the hardware configuration of the
The
通信インタフェース114は、例えば一つ以上の有線または無線の通信モジュールを含むことができる。通信インタフェース114は、ケーブル(cable)もしくはLAN(Local Area Network)またはインターネット(internet)等のネットワーク(NW)を介して接続される他のコンピュータおよびwebカメラ、などとの間で通信を行うことができる。
The communication interface 114 can include, for example, one or more wired or wireless communication modules. The communication interface 114 can communicate with other computers, web cameras, etc. connected via a cable, a LAN (Local Area Network), or a network (NW) such as the Internet. can.
入出力インタフェース113には、入力デバイス(device)200及び出力デバイス300が接続されている。入力デバイス200は、キーボード、マウス(mouse)などのポインティングデバイス(pointing device)、などの入力デバイス、カメラなどのセンサデバイス(sensor device)、などを含む。また、出力デバイス300は、液晶ディスプレイ、CRT(Cathode Ray Tube)ディスプレイ、などの表示デバイスである。入力デバイス200及び出力デバイス300は、いわゆるタブレット(tablet)型の入力・表示デバイスを用いたものが用いられることもできる。この種の入力・表示デバイスは、例えば液晶または有機EL(Electro Luminescence)を使用した表示デバイスの表示画面上に、静電方式または圧力方式を採用した入力検知シート(sheet)を配置して構成される。入出力インタフェース113は、上記入力デバイス200において入力された操作情報をプロセッサ111Aに入力すると共に、プロセッサ111Aで生成された表示情報を出力デバイス300に表示させる。
An input device 200 and an output device 300 are connected to the input/output interface 113 . The input device 200 includes an input device such as a keyboard, a pointing device such as a mouse, a sensor device such as a camera, and the like. Also, the output device 300 is a display device such as a liquid crystal display, a CRT (Cathode Ray Tube) display, or the like. The input device 200 and the output device 300 can also be those using a so-called tablet type input/display device. This type of input/display device is configured by arranging an input detection sheet that employs an electrostatic method or a pressure method on the display screen of a display device that uses liquid crystal or organic EL (Electro Luminescence), for example. be. The input/output interface 113 inputs operation information input by the input device 200 to the processor 111A, and causes the output device 300 to display display information generated by the processor 111A.
なお、入力デバイス200及び出力デバイス300は、入出力インタフェース113に接続されていなくても良い。入力デバイス200及び出力デバイス300は、通信インタフェース114と直接またはネットワークを介して接続するための通信ユニットを備えることで、プロセッサ111Aとの間で情報の授受を行い得る。
Note that the input device 200 and the output device 300 do not have to be connected to the input/output interface 113 . The input device 200 and the output device 300 are provided with a communication unit for connecting to the communication interface 114 directly or via a network, so that information can be exchanged with the processor 111A.
また、入出力インタフェース113は、フラッシュメモリ(Flash memory)等の半導体メモリといった記録媒体のリード/ライト(read / write)機能を有しても良いし、あるいは、そのような記録媒体のリード/ライト機能を持ったリーダライタ(reader writer)との接続機能を有しても良い。さらに、入出力インタフェース113は、他の機器との接続機能を有して良い。
In addition, the input/output interface 113 may have a read/write function of a recording medium such as a semiconductor memory such as a flash memory, or read/write of such a recording medium. It may have a connection function with a reader writer (reader writer) with the function. Furthermore, the input/output interface 113 may have a connection function with other devices.
プログラムメモリ111Bは、非一時的な有形のコンピュータ可読記憶媒体として、随時書込み及び読出しが可能な不揮発性メモリ(non-volatile memory)と、随時読出しのみが可能な不揮発性メモリとが組み合わせて使用されたものである。随時書込み及び読出しが可能な不揮発性メモリは、例えば、HDD(Hard Disk Drive)、SSD(Solid State Drive)、などである。随時読出しのみが可能な不揮発性メモリは、例えば、ROM(Read Only Memory)などである。このプログラムメモリ111Bには、プロセッサ111Aが一実施形態に係る各種制御処理を実行するために必要なプログラム、例えば画像変換プログラムが格納されている。すなわち、上記の画像取得部11、特徴点認識部12、顔角度算出部13、表示割合算出部14、変換表情入力部15、変化量補正部17、表情変換部18、及び画像出力部19の各部における処理機能部は、何れも、プログラムメモリ111Bに格納された画像変換プログラムを上記プロセッサ111Aにより読み出させて実行させることにより実現され得る。なお、これらの処理機能部の一部または全部は、特定用途向け集積回路(ASIC:Application Specific Integrated Circuit)またはFPGA(field-programmable gate array)等の集積回路を含む、他の多様な形式によって実現されても良い。
The program memory 111B is used as a non-temporary tangible computer-readable storage medium by combining a non-volatile memory that can be written and read at any time and a non-volatile memory that can only be read at any time. It is a thing. Non-volatile memories that can be written and read at any time are, for example, HDDs (Hard Disk Drives), SSDs (Solid State Drives), and the like. A non-volatile memory that can only be read at any time is, for example, a ROM (Read Only Memory). The program memory 111B stores a program necessary for the processor 111A to execute various control processes according to one embodiment, such as an image conversion program. That is, the image acquisition unit 11, the feature point recognition unit 12, the face angle calculation unit 13, the display ratio calculation unit 14, the converted expression input unit 15, the change amount correction unit 17, the expression conversion unit 18, and the image output unit 19 Each of the processing function units in each unit can be realized by causing the processor 111A to read and execute the image conversion program stored in the program memory 111B. Some or all of these processing functions may be implemented in various other forms, including integrated circuits such as Application Specific Integrated Circuits (ASICs) or field-programmable gate arrays (FPGAs). May be.
データメモリ112は、有形のコンピュータ可読記憶媒体として、例えば、上記の不揮発性メモリと、RAM(Random Access Memory)等の揮発性メモリ(volatile memory)とが組み合わせて使用されたものである。このデータメモリ112は、各種処理が行われる過程で取得及び作成された各種データが記憶されるために用いられる。すなわち、データメモリ112には、各種処理が行われる過程で、適宜、各種データを記憶するための領域が確保される。
The data memory 112 is used as a tangible computer-readable storage medium, for example, by combining the above nonvolatile memory and a volatile memory such as RAM (random access memory). This data memory 112 is used to store various data acquired and created in the process of performing various processes. That is, in the data memory 112, an area for storing various data is appropriately secured in the process of performing various processes.
図3は、顔の特徴点の一例を示す図である。図3中の星印が、プロセッサ111Aが認識した特徴点であり、各特徴点の横に付された数字は各特徴点を識別するための一意な特徴点ID(IDentifier)である。特徴点IDの数及び各特徴点IDに対する顔の部分は、採用する特徴点認識手法により決まっている。例えば、特徴点ID「18」の特徴点は向かって左の眉の左端、のように予め決まっている。
FIG. 3 is a diagram showing an example of facial feature points. The asterisks in FIG. 3 are feature points recognized by the processor 111A, and the numbers attached to each feature point are unique feature point IDs (IDentifiers) for identifying each feature point. The number of feature point IDs and the portion of the face for each feature point ID are determined by the feature point recognition method employed. For example, the feature point with the feature point ID "18" is predetermined as the left edge of the left eyebrow.
図4は、特徴点の記憶形態の一例を示す図である。図4に示すように、データメモリ112には、テーブル(table)形式で、特徴点IDに対応付けて顔画像中の特徴点のx座標及びy座標が記憶される。座標の値はピクセル(pixel)である。従って、データメモリ112には、図3の例であれば、特徴点ID「1」~「68」に係る特徴点について、そのxy座標が記憶される。
FIG. 4 is a diagram showing an example of a storage form of feature points. As shown in FIG. 4, the data memory 112 stores the x-coordinates and y-coordinates of the feature points in the face image in association with the feature point IDs in the form of a table. Coordinate values are in pixels. Therefore, in the example of FIG. 3, the data memory 112 stores the xy coordinates of the feature points with the feature point IDs "1" to "68".
データメモリ112には、プロセッサ111Aが上記の変換表情入力部15として動作したときに取得した、ユーザによって指定された変換表情が記憶される。
データメモリ112には、上記の変化量格納部16に格納される変換量が格納され得る。 Thedata memory 112 stores the converted facial expression designated by the user, which is obtained when the processor 111A operates as the above-described converted facial expression input unit 15. FIG.
Thedata memory 112 can store the conversion amount stored in the change amount storage unit 16 described above.
データメモリ112には、上記の変化量格納部16に格納される変換量が格納され得る。 The
The
図5は、変化量の記憶形態の一例を示す図である。図5に示すように、データメモリ112には、変換表情ごとに、特徴点IDに対応付けて、特徴点のx座標の変化量とy座標の変化量とが、被写体である人物によらない変化量として、テーブル形式で記憶される。変化量の値はピクセルである。変化量は、特徴点の移動方向と移動量によって表される。例えば、移動量「+1」は、正方向に1ピクセル移動することを表す。
FIG. 5 is a diagram showing an example of the storage form of the amount of change. As shown in FIG. 5, in the data memory 112, the amount of change in the x-coordinate and the amount of change in the y-coordinate of the feature point are stored in association with the feature point ID for each transformed facial expression, regardless of the person who is the subject. The amount of change is stored in a table format. The delta value is in pixels. The amount of change is represented by the direction and amount of movement of the feature point. For example, a movement amount of "+1" represents a movement of 1 pixel in the positive direction.
データメモリ112には、プロセッサ111Aが上記の表情変換部18として動作したときに変換した顔画像が記憶され得る。
また、データメモリ112には、プロセッサ111Aが動作途中で発生する種々の中間データが記憶され得る。 Thedata memory 112 can store face images converted when the processor 111A operates as the facial expression converter 18 described above.
In addition, thedata memory 112 can store various intermediate data generated during the operation of the processor 111A.
また、データメモリ112には、プロセッサ111Aが動作途中で発生する種々の中間データが記憶され得る。 The
In addition, the
(動作)
次に、画像変換装置100の動作を説明する。
図6は、画像変換装置100による画像変換処理動作の一例を示すフローチャートである。画像変換装置100のプロセッサ111Aは、プログラムメモリ111Bに記憶された画像変換プログラムを読み出して実行することで、このフローチャートに示す画像変換装置100としての動作を開始する。プロセッサ111Aでの画像変換プログラムの実行は、入力デバイス200から、入出力インタフェース113を介して、あるいは、通信インタフェース114を介して、画像変換の実施を指示されることで開始される。 (motion)
Next, the operation of theimage conversion device 100 will be described.
FIG. 6 is a flow chart showing an example of image conversion processing operation by theimage conversion device 100 . The processor 111A of the image conversion device 100 reads and executes the image conversion program stored in the program memory 111B, thereby starting the operation of the image conversion device 100 shown in this flowchart. Execution of the image conversion program by the processor 111A is started when an instruction to perform image conversion is issued from the input device 200 via the input/output interface 113 or via the communication interface 114 .
次に、画像変換装置100の動作を説明する。
図6は、画像変換装置100による画像変換処理動作の一例を示すフローチャートである。画像変換装置100のプロセッサ111Aは、プログラムメモリ111Bに記憶された画像変換プログラムを読み出して実行することで、このフローチャートに示す画像変換装置100としての動作を開始する。プロセッサ111Aでの画像変換プログラムの実行は、入力デバイス200から、入出力インタフェース113を介して、あるいは、通信インタフェース114を介して、画像変換の実施を指示されることで開始される。 (motion)
Next, the operation of the
FIG. 6 is a flow chart showing an example of image conversion processing operation by the
プロセッサ111Aは、変換表情入力部15として動作して、ユーザによる、笑顔などの変換したい先の表情である変換表情の指定入力を待つ(ステップS1)。例えば、プロセッサ111Aは、入出力インタフェース113または通信インタフェース114を介した入力デバイス200からの入力信号が変換表情の指定入力を含むか否かを判断する。変換表情の指定入力が有ったならば、プロセッサ111Aは、ステップS2の処理へ移行する。
The processor 111A operates as the converted facial expression input unit 15 and waits for the user to input a specified converted facial expression, such as a smile, which is the facial expression to be converted (step S1). For example, the processor 111A determines whether or not the input signal from the input device 200 via the input/output interface 113 or the communication interface 114 includes a designated input of a converted facial expression. If there is an input specifying a converted facial expression, the processor 111A proceeds to the process of step S2.
プロセッサ111Aは、指定された変換表情を、データメモリ112に記憶させる(ステップS2)。
The processor 111A stores the designated converted facial expression in the data memory 112 (step S2).
プロセッサ111Aは、画像取得部11として動作して、顔画像を取得する(ステップS3)。例えば、プロセッサ111Aは、入力デバイス200のカメラによる被験者の顔の撮影画像を入出力インタフェース113を介して取得する。あるいは、プロセッサ111Aは、ネットワークに接続されたwebカメラにより撮影された顔画像または他のコンピュータが生成したアバターの顔を通信インタフェース114を介して取得する。プロセッサ111Aは、取得した顔画像を、データメモリ112に記憶させる。
The processor 111A operates as the image acquisition unit 11 and acquires a face image (step S3). For example, the processor 111A acquires an image of the subject's face captured by the camera of the input device 200 via the input/output interface 113 . Alternatively, processor 111A obtains through communication interface 114 a face image captured by a network-connected web camera or other computer-generated avatar face. Processor 111A causes data memory 112 to store the acquired face image.
プロセッサ111Aは、特徴点認識部12として動作して、データメモリ112に記憶されている顔画像から特徴点を認識する(ステップS4)。プロセッサ111Aは、例えば、dlibのface_landmark_detection関数(例えばhttp://dlib.net/face_landmark_detection.py.htmlを参照)などを利用して、顔画像に対して特徴点を認識する。具体的には、プロセッサ111Aは、入力の顔画像に対して、HOG(Histogram of Oriented Gradients)特徴と呼ばれる輝度の勾配方向の分布を抽出する。HOG特徴と顔の特徴点の位置を紐付けたデータをもとに学習されたモデル(model)は一般的に提供されている。よって、プロセッサ111Aは、抽出されたHOG特徴を、この学習モデルに入力し、顔の特徴点の位置を取得する。プロセッサ111Aは、取得した特徴点の位置をデータメモリ112に記憶させる。
The processor 111A operates as the feature point recognition unit 12 and recognizes feature points from the face image stored in the data memory 112 (step S4). The processor 111A uses, for example, the face_landmark_detection function of dlib (see, for example, http://dlib.net/face_landmark_detection.py.html) to recognize feature points in the face image. Specifically, the processor 111A extracts a luminance gradient direction distribution called HOG (Histogram of Oriented Gradients) features from the input face image. A model that is learned based on data that associates HOG features with positions of facial feature points is generally provided. Therefore, the processor 111A inputs the extracted HOG features to this learning model to obtain the positions of the feature points of the face. The processor 111A causes the data memory 112 to store the positions of the acquired feature points.
プロセッサ111Aは、顔角度算出部13として動作して、例えばopencvなどを利用して、顔画像における顔の角度を算出する(ステップS5)。
具体的には、プロセッサ111Aは、顔が正面に向いているときの顔パーツの特徴点の3次元位置(P_3d)を予め計測して、これをデータメモリ112に保持する。
プロセッサ111Aは、顔画像の顔パーツの現在の特徴点の2次元位置(P’_2d)を取得する。
プロセッサ111Aは、上記3次元位置(P_3d)を回転または移動したときの顔パーツの特徴点の2次元位置(P_2d)を算出する。
プロセッサ111Aは、例えばopencvのProjectPoints2関数(例えばhttp://opencv.jp/opencv-2svn/py/camera_calibration_and_3d_reconstruction.html#projectpoints2を参照)などを利用して、上記各2次元位置を算出する。 Theprocessor 111A operates as the face angle calculation unit 13 and calculates the angle of the face in the face image using, for example, opencv (step S5).
Specifically, theprocessor 111A measures in advance the three-dimensional position (P_3d) of the feature points of the facial parts when the face is facing forward, and stores this in the data memory 112. FIG.
Theprocessor 111A obtains the two-dimensional position (P'_2d) of the current feature point of the facial part of the facial image.
Theprocessor 111A calculates the two-dimensional position (P_2d) of the feature point of the facial part when the three-dimensional position (P_3d) is rotated or moved.
Theprocessor 111A calculates the two-dimensional positions using, for example, the opencv ProjectPoints2 function (see, for example, http://opencv.jp/opencv-2svn/py/camera_calibration_and_3d_reconstruction.html#projectpoints2).
具体的には、プロセッサ111Aは、顔が正面に向いているときの顔パーツの特徴点の3次元位置(P_3d)を予め計測して、これをデータメモリ112に保持する。
プロセッサ111Aは、顔画像の顔パーツの現在の特徴点の2次元位置(P’_2d)を取得する。
プロセッサ111Aは、上記3次元位置(P_3d)を回転または移動したときの顔パーツの特徴点の2次元位置(P_2d)を算出する。
プロセッサ111Aは、例えばopencvのProjectPoints2関数(例えばhttp://opencv.jp/opencv-2svn/py/camera_calibration_and_3d_reconstruction.html#projectpoints2を参照)などを利用して、上記各2次元位置を算出する。 The
Specifically, the
The
The
The
プロセッサ111Aは、2次元位置(P_2d)と、2次元位置(P’_2d)の距離の二乗和(sum of squares)(D)を算出する。
プロセッサ111Aは、この二乗和Dを最小化するような角度(および移動量)を大域的最適化(global optimization)により求める。 Theprocessor 111A calculates the sum of squares (D) of the distances between the two-dimensional position (P_2d) and the two-dimensional position (P'_2d).
Theprocessor 111A obtains the angle (and the amount of movement) that minimizes the sum of squares D by global optimization.
プロセッサ111Aは、この二乗和Dを最小化するような角度(および移動量)を大域的最適化(global optimization)により求める。 The
The
プロセッサ111Aは、例えばopencvのsolvPnP関数(例えばhttp://opencv.jp/opencv-2svn/cpp/camera_calibration_and_3d_reconstruction.html#cv-solvepnpを参照)などを利用して、上記最小化するような角度(および移動量)を正面からの顔の角度(a)として算出部する。
The processor 111A uses, for example, the opencv solvPnP function (see, for example, http://opencv.jp/opencv-2svn/cpp/camera_calibration_and_3d_reconstruction.html#cv-solvepnp) to determine the angle to be minimized (and The amount of movement) is calculated as the angle (a) of the face from the front.
プロセッサ111Aは、顔認識ツールを起動しつつ顔を動かしながら、認識ができなくなった際の特徴点の位置を取得することにより、認識ができる限界の顔の角度(A)を被写体の人物によらない角度として予め算出し、これをデータメモリ112に保持する。
The processor 111A activates the face recognition tool and moves the face, and obtains the positions of the feature points when recognition becomes impossible. is calculated in advance as an angle that does not occur, and is stored in the data memory 112 .
次に、プロセッサ111Aは、表示割合算出部14として動作して、顔画像に対して顔の全体の領域のうち顔以外の物体で隠れている領域の割合である、顔の表示割合を算出する(ステップS6)。例えば顔の全体の10%が顔以外の物体で隠れていれば、上記顔の表示割合は10%となる。
Next, the processor 111A operates as the display ratio calculation unit 14 and calculates the display ratio of the face, which is the ratio of the area hidden by objects other than the face to the entire area of the face with respect to the face image. (Step S6). For example, if 10% of the entire face is hidden by objects other than the face, the display ratio of the face is 10%.
ここで、表示割合算出部14による算出の例を図7および図8を参照して説明する。
Here, an example of calculation by the display ratio calculation unit 14 will be described with reference to FIGS. 7 and 8. FIG.
図7は、表示割合算出部により用いられるニューラルネットワークの一例を示す図である。図8は、表示割合算出部により処理されるグリッドセルの一例を示す図である。ここでは、動物および各種物体が含まれる入力画像に係る例を説明するが、これらが人の顔および顔を隠している物体、例えば手またはその他の物体であるときにも同様に適用が可能である。
FIG. 7 is a diagram showing an example of a neural network used by the display ratio calculation unit. FIG. 8 is a diagram showing an example of grid cells processed by the display ratio calculator. Here, an example relating to an input image containing animals and various objects will be described, but the same can be applied when these are human faces and objects hiding faces, such as hands or other objects. be.
図7および図8に示された例では、既知のYOLO (You Only Look Once)(ディープラーニング(deep learning)による一般物体検出手法)が用いられ得る。この手法は、例えば下記の資料に開示される。
「Joseph Redmon, et al., “YOLOv3: An Incremental Improvement”, arXiv preprint, arXiv:1804.02767, 2018.」 In the examples shown in FIGS. 7 and 8, the known YOLO (You Only Look Once) (general object detection technique by deep learning) can be used. This technique is disclosed, for example, in the following documents.
“Joseph Redmon, et al., “YOLOv3: An Incremental Improvement”, arXiv preprint, arXiv:1804.02767, 2018.”
「Joseph Redmon, et al., “YOLOv3: An Incremental Improvement”, arXiv preprint, arXiv:1804.02767, 2018.」 In the examples shown in FIGS. 7 and 8, the known YOLO (You Only Look Once) (general object detection technique by deep learning) can be used. This technique is disclosed, for example, in the following documents.
“Joseph Redmon, et al., “YOLOv3: An Incremental Improvement”, arXiv preprint, arXiv:1804.02767, 2018.”
この手法では、プロセッサ111Aは、顔画像を正方形にリサイズ(resize)し、これを図7に示されるような、画像処理の分野で数多く用いられるニューラルネットワークであるCNN(Convolutional Neural Network(畳み込みニューラルネットワーク))に入力する。プロセッサ111Aは、図7に示されたCNNにおける24層の畳み込み層(Conv. Layer)および4層のpooling層(図7の符号a参照)を経て顔画像から特徴を抽出し、2層の全結合層(Conn. Layer)で(図7の符号b参照)、画像における物体のBounding Box、および物体の種類の確率を推定することができる。畳み込み層の最終出力サイズ7×7はgrid cellの分割数と一致する。
In this method, the processor 111A resizes the face image into a square, and transforms it into a CNN (Convolutional Neural Network), which is a neural network widely used in the field of image processing, as shown in FIG. )). The processor 111A extracts features from the face image through 24 convolution layers and 4 pooling layers (see symbol a in FIG. 7) in the CNN shown in FIG. In the Conn. Layer (see symbol b in FIG. 7), the Bounding Box of objects in the image and the object type probabilities can be estimated. The final output size 7×7 of the convolutional layer matches the number of divisions of the grid cell.
上記入力された画像は、図8に示されるような、S×Sのgrid cellに分割される(図8の(a)参照)。
プロセッサ111Aは、上記分割した各grid cellに対して、B個の物体のBounding Boxを推定する。プロセッサ111Aは、1つのBounding Boxにつき、Bounding Boxの座標値、幅、高さ(x, y, w, h)と、そのBounding Boxが物体である信頼度(confidence)スコアでなる、計5つの値を出力する(図8の(b)参照)。 The input image is divided into S×S grid cells as shown in FIG. 8 (see FIG. 8(a)).
Theprocessor 111A estimates the bounding box of B objects for each of the divided grid cells. The processor 111A stores a total of five scores for each bounding box, consisting of the coordinate values, width, height (x, y, w, h) of the bounding box and the confidence score that the bounding box is an object. Output the value (see (b) in FIG. 8).
プロセッサ111Aは、上記分割した各grid cellに対して、B個の物体のBounding Boxを推定する。プロセッサ111Aは、1つのBounding Boxにつき、Bounding Boxの座標値、幅、高さ(x, y, w, h)と、そのBounding Boxが物体である信頼度(confidence)スコアでなる、計5つの値を出力する(図8の(b)参照)。 The input image is divided into S×S grid cells as shown in FIG. 8 (see FIG. 8(a)).
The
座標値のx, yは、grid cellの境界を基準にしたBounding Boxの中心座標であり、幅wと高さhは画像全体のサイズに対する相対値であり、信頼度スコア(score)は、そのBounding Boxが物体か背景かの確率を表す。この確率は、物体なら「1」で背景であれば「0」である。
The coordinate values x, y are the center coordinates of the Bounding Box with reference to the grid cell boundary, the width w and height h are relative values to the size of the entire image, and the confidence score is the Represents the probability that the bounding box is an object or background. This probability is "1" for an object and "0" for a background.
物体領域の推定精度を測る指標として、正解Bounding Boxと推定Bounding Boxの一致具合を表すIoU (Intersection over Union)がある。上記YOLOではBounding Boxの信頼度スコアがIoUを表す。
As an index to measure the estimation accuracy of the object area, there is IoU (Intersection over Union), which indicates the degree of agreement between the correct bounding box and the estimated bounding box. In the above YOLO, the reliability score of the Bounding Box represents the IoU.
プロセッサ111Aは、各grid cell単位で物体の種類の確率を推定する。例えば、プロセッサ111Aは、C種類の分類クラス(classification class)で、grid cellが物体である場合に、どのクラスに属するかの確率、すなわち条件付き確率(conditional probability)を推定する(図8の(c)参照)。
The processor 111A estimates object type probabilities for each grid cell. For example, the processor 111A estimates the probability of belonging to which class, that is, the conditional probability, when the grid cell is an object in C classification classes (( c) see).
プロセッサ111Aは、ここで推定したクラス確率を上記のBounding Boxと統合することで、何の物体であるかを示す複数のBounding Boxを得る(図8の(d)参照)。
The processor 111A integrates the class probabilities estimated here with the above bounding boxes to obtain a plurality of bounding boxes indicating what the object is (see (d) in FIG. 8).
プロセッサ111Aは、重複領域も含んだ、これらのBounding Boxを、信頼度スコアの高いBounding Boxを基準にNMS((Non-Maximum Suppression)という手法で選別する(図8の(e)参照)。NMSは、IoU値が大きい(重なり度合いの高い)領域をしきい値で抑制(suppression)する。これにより物体領域の検出結果が得られる。
The processor 111A selects these bounding boxes, including overlapping regions, based on the bounding boxes with high reliability scores using a method called NMS ((Non-Maximum Suppression) (see (e) in FIG. 8). NMS. suppresses a region with a large IoU value (a high degree of overlap) with a threshold value, thereby obtaining a detection result of an object region.
プロセッサ111Aは、顔領域と、この領域に重畳する物体領域があったときは、重畳している領域の面積を顔領域の面積で除することによって、上記の顔の表示割合を算出することができる。
When there is a face area and an object area overlapping this area, the processor 111A can calculate the display ratio of the face by dividing the area of the overlapping area by the area of the face area. can.
次に、プロセッサ111Aは、変化量補正部17として動作して、変換したい先の表情に応じた変化量を変化量格納部16から読み出し、S4で認識した特徴点、S5で算出した顔角度、及びS6で算出した表示割合に基づいて、変換したい先の表情に応じた、上記読み出した変化量を補正した変化量を算出する(ステップS7)。
Next, the processor 111A operates as the change amount correction unit 17, reads the change amount according to the facial expression to be converted from the change amount storage unit 16, and performs the feature points recognized in S4, the face angle calculated in S5, Then, based on the display ratio calculated in S6, the amount of change obtained by correcting the read amount of change according to the expression to be converted is calculated (step S7).
具体的には、プロセッサ111Aは、顔の角度、すなわち正面からの顔の角度aおよび認識ができる限界の顔の角度Aと、顔全体の領域に対する顔が隠れている領域の割合Hを取得し、これらに応じて、下記の式(1)により、表情変換の変化量を減衰させる、すなわち変化量を補正し、この補正した結果をデータメモリ112に保持する。
ΔPnew=ΔP・(1-H)・a/А …式(1)
式(1)の左辺ΔPnewは、表情変換の減衰させた、すなわち補正後の変化量であり、右辺のΔPは表情変換の補正前の変化量である。 Specifically, theprocessor 111A obtains the angle of the face, that is, the angle a of the face from the front, the limit face angle A at which recognition is possible, and the ratio H of the area where the face is hidden to the area of the entire face. , the amount of change in expression conversion is attenuated, that is, the amount of change is corrected according to the following equation (1), and the corrected result is stored in the data memory 112 .
ΔP new =ΔP・(1−H)・a/A … Formula (1)
ΔP new on the left side of equation (1) is the amount of change after attenuation, that is, after correction of the facial expression transformation, and ΔP on the right side is the amount of change of the facial expression transformation before correction.
ΔPnew=ΔP・(1-H)・a/А …式(1)
式(1)の左辺ΔPnewは、表情変換の減衰させた、すなわち補正後の変化量であり、右辺のΔPは表情変換の補正前の変化量である。 Specifically, the
ΔP new =ΔP・(1−H)・a/A … Formula (1)
ΔP new on the left side of equation (1) is the amount of change after attenuation, that is, after correction of the facial expression transformation, and ΔP on the right side is the amount of change of the facial expression transformation before correction.
すなわち、上記の例では、(1)正面からの顔の角度aおよび認識ができる限界の顔の角度Aとの比率a/Аと、(2)顔全体の領域に対する顔が隠れている領域の割合Hと、に基づいて、補正後の変化量が算出される。
なお、この例に限らず、例えば、許容される精度の範囲内で、(1)正面からの顔の角度aおよび認識ができる限界の顔の角度Aとの比率a/Аと、(2)顔全体の領域に対する顔が隠れている領域の割合Hと、の一方に基づいて補正後の変化量が算出されてもよい。 That is, in the above example, (1) the ratio a/A between the face angle a from the front and the limit face angle A that can be recognized, and (2) the area where the face is hidden with respect to the entire face area. Based on the ratio H, the amount of change after correction is calculated.
Not limited to this example, for example, within the range of allowable accuracy, (1) the ratio a / A of the face angle a from the front and the limit face angle A that can be recognized, and (2) The amount of change after correction may be calculated based on one of the ratio H of the area where the face is hidden to the area of the entire face.
なお、この例に限らず、例えば、許容される精度の範囲内で、(1)正面からの顔の角度aおよび認識ができる限界の顔の角度Aとの比率a/Аと、(2)顔全体の領域に対する顔が隠れている領域の割合Hと、の一方に基づいて補正後の変化量が算出されてもよい。 That is, in the above example, (1) the ratio a/A between the face angle a from the front and the limit face angle A that can be recognized, and (2) the area where the face is hidden with respect to the entire face area. Based on the ratio H, the amount of change after correction is calculated.
Not limited to this example, for example, within the range of allowable accuracy, (1) the ratio a / A of the face angle a from the front and the limit face angle A that can be recognized, and (2) The amount of change after correction may be calculated based on one of the ratio H of the area where the face is hidden to the area of the entire face.
このようにして変化量を補正すれば、顔の角度が変わったり、顔の一部が隠れたりすることにより、特徴点の認識ができなかったとしても、自然でないタイミングで表情変換が止まることが無くなり、顔画像の表情を自然に変換することができる。
By correcting the amount of change in this way, even if the face angle changes or a part of the face is hidden, even if the feature points cannot be recognized, the facial expression conversion stops at an unnatural timing. It is possible to convert the expression of the face image naturally.
プロセッサ111Aは、表情変換部18として動作して、データメモリ112に記憶されている顔画像の表情を変換する(ステップS8)。すなわち、プロセッサ111Aは、データメモリ112に記憶された、変換表情に応じた変化量が補正された結果に基づいて、顔画像を変換する。例えば、プロセッサ111Aは、MLSの実装(例えばhttps://github.com/Jarvis73/Moving-Least-Squaresを参照)などを利用する。
The processor 111A operates as the facial expression transforming unit 18 to transform the facial image stored in the data memory 112 (step S8). That is, the processor 111A converts the face image based on the result of correcting the amount of change corresponding to the converted facial expression stored in the data memory 112 . For example, processor 111A utilizes an implementation of MLS (see, eg, https://github.com/Jarvis73/Moving-Least-Squares), or the like.
具体的には、プロセッサ111Aは、各特徴点について、データメモリ112に記憶された変換表情に応じた変化量の補正後の変化量分だけ移動させる。例えば、表情を笑顔に変換する場合には、特徴点ID「1」の制御点については、変換前のxy座標が(23,45)であるので(図4参照)、プロセッサ111Aは、x座標を「+1」、y座標を「+2」する(図5参照)ことで、当該特徴点の画素を(24,47)に移動するような変換を行う。
Specifically, the processor 111A moves each feature point by the amount of change after correction of the amount of change corresponding to the converted facial expression stored in the data memory 112 . For example, when converting a facial expression into a smile, the x-y coordinates of the control point with feature point ID "1" are (23, 45) before conversion (see FIG. 4). is incremented by 1 and the y coordinate is incremented by 2 (see FIG. 5) to move the pixel of the feature point to (24, 47).
そして、特徴点については、プロセッサ111Aは、下記の式(2)に示されるアフィン(Affine)変換(ヘルマート(Helmert)変換=相似変換及びrigid deformation=剛体変形を含む)を適用する。
Then, for the feature points, the processor 111A applies Affine transformation (including Helmert transformation = similarity transformation and rigid deformation = rigid deformation) shown in Equation (2) below.
ただし、上記式(2)のx,yは近傍の特徴点の座標であり、x’,y’は、その特徴点の座標に変化量を足した座標であり、a,b,c,dはパラメータ(parameter)であり、tx,tyは平行移動パラメータである。プロセッサ111Aは、特徴点の座標x,yと変化量を足した座標x’,y’の最小二乗平均(least square means)を算出し、これを最小化するようなパラメータa,b,c,d,tx,tyを大域的最適化により求める。そして、プロセッサ111A変換するべき対象点の座標をx,yとして、これら求めたパラメータを用いて変換後の座標を求める。プロセッサ111Aは、こうして求めたパラメータa,b,c,d,tx,tyを用いて、特徴点から上記アフィン変換により変換した後の座標を求める。
However, x and y in the above equation (2) are coordinates of nearby feature points, x' and y' are coordinates obtained by adding the amount of change to the coordinates of the feature points, and a, b, c, and d is a parameter, and t x , t y are translation parameters. The processor 111A calculates the least square means of the coordinates x, y of the feature points and the coordinates x′, y′ obtained by adding the amount of change, and sets the parameters a, b, c, d, t x , t y are determined by global optimization. Then, the processor 111A uses x and y as the coordinates of the target point to be transformed, and uses the determined parameters to determine the coordinates after transformation. The processor 111A uses the parameters a, b, c, d, t x , and t y thus obtained to obtain the coordinates after the feature point is transformed by the above affine transformation.
プロセッサ111Aは、こうして変換した後の顔画像を変換画像としてデータメモリ112に記憶させる。
The processor 111A stores the converted face image in the data memory 112 as a converted image.
プロセッサ111Aは、画像出力部19として動作して、データメモリ112に記憶された変換画像を出力する(ステップS9)。例えば、プロセッサ111Aは、入出力インタフェース113を介して出力デバイス300に顔画像を表示させる。あるいは、プロセッサ111Aは、通信インタフェース114によりネットワーク上に送信し、ネットワークに接続された表示デバイスに表示させたり、ネットワークに接続された他のコンピュータの表示部に表示させたりする。
The processor 111A operates as the image output unit 19 and outputs the converted image stored in the data memory 112 (step S9). For example, the processor 111A causes the output device 300 to display a facial image via the input/output interface 113. FIG. Alternatively, the processor 111A transmits to the network via the communication interface 114 and displays it on a display device connected to the network, or displays it on a display unit of another computer connected to the network.
プロセッサ111Aは、図6のフローチャートに示す画像変換装置100としての動作を終了するか否か判断する(ステップS10)。例えば、プロセッサ111Aは、入力デバイス200から、入出力インタフェース113を介して、あるいは、通信インタフェース114を介して、ユーザから画像変換の終了を指示されたか否か確認する。ここで、上記動作を終了する場合には(ステップS10のYES)、プロセッサ111Aは、図6のフローチャートに示す動作を終了する。
The processor 111A determines whether or not to end the operation as the image conversion device 100 shown in the flowchart of FIG. 6 (step S10). For example, the processor 111A checks whether or not the user has instructed to end image conversion from the input device 200 via the input/output interface 113 or via the communication interface 114 . Here, when ending the above operation (YES in step S10), the processor 111A ends the operation shown in the flowchart of FIG.
これに対して、未だ上記動作を終了しない場合には(ステップS10のNO)、プロセッサ111Aは、変換表情入力部15として動作して、ユーザによる変換表情の変更指定入力が有ったか否か判断する(ステップS11)。変換表情の変更指定入力が無ければ(ステップS11のNO)、プロセッサ111Aは、ステップS3の処理へ移行する。また、変換表情の変更指定入力が有った場合には(ステップS10のYES)、プロセッサ111Aは、ステップS2の処理へ移行する。
On the other hand, if the above operation has not yet been completed (NO in step S10), the processor 111A operates as the converted facial expression input unit 15 and determines whether or not the user has entered a change designation input for the converted facial expression. (step S11). If there is no change specification input for the converted facial expression (NO in step S11), the processor 111A proceeds to the process of step S3. Also, if there is an input specifying a change in the converted facial expression (YES in step S10), the processor 111A proceeds to the process of step S2.
以上に説明した一実施形態に係る画像変換装置100は、顔角度算出部13と、表示割合算出部14と、変化量補正部17と、表情変換部18とを備える。表情変換部18は、変換するべき変換表情に応じた変形量により特徴点を変換することで人の顔の表情を変換した変換画像を得る。
従って、一実施形態に係る画像変換装置100は、顔の角度が変わったり、顔の一部が隠れたりすることにより、特徴点の認識ができなかったとしても、自然でないタイミングで表情変換が止まることが無くなり、顔画像の表情を自然に変換することができる。 Theimage conversion device 100 according to the embodiment described above includes a face angle calculator 13 , a display ratio calculator 14 , a change amount corrector 17 , and a facial expression converter 18 . A facial expression conversion unit 18 obtains a transformed image in which the facial expression of a person is transformed by transforming the feature points by a deformation amount according to the transformed facial expression to be transformed.
Therefore, theimage conversion apparatus 100 according to one embodiment stops facial expression conversion at unnatural timing even if feature points cannot be recognized due to a change in the angle of the face or a part of the face being hidden. Therefore, the expression of the face image can be converted naturally.
従って、一実施形態に係る画像変換装置100は、顔の角度が変わったり、顔の一部が隠れたりすることにより、特徴点の認識ができなかったとしても、自然でないタイミングで表情変換が止まることが無くなり、顔画像の表情を自然に変換することができる。 The
Therefore, the
[他の実施形態]
なお、この発明は上記一実施形態に限定されるものではない。
例えば、以上で説明した各処理の流れは、説明した手順に限定されるものではなく、いくつかのステップの順序が入れ替えられても良いし、いくつかのステップが同時並行で実施されても良い。 [Other embodiments]
In addition, this invention is not limited to the said one embodiment.
For example, the flow of each process described above is not limited to the procedures described above, and the order of some steps may be changed, and some steps may be performed in parallel. .
なお、この発明は上記一実施形態に限定されるものではない。
例えば、以上で説明した各処理の流れは、説明した手順に限定されるものではなく、いくつかのステップの順序が入れ替えられても良いし、いくつかのステップが同時並行で実施されても良い。 [Other embodiments]
In addition, this invention is not limited to the said one embodiment.
For example, the flow of each process described above is not limited to the procedures described above, and the order of some steps may be changed, and some steps may be performed in parallel. .
また、以上で説明した各処理の流れは、リアルタイムに取得する顔画像の表情をリアルタイムに変換していく場合であったが、リアルタイム処理ではなく、保存された顔画像の表情を変換する用途にも同様に適用できる。
In addition, the flow of each process described above was for the case of converting the expression of a face image acquired in real time, but instead of real-time processing, the expression of a saved face image is converted. can be applied as well.
また、各実施形態に記載された手法は、計算機(コンピュータ)に実行させることができるプログラム(ソフトウエア手段)として、例えば磁気ディスク(フロッピー(登録商標)ディスク(Floppy disk)、ハードディスク(hard disk)等)、光ディスク(optical disc)(CD-ROM、DVD、MO等)、半導体メモリ(ROM、RAM、フラッシュメモリ等)等の記録媒体に格納し、また通信媒体により伝送して頒布され得る。なお、媒体側に格納されるプログラムには、計算機に実行させるソフトウエア手段(実行プログラムのみならずテーブル、データ構造も含む)を計算機内に構成させる設定プログラムをも含む。本装置を実現する計算機は、記録媒体に記録されたプログラムを読み込み、また場合により設定プログラムによりソフトウエア手段を構築し、このソフトウエア手段によって動作が制御されることにより上述した処理を実行する。なお、本明細書でいう記録媒体は、頒布用に限らず、計算機内部あるいはネットワークを介して接続される機器に設けられた磁気ディスク、半導体メモリ等の記憶媒体を含むものである。
In addition, the method described in each embodiment can be applied to a program (software means) that can be executed by a computer (computer), such as a magnetic disk (floppy disk, hard disk). etc.), optical discs (CD-ROM, DVD, MO, etc.), semiconductor memories (ROM, RAM, flash memory, etc.), etc., and can be transmitted and distributed via communication media. The programs stored on the medium also include a setting program for configuring software means (including not only execution programs but also tables and data structures) to be executed by the computer. A computer that realizes this apparatus reads a program recorded on a recording medium, and optionally constructs software means by a setting program, and executes the above-described processing by controlling the operation by this software means. The term "recording medium" as used herein is not limited to those for distribution, and includes storage media such as magnetic disks, semiconductor memories, etc. provided in computers or devices connected via a network.
なお、本発明は、上記実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。また、各実施形態は適宜組み合わせて実施してもよく、その場合組み合わせた効果が得られる。更に、上記実施形態には種々の発明が含まれており、開示される複数の構成要件から選択された組み合わせにより種々の発明が抽出され得る。例えば、実施形態に示される全構成要件からいくつかの構成要件が削除されても、課題が解決でき、効果が得られる場合には、この構成要件が削除された構成が発明として抽出され得る。
It should be noted that the present invention is not limited to the above-described embodiments, and can be variously modified in the implementation stage without departing from the gist of the present invention. Further, each embodiment may be implemented in combination as appropriate, in which case the combined effect can be obtained. Furthermore, various inventions are included in the above embodiments, and various inventions can be extracted by combinations selected from a plurality of disclosed constituent elements. For example, even if some constituent elements are deleted from all the constituent elements shown in the embodiments, if the problem can be solved and effects can be obtained, the configuration with the constituent elements deleted can be extracted as an invention.
100…画像変換装置
11…画像取得部
12…特徴点認識部
13…顔角度算出部
14…表示割合算出部
15…変換表情入力部
16…変化量格納部
17…変化量補正部
18…表情変換部
19…画像出力部
111A…プロセッサ
111B…プログラムメモリ
112…データメモリ
113…入出力インタフェース
114…通信インタフェース
115…バス
200…入力デバイス
300…出力デバイス DESCRIPTION OFSYMBOLS 100... Image conversion apparatus 11... Image acquisition part 12... Feature point recognition part 13... Face angle calculation part 14... Display ratio calculation part 15... Conversion facial expression input part 16... Change amount storage part 17... Change amount correction part 18... Facial expression conversion Unit 19 Image output unit 111A Processor 111B Program memory 112 Data memory 113 Input/output interface 114 Communication interface 115 Bus 200 Input device 300 Output device
11…画像取得部
12…特徴点認識部
13…顔角度算出部
14…表示割合算出部
15…変換表情入力部
16…変化量格納部
17…変化量補正部
18…表情変換部
19…画像出力部
111A…プロセッサ
111B…プログラムメモリ
112…データメモリ
113…入出力インタフェース
114…通信インタフェース
115…バス
200…入力デバイス
300…出力デバイス DESCRIPTION OF
Claims (6)
- 人の顔が含まれる画像から認識された顔パーツの特徴点を認識する特徴点認識部と、
前記画像における顔が正面から認識できなくなる限界の角度に対する、正面からの前記顔の角度の比率と、前記顔の全体の領域に対する前記顔が物体で隠れている領域が除かれた領域の割合の少なくとも一方に基づいて、前記認識された顔の表情を変換するべき変換表情に変換するときの、前記変換表情に応じた前記顔パーツの特徴点のそれぞれについての変形量を表す変化量を補正する変化量補正部と、
前記補正した変化量により前記特徴点を変形することで前記人の顔の表情を変換した変換画像を得る表情変換部と、
を具備する、画像変換装置。 a feature point recognition unit that recognizes feature points of facial parts recognized from an image containing a human face;
The ratio of the angle of the face from the front to the limit angle at which the face cannot be recognized from the front in the image, and the ratio of the area excluding the area where the face is hidden by an object to the entire area of the face Based on at least one of them, when the recognized facial expression is converted into a converted expression to be converted, a change amount representing a deformation amount of each of the feature points of the facial parts corresponding to the converted expression is corrected. a change amount correction unit;
a facial expression transforming unit that transforms the feature points according to the corrected amount of change to obtain a transformed image in which the facial expression of the person is transformed;
An image conversion device comprising: - 前記変化量補正部は、
前記画像における顔が正面から認識できなくなる限界の角度に対する、正面からの前記顔の角度の比率と、前記顔の全体の面積に対する前記顔が物体で隠れている面積を除いた面積の割合の少なくとも一方を、前記顔パーツの特徴点のそれぞれについての所定の変化量に乗じることで、前記変化量を補正する、
請求項1に記載の画像変換装置。 The change amount correction unit
At least the ratio of the angle of the face from the front to the limit angle at which the face cannot be recognized from the front in the image, and the ratio of the area of the face excluding the area where the face is hidden by an object to the total area of the face correcting the amount of change by multiplying one by a predetermined amount of change for each feature point of the face part;
2. The image conversion device according to claim 1. - 前記顔が正面を向いているときの前記顔パーツの特徴点の3次元位置を回転または移動したときの前記顔パーツの特徴点の2次元位置を算出し、前記算出した2次元位置と、現在の前記顔パーツの特徴点の2次元位置との距離の二乗和が最小になる角度を前記正面からの前記顔の角度として算出する、
請求項1に記載の画像変換装置。 calculating a two-dimensional position of the feature point of the face part when the three-dimensional position of the feature point of the face part is rotated or moved when the face is facing forward; calculating as the angle of the face from the front the angle at which the sum of squares of the distances from the two-dimensional positions of the feature points of the face part is minimized;
2. The image conversion device according to claim 1. - 前記変換するべき変換表情ごとに、前記特徴点それぞれについての変形量を表す変化量が予め記憶される記憶装置と、
前記変換するべき変換表情を入力する変換表情入力部と、
を更に具備し、
前記変化量補正部は、
前記入力された変換表情に応じた前記変化量を前記記憶装置から読み出し、この読み出した変化量を補正する、
請求項1乃至3の何れか1項に記載の画像変換装置。 a storage device that stores in advance a change amount representing a deformation amount for each of the feature points for each of the converted facial expressions to be converted;
a conversion expression input unit for inputting the conversion expression to be converted;
further comprising
The change amount correction unit
reading the amount of change corresponding to the input converted facial expression from the storage device and correcting the read amount of change;
4. The image conversion device according to any one of claims 1 to 3. - 人の顔の画像における表情を変換する画像変換装置により行なわれる方法であって、
前記画像変換装置の特徴点認識部により、人の顔が含まれる画像から認識された顔パーツの特徴点を認識することと、
前記画像変換装置の変化量補正部により、前記画像における顔が正面から認識できなくなる限界の角度に対する、正面からの前記顔の角度の比率と、前記顔の全体の領域に対する前記顔が物体で隠れている領域が除かれた領域の割合の少なくとも一方に基づいて、前記認識された顔の表情を変換するべき変換表情に変換するときの、前記変換表情に応じた前記顔パーツの特徴点のそれぞれについての変形量を表す変化量を補正することと、
前記画像変換装置の表情変換部により、前記補正した変化量により前記特徴点を変形することで前記人の顔の表情を変換した変換画像を得ることと、
を具備する画像変換方法。 A method performed by an image transformation device for transforming expressions in an image of a person's face, comprising:
Recognizing feature points of facial parts recognized from an image including a human face by a feature point recognition unit of the image conversion device;
The ratio of the angle of the face from the front to the limit angle at which the face cannot be recognized from the front in the image and the face hidden by an object with respect to the entire area of the face are obtained by the change amount correction unit of the image conversion device. Each of the feature points of the facial parts corresponding to the converted facial expression when the recognized facial expression is converted to the converted facial expression to be converted based on at least one of the proportions of the areas from which the area where the facial expression is removed is correcting the amount of change representing the amount of deformation for
Obtaining a converted image in which the expression of the person's face is converted by deforming the feature points according to the corrected amount of change by the expression conversion unit of the image conversion device;
An image conversion method comprising: - 請求項1乃至4のいずれか1項に記載の画像変換装置の前記各部としてプロセッサを機能させる画像変換処理プログラム。 An image conversion processing program that causes a processor to function as each part of the image conversion device according to any one of claims 1 to 4.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2024502365A JPWO2023162132A1 (en) | 2022-02-25 | 2022-02-25 | |
PCT/JP2022/007870 WO2023162132A1 (en) | 2022-02-25 | 2022-02-25 | Image transformation device, method, and program |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2022/007870 WO2023162132A1 (en) | 2022-02-25 | 2022-02-25 | Image transformation device, method, and program |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023162132A1 true WO2023162132A1 (en) | 2023-08-31 |
Family
ID=87765082
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/007870 WO2023162132A1 (en) | 2022-02-25 | 2022-02-25 | Image transformation device, method, and program |
Country Status (2)
Country | Link |
---|---|
JP (1) | JPWO2023162132A1 (en) |
WO (1) | WO2023162132A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005215763A (en) * | 2004-01-27 | 2005-08-11 | Konica Minolta Photo Imaging Inc | Method, device and program for image processing |
JP2011060038A (en) * | 2009-09-10 | 2011-03-24 | Seiko Epson Corp | Image processing apparatus |
-
2022
- 2022-02-25 WO PCT/JP2022/007870 patent/WO2023162132A1/en active Application Filing
- 2022-02-25 JP JP2024502365A patent/JPWO2023162132A1/ja active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005215763A (en) * | 2004-01-27 | 2005-08-11 | Konica Minolta Photo Imaging Inc | Method, device and program for image processing |
JP2011060038A (en) * | 2009-09-10 | 2011-03-24 | Seiko Epson Corp | Image processing apparatus |
Also Published As
Publication number | Publication date |
---|---|
JPWO2023162132A1 (en) | 2023-08-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10566026B1 (en) | Method for real-time video processing involving changing features of an object in the video | |
US11915514B2 (en) | Method and apparatus for detecting facial key points, computer device, and storage medium | |
JP6798183B2 (en) | Image analyzer, image analysis method and program | |
CN112967236B (en) | Image registration method, device, computer equipment and storage medium | |
WO2015139574A1 (en) | Static object reconstruction method and system | |
CN110969245B (en) | Target detection model training method and device for medical image | |
US9443325B2 (en) | Image processing apparatus, image processing method, and computer program | |
US10977767B2 (en) | Propagation of spot healing edits from one image to multiple images | |
JP5227629B2 (en) | Object detection method, object detection apparatus, and object detection program | |
JP7149124B2 (en) | Image object extraction device and program | |
KR102344373B1 (en) | Apparatus and method for generating feature maps | |
JP2007087345A (en) | Information processing device, control method therefor, computer program, and memory medium | |
JP7064257B2 (en) | Image depth determination method and creature recognition method, circuit, device, storage medium | |
JP2020109626A (en) | Apparatus and method for identifying articulatable part of physical object using multiple 3d point clouds | |
CN112381061A (en) | Facial expression recognition method and system | |
JP2018055199A (en) | Image processing program, image processing device, and image processing method | |
Hu et al. | Towards effective learning for face super-resolution with shape and pose perturbations | |
CN113435367A (en) | Social distance evaluation method and device and storage medium | |
JP2007282906A (en) | Method, apparatus, and program of medical image processing | |
WO2023162132A1 (en) | Image transformation device, method, and program | |
WO2022181253A1 (en) | Joint point detection device, teaching model generation device, joint point detection method, teaching model generation method, and computer-readable recording medium | |
JP2017122993A (en) | Image processor, image processing method and program | |
KR102593247B1 (en) | Geometric calibration method and apparatus of computer tomography | |
WO2023162131A1 (en) | Image converting device, image converting method, and image converting program | |
CN116758205B (en) | Data processing method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22928659 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2024502365 Country of ref document: JP |