WO2020090128A1 - Image processing device, method, and computer program - Google Patents

Image processing device, method, and computer program Download PDF

Info

Publication number
WO2020090128A1
WO2020090128A1 PCT/JP2019/004530 JP2019004530W WO2020090128A1 WO 2020090128 A1 WO2020090128 A1 WO 2020090128A1 JP 2019004530 W JP2019004530 W JP 2019004530W WO 2020090128 A1 WO2020090128 A1 WO 2020090128A1
Authority
WO
WIPO (PCT)
Prior art keywords
image data
moving image
data
still image
face
Prior art date
Application number
PCT/JP2019/004530
Other languages
French (fr)
Japanese (ja)
Inventor
健志 加畑
Original Assignee
有限会社アドリブ
西村 昇
山本 慎也
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 有限会社アドリブ, 西村 昇, 山本 慎也 filed Critical 有限会社アドリブ
Priority to JP2019507886A priority Critical patent/JP6516316B1/en
Publication of WO2020090128A1 publication Critical patent/WO2020090128A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working

Definitions

  • the present invention relates to an image processing technique that can be applied to, for example, a video conference.
  • Video conferencing may be realized using an expensive dedicated device (dedicated system), or a simple general-purpose device (system) such as Skype (trademark) provided by Microsoft (trademark) corporation, and video. In some cases, it is realized using software for sending and receiving. Whether it is realized by a dedicated device or a general-purpose device, the general principle of the video conference remains unchanged. For example, in a one-to-one videoconference, both participants prepare a computer connected to the network.
  • a display and a camera are connected to each of these computers.
  • the camera is a digital camera capable of capturing moving images, and captures the participants of the video conference.
  • the moving image data of the moving image in which the face of one participant is reflected by one camera is sent to the other computer via the one computer and the network.
  • a moving image in which the face of one participant is reflected is displayed on the other display connected to the other computer.
  • the other participant can thereby see the face of one participant.
  • voice and text can also be exchanged between two computers (or both participants), and at least one of them is usually required, but since exchange of voice and text is unrelated to this application, The description of is basically omitted hereafter.
  • the line of sight or face of one participant displayed in the moving image displayed on the other display in front of the other participant receiving the moving image data generated by one camera is It seems that the other participant does not face the direction of the other participant and is looking downward.
  • the phenomenon described above, in which one participant's line of sight or face is projected in the moving image displayed on the other display in front of the other participant, is one It occurs not only when it is above the widthwise center of the display, but also where one camera is anywhere around one display. However, depending on the position where one of the cameras is arranged, the line of sight or face of one participant displayed in the moving image displayed on the other display in front of the other participant is different.
  • stereo imaging is performed using two cameras, or, if imaging is performed by one camera, imaging is performed by that camera.
  • a large number of still images forming the moving image data to be recorded must include depth data.
  • depth data are not common as cameras, and the technology that compels the user to prepare such uncommon hardware is extremely difficult to spread.
  • modern laptop personal computers, smartphones, tablets, and other computers have built-in cameras, and webcams and other cameras used in combination with desktop personal computers have also become widespread. ..
  • moving image data including depth data cannot be created, and at least it is not suitable for practical use or widespread unless it is a technology that can be applied to such widespread cameras.
  • the invention of the present application which can be used mainly in combination with a general camera in a video conference system, can reduce a sense of discomfort felt about a direction of a face or a line of sight in a moving image displayed on a display in front, is inexpensive, and It is an object of the present invention to provide a technology that does not easily cause delay.
  • the line of sight or face of one participant displayed in the moving image displayed on the other display in front of the other participant receiving the moving image data generated by one camera is Face the direction of the participants.
  • one display is usually not entirely transparent, so one camera would be placed somewhere around one display.
  • moving image data created by one of the cameras is captured by a virtual camera in which a moving image based on the moving image data exists at a virtual position behind the display (including the inside of the display, the same applies below). It is possible, at least in theory, to correct as described. Since the face image of the face of one participant included in the video based on the moving image data corrected in such a manner basically faces the front, the participation of the other displayed on the other display. It is possible to suppress a feeling of strangeness given to a person. The present invention is based on such knowledge.
  • the present invention is capable of capturing a moving image, and can be obtained by capturing a target face, which is the face of one image-captured person, with a predetermined one camera existing at a predetermined actual position.
  • a moving image data receiving unit that receives moving image data that is data of a moving image composed of a large number of continuous still image data that is data about a two-dimensional still image; and still image data included in the moving image data. At least a plurality of each of the two-dimensional image taken by the camera when the camera is present at a virtual position that is a predetermined position on a virtual straight line extending in the front direction from the target face facing front. Converted still image data, which is a still image, is converted into converted still image data.
  • a conversion moving image data generation unit that generates conversion moving image data that is image data, and a moving image data output unit that outputs the conversion moving image data generated by the conversion moving image data generation unit are provided.
  • An image processing device Then, the converted moving image data generation unit in this image processing apparatus reflects the at least a plurality of still image data included in the moving image data on the still image specified by the still image data.
  • a three-dimensional model generation unit that generates a three-dimensional model of the face portion of the target face using a conversion algorithm that estimates the three-dimensional model of the face obtained by machine learning of a large number of faces, and the three-dimensional model.
  • Each of the three-dimensional model rotating unit that performs a process of rotating the plurality of three-dimensional models generated by the generating unit by a rotation angle that is a constant angle, and the three-dimensional model that is rotated by the three-dimensional model rotating unit.
  • a two-dimensional image generation unit that generates the converted still image data based on the above.
  • the number of cameras in the present invention is one.
  • the camera according to the present invention is a general camera, and the still image data forming the moving image data does not include depth data.
  • the camera may be integrated with the image processing apparatus or may be a separate body.
  • the image processing device is configured by the computer described in the background art (for example, a desktop computer without a camera)
  • the camera is separate from the image processing device.
  • the camera in that case may be, for example, a known or known webcam itself.
  • the camera in this case, which is separate from the computer as the image processing apparatus is connected to the computer as the image processing apparatus by wire or wirelessly.
  • many known or well-known laptop personal computers and computers such as smartphones and tablets have an integrated camera.
  • the camera is included in the image processing apparatus.
  • the part of the computer excluding the camera is the image processing device according to the present invention.
  • the part of the web camera excluding the camera is the image processing device according to the present invention.
  • the camera exists in a real position which is a predetermined position.
  • the actual position is generally a predetermined location around the display, for example, if the display is connected to a computer that is an image processing apparatus.
  • the camera is generally attached at a predetermined position above the display integrated with the computer. However, that position is the actual position of the camera in that case. If the image processing apparatus of the present invention has a web camera-like appearance, the position where it is attached is the actual position of the camera. In any case, the camera in the actual position captures the target face, which is the face of one image-captured person. The camera can capture a moving image and generates moving image data for the moving image.
  • the moving image data generated by the camera is general data, for example, MJPEG data.
  • the moving image data in the invention of the present application is data of a moving image composed of a large number of continuous still image data which is data about a two-dimensional still image, and this is very general moving image data.
  • the image processing apparatus includes a moving image data receiving unit that receives moving image data generated by the camera from the camera.
  • the moving image data reception unit can generally realize wired connection with the camera. It will be the input terminal provided on the device.
  • the moving image data accepting unit can generally realize wireless communication with the camera. It will be the receiving device provided in the device.
  • the moving image data receiving unit When the image processing device and the camera are integrated, the moving image data receiving unit will generally be an interface provided in the image processing device for realizing connection with the camera.
  • the image processing apparatus includes a conversion moving image data generation unit.
  • the converted moving image data generation unit converts at least a plurality of still image data included in the moving image data into converted still image data.
  • the moving image data received from the camera by the image processing apparatus or the still image data included in the moving image data is generated by the camera in the actual position, and the moving image or the still image based on them is copied from the actual position.
  • the target face is included.
  • the converted still image data is generated based on the still image data or by converting the still image data, and the target face to the front when the user faces the front (the user takes a natural posture).
  • It is data of a converted still image which is a two-dimensional still image captured by the camera when the camera exists at a virtual position which is a predetermined position on a virtual straight line extending in the direction. That is, the target face included in the converted still image is the target face when the image is taken from the virtual position that is the front of the user's face.
  • the virtual position of the camera is fixed and the relative positional relationship between the actual position and the virtual position of the camera is constant, the still image data received from the camera by the image processing device is converted into still image data.
  • the process to be performed is basically the same for all still image data that are the target of such conversion. Therefore, the process of converting still image data into converted still image data is "lighter" than in the case of individually performing different processes, and thus the conversion of such data is less likely to cause a delay in a moving image.
  • the converted moving image data is a series of converted still image data generated one after another by the converted moving image data generation unit.
  • the still image data is data of a still image (so-called frame) forming a moving image.
  • the image processing apparatus may generate the converted still image data from all the still image data received from the camera, but doing so may cause a delay in the moving image.
  • the still image data to be converted to the converted still image data is set to, for example, every two or more of the still image data included in the moving image data. Every three (every two frames or every three frames) still image data can be used. Then, the number of frames or frame rate of the converted moving image data (the number of converted still image data included in the converted moving image data per second) is included in the number of frames of the moving image data (included in the moving image data per second). Although it is smaller than the number of still image data), if the number of frames of the converted moving image data is at least about 10 fps, the moving image based on the converted moving image data can be used as a moving image.
  • the image processing device includes a moving image data output unit.
  • the moving image data output unit has a function of outputting the converted moving image data generated by the converted moving image data generation unit.
  • the converted moving image data is output from the image processing device to another device, for example.
  • the other device as the output destination of the converted moving image data is a device (such as a display) directly connected to the image processing device by wire or wirelessly, or via the image processing device and the network (or the network and other device).
  • Device (a display connected to another image processing device). If the image processing device includes a display, the output destination of the converted moving image data may be the display included in the image processing device.
  • the web camera as the image processing device is used similarly to the web camera in the conventional video conference system.
  • the moving image data input to the computer in the video conference system can be used as the converted moving image data from the beginning.
  • the target face in the converted still image based on each of the converted still image data included in the converted moving image data generated as described above is captured by the camera at the virtual position in front of the target face.
  • the orientation is the same as the orientation of the target face. Therefore, when a moving image based on the converted moving image data generated by the above-described image processing apparatus is displayed on some display, the target face displayed on the display is basically in a front facing state.
  • the application of the image processing device of the present application is not limited to a video conference, but when it is applied to a video conference, for example, a moving image based on the converted moving image data generated by the image processing device described above is used.
  • a moving image based on the converted moving image data generated by the image processing device described above is used.
  • the target face reflected in the moving image based on the converted moving image data obtained by the present invention includes the line of sight when the target face has a natural posture when the target face is displayed on the display.
  • the target face displayed on the display also rotates or the line of sight moves accordingly.
  • the converted moving image which is a moving image captured when the camera is located at the virtual position
  • an image showing the target face based on the movement of the target face or the movement of the line of sight is displayed. Compared with the case of converting, the target face displayed on the display is not unnatural.
  • the image processing device is configured to be communicable via a predetermined network and used in pairs, and the converted moving image data generated by one of the image processing devices is transmitted via the network. It may be configured to be bidirectionally sent to the other of the image processing devices via the image processing device. By doing so, a video conference similar to the conventional one can be realized.
  • the application of the image processing device in the present invention is not limited to the video conference system. For example, when you watch a video of your face taken by selfie (selfie) on the display of your own smartphone, tablet, desktop or laptop computer, the direction of your face is It is also known that there is a sense of discomfort because the person is not facing or his or her line of sight is not facing the front. Such a problem can also be solved by the image processing device according to the present invention. In this case, naturally, the converted moving image data created from the moving image data by the image processing device does not need to be sent to a computer or the like owned by another person.
  • the image processing device also includes the converted moving image data generation unit. Then, as described above, the converted moving image data generation unit converts the at least a plurality of still image data included in the moving image data into the still image specified by the still image data.
  • a three-dimensional model generation unit that generates a three-dimensional model of the face part of the target face reflected by using a conversion algorithm that estimates a three-dimensional model of the face obtained by machine learning of a large number of faces
  • a three-dimensional model rotation unit that performs a process of rotating each of the plurality of three-dimensional models generated by the three-dimensional model generation unit by a rotation angle that is a constant angle, and the three-dimensional model rotated by the three-dimensional model rotation unit.
  • a two-dimensional image generation unit that generates the converted still image data based on each of the models.
  • the three-dimensional model generation unit from at least a plurality of each of the still image data included in the moving image data, a three-dimensional model of the face part of the target face reflected in the still image specified by the still image data.
  • the three-dimensional model and its generation method are performed using a conversion algorithm that estimates a three-dimensional model of a face obtained by machine learning of many faces.
  • a three-dimensional model of the facial part of a face reflected in a still image from one general two-dimensional still image (in other words, from the data of a single face photograph) is automatically generated.
  • the technology to make was developed.
  • a large number of two-dimensional still images including human faces, which are generated by imaging various human faces from various angles, are sample-machined by a computer.
  • a transformation algorithm which is an algorithm for generating a three-dimensional model of a human face from a still image, is used.
  • the conversion algorithm is used to automatically generate a three-dimensional model of the face portion of the target face reflected in the still image specified by the still image data.
  • the face portion means a portion of the human head, which is generally in front of the ears and below the forehead.
  • the above-mentioned technology developed in recent years is an interesting technology in the world, which automatically creates a three-dimensional model of the face part reflected in the still image from one general two-dimensional still image in which the face is reflected. Is recognized. However, although this technique has been recognized as interesting, it has few practical uses so far.
  • the present invention proposes a practical application of such a technique.
  • the conversion algorithm described above is for generating a three-dimensional model of at least the face part of the target face, and the two-dimensional still image that is the source used when generating the three-dimensional model is captured by a stereo camera. Data need not be included and depth data need not be included. That is, the camera used in combination with the image processing apparatus of the present invention may be a general camera.
  • the three-dimensional model may be any one as long as it is created by the above method, and is, for example, a wire frame model.
  • the three-dimensional model generation unit generates a three-dimensional model based on at least a plurality of still image data forming the moving image data. This "at least a plurality of still image data" is still image data that is the target of the above-mentioned conversion.
  • the three-dimensional model rotation unit performs a process of rotating a plurality of three-dimensional models generated by the three-dimensional model generation unit by a rotation angle that is a constant angle. This corresponds to the process of directing the face orientation specified by the three-dimensional model toward the camera at the virtual position.
  • the two-dimensional image generation unit generates converted still image data based on each of the three-dimensional models rotated by the three-dimensional model rotation unit. That is, the two-dimensional image generation unit generates converted still image data about the converted still image by creating data about the two-dimensional still image again from the three-dimensional model.
  • the angle (including, of course, the direction of rotation) when the 3D model is rotated by the 3D model rotation unit depends on which static image data is obtained because the relative positional relationship between the real position and the virtual position of the camera is constant. It is also constant in the processing performed based on.
  • the processing performed by the three-dimensional model generation unit, the three-dimensional model rotation unit, and the two-dimensional image generation unit for each still image data that is the target of image processing is performed based on which still image data. It is the same even when it is said. This is also one of the reasons why the problem of video delay is unlikely to occur.
  • the three-dimensional model generation unit extracts the facial part of the target face reflected in the still image specified by the still image data to generate the three-dimensional model, and the three-dimensional model of the still image.
  • Background image data which is data about a two-dimensional still image of a portion other than the face portion of the target face, is generated, and the two-dimensional image generation unit is rotated by the three-dimensional model rotation unit. Even if the converted still image data is generated by pasting face image data, which is data obtained by converting the three-dimensional model into two-dimensional data, to the face portion of the target face in the background image data. Good.
  • the three-dimensional model generation unit recognizes the face part of the target face reflected in the still image, extracts the part to generate a three-dimensional model, and other parts (for example, ears and hair of the target face, Alternatively, the background behind the owner of the target face) is left as it is as a two-dimensional still image.
  • the three-dimensional model rotation unit rotates the three-dimensional model
  • the two-dimensional image generation unit converts the three-dimensional model rotated by the three-dimensional model rotation unit into a two-dimensional image, the image
  • the still image is pasted to the extracted portion of the target face of the target face.
  • the still image after the face portion of the target face generated by the three-dimensional model generation unit is not necessarily exactly matched. It suggests that some unnaturalness may occur in the target face included in the still image specified by the converted still image data.
  • the discomfort felt by a person who sees a moving image based on the converted moving image data in which the converted still image data is connected is in a direction in which the target face in the moving image is not oriented. It was much smaller than it was. This is because the mechanism is unknown in detail, but when a person recognizes a face, the brain mainly recognizes the human eye, which is the target of recognition. If it is suitable, it is considered that it does not recognize other unnaturalness.
  • the effect of the present invention is sufficient even if the method of generating a converted still image as described above is adopted. At least when the angle of rotation of the target face is within 15 degrees or less, the discomfort felt by the person who views the moving image based on the converted moving image data is so small as not to be a practical problem.
  • the three-dimensional model generation unit performs two-dimensional predetermined image processing on a still image of a portion of the still image other than the face portion of the target face, and then the background of the still image.
  • the two-dimensional image processing means image processing that does not involve three-dimensional modeling of a subject in a still image. For example, when the three-dimensional model of the face portion of the target face is rotated, the apparent length in the vertical direction, for example, may change. In response to such an apparent length change, the three-dimensional model generation unit performs a vertical length change (enlargement or reduction) process on a still image of a part other than the face part of the target face. It can be performed.
  • Examples of two-dimensional image processing may include image scaling in one direction as described above, image scaling in two directions, rotation, and the like. By doing so, it is possible to further reduce the above-mentioned unnaturalness which is hardly recognized by the brain and which may occur in the target face in the converted still image. However, it is not essential to add such processing to a still image of a portion of the still image other than the face portion of the target face.
  • the three-dimensional model rotating unit may rotate the three-dimensional model about a predetermined point. As described above, the 3D model rotating unit rotates the 3D model.
  • the process for rotating the three-dimensional model includes a certain axis of the three-dimensional model (for example, a horizontal straight line that penetrates both ears, or a straight line that vertically penetrates the center of the skull when viewed in plan view, or both of these straight lines). Can be the axis.)
  • a process of rotating around can be effectively roll, yaw, and pitch rotation processes.
  • the rotation of the three-dimensional model around a certain point can be executed by transforming the spatial coordinates, and can be regarded as the rotation of the space itself in which the three-dimensional model exists.
  • the predetermined point can be, for example, the lens position of one camera.
  • the lens position of the camera is set as the predetermined point to determine the position of the predetermined point. Makes it easy to decide. Regardless of whether the predetermined point is the lens position of the camera, if the predetermined point is the origin in the virtual space where the three-dimensional model exists, the calculation of the spatial coordinates becomes easy.
  • the three-dimensional model rotation unit included in the image processing apparatus of the present invention performs a process of rotating the plurality of three-dimensional models generated by the three-dimensional model generation unit by rotation angles that are constant angles.
  • the fixed rotation angle for rotating the three-dimensional model can be determined as follows. First, the rotation angle may be determined in advance. In that case, the rotation angle is recorded in the image processing apparatus. The rotation angle is determined by the relative positional relationship between the actual position and the virtual position of the camera. If the image processing apparatus is, for example, a laptop personal computer, a smartphone, or a tablet, and the camera is detachably attached to the housing, the actual position of the camera is relative to the image processing apparatus. It is fixed to.
  • the virtual position of the camera is determined to be an appropriate position such as behind a display included in a laptop personal computer, smartphone, or tablet
  • the actual position and virtual position of the camera can be uniquely determined. become.
  • a laptop personal computer, a smartphone, or a tablet as an image processing apparatus is usually used in a state where the user separates the display from the face.
  • the rotation angle can be determined in advance in consideration of whether to use it.
  • a computer program for causing a computer such as a laptop personal computer, a smart phone, or a tablet to function as the image processing apparatus of the present invention has a virtual position (or a real position and a virtual position) of a camera in each of various computers.
  • the computer has data (that is, a large number of pairs of models and virtual positions of the camera) about the above-mentioned data that specifies the rotation angle that can be grasped from the relationship with the position).
  • the computer model is automatically identified by the function of the computer program after the computer program is installed on the computer, or the user identifies the computer model after the computer program is installed on the computer. It may have a function that allows the input.
  • the rotation angle suitable for the image processing apparatus is automatically determined from the relationship between the model and the virtual position. It becomes possible to On the other hand, even when the image processing device is configured by a desktop computer, or when the image processing device is integrated with the camera and has the same appearance as a webcam, the position of the camera (the camera If the actual position) is determined at least to some extent, the relative positional relationship between the actual position of the camera and the virtual position set behind the display, for example, will be uniquely determined.
  • the relative position between the actual position and the virtual position of the camera is determined.
  • the relationship is uniquely determined.
  • the rotation It is possible to predetermine the corners.
  • the instruction "How many cm above the center of the display in the vertical and width directions should the camera be placed and how many cm away from the virtual position of the camera should the target face be used?"
  • the target face in the moving image based on the converted moving image data generated by the image processing device is better if the means for notifying the user is determined and the rotation angle is determined in advance as the virtual position. The effect of correctly facing the front can be obtained more accurately.
  • the rotation angle may not be determined in advance and may be determined by the image processing device when the image processing device is used.
  • the image processing device may be configured to determine the rotation angle before starting the generation of the converted moving image data.
  • the image processing apparatus may be configured to determine the rotation angle by performing a predetermined calculation based on the moving image data received by the moving image data receiving unit.
  • the image processing device is adapted to receive moving image data from a camera.
  • the image processing apparatus can generate a three-dimensional model from moving image data by the three-dimensional model generation unit. Therefore, it is determined by calculation how much the 3D model can be rotated to face the target face of the user facing the camera at the virtual position in the still image based on the converted still image data. can do.
  • the image processing apparatus also includes an input device reception unit for receiving data about the parameter from an input device for inputting a predetermined parameter necessary for determining the rotation angle, and the rotation angle Alternatively, it may be determined by performing a predetermined calculation based on the data about the parameter accepted by the input device acceptance unit.
  • a computer that constitutes an image processing apparatus is connected to an input device (for example, a keyboard, a mouse, a touch panel) or is provided as an integrated device. You can enter. It is the present invention that the rotation angle is determined by calculation based on the parameter input from such an input device.
  • the parameters are, for example, information that specifies the shape and size of the display, information that specifies where the actual position of the camera is (for example, immediately above the display in the center of the width direction of the display, the upper right corner of the display), and the target from the display
  • the information is information specifying the distance to the face.
  • the image processing device also includes a sensor reception unit that receives data about the parameter from a sensor that detects a predetermined parameter required to determine the rotation angle, and the rotation angle is set by the sensor reception unit. It may be determined by performing a predetermined calculation based on the received data about the parameter.
  • the sensor is a known or well-known distance measuring device that is connected to the image processing device and is provided at either end of the display in the width direction.
  • the present invention is to determine an appropriate rotation angle by using a parameter (for example, the distance from the display to the target face) obtained by the distance measuring device.
  • the parameter to be measured by the sensor is not limited to the distance.
  • the sensor may measure a parameter useful for obtaining a relative positional relationship between the real position and the virtual position of the camera and a relationship between the virtual position of the camera and the target face.
  • the moving image data output unit in the image processing device may be connected to a predetermined display that displays a moving image based on the converted moving image data.
  • the image processing apparatus in this case includes a rotation angle change data reception unit that receives rotation angle change data that is data for changing the rotation angle, and the three-dimensional model rotation unit receives the rotation angle change data.
  • the rotation angle for rotating the three-dimensional model is changed based on the rotation angle change data received by the rotation angle change data receiving unit.
  • a moving image based on the converted moving image data is displayed on the display in substantially real time.
  • the user inputs the rotation angle change data while looking at his / her face (target face) displayed on the display, and for example, by rotating the target face little by little, the target face displayed on the display is changed to the target face. Basically it can be adjusted to face the front.
  • the angle at which the three-dimensional model is rotated when the target face displayed on the display basically faces the front is determined as the rotation angle.
  • the rotation direction of the three-dimensional model is not limited to this, but may be only the vertical direction (around the X axis) and the horizontal direction (around the Y axis).
  • the user can input the rotation angle change data using the input device as described above. Note that, of course, the above-mentioned four ideas for determining the rotation angle when the rotation angle is not determined in advance can be used in combination as required.
  • the moving image data receiving unit may directly receive the moving image data from the camera (eg, without passing through another device or device).
  • the moving image data receiving unit may receive the moving image data from the camera via a predetermined network.
  • the image processing device uses a so-called cloud computing technique. That is, for example, a computer near the user receives the moving image data from the camera and sends it to the image processing device at a remote place via a network (for example, the Internet).
  • the converted moving image data generated by performing the image processing as already described in the image processing device is returned from the image processing device to the user's computer via the network.
  • a computer near the user can use the converted moving image data received from the image processing apparatus as moving image data received from the camera.
  • the computer can send the converted moving image data to a computer on the other end of the video conference via a network.
  • the image processing apparatus is configured by using the technology of cloud computing, the computer used by the user is not required to have high specifications regarding image processing.
  • the converted moving image data generated by converting the moving image data received from the computer of one participant via the network.
  • the destination to which the image processing apparatus transmits is not the computer of one participant but the computer of the other participant.
  • the inventor of the present application also proposes a method executed by an image processing apparatus as one aspect of the present invention.
  • the effect of this method is equal to the effect of the image processing apparatus according to the present invention.
  • the method is capable of capturing a moving image, and capturing a target face, which is the face of one image-captured person, with a predetermined one camera existing at a predetermined actual position.
  • a computer including a moving image data receiving unit that receives moving image data that is data of a moving image that is composed of a large number of continuous still image data that is data about a two-dimensional still image obtained by Is the way.
  • the method at least a plurality of each of the still image data included in the moving image data is placed at a virtual position which is a predetermined position on a virtual straight line extending in the front direction from the target face facing the front.
  • the moving image formed by a number of continuous converted still image data by converting into converted still image data which is the data of the converted still image which is a two-dimensional still image captured by the camera when A converted moving image data generating step of generating converted moving image data that is data, and a moving image data output step of outputting the converted moving image data generated by the converted moving image data generating step, the converted moving image In the image data generation process, from at least a plurality of still image data included in the moving image data, the A three-dimensional model of the face part of the target face reflected in the still image specified by the still image data, using a conversion algorithm for estimating the three-dimensional model of the face obtained by machine learning of many faces And a three-dimensional model rotation process for performing a process of rotating the three-dimensional models of the target face generated in
  • the present inventor also proposes, as one aspect of the present invention, a computer program for causing a predetermined, for example, general-purpose computer to function as an image processing apparatus.
  • a computer program for causing a predetermined, for example, general-purpose computer to function as an image processing apparatus.
  • the effect of such a computer program is equal to the effect of the image processing apparatus according to the present invention, and it is also an effect that a predetermined computer can be made to function as the image processing apparatus according to the present application.
  • the computer program which is an example, is capable of capturing a moving image, and captures a target face, which is the face of one image-captured person, with a predetermined one camera existing at a predetermined actual position.
  • a computer provided with a moving image data receiving unit that receives moving image data that is data of a moving image configured by a large number of continuous still image data that is data about a two-dimensional still image obtained by In the case where the camera is present at a virtual position which is a predetermined position on a virtual straight line extending in the front direction from the target face facing the front, at least a plurality of each of the still image data included in the moving image data
  • a converted moving image data generation process for generating converted moving image data that is moving image data composed of a large number of continuous converted still image data; and the converted moving image data generated by the converted moving image data generation process.
  • the still image data is output from each of at least a plurality of still image data included in the moving image data.
  • the three-dimensional model generation process and the plurality of three-dimensional models generated in the three-dimensional model generation process are each performed at a constant angle.
  • a three-dimensional model rotation process for performing a process of rotating the three-dimensional model by a rotation angle and a two-dimensional image generation process for generating the converted still image data based on each of the three-dimensional model rotated in the three-dimensional model rotation process.
  • a computer program that causes the computer to execute.
  • FIG. 2 is a perspective view showing an appearance of a communication system of the video conference system shown in FIG. 1.
  • FIG. 3 is a diagram showing a hardware configuration of the computer device shown in FIG. 2.
  • FIG. 3 is a block diagram showing functional blocks generated inside the computer device shown in FIG. 2.
  • FIG. 5 is a block diagram showing an example of functional blocks generated inside the image processing unit shown in FIG. 4.
  • FIG. 5 is a block diagram showing an example of functional blocks generated inside the image processing unit shown in FIG. 4.
  • FIG. 5 is a block diagram showing an example of functional blocks generated inside the image processing unit shown in FIG. 4.
  • FIG. 5 is a block diagram showing an example of functional blocks generated inside the image processing unit shown in FIG. 4.
  • FIG. 5 is a block diagram showing an example of functional blocks generated inside the image processing unit shown in FIG. 4.
  • FIG. 1 schematically shows the overall configuration of a preferred embodiment of a system including an image processing device of the present invention.
  • the system according to the first embodiment is a video conference system.
  • the video conference system includes a first communication system 10-1 and a second communication system 10-2. All of these are connectable to the network 400.
  • Network 400 is, but is not limited to, the Internet in this embodiment.
  • the first communication system 10-1 in this embodiment is used by one user who participates in a video conference
  • the second communication system 10-2 is used by the other user who participates in a video conference. is there.
  • the first communication system 10-1 and the second communication system 10-2 have substantially the same configuration in relation to the invention of the present application and have the same functions and effects. Therefore, both are collectively described below. In some cases, the description will be given by calling it 10.
  • FIG. 2 which is a perspective view showing the external appearance of the communication system 10
  • the communication system 10 in this embodiment includes a computer device 100 as an image processing device, a display 101, and a camera 210.
  • the computer device 100, the display 101, and the camera 210 in this embodiment are all separate bodies, although not limited thereto.
  • the computer device 100 in this embodiment is configured by a general-purpose computer.
  • the computer device 100 may be a commercially available product. More specifically, the computer device 100 in this embodiment is a known or well-known desktop personal computer.
  • the computer device 100 is capable of communication via the network 400.
  • the counterpart of the communication performed by the computer apparatus 100 via the network 400 includes at least the computer apparatus 100 included in the communication system 10 paired with the communication system 10 including the computer apparatus 100.
  • the above-described display 101 is connected to the computer device 100.
  • the display 101 is for displaying a still image or a moving image, and a known or known one can be used.
  • the computer device 100 in this embodiment is required to be able to display a moving image.
  • the display 101 may be a commercially available product and may be publicly known or publicly known, and is, for example, a liquid crystal display.
  • the display 101 in this embodiment is connected to the computer apparatus 100 by a cable through a cable, but may be wirelessly connected to the computer apparatus 100.
  • the technique used for connecting the computer device 100 and the display 101 may be publicly known or well known.
  • the computer device 100 also includes an input device 102.
  • the input device 102 is used by the user to make a desired input to the computer device 100.
  • a known or well-known input device 102 can be used.
  • the input device 102 of the computer device 100 in this embodiment is a keyboard, the input device 102 is not limited to this, and a well-known or well-known voice input using a numeric keypad, a trackball, a mouse, or a microphone terminal can be used. It is also possible to use.
  • the display 101 is a touch panel, the display 101 also functions as the input device 102.
  • One of the cameras 210 described above is connected to the computer device 100.
  • the camera 210 is a digital camera capable of capturing a moving image, and is capable of outputting moving image data that is data regarding the captured moving image.
  • the moving image data generated by the camera 210 is composed of a large number of continuous still image data which are data about a two-dimensional still image.
  • the camera 210 having such a function is publicly known or well known, and is commercially available.
  • the still image data is, for example, MJPEG format data, and the still image data does not include depth data.
  • the camera 210 in this embodiment may be such, and for example, a commercially available webcam can be used as the camera 210 in this embodiment.
  • the camera 210 outputs moving image data to the computer device 100. To enable this, the camera 210 is connected to the computer device 100, for example, by wire. Such connection may be wireless.
  • the technique used for connecting the computer device 100 and the camera 210 may be publicly known or well known.
  • the camera 210 is fixedly arranged at a predetermined position.
  • the predetermined position may basically be anywhere, but is a position where the target face, which is the face of the user who uses the communication system 10 shown in FIG. 2, is reflected in the moving image captured by the camera 210.
  • the camera 210 is fixed to the upper side of the display 101 at approximately the center in the width direction of the display 101.
  • the actual position of the camera 210 shown in FIG. 2 is the actual position of the camera in the present invention.
  • the hardware configuration of the computer device 100 is shown in FIG.
  • the hardware includes a CPU (central processing unit) 111, a ROM (read only memory) 112, a RAM (random access memory) 113, and an interface 114, which are interconnected by a bus 116.
  • the CPU 111 is a computing device that performs computation.
  • the CPU 111 executes the processing described below by executing a computer program recorded in the ROM 112 or the RAM 113, for example.
  • the hardware may include an HDD (hard disk drive) or other large-capacity recording device, and the computer program described above may be recorded in the large-capacity recording device.
  • the computer program mentioned here includes at least a computer program for causing the computer apparatus 100 to execute a process, which will be described later, for generating converted moving image data by converting moving image data.
  • This computer program may be pre-installed in the computer device 100 or may be installed afterwards.
  • the computer program may be installed in the computer device 100 via a predetermined recording medium (not shown) such as a memory card, or via a network such as a LAN or the Internet.
  • the ROM 112 stores computer programs and data necessary for the CPU 111 to execute the processing described below.
  • the computer program recorded in the ROM 112 is not limited to this, and may include other programs such as an OS, a web browser for browsing a home page via the Internet, and a mailer for handling electronic mail. is there.
  • the RAM 113 provides a work area necessary for the CPU 111 to perform processing. In some cases, (at least a part of) the computer program and data described above may be recorded.
  • the interface 114 is for exchanging data between the CPU 111, the RAM 113, etc. connected by the bus 116 and the outside.
  • the above-described display 101, input device 102, and camera 210 are connected to the interface 114.
  • the operation content input from the input device 102 is input to the bus 116 from the interface 114.
  • the moving image data sent from the camera 210 is also input to the bus 116 from the interface 114. Further, as is well known, data for displaying an image on the display 101 is sent from the bus 116 to the interface 114 and output from the interface 114 to the display 101.
  • the interface 114 is also connected to a transmission / reception mechanism (not shown) that is a known means for communicating with the outside via the network 400 that is the Internet. It is possible to send data via the network and receive data via the network 400.
  • the data transmission / reception via the network 400 may be performed by wire or wirelessly.
  • the configuration of the transmission / reception mechanism may be publicly known or well known.
  • the data received by the transmission / reception mechanism from the network 400 is adapted to be received by the interface 114, and the data passed to the transmission / reception mechanism from the interface 114 is transmitted by the transmission / reception mechanism to the outside via the network 400, for example, this embodiment. In this connection, it is sent to the computer device 100 included in the communication system 10 of the other party.
  • the functional blocks shown in FIG. 4 are generated inside the computer device 100.
  • the following functional blocks may be generated by the functions of the above-mentioned computer program alone for causing the computer apparatus 100 to perform the processing described below, but the above-described computer program and the computer apparatus 100 are installed. It may be generated in cooperation with the generated OS or other computer program.
  • An input unit 121, a main control unit 122, an image processing unit 123, and an output unit 125 are generated in the computer device 100 in relation to the functions of the present invention.
  • the input unit 121 receives an input from the interface 114.
  • Input from the interface 114 to the input unit 121 includes input from the input device 102.
  • the input from the input device 102 includes, for example, designation data and start data.
  • the input data such as the designated data and the start data is input from the input device 102
  • all the data from the input device 102 are sent from the input unit 121 to the main control unit 122.
  • the data input from the interface 114 to the input unit 121 also includes data sent from the computer device 100 included in the communication system 10 that is a counterpart of the video conference and received by the transmission / reception mechanism. Such data is, for example, converted moving image data described later.
  • the input unit 121 When the converted moving image data is received by the input unit 121 via the transmission / reception mechanism and the interface 114, the input unit 121 sends them to the main control unit 122.
  • the data input from the interface 114 to the input unit 121 also includes moving image data sent from the camera 210.
  • the input unit 121 sends it to the main control unit 122.
  • the main controller 122 controls the entire functional blocks generated in the computer device 100.
  • the main control unit 122 controls communication between the communication systems 10 for realizing a video conference.
  • the main control unit 122 may receive designated data and start data from the input unit 121. When receiving the designated data and the start data, the main control unit 122 is configured to execute the processes described below.
  • the main control unit 122 which receives the designated data, sends it to the output unit 125.
  • the main control unit 122 may receive, from the input unit 121, the converted moving image data transmitted from the computer device 100 included in the communication system 10 that is the other party of the video conference and received by the transmission / reception mechanism. Upon receiving this, the main control unit 122 sends the converted moving image data to the output unit 125.
  • the main control unit 122 may receive the moving image data sent from the camera 210 from the input unit 121.
  • the main control unit 122 which has received this, sends the moving image data to the image processing unit 123 when the conditions described later are satisfied.
  • the image processing unit 123 performs image processing.
  • the image processing unit 123 may receive the moving image data from the main control unit 122 as described above. When the moving image data is received, the image processing unit 123 performs image processing on the moving image data and converts the moving image data into converted moving image data.
  • the moving image data is composed of a large number of continuous still image data which are data about a two-dimensional still image. Then, the target face is reflected in the still image based on each still image data. The image processing unit 123 converts such moving image data into converted moving image data.
  • the image processing unit 123 converts a plurality of still image data included in the moving image data into converted still image data, and The converted still image data is made continuous to form converted moving image data. That is, the converted moving image data is a series of converted still image data.
  • the converted still image data is data of a converted still image that is a two-dimensional still image.
  • the converted moving image data is general moving image data, for example, data in the MJPEG format.
  • the moving image data or the still image data included in the moving image data is generated by the camera 210 in the actual position, and the moving image or the still image based on the moving image data reflects the target face captured from the actual position. There is.
  • the converted still image data is data of a converted still image, which is data generated based on the still image data or by converting the still image data.
  • the converted still image is captured by the camera when the camera is present at a virtual position, which is a predetermined position on a virtual straight line extending in the front direction from the target face when facing the front (the user takes a natural posture). It is a two-dimensional still image that should be displayed. That is, the target face included in the converted still image specified by the converted still image data becomes the target face when the image is taken from the virtual position which is the front of the user's face, and is basically in a front-facing state.
  • the virtual position of the camera 210 will be described later in detail.
  • the still image data is data of a still image (so-called frame) that constitutes a moving image.
  • the image processing apparatus may generate the converted still image data from all the still image data received from the camera, but doing so may cause a delay in the moving image. Therefore, if emphasis is placed on not causing a delay, the still image data to be converted to the converted still image data is set to, for example, every two or more of the still image data included in the moving image data. Every three (every two frames or every three frames) still image data can be used. Then, the number of frames of the converted moving image data (the number of converted still image data included in the converted moving image data per second) is equal to the number of frames of the moving image data (still image data included in the moving image data per second).
  • the moving image based on the converted moving image data can be used as a moving image.
  • the still image data to be converted does not have to be a fixed number of still image data such as every two or three.
  • the image processing unit 123 sends the generated converted moving image data to the output unit 125.
  • the output unit 125 outputs the data generated by the functional blocks in the computer device 100 to the interface 114.
  • the output unit 125 may receive the designated data from the main control unit 122. When the designated data is received, the output unit 125 sends it to the transmission / reception mechanism via the interface 114.
  • the designated data is information that identifies the computer device 100 included in the communication system 10 of the other party when the video conference is held.
  • the output unit 125 may receive the converted moving image data from the main control unit 122.
  • the converted moving image data is sent from the computer device 100 included in the communication system 10 of the other party.
  • the output unit 125 sends it via the interface 114 to the display 101 connected to the computer apparatus 100.
  • a moving image based on the converted moving image data is displayed on the display 101.
  • the output unit 125 may receive the converted moving image data from the image processing unit 123.
  • the converted moving image data is generated in the computer device 100 in which the output unit 125 is located.
  • the output unit 125 sends it to the transmitting / receiving mechanism via the interface 114.
  • the transmission / reception mechanism is configured to send the converted moving image data to the computer device 100 specified by the above-mentioned designated data.
  • the video conference system includes the first communication system 10-1 used by one user participating in the video conference and the second communication system 10-2 used by the other user participating in the video conference. Is included.
  • Both users prepare for the video conference.
  • one user watches the display 101 in the first communication system 10-1, while the other user displays the display in the second communication system 10-2. While watching 101, hold a video conference. Therefore, one user is sitting in front of the display 101 in the first communication system 10-1, and the other user is sitting in front of the display 101 in the second communication system 10-2. Move to position.
  • the participants of the video conference specify two users who hold the video conference.
  • the identification of the two users can be realized by using a known technique or a known technique.
  • the two users can be specified by at least one of the two users participating in the video conference specifying the other party to whom the video conference is to be performed. Of course, both users may specify the other party.
  • one user specifies the other party with whom the video conference is to be performed, and the user on the specified side approves the other party.
  • Two users who have a meeting are specified. The case of identifying the other party from the side of one user who uses the first communication system 10-1 will be described as an example. First, a user who uses the first communication system 10-1 operates the input device 102 included in the first communication system 10-1 to generate designated data.
  • the designated data is information that identifies the user of the other party who holds the video conference. For example, each of the users who may participate in the video conference is given an ID that is a unique identifier.
  • the user using the first communication system 10-1 can input the designated data by inputting this ID using the input device 102 or by selecting from the IDs registered in advance. In this example, it is assumed that the designation data designates the ID of the user who uses the second communication system 10-2.
  • the input designated data reaches the input unit 121 from the input device 102 via the interface 114.
  • the input unit 121 further attaches the ID of the first communication system 10-1 itself to the designated data and sends them to the output unit 125 via the main control unit 122.
  • the designated data and the ID of the first communication system 10-1 are sent from the output unit 125 to the transmitting / receiving mechanism via the interface 114.
  • the transmission / reception mechanism transmits the first communication system 10-1 to the communication system 10 operated by the user having the ID specified by the designated data, that is, the computer device 100 of the second communication system 10-2 via the network 400. Send your ID.
  • the user of the first communication system 10-1 identifies the user of the second communication system 10-2 as the other party of the video conference.
  • the user of the first communication system 10-1 applies for a video conference with the user of the second communication system 10-2.
  • the computer device 100 of the second communication system 10-2 receives the ID of the first communication system 10-1 transmitted from the computer device 100 of the first communication system 10-1 via the network 400 by the transmission / reception mechanism.
  • the ID reaches the input unit 121 from the transmission / reception mechanism via the interface 114, and is further transmitted to the main control unit 122.
  • the main control unit 122 receives an image indicating that the user of the first communication system 10-1 has applied for the video conference, for example, the first communication system 10 sent from the first communication system 10-1.
  • An image including the ID of the user of -1 is generated, and the data of the image is sent to the output unit 125.
  • the output unit 125 sends the image data to the display 101 via the interface 114.
  • an image indicating that the user of the first communication system 10-1 has applied for the video conference is displayed on the display 101 included in the second communication system 10-2.
  • the user of the second communication system 10-2 uses the input device 102 to make an input indicating the intention of the approval. This corresponds to the designated data in the computer device 100 included in the second communication system 10-2. If the user of the first communication system 10-1 does not agree to hold the video conference, the user of the second communication system 10-2 does not make an input indicating the intention of the approval or the first communication system 10-1. Input indicating the intention not to accept the video conference with the user.
  • designated data which is data indicating that, is input to the computer device 100 included in the second communication system 10-2.
  • the designated data is sent to the main control unit 122 via the interface 114 and the input unit 121.
  • the main control unit 122 receives it, the main control unit 122 generates data indicating that the video conference is ready to be conducted, and sends it to the output unit 125.
  • the data is sent from the output unit 125 to the transmitting / receiving mechanism via the interface 114, and is then sent from the transmitting / receiving mechanism to the first communication system 10-1 via the network 400.
  • the transmission / reception mechanism of the computer device 100 in the first communication system 10-1 receives the data sent from the second communication system 10-2.
  • the data is sent from the transmission / reception mechanism to the main control unit 122 of the computer apparatus 100 of the first communication system 10-1 via the interface 114 and the input unit 121.
  • the computer device 100 in the first communication system 10-1 and the computer device 100 in the second communication system 10-2 transmit and receive the converted moving image data, which is the data about the moving image necessary for the video conference. You are ready to do each other.
  • both parties participating in the video conference are placed so that the target faces, which are the faces of both users, are located within the imaging range of the camera 210 included in the communication system 10 near both users.
  • the user for example, adjusts his / her own posture or adjusts the position and angle of the camera 210 as necessary. This completes the preparation for the video conference.
  • the video conference is started.
  • the second communication of the converted moving image data generated by the first communication system 10-1 is performed. Transmission to the system 10-2 is performed, a moving image based on the converted moving image data is displayed on the display 101 included in the second communication system 10-2, and a user who uses the second communication system 10-2.
  • the converted moving image data generated by the second communication system 10-2 is transmitted to the first communication system 10-1 and included in the first communication system 10-1. A moving image based on the converted moving image data is displayed on the display 101.
  • the converted moving image data is generated in the first communication system 10-1, and the generated converted moving image data is sent to the second communication system 10-2. Then, the following description will be given focusing only on the processing when the moving image based on the converted moving image data is displayed on the display 101 included in the second communication system 10-2.
  • the user of the first communication system 10-1 uses the input device 102 to input start data.
  • the start data is sent from the input device 102 to the main control unit 122 in the computer device 100 of the first communication system 10-1 as in the case of the designated data.
  • the main control unit 122 which has received it starts the process for transmitting the converted moving image data to the computer device 100 in the second communication system 10-2.
  • moving image data is sent from the camera 210 connected to the computer apparatus 100 to the computer apparatus 100 regardless of whether or not start data is input.
  • the moving image data is constantly sent to the main control unit 122 via the interface 114 and the input unit 121.
  • the main control unit 122 does not perform any processing even if the moving image data is received until the start data is input.
  • the received moving image data is processed by the image processing unit 123. Send to.
  • the image processing unit 123 that has received the moving image data performs a process of converting the moving image data into converted moving image data.
  • the moving image data and the converted moving image data are as described above, and the conversion may be performed in any way.
  • four types of conversion methods i.e., first to fourth conversion methods, are proposed.
  • the image processing unit 123 includes a frame dropping unit that extracts at least a plurality of still image data from the still image data included in the moving image data as a target of image processing (conversion).
  • the frame dropping unit is not essential as described later.
  • the image processing unit 123 from each of the at least a plurality of still image data extracted by the frame dropping unit, the three-dimensional image of the face portion of the target face reflected in the still image specified by the still image data.
  • a three-dimensional model generation unit that generates a model is provided.
  • the image processing unit 123 also includes a three-dimensional model rotation unit that performs a process of rotating the plurality of three-dimensional models generated by the three-dimensional model generation unit by a rotation angle that is a fixed angle.
  • the image processing unit 123 also includes a two-dimensional image generation unit that generates converted still image data based on each of the three-dimensional models rotated by the three-dimensional model rotation unit. These functions are the same from the first conversion method to the fourth conversion method.
  • the difference between the first to fourth conversion methods is generally that the rotation angle (including the rotation direction) of the three-dimensional model when the target face is rotated by the three-dimensional model rotation unit is determined. The only way to do it is.
  • the image processing unit 123 executes the first conversion method, the image processing unit 123 is configured as shown in FIG.
  • the image processing unit 123 in this case includes a frame dropping unit 123A, a three-dimensional model generation unit 123B, a three-dimensional model rotation unit 123C, and a two-dimensional image generation unit 123D.
  • the frame dropping unit 123A extracts at least a plurality of still image data as image processing (conversion) targets from the still image data included in the moving image data. Only the extracted still image data is converted from still image data to converted still image data.
  • All of the still image data included in the moving image data is not subject to conversion into the converted still image data because the computing power of the computer device 100 converts the moving image data into the moving image data, which requires immediateness. This is because there may be a shortage in performing conversion (or conversion of still image data into still image data). Therefore, if the computing power of the computer device 100 is sufficient, it means that the frame dropping unit 123A is unnecessary. Although not limited to this, the frame dropping unit 123A in this embodiment extracts every five still image data included in the moving image data of 60 fps sent from the camera 210 every ten still images per second. The data will be extracted.
  • the frame dropping unit 123A need not always extract a fixed number of still image data, and need not set the number of still image data extracted per second to 10.
  • the number can be, for example, about 6 to 8 or more.
  • the three-dimensional model generation unit 123B reflects the three-dimensional model reflected in the still image specified by the still image data from each of the at least a plurality of still image data extracted by the frame dropping unit 123A. To generate.
  • the three-dimensional model is, for example, a wire frame model, but is not limited to this.
  • the three-dimensional model rotation unit 123C performs a process of rotating each of the three-dimensional models generated by the three-dimensional model generation unit 123B by a certain rotation angle.
  • the orientation and angle in which each of the 3D models is rotated is constant for all 3D models.
  • the two-dimensional image generation unit 123D also generates converted still image data based on each of the three-dimensional models rotated by the three-dimensional model rotation unit 123C.
  • the rotation angle when the three-dimensional model rotating unit 123C rotates the three-dimensional model is such that the two-dimensional image is generated based on the three-dimensional model after being rotated (that is, returned to the two-dimensional image).
  • the target face (more accurately, the face portion of the target face) that is sometimes included in the two-dimensional image is determined so as to be the same as the target face when captured by the camera in the virtual position. ..
  • the virtual position is a predetermined position on a virtual straight line extending in the front direction from the target face when facing the front (the user takes a natural posture). That is, the three-dimensional model rotation unit 123C makes the moving image data (or still image data) captured by the camera 210 at the actual position the same as that captured by the virtual camera at the virtual position for the target face.
  • the 3D model of the facial part of the target face is rotated so that In the first conversion method, the rotation angle is predetermined.
  • the data that specifies the rotation angle is, for example, recorded in advance in the three-dimensional model rotation unit 123C, and the three-dimensional model rotation unit 123C rotates the three-dimensional model by the rotation angle specified by the data that specifies the rotation angle.
  • FIG. 6A shows a side view of the relationship between the camera 210 and the target face.
  • the camera 210 exists in the actual position immediately above the display 101. In this example, it is assumed that the camera 210 is located above the target face, although it is in the front direction of the target face when considered in the horizontal direction.
  • the camera 210 images the target face from the upper side by the angle ⁇ , and the moving image based on the moving image data generated by the camera 210 or the still image based on the still image data included in the moving image data.
  • the target face reflected in the image is captured from above from the angle ⁇ .
  • FIG. 6B shows an example in which an image based on such moving image data is displayed on the display 101 included in the communication system 10 of the other party.
  • the target face included in the moving image is directed downward by the angle ⁇ .
  • the three-dimensional model generation unit 123B generates a three-dimensional model of the face part of the target face included in the still image specified by the still image data.
  • the three-dimensional model generation unit 123B first extracts the face portion F of the target face from the image included in the still image.
  • the method for extracting the face portion F may be any method, but a general image recognition technique may be used.
  • the area surrounded by the broken line in FIG. 7A is the face portion F.
  • the face portion in this embodiment means a portion of the human head (target face) that is generally in front of the ears and below the forehead.
  • the range of the face part may be narrower at least in the range including eyes, nose, and mouth, or may be wider up to the entire head.
  • the three-dimensional model generation unit 123B generates a three-dimensional model for the above-mentioned face portion F.
  • the three-dimensional model generation unit 123B generates a three-dimensional model using a conversion algorithm that estimates a three-dimensional model of a human face obtained by machine learning of many faces. Automatically create a three-dimensional model of the facial part of a face reflected in a still image from one general two-dimensional still image (in other words, from the data of a single facial photograph)
  • the technology is disclosed in detail in the paper "Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression / Accepted to ICCV 2017" (URL: http://aaronsplace.co.uk/papers/jackson2017recon/). ..
  • the conversion algorithm described above is generated by machine learning by a computer using a large number of two-dimensional still images of human faces generated by capturing various human faces from various angles. It is a thing.
  • the three-dimensional model generation unit 123B automatically generates a three-dimensional model of the face portion F of the target face reflected in the still image specified by the still image data, using the conversion algorithm.
  • the three-dimensional model generated in that case is, for example, as shown in FIG. 7 (B). What is shown in FIG. 7B (1) is a three-dimensional model of the face portion F of the target face viewed from the front.
  • the three-dimensional model is a wire frame model, but is not limited to this.
  • (2) is a side view of the three-dimensional model of the face portion F, in which the wire frame is omitted.
  • the face portion F faces downward by the angle ⁇ shown in FIG.
  • the three-dimensional model generation unit 123B also generates data of a portion of the still image data excluding the face portion F, that is, data of a still image of a portion around the face portion F in FIG. 7A. , And sends it to the two-dimensional image generation unit 123D.
  • the three-dimensional model which is in the state of facing downward by the angle ⁇ , naturally faces the front if it is rotated upward by the angle ⁇ .
  • the angle ⁇ can be easily obtained by using a and b shown in FIG.
  • a is the horizontal distance from the virtual position X of the camera to the target face
  • b is the vertical distance from the virtual position X of the camera 210 to the actual position of the camera 210.
  • the virtual position X of the camera 210 is the position just before the display 101 in the front direction of the target face. That is, the virtual position X is located on a virtual straight line extending in the front direction of the target face of the user who takes a natural posture. As long as the condition is satisfied, the relative positional relationship between the virtual position X and the display 101 does not matter.
  • the virtual position X may be located inside the display 101 or behind the display 101. .
  • the three-dimensional model rotating unit 123C rotates the three-dimensional model shown in FIG. 7B upward by an angle ⁇ in the vertical plane. Then, the three-dimensional model faces the front as shown in FIG. FIG. 7C (1) shows a three-dimensional model of the face portion F of the target face viewed from the front.
  • (2) is a side view of the three-dimensional model of the face portion F, in which the wire frame is omitted.
  • the three-dimensional model rotation unit 123C in this embodiment rotates the three-dimensional model about a predetermined point.
  • the process for rotating the 3D model is as follows: the 3D model has a certain axis (for example, a horizontal straight line that penetrates both ears, or a straight line that vertically penetrates the center of the skull when viewed in plan), or both of them. It is also possible to rotate around. However, in order to perform such processing, it is necessary to detect the position of the center of the ear or the skull in a three-dimensional model when viewed in plan and specify the coordinates thereof.
  • a point in a virtual space in which the 3D model exists (whether or not the point is a virtual point and is located inside the 3D model.
  • the point is a virtual point where the 3D model exists.
  • the complicated processing as described above can be omitted.
  • the predetermined point is the lens position of the camera, and is the origin of the virtual space in which the three-dimensional model exists.
  • the rotation of the three-dimensional model is executed as a transformation of spatial coordinates with a predetermined point as the origin.
  • the two-dimensional image generation unit 123D again generates the two-dimensional image data using the three-dimensional model shown in FIG. 7C after being rotated by the three-dimensional model rotation unit 123C.
  • Such a two-dimensional image is displayed on the excluded face portion F in the data of the portion of the still image data excluding the face portion F that has been sent from the three-dimensional model generating unit 123B to the two-dimensional image generating unit 123D. It is pasted in the corresponding area.
  • the still image thus obtained is the converted still image
  • the data of the converted still image is the converted still image data.
  • the target face included in the obtained converted still image basically faces the front as shown in FIG.
  • the data of the part of the still image data excluding the face part F which is sent from the three-dimensional model generation part 123B to the two-dimensional image generation part 123D, is the data itself of the part of the still image data excluding the face part F. It may be present, but it may be something that has undergone some processing.
  • the range of the face part F in FIG. 7 (D) is the same as the face part F in FIG. 7 (B), but is generated using the three-dimensional model after being rotated and pasted in the range.
  • the edge of the two-dimensional image may not completely coincide with the edge of the range of the face portion F. If it is desired to reduce the unnaturalness due to this, the above-mentioned processing may be performed.
  • the processing may be, for example, any method as long as the edge of the two-dimensional image generated from the rotated three-dimensional model is made to coincide with the edge of the face portion F, but the processing is a two-dimensional image.
  • the processing may be, for example, image scaling in one direction, image scaling in two directions, rotation, and the like. For example, when the three-dimensional model of the face portion F of the target face facing downward is rotated to face the front, the apparent length in the vertical direction, for example, becomes short.
  • the three-dimensional model generation unit 123B can perform a process of reducing the vertical length of a still image of a part other than the face part F of the target face. .. Then, the edge of the image of the face generated from the three-dimensional model matches the range of the face portion F well. If the actual position of the camera 210 is deviated from the frontal direction of the face, the camera 210 is rotated in the horizontal direction in the horizontal direction in the same manner as when the three-dimensional model is rotated in the vertical direction in the above example. Is naturally required, but the description thereof is omitted.
  • the three-dimensional model rotation unit 123C does not need to individually perform the two processes of the vertical rotation and the horizontal rotation, and the three-dimensional model rotation unit 123C performs one rotation that is a combination of both rotations. Of course, it is possible to do it.
  • each of the still image data extracted by the frame dropping unit 123A is converted into converted still image data.
  • the converted still image data generated as a result is sequentially output from the two-dimensional image generation unit 123D to the output unit 125.
  • This set of a large number of converted still image data is the converted moving image data. That is, the converted moving image data is output from the image processing unit 123 to the output unit 125.
  • the common or typical rotation angle ⁇ (14 degrees or 9.5 degrees in the above example) is set to the three-dimensional model.
  • the rotation unit 123C uses it as an angle for rotating the three-dimensional model.
  • This rotation angle can be selected from a plurality of rotation angles, however, it is basically fixed. Therefore, the numerical values of a and b in the above example may not match the relationship between the real position and the virtual position of the camera 210.
  • the virtual position of the camera 210 can be freely determined in relation to the computer program, such a situation is essentially that the actual position of the camera 210 is not the position planned at the time of designing the computer program. It occurs when there is.
  • the first conversion method is particularly effective when the actual position of the camera 210 exists at a planned position or a position not far from it.
  • the computer device 100 is a laptop personal computer, a smart phone, a tablet, or the like
  • the actual position of the camera is fixed with respect to these housings.
  • the virtual position of the camera is determined to be an appropriate position immediately before or behind the display of a laptop personal computer, smartphone, or tablet
  • the actual position of the camera is The virtual position can be uniquely determined. If the specifications of the devices constituting the image processing apparatus are clear from this point, the distance between the target face and the virtual position of the camera 210, or the distance between the target face and the display 101 is predicted to some extent according to the size of the display 101.
  • the computer program for causing the computer device 100 to function as the image processing device in the present application includes a virtual position (or a real position and a virtual position) of a camera in each of various laptop personal computers, smartphones, tablets, and the like. It is possible to include data (that is, a large number of sets of data, which is a pair of the model and the virtual position of the camera) about the above-described data that specifies the rotation angle that can be grasped from the relationship with.
  • the computer program after the computer program is installed in the computer device 100, the computer program has a function of automatically specifying the model of the computer, or after the computer program is installed in the computer device 100,
  • the computer program may implement any of the functions of receiving an input made by the user for specifying the model of the computer device 100 in which the program is installed.
  • the camera is placed, and how many cm away from the virtual position of the camera immediately in front of the center of the display.
  • the user is instructed to use the image processing apparatus, and the user is made to have the positional relationship between the display 101 and the camera 210 set in advance, and the virtual position determined as described above and the user set it accordingly.
  • the rotation angle can be determined in advance in consideration of the relationship with the actual position of the wax camera 210.
  • the image processing unit 123 executes the second conversion method
  • the image processing unit 123 is configured as shown in FIG.
  • the image processing unit 123 like the image processing unit 123 that executes the first conversion method, has a frame dropping unit 123A, a three-dimensional model generation unit 123B, a three-dimensional model rotation unit 123C, and a two-dimensional image generation unit 123D. Is equipped with. All of their configurations and functions are the same as those in the case of the first conversion method except that the three-dimensional model rotation unit 123C in the case of executing the second conversion method does not previously record the data for specifying the rotation angle. Is the same as.
  • the image processing unit 123 that executes the second conversion method includes the angle detection unit 123E.
  • the angle detector 123E determines the above-mentioned rotation angle by performing a predetermined calculation based on the moving image data sent from the main controller 122. Note that in FIG. 9, the moving image data is directly input from the main control unit 122 to the angle detection unit 123E, but the angle detection unit 123E uses the rotation angle based on the still image data extracted by the frame dropping unit 123A. ⁇ may be determined. If such an angle detection unit 123E is used, it is not necessary to pay attention to the relative positional relationship between the actual position and the virtual position of the camera 210.
  • the angle detection unit 123E In order for the angle detection unit 123E to automatically obtain the rotation angle from the moving image data, it is conceivable to let the angle detection unit 123E perform machine learning. If the angle detection unit 123E learns the image of the face taken from various angles and the angle at which each image was taken, the still image based on the still image data included in the moving image data can be obtained. It is possible to cause the angle detection unit 123E to detect from what angle the reflected face is imaged. If that is possible, the angle detector 123E can naturally determine the magnitude of the rotation angle ⁇ including the direction of rotation. In the case of using the second conversion method, for example, the user is informed of an instruction such as "keep the front facing for a few seconds until the rotation angle is determined", and the user is informed of the instruction.
  • an instruction such as "keep the front facing for a few seconds until the rotation angle is determined"
  • the image processing unit 123 executes the third conversion method
  • the image processing unit 123 is configured as shown in FIG.
  • the third conversion method does not determine the rotation angle in advance, but also performs the process of determining the rotation angle, like the second conversion method.
  • the image processing unit 123 when executing the third conversion method is similar to the image processing unit 123 when executing the second conversion method. Similar to the image processing unit 123 that executes the second conversion method, the image processing unit 123 that executes the third conversion method includes a frame dropping unit 123A, a three-dimensional model generation unit 123B, a three-dimensional model rotation unit 123C, and a two-dimensional model rotation unit 123C.
  • the three-dimensional image generation unit 123D is provided.
  • the image processing unit 123 that executes the third conversion method includes a rotation angle determination unit 123F instead of the angle detection unit 123E in the image processing unit 123 that executes the second conversion method.
  • the rotation angle determination unit 123F has a function of determining the rotation angle, like the angle detection unit 123E described above.
  • the angle detection unit 123E determines the rotation angle by performing a predetermined calculation based on the moving image data, but the rotation angle determination unit 123F performs a predetermined calculation based on other data instead of the moving image data. Determine the rotation angle.
  • the data used by the rotation angle determination unit 123F to determine the rotation angle is parameter data input from the input device 102, parameter data input from a sensor (not shown), or both of them. ..
  • the parameters input from the input device 102 are, for example, information that specifies the shape of the display 101 (for example, the aspect ratio is 3: 4 or 9:16) and the size of the display 101 (for example, how many inches the display 101 is.
  • Information that specifies where the actual position of the camera is for example, directly above the display 101 at the center of the width direction of the display 101 or the upper right corner of the display 101), from the display 101 to the target face Is information for specifying the distance of.
  • the sensor may measure a parameter useful for obtaining a relative positional relationship between the real position and the virtual position of the camera 210 and a relative positional relationship between the virtual position of the camera 210 and the target face. ..
  • a known or well-known distance measuring device may be used as a sensor, and a parameter for measuring the distance of the target face from the sensor may be used.
  • the data specifying the rotation angle determined by the rotation angle determination unit 123F is sent from the rotation angle determination unit 123F to the three-dimensional model rotation unit 123C.
  • the three-dimensional model rotation unit 123C rotates each three-dimensional model in the same angle and in the same direction with the rotation angle specified by the data, as in the case of the first conversion method.
  • the converted moving image data is output from the image processing unit 123 to the output unit 125.
  • data for executing the mode for determining the rotation angle can be input from the input device 102, and the mode for determining the rotation angle can be changed by inputting the start data, for example. It is preferable to do it beforehand.
  • the fourth conversion method does not determine the rotation angle in advance, but also performs the processing of determining the rotation angle, like the second and third conversion methods.
  • the image processing unit 123 that executes the fourth conversion method includes the same functional blocks as the image processing unit 123 when executing the first conversion method.
  • the image processing unit 123 that executes the fourth conversion method includes a frame dropping unit 123A, a three-dimensional model generation unit 123B, a three-dimensional model rotation unit 123C, and a two-dimensional image generation unit 123D.
  • the three-dimensional model rotation unit 123C in the case of executing the fourth conversion method does not previously record the data specifying the rotation angle, and the rotation angle change data for changing the rotation angle. Is input from the main control unit 122 to the three-dimensional model rotation unit 123C, and the three-dimensional rotation model rotation unit 123C receives the rotation angle change data every time the rotation angle change data is received. All are the same as the case of the first conversion method except that the rotation angle for rotating the three-dimensional model of the target face is changed based on the above. Even when the fourth conversion method is executed, the converted moving image data generated by the image processing unit 123 is sent to the output unit 125, as in the case where the first conversion method is executed.
  • This data is sent from the output unit 125 to the display 101. Then, on the display 101, a moving image based on the converted moving image data will be displayed, as will be described later.
  • This display is performed in substantially real time after the image is captured by the camera 210, preferably within 0.5 seconds.
  • the user inputs the rotation angle change data while looking at his / her own face (target face) displayed on the display 101, and rotates the target face little by little, for example, to target the target face displayed on the display 101. Adjust so that your face is basically facing the front.
  • the rotation angle change data is input using the input device 102.
  • the rotation angle change data reaches the main control unit 122 in the same manner as other data input by the input device 102, and is sent from the main control unit 122 to the three-dimensional model rotation unit 123C.
  • the rotation direction of the three-dimensional model is not limited to this, but may be only the vertical direction (around the X axis) and the horizontal direction (around the Y axis). Of course, they can be input using the input device 102.
  • the angle at which the three-dimensional model rotating unit 123C rotates the three-dimensional model when the target face displayed on the display 101 is basically facing the front, and after that, the three-dimensional model rotating unit 123C causes the three-dimensional model of the target face. It is determined as the rotation angle when rotating the model at a uniform angle.
  • the converted moving image data is output from the image processing unit 123 to the output unit 125. Even when the fourth conversion method is used, it is possible to input the data for executing the mode for determining the rotation angle from the input device 102, and the mode for determining the rotation angle is set to, for example, the input of the start data. It is preferable to do it beforehand.
  • the output unit 125 receives the converted moving image data from the image processing unit 123 as described above.
  • the output unit 125 sends it to the transmitting / receiving mechanism via the interface 114.
  • the transmission / reception mechanism sends the converted moving image data to the computer device 100 specified by the above-mentioned designated data, that is, the computer device 100 included in the second communication system 10-2.
  • the transmission / reception mechanism in the computer device 100 included in the second communication system 10-2 receives the converted moving image data sent from the first communication system 10-1.
  • the converted moving image data is sent from the transmission / reception mechanism to the input unit 121 via the interface 114, and then sent from the input unit 121 to the main control unit 122.
  • the main control unit 122 sends this converted moving image data to the display 101 via the output unit 125 and the interface 114.
  • a moving image based on the converted moving image data sent from the first communication system 10-1 is displayed on the display 101 in the second communication system 10-2.
  • the face image displayed on the display 101 basically faces the front as shown in FIG. I've said several times that basically means when the user is in a natural position.
  • FIG. 13A shows a state in which the user of the first communication system 10-1 faces downward from the horizontal direction by the angle ⁇ .
  • a deviation of angle ⁇ + angle ⁇ occurs between the camera 210 and the front direction of the target face. Therefore, if no image processing is performed, the target face included in the moving image displayed on the display 101 included in the second communication system 10-2 is the target face shown in FIG. It will be as seen from.
  • the target face is displayed on the display 101 while being rotated upward by the angle ⁇ .
  • the target face included in the moving image displayed on the display 101 included in the second communication system 10-2 is a state in which the target face shown in FIG. 13C is viewed from the front. That is, the target face of the user of the first communication system 10-1 facing downward from the horizontal direction by the angle ⁇ is displayed on the display 101 included in the second communication system 10-2. This is a natural state and does not give a feeling of strangeness to the user of the second communication system 10-2.
  • the video conference system according to the modified example includes a first communication system 10-1 and a second communication system 10-2, like the video conference system of the first embodiment.
  • Both communication systems 10 include a computer device 100, a display 101, and a camera 210.
  • the computer device 100 in both communication systems 10 in the first embodiment has a function of converting moving image data into converted moving image data, but the computer device 100 in both communication systems 10 in the modified example has the function. Do not have. That is, the computer device 100 in both communication systems 10 in the modification is not the image processing device in the present invention.
  • the computer device 100 in both communication systems 10 in the modification basically has only the same functions as those in the conventional video conference system except for the data exchange with the conversion server described later.
  • the conversion server 20-1 and the conversion server 20-2 have a function of converting the moving image data to be converted moving image data, which the image processing apparatus according to the present invention should perform. That is, the conversion server 20-1 and the conversion server 20-2 in the modified example use the cloud computing technology to send moving image data to the first communication system 10-1 and the second communication system 10-2. It can be said that it provides a function of converting to converted moving image data.
  • the video conference system includes a first communication system 10-1, a second communication system 10-2, a conversion server 20-1, and a conversion server 20-2.
  • the first communication system 10-1, the second communication system 10-2, the conversion server 20-1, and the conversion server 20-2 are all connectable to the network 400.
  • the computer device 100 in the first communication system 10-1 is adapted to receive the moving image data from the camera 210 in the actual position.
  • the moving image data is sent from the computer device 100 in the first communication system 10-1 to the conversion server 20-1.
  • the conversion server 20-1 converts the received moving image data into converted moving image data.
  • the conversion server 20-1 returns the converted moving image data to the computer device 100 in the first communication system 10-1.
  • the converted moving image data is sent from the computer device 100 of the first communication system 10-1 to the computer device 100 of the second communication system 10-2, as in the case of the first embodiment.
  • the converted moving image data generated by the conversion server 20-1 is directly sent to the computer device 100 in the second communication system 10-2 without being sent to the computer device 100 in the first communication system 10-1. It may be sent.
  • the hardware configuration of the conversion server 20-1 for enabling the above-described functions to be exhibited may be basically the same as the hardware configuration of the computer device 100 according to the first embodiment, and the functions generated therein.
  • the blocks may be the same as the functional blocks in the computer device 100 according to the first embodiment.
  • the computer device 100 receives the moving image data from the camera 210, and the moving image data reaches the input unit 121 in the order of the camera 210, the interface 114, and the input unit 121.
  • the conversion server 20-1 in the modification is adapted to receive the moving image data from the computer device 100 in the first communication system 10-1 via the network 400, and the moving image data is transmitted and received.
  • the computer device 100 is adapted to receive the input from the input device 102 via the interface 114.
  • the conversion server 20-1 in the modification example is adapted to receive an input from the input device 102 from the computer device 100 in the first communication system 10-1 via the network 400.
  • the converted moving image data generated by the image processing unit 123 is sent to the second communication system 10-2 via the output unit 125, the interface 114, and the transmission / reception mechanism.
  • the converted moving image data generated by the image processing unit 123 is returned to the first communication system 10-1 via the output unit 125, the interface 114, and the transmission / reception mechanism. ..
  • the conversion server 20-1 may send the converted moving image data to the second communication system 10-2.
  • the conversion server 20-2 has the same configuration and function as the conversion server 20-1, and has the same function as the conversion server 20-1 provides to the computer device 100 in the first communication system 10-1. Are provided to the computer device 100 in the second communication system 10-2.
  • the first communication system 10-1 and the second communication system 10-2 can send the converted moving image data to each other, as in the case of the first embodiment.
  • one conversion server may provide both communication systems 10 with a function of converting moving image data into converted moving image data.
  • the appearance of the image processing apparatus in the second embodiment is like a webcam.
  • the image processing apparatus according to the second embodiment has the appearance as shown in FIG. 2, FIG. 8, FIG.
  • the image processing apparatus according to the second embodiment can be used by being connected to a computer device that constitutes a conventional video conference system.
  • a computer device has a function of transmitting / receiving moving image data to / from another computer device, and may be publicly known or well known.
  • the image processing apparatus according to the second embodiment is integrated with a camera, and the camera includes the same hardware as the hardware configuration of the computer apparatus 100 according to the first embodiment, and the hardware has the first hardware.
  • the same computer program as that described in the embodiment is installed.
  • the hardware configuration of the image processing apparatus according to the second embodiment has a camera connected to the interface 114 in FIG.
  • the image processing apparatus according to the invention of the present application has such a configuration without the camera.
  • the image processing apparatus according to the second embodiment has a function of converting moving image data generated by a camera integrated with the image processing apparatus into converted moving image data.
  • the image processing apparatus according to the second embodiment can be used in the same manner as a normal webcam.
  • the data output by this image processing apparatus is not general moving image data, but converted moving image data. Therefore, the computer devices in both communication systems can send the converted moving image data to each other without having the function of converting the moving image data into the converted moving image data as in the first embodiment. Become.

Abstract

The present invention provides technology with which it is possible to reduce discomfort felt in relation to the direction of a face or the line of sight in a moving image shown on a display. When performing a teleconference using a teleconference system, a subject face in a moving image captured by a typical web camera tends, for example, to be an image captured from slightly above, with neither the line of sight nor the direction of a face basically facing the front. This image processing device generates a three-dimensional model included in a still image in the moving image. The image processing device then turns the three-dimensional model, which faces downward by an angle θ, upward by the angle θ. Next, the image processing device generates data of a two-dimensional image again from the three-dimensional model after the same is turned. Thus, the subject face in the moving image basically faces the front.

Description

画像処理装置、方法、コンピュータプログラムImage processing apparatus, method and computer program
 本発明は、例えば、テレビ会議に応用することのできる画像処理技術に関する。 The present invention relates to an image processing technique that can be applied to, for example, a video conference.
 インターネットを始めとするネットワークが普及して久しく、また、近年ではネットワークによる通信の高速化が著しい。それに伴い、近年では、遠隔地間での動画の送受信が容易になったため、遠隔地間でのテレビ会議(テレビ通話)が極々身近なものになっている。
 テレビ会議は、高価な専用装置(専用システム)を用いて実現される場合もあれば、Microsoft(商標) Corporationが提供するSkype(商標)のように、簡易な汎用装置(システム)に加えて動画の送受信用のソフトウェアを用いて実現される場合もある。
 それが専用装置によって実現されるにせよ、汎用装置によって実現されるにせよ、テレビ会議の大まかな原理は変わらない。例えば、一対一でテレビ会議が行われる場合、両参加者はネットワークに接続されたコンピュータを準備する。そしてそれら各コンピュータにはそれぞれ、ディスプレイとカメラが接続される。カメラは動画の撮影が可能なデジタルカメラであり、テレビ会議の参加者を撮像する。一方のカメラで撮られた一方の参加者の顔が映り込んだ動画についての動画像データは、一方のコンピュータ及びネットワークを介して他方のコンピュータに送られる。それにより、他方のコンピュータに接続された他方のディスプレイには、一方の参加者の顔が映り込んだ動画が表示される。他方の参加者は、それにより一方の参加者の顔を見ることができる。このような処理を双方向で行うことにより、両参加者は相手の顔を見ながら会議を行うことができる。
 もちろん、2つのコンピュータ(或いは両参加者)の間では音声やテキストもやり取りすることができ通常それらの少なくとも一方は必須とされるが、音声やテキストのやり取りは本願とは無関係であるので、それらについての記載は以降においても基本的に省略する。
It has been a long time since networks such as the Internet have spread, and in recent years, the speed of communication by networks has been remarkable. Along with this, in recent years, it has become easier to send and receive moving images between remote places, and thus video conferencing (video call) between remote places has become extremely familiar.
Video conferencing may be realized using an expensive dedicated device (dedicated system), or a simple general-purpose device (system) such as Skype (trademark) provided by Microsoft (trademark) corporation, and video. In some cases, it is realized using software for sending and receiving.
Whether it is realized by a dedicated device or a general-purpose device, the general principle of the video conference remains unchanged. For example, in a one-to-one videoconference, both participants prepare a computer connected to the network. A display and a camera are connected to each of these computers. The camera is a digital camera capable of capturing moving images, and captures the participants of the video conference. The moving image data of the moving image in which the face of one participant is reflected by one camera is sent to the other computer via the one computer and the network. As a result, a moving image in which the face of one participant is reflected is displayed on the other display connected to the other computer. The other participant can thereby see the face of one participant. By performing such processing bidirectionally, both participants can hold a conference while looking at the other party's face.
Of course, voice and text can also be exchanged between two computers (or both participants), and at least one of them is usually required, but since exchange of voice and text is unrelated to this application, The description of is basically omitted hereafter.
特開2018-056907Japanese Patent Laid-Open No. 2018-056907 国際公開第2016/158014号International Publication No. 2016/158014 特開2016-085579JP, 2016-085579, A 特開平6-90445号公報JP-A-6-90445
 以上のようにして行われるテレビ会議においてよく知られている課題がある。
 上述したように、テレビ会議が行われる場合には、一方のカメラで撮られた一方の参加者の顔が映り込んだ動画についての動画像データは、一方のコンピュータ及びネットワークを介して他方のコンピュータに送られ、それにより、他方のコンピュータに接続された他方のディスプレイに、一方の参加者の顔が映り込んだ動画が表示される。
 他方の参加者は、他方のディスプレイに映し出された一方の参加者の顔を見ながらテレビ会議を行うのであるが、そのとき、他方のディスプレイに映し出された一方の参加者の目線が他方の参加者の方向を向いておらず、場合によっては目線のみならず一方の参加者の顔の向きが他方の参加者の方向を向いていないことまである。そのような事態は、他方の参加者に対して激しい違和感を与える。結果としてテレビ会議の参加者の双方は、そのような違和感を抱えたままテレビ会議を行うことになる。
 このような不具合が生じるのは、上述した一方の参加者の側で作られる動画像データ、或いは一方の参加者の側で動画像データを作る一方のカメラの位置に問題があるからである。例えば、一方の参加者の顔の正面に一方のディスプレイが存在するとする。その場合、一方の参加者の顔は、基本的には(言い換えれば、一方の参加者が自然な姿勢をとっている場合には)一方のディスプレイに正対した状態となる。この場合において、例えば、一方のカメラが一方のディスプレイの幅方向の中心の上側に配置されているとする。そうすると一方のカメラは、一方のディスプレイと正対する一方の参加者の顔を基本的に斜め上から捉えることになる。そのような場合においては、一方のカメラが生成した動画像データを受け取った他方の参加者の前にある他方のディスプレイに映し出される動画像中に映し出される一方の参加者の目線或いは顔は、他方の参加者の方向を向かず他方の参加者には下方向を見ているように感じられる。
 他方の参加者の前にある他方のディスプレイに映し出される動画像中に映し出される一方の参加者の目線或いは顔の向きがあらぬ方向を向くという上述のような現象は、一方のカメラが一方のディスプレイの幅方向の中心の上側にある場合のみならず、一方のカメラが一方のディスプレイの周囲のどこにあっても生じる。もっとも一方のカメラが配置される位置によって他方の参加者の前にある他方のディスプレイに映し出される動画像中に映し出される一方の参加者の目線或いは顔の向きは異なることになる。
There is a well-known problem in the video conference performed as described above.
As described above, when a video conference is held, moving image data of a moving image of one participant's face captured by one camera is transferred to the other computer via the one computer and the network. , Which causes a video of one participant's face to be displayed on the other display connected to the other computer.
The other participant holds the video conference while looking at the face of one participant displayed on the other display, and at that time, the line of sight of the one participant displayed on the other display joins the other participant. There is a case where the direction of one participant's face is not toward the direction of the other participant as well as the line of sight in some cases. Such a situation gives a strong sense of discomfort to the other participant. As a result, both parties of the video conference will hold the video conference with such discomfort.
Such a problem occurs because there is a problem in the above-described moving image data created by one participant or in the position of one camera that creates moving image data by one participant. For example, assume that one display is in front of one participant's face. In that case, the face of one participant is basically facing the other display (in other words, when one participant has a natural posture). In this case, for example, one camera is arranged above the center of one display in the width direction. Then, one of the cameras basically captures the face of one participant facing the one display, basically from obliquely above. In such a case, the line of sight or face of one participant displayed in the moving image displayed on the other display in front of the other participant receiving the moving image data generated by one camera is It seems that the other participant does not face the direction of the other participant and is looking downward.
The phenomenon described above, in which one participant's line of sight or face is projected in the moving image displayed on the other display in front of the other participant, is one It occurs not only when it is above the widthwise center of the display, but also where one camera is anywhere around one display. However, depending on the position where one of the cameras is arranged, the line of sight or face of one participant displayed in the moving image displayed on the other display in front of the other participant is different.
 上述の如き課題は広く知られているため、そのような課題を解決するための手法も既に幾つか提案されている。
 例えば、ディスプレイの少なくとも一部を透明な部材で構成するとともに、カメラをディスプレイの内部或いは背後に設けることで、ディスプレイの前にいる参加者の顔を基本的に正面から撮像するという技術が知られている。もっともディスプレイに対してこのような工夫を行うには大きなコストが必要であるため、この技術は殆ど普及していない。またこの技術は、既に出回っている一般的なディスプレイに後付で使用することができない。
 また、ディスプレイの周囲に配置されたカメラで作られた動画像データに映り込んだ顔が動画像中の例えば中心からずれた場合に、そのずれ量を検出して動画像データに対して補正を行い、動画像データに基づく動画像中に映し出される顔を上下方向或いは左右方向に平行移動させるという技術も知られている。しかしながら、動画像データに基づく動画像中に映し出される顔を上下方向或いは左右方向に平行移動させたとしても顔の向きは修正されない。また、かかる技術は、ずれ量の検出を継続的に行い、動画像データに基づく動画像中に映し出される顔の上下方向或いは左右方向の平行移動を継続的に行うものであるから、画像処理に要する演算が複雑になりやすく動画像の遅延を生じやすい。
 また、ディスプレイの周囲に配置されたカメラで作られた動画像データに映り込んだ顔の中から更に目線の方向を検出して動画像データに対して補正を行い、動画像データに基づく動画像中に映し出される顔における目線の方向を修正するという技術も知られている。しかしながら、動画像データに基づく動画像中に映し出される顔における目線の方向を補正することにより他方の参加者の前にあるディスプレイに映し出される動画像中の一方の参加者の目線を他方の参加者の目線と一致させることが可能となる場合もあるにせよ、例えば上述の例の場合であれば、他方の参加者の前にある他方のディスプレイに映し出される動画像中に映し出される下を向いた一方の参加者の顔における目は基本的に上目遣いの状態となるから、却って不自然さを増すことすらあり得る。加えて、顔の方向全体の補正を行うにせよ、目線の方向に基づく画像の補正を行うと、一方の参加者が目線を動かしただけで一方の参加者が顔の向きを変えていないのに他方の参加者の前にあるディスプレイに映し出される動画像中の一方の参加者の顔の向きが変わって不自然さが増すこともあり得る。また、この技術においても、上述の場合と同様の理由で、動画像の遅延を生じやすい。
 また、上述の特許文献4に記載のような、カメラで作られた動画像データに写り込んだ顔の三次元モデルを生成するとともに、生成した三次元モデルを所定の角度だけ回転させてから再度二次元の画像を得るという技術が知られている。しかしながら、動画像データから顔の三次元モデルを生成するには一般に、2台のカメラを用いてのいわゆるステレオ撮像を行うか、又は、1台のカメラで撮像を行うのであればそのカメラで撮像される動画像データを構成する多数の静止画像はデプスデータを含むものとされなければならない。これらは、カメラとしては一般的なものは無く、そのような一般的でないハードウェアを準備することをユーザに強いる技術は、その普及が極めて困難である。例えば、今どきのラップトップ型のパーソナルコンピュータや、スマートフォン、タブレット等のコンピュータはカメラを内蔵しているし、また、デスクトップ型のパーソナルコンピュータと組合せて用いられるウェブカメラその他のカメラも広く普及している。ステレオカメラでもなければ、デプスデータを含む動画像データも作れず、それら広く普及しているカメラにも応用できる技術でなければ、少なくとも実用、普及には向かない。
Since the above-described problems are widely known, some methods for solving such problems have already been proposed.
For example, there is known a technique in which at least a part of a display is made of a transparent member and a camera is provided inside or behind the display so that the faces of participants in front of the display are basically captured from the front. ing. However, since this technique requires a large cost to make such a device for a display, this technique has not been widely used. Also, this technology cannot be retrofitted to common displays already on the market.
Also, if the face reflected in the moving image data created by the cameras arranged around the display is displaced from the center, for example, in the moving image, the amount of displacement is detected and the moving image data is corrected. There is also known a technique in which a face displayed in a moving image based on moving image data is translated in the vertical direction or the horizontal direction. However, even if the face displayed in the moving image based on the moving image data is translated in the vertical direction or the horizontal direction, the direction of the face is not corrected. Further, since such a technique continuously detects the amount of deviation and continuously performs parallel movement in the vertical direction or the horizontal direction of the face displayed in the moving image based on the moving image data, it is not necessary for image processing. The required calculation is likely to be complicated, and the moving image is likely to be delayed.
In addition, the direction of the line of sight is further detected from the face reflected in the moving image data created by the cameras arranged around the display, and the moving image data is corrected. There is also known a technique of correcting the direction of the eyes on the face projected inside. However, by correcting the direction of the line of sight on the face displayed in the moving image based on the moving image data, the line of sight of one participant in the moving image displayed on the display in front of the other participant is changed to the other participant. Although it may be possible to match the line of sight of the other person, for example, in the case of the above-mentioned example, the user looks downward in the moving image displayed on the other display in front of the other participant. The eyes on the face of one of the participants are basically in a state of eye-gaze, so it is possible that they may even become unnatural. In addition, when correcting the image based on the direction of the line of sight, even if the entire face direction is corrected, one participant does not change the face direction only by moving the line of sight. In addition, it is possible that the direction of the face of one participant in the moving image displayed on the display in front of the other participant changes and the unnaturalness increases. Also in this technique, the moving image is likely to be delayed for the same reason as described above.
In addition, as described in Patent Document 4 described above, a three-dimensional model of a face reflected in moving image data created by a camera is generated, and the generated three-dimensional model is rotated by a predetermined angle and then again. A technique of obtaining a two-dimensional image is known. However, in order to generate a three-dimensional model of a face from moving image data, generally so-called stereo imaging is performed using two cameras, or, if imaging is performed by one camera, imaging is performed by that camera. A large number of still images forming the moving image data to be recorded must include depth data. These are not common as cameras, and the technology that compels the user to prepare such uncommon hardware is extremely difficult to spread. For example, modern laptop personal computers, smartphones, tablets, and other computers have built-in cameras, and webcams and other cameras used in combination with desktop personal computers have also become widespread. .. Unless it is a stereo camera, moving image data including depth data cannot be created, and at least it is not suitable for practical use or widespread unless it is a technology that can be applied to such widespread cameras.
 本願発明は、主にテレビ会議システムで一般的なカメラと組合せて利用可能な、目前のディスプレイに映し出された動画中の顔の向き、或いは視線について感じる違和感を低減させることのできる、安価で且つ遅延の生じにくい技術を提供することをその課題とする。 INDUSTRIAL APPLICABILITY The invention of the present application, which can be used mainly in combination with a general camera in a video conference system, can reduce a sense of discomfort felt about a direction of a face or a line of sight in a moving image displayed on a display in front, is inexpensive, and It is an object of the present invention to provide a technology that does not easily cause delay.
 上述の課題を解決するために、本願発明者は研究を重ねた。その結果、以下のような知見を得た。
 上述したように、テレビ会議に参加する遠隔地にいる両参加者が、目前のディスプレイに映し出された動画中の相手側の参加者の目線或いは顔の向きについて感じる違和感が生じる原因は、上述した一方の参加者の側で作られる動画像データ、或いは一方の参加者の側で動画像データを作る一方のカメラの位置に問題があるからである。
 ところで、仮に、上述の例において一方の参加者の前にある一方のディスプレイの全体が透明であり、一方のディスプレイの背後に一方のカメラが存在するのであれば、一方のカメラは、テレビ会議中において一方のディスプレイを正面から見ることになる一方の参加者の顔を基本的に正面から捉えることになる。もしそうなのであれば、一方のカメラが生成した動画像データを受け取った他方の参加者の前にある他方のディスプレイに映し出される動画像中に映し出される一方の参加者の目線或いは顔は、他方の参加者の方向を向く。しかしながら実際のところ、一方のディスプレイは通常その全体が透明ではないから、一方のカメラは一方のディスプレイの周囲のどこかに配置されることになる。
 とはいえ、一方のカメラで作られた動画像データを、当該動画像データに基づく動画が、ディスプレイの背後(ディスプレイの内部を含む、以下同じ。)の仮想位置に存在する仮想のカメラで撮像されたように補正することは少なくとも理論上は可能である。そしてそのような補正の行われた動画像データに基づく動画中に含まれる一方の参加者の顔についての顔画像は基本的に正面を向くのであるから、他方のディスプレイに映し出された他方の参加者に対して与える違和感を小さく抑えることができる。
 本願発明は、このような知見に基づく。
In order to solve the above-mentioned subject, the inventor of the present application has conducted extensive research. As a result, the following findings were obtained.
As described above, the causes of discomfort that both participants in a remote place participating in the video conference feel about the line of sight or the face direction of the participant on the other side in the video displayed on the display in front are explained above. This is because there is a problem in the moving image data created by one participant, or the position of one camera that creates moving image data by one participant.
By the way, in the above example, if one display in front of one participant is entirely transparent and one camera is behind the other display, then one camera is in a video conference. In, one will see one display from the front, and basically the face of one participant will be captured from the front. If so, the line of sight or face of one participant displayed in the moving image displayed on the other display in front of the other participant receiving the moving image data generated by one camera is Face the direction of the participants. In practice, however, one display is usually not entirely transparent, so one camera would be placed somewhere around one display.
However, moving image data created by one of the cameras is captured by a virtual camera in which a moving image based on the moving image data exists at a virtual position behind the display (including the inside of the display, the same applies below). It is possible, at least in theory, to correct as described. Since the face image of the face of one participant included in the video based on the moving image data corrected in such a manner basically faces the front, the participation of the other displayed on the other display. It is possible to suppress a feeling of strangeness given to a person.
The present invention is based on such knowledge.
 本願発明は、動画を撮像することのできるものであり、所定の位置である実位置に存在する所定の1つのカメラで1人の被撮像者の顔である対象顔を撮像することにより得られる、二次元の静止画像についてのデータである連続する多数の静止画像データによって構成される動画像のデータである動画像データを受付ける動画像データ受付部と、前記動画像データに含まれる静止画像データのうちの少なくとも複数のそれぞれを、正面を向いた前記対象顔から正面方向に伸びる仮想の直線上の所定の位置である仮想位置に前記カメラが存在する場合において前記カメラによって撮像される二次元の静止画像である変換静止画像のデータである変換静止画像データに変換することにより、連続する多数の変換静止画像データによって構成される動画像のデータである変換動画像データを生成する変換動画像データ生成部と、前記変換動画像データ生成部によって生成された前記変換動画像データを出力する動画像データ出力部と、を備えている、画像処理装置である。
 そして、この画像処理装置における前記変換動画像データ生成部は、前記動画像データに含まれる静止画像データのうちの少なくとも複数のそれぞれから、前記静止画像データによって特定される前記静止画像に写り込んだ前記対象顔のうちの顔面部分の三次元モデルを、多数の顔の機械学習によって得られた顔の三次元モデルを推定する変換アルゴリズムを用いて生成する三次元モデル生成部と、前記三次元モデル生成部で複数生成された前記三次元モデルをそれぞれ一定の角度である回転角分回転させる処理を行う三次元モデル回転部と、前記三次元モデル回転部で回転させられた前記三次元モデルのそれぞれに基づいて、前記変換静止画像データを生成する二次元画像生成部と、を備えている。
INDUSTRIAL APPLICABILITY The present invention is capable of capturing a moving image, and can be obtained by capturing a target face, which is the face of one image-captured person, with a predetermined one camera existing at a predetermined actual position. A moving image data receiving unit that receives moving image data that is data of a moving image composed of a large number of continuous still image data that is data about a two-dimensional still image; and still image data included in the moving image data. At least a plurality of each of the two-dimensional image taken by the camera when the camera is present at a virtual position that is a predetermined position on a virtual straight line extending in the front direction from the target face facing front. Converted still image data, which is a still image, is converted into converted still image data. A conversion moving image data generation unit that generates conversion moving image data that is image data, and a moving image data output unit that outputs the conversion moving image data generated by the conversion moving image data generation unit are provided. , An image processing device.
Then, the converted moving image data generation unit in this image processing apparatus reflects the at least a plurality of still image data included in the moving image data on the still image specified by the still image data. A three-dimensional model generation unit that generates a three-dimensional model of the face portion of the target face using a conversion algorithm that estimates the three-dimensional model of the face obtained by machine learning of a large number of faces, and the three-dimensional model. Each of the three-dimensional model rotating unit that performs a process of rotating the plurality of three-dimensional models generated by the generating unit by a rotation angle that is a constant angle, and the three-dimensional model that is rotated by the three-dimensional model rotating unit. And a two-dimensional image generation unit that generates the converted still image data based on the above.
 本願発明におけるカメラは、1つである。また、本願発明におけるカメラは、一般的なカメラであって、動画像データを構成する静止画像データにデプスデータを含まない。カメラは画像処理装置に一体でも良いが、別体でもよい。例えば、画像処理装置が、背景技術で述べたコンピュータ(例えば、カメラを有さないデスクトップ型のコンピュータ)により構成される場合、カメラは画像処理装置とは別体となる。その場合のカメラは、例えば、公知或いは周知のウェブカメラそのものであってもよい。画像処理装置としてのコンピュータとは別体であるこの場合におけるカメラは、画像処理装置としてのコンピュータに有線或いは無線で接続されることになる。また、公知或いは周知のラップトップ型のパーソナルコンピュータや、スマートフォン、タブレット等のコンピュータの多くは、一体型のカメラを備えている。そのようなカメラ一体型のコンピュータにより画像処理装置が構成される場合には、カメラは画像処理装置に含まれることになる。もっともこの場合には、正確には、コンピュータのうちカメラを除いた部分が本願発明でいう画像処理装置であるということになる。また、従来のウェブカメラに、本願発明における画像処理装置を搭載することも可能である。この場合においては、ウェブカメラのうちカメラを除いた部分が本願発明でいう画像処理装置であるということになる。
 カメラは、所定の位置である実位置に存在する。実位置は、例えば、画像処理装置であるコンピュータにディスプレイが接続されているのであれば、一般的にはディスプレイの周囲の所定の場所である。また、画像処理装置であるコンピュータが、例えば、ラップトップ型のパーソナルコンピュータ、スマートフォン、タブレットである場合には、一般的には、コンピュータと一体とされたディスプレイの上方における所定の位置にカメラが取付けられているが、その位置がその場合におけるカメラの実位置となる。本願発明の画像処理装置がウェブカメラのような概観を呈しているのであれば、それが取付けられた位置がカメラの実位置となる。いずれにせよ、実位置にあるカメラは、1人の被撮像者の顔である対象顔を撮像するようになっている。カメラは動画を撮像することができ、動画についての動画像データを生成するようになっている。カメラで生成される動画像データは、一般的なものであり、例えば、MJPEGデータである。本願発明における動画像データは、二次元の静止画像についてのデータである連続する多数の静止画像データによって構成される動画像のデータであるが、これは極一般的な動画像データである。
 画像処理装置は、カメラで生成された動画像データをカメラから受取る動画像データ受付部を備えている。動画像データ受付部は、画像処理装置とカメラが別体であり、画像処理装置とカメラが有線で接続される場合においては一般に、カメラとの有線での接続を実現することのできる、画像処理装置に設けられた入力端子となるであろう。動画像データ受付部は、画像処理装置とカメラが別体であり、画像処理装置とカメラが無線で接続される場合においては一般に、カメラとの無線での通信を実現することのできる、画像処理装置に設けられた受信装置となるであろう。動画像データ受付部は、画像処理装置とカメラが一体である場合には一般に、カメラとの接続を実現する、画像処理装置内に設けられたインターフェイスとなるであろう。
 本願発明における画像処理装置は、変換動画像データ生成部を備えている。変換動画像データ生成部は、動画像データに含まれる静止画像データのうちの少なくとも複数のそれぞれを、変換静止画像データに変換する。上述したように画像処理装置がカメラから受取る動画像データ或いはそれに含まれる静止画像データは、実位置にあるカメラによって生成されたものであり、それらに基づく動画或いは静止画像には実位置から写した対象顔が含まれる。対して変換静止画像データは、静止画像データに基づいて、或いは静止画像データを変換して生成されるものであり、正面を向いた(ユーザが自然な体勢を取った)ときの対象顔から正面方向に伸びる仮想の直線上の所定の位置である仮想位置にカメラが存在する場合においてカメラによって撮像される二次元の静止画像である変換静止画像のデータである。つまり、変換静止画像に含まれる対象顔は、ユーザの顔の正面である仮想位置から写した場合における対象顔となる。ここで、カメラの仮想位置は固定であり、カメラの実位置と仮想位置との相対的な位置関係は一定であるから、画像処理装置がカメラから受取った静止画像データを変換静止画像データに変換する処理は基本的に、かかる変換の対象となるすべての静止画像データについて同じになる。したがって個々に異なる処理を行う場合に比較して静止画像データを変換静止画像データに変換する処理は「軽い」ものとなるので、かかるデータの変換が動画の遅延の原因となりにくい。そして、変換動画像データ生成部で次々生成される変換静止画像データを連ねたものが変換動画像データとなる。
 静止画像データは、動画を構成する静止画像(いわゆるフレーム)のデータである。画像処理装置がカメラから受取ったすべての静止画像データから変換静止画像データを生成しても構わないが、そうすると動画像の遅延を生じるおそれがある。したがって、遅延が生じないことに重きを置くのであれば、変換静止画像データへの変換の対象とする静止画像データを、動画像データに含まれる静止画像データのうちの、例えば、2つおき或いは3つおき(2フレームおき或いは3フレームおき)の静止画像データとすることができる。そうすると、変換動画像データのフレーム数或いはフレームレート(1秒あたりの変換動画像データに含まれる変換静止画像データの数)は、動画像データのフレーム数(1秒あたりの動画像データに含まれる静止画像データの数)よりも小さくなるが、変換動画像データのフレーム数が少なくとも10fps程度であれば、変換動画像データによる動画は一応動画として通用する。もちろん、変換の対象となる静止画像データは、2つおきとか3つおきとかの一定の数おきの静止画像データである必要はない。
 そして、この画像処理装置は、動画像データ出力部を備えている。動画像データ出力部は、変換動画像データ生成部で生成された変換動画像データを出力する機能を有している。変換動画像データは、例えば、画像処理装置から他の装置へと出力される。変換動画像データの出力先となる他の装置は、画像処理装置と有線又は無線によって直接接続された装置(例えばディスプレイ)であったり、画像処理装置とネットワーク(或いはネットワークと他の装置)を介して接続された装置(他の画像処理装置に接続されたディスプレイ)であったりする。画像処理装置がディスプレイを備えているのであれば、変換動画像データの出力先は画像処理装置が備えるディスプレイである場合もあり得る。また、画像処理装置がカメラと一体であってその概観が一般的なウェブカメラの概観を呈しているのであれば、画像処理装置としてのウェブカメラを従来のテレビ会議システムにおけるウェブカメラと同様に用いると、テレビ会議システムにおけるコンピュータに入力される動画像データを当初から変換動画像データとすることができる。
 以上のように生成された変換動画像データに含まれる変換静止画像データのそれぞれに基づく変換静止画像中の対象顔は、上述のように、対象顔の正面に位置する仮想位置にあるカメラで撮像した場合における対象顔の向きと同等のものとなる。したがって、上述の画像処理装置で生成された変換動画像データに基づく動画が何らかのディスプレイに映し出された場合、ディスプレイに映し出された対象顔は基本的に正面を向いた状態となる。したがって、本願の画像処理装置の用途はテレビ会議に制限されるものではないが、例えばテレビ会議にそれが応用された場合には、上述の画像処理装置で生成された変換動画像データに基づく動画が相手側のディスプレイに表示された場合において、相手側が対象顔中の目線或いは対象顔の向きについて感じる違和感を低減させることができる。また、かかる技術は、カメラ、ディスプレイ等のハードウェアに対する工夫を必要とせず、例えば、一般的なコンピュータにソフトウェアを組み合せるだけでも実現可能であるから比較的安価である。また、かかる技術は、上述のように画一的な画像処理を繰り返すものであり、またすべての静止画像データに対して画像処理を行う必要が必ずしもないものであるから、動画の遅延の問題を生じにくい。
 加えて、この発明によって得られる変換動画像データに基づく動画像に映った対象顔は、対象顔がディスプレイに表示された場合において、対象顔の持ち主が自然な体勢を取っているときには目線も含めて正面を向いているが、対象顔の持ち主が対象顔を回転させたり目線を動かせば、それに応じてディスプレイに表示された対象顔も回転したり目線が動くことになる。本願発明では、仮想位置にカメラが位置する場合において撮像される動画像である変換動画像をディスプレイに表示するのみであるから、対象顔の動きや視線の動きに基づいて対象顔の映った画像を変換する場合と比較して、ディスプレイに表示される対象顔に不自然さが生じることがない。
 前記画像処理装置は、所定のネットワークを介して通信可能とされているとともに2つ対にして用いられるものであり、前記画像処理装置の一方で生成された前記変換動画像データは、前記ネットワークを介して前記画像処理装置の他方へ双方向で送られるようになっていてもよい。こうすることにより、従来と同様のテレビ会議を実現することができる。
 なお、本願発明における画像処理装置の用途は、テレビ会議システムに限らない。例えば、自撮り(セルフィー)で撮像した自分の顔の動画を自分が所有するスマートフォン、タブレット、デスクトップ型の或いはラップトップ型のコンピュータ等のディスプレイで見たときに、自分の顔の方向が正面を向いておらず、或いは自分の視線が正面を向いていないため違和感が生じる、という課題も知られている。そのような課題も、本願発明による画像処理装置によって解決することができる。この場合には当然に、画像処理装置によって動画像データから作られた変換動画像データは、他人が有するコンピュータ等に送られる必要はない。
The number of cameras in the present invention is one. The camera according to the present invention is a general camera, and the still image data forming the moving image data does not include depth data. The camera may be integrated with the image processing apparatus or may be a separate body. For example, when the image processing device is configured by the computer described in the background art (for example, a desktop computer without a camera), the camera is separate from the image processing device. The camera in that case may be, for example, a known or known webcam itself. The camera in this case, which is separate from the computer as the image processing apparatus, is connected to the computer as the image processing apparatus by wire or wirelessly. In addition, many known or well-known laptop personal computers and computers such as smartphones and tablets have an integrated camera. When the image processing apparatus is configured by such a camera-integrated computer, the camera is included in the image processing apparatus. In this case, however, to be exact, the part of the computer excluding the camera is the image processing device according to the present invention. Further, it is possible to mount the image processing device of the present invention on a conventional web camera. In this case, the part of the web camera excluding the camera is the image processing device according to the present invention.
The camera exists in a real position which is a predetermined position. The actual position is generally a predetermined location around the display, for example, if the display is connected to a computer that is an image processing apparatus. When the computer that is the image processing apparatus is, for example, a laptop personal computer, a smartphone, or a tablet, the camera is generally attached at a predetermined position above the display integrated with the computer. However, that position is the actual position of the camera in that case. If the image processing apparatus of the present invention has a web camera-like appearance, the position where it is attached is the actual position of the camera. In any case, the camera in the actual position captures the target face, which is the face of one image-captured person. The camera can capture a moving image and generates moving image data for the moving image. The moving image data generated by the camera is general data, for example, MJPEG data. The moving image data in the invention of the present application is data of a moving image composed of a large number of continuous still image data which is data about a two-dimensional still image, and this is very general moving image data.
The image processing apparatus includes a moving image data receiving unit that receives moving image data generated by the camera from the camera. When the image processing device and the camera are separate bodies and the image processing device and the camera are connected by wire, the moving image data reception unit can generally realize wired connection with the camera. It will be the input terminal provided on the device. When the image processing device and the camera are separate units and the image processing device and the camera are wirelessly connected to each other, the moving image data accepting unit can generally realize wireless communication with the camera. It will be the receiving device provided in the device. When the image processing device and the camera are integrated, the moving image data receiving unit will generally be an interface provided in the image processing device for realizing connection with the camera.
The image processing apparatus according to the present invention includes a conversion moving image data generation unit. The converted moving image data generation unit converts at least a plurality of still image data included in the moving image data into converted still image data. As described above, the moving image data received from the camera by the image processing apparatus or the still image data included in the moving image data is generated by the camera in the actual position, and the moving image or the still image based on them is copied from the actual position. The target face is included. On the other hand, the converted still image data is generated based on the still image data or by converting the still image data, and the target face to the front when the user faces the front (the user takes a natural posture). It is data of a converted still image which is a two-dimensional still image captured by the camera when the camera exists at a virtual position which is a predetermined position on a virtual straight line extending in the direction. That is, the target face included in the converted still image is the target face when the image is taken from the virtual position that is the front of the user's face. Here, since the virtual position of the camera is fixed and the relative positional relationship between the actual position and the virtual position of the camera is constant, the still image data received from the camera by the image processing device is converted into still image data. The process to be performed is basically the same for all still image data that are the target of such conversion. Therefore, the process of converting still image data into converted still image data is "lighter" than in the case of individually performing different processes, and thus the conversion of such data is less likely to cause a delay in a moving image. The converted moving image data is a series of converted still image data generated one after another by the converted moving image data generation unit.
The still image data is data of a still image (so-called frame) forming a moving image. The image processing apparatus may generate the converted still image data from all the still image data received from the camera, but doing so may cause a delay in the moving image. Therefore, if emphasis is placed on not causing a delay, the still image data to be converted to the converted still image data is set to, for example, every two or more of the still image data included in the moving image data. Every three (every two frames or every three frames) still image data can be used. Then, the number of frames or frame rate of the converted moving image data (the number of converted still image data included in the converted moving image data per second) is included in the number of frames of the moving image data (included in the moving image data per second). Although it is smaller than the number of still image data), if the number of frames of the converted moving image data is at least about 10 fps, the moving image based on the converted moving image data can be used as a moving image. Of course, the still image data to be converted does not have to be a fixed number of still image data such as every two or three.
Then, the image processing device includes a moving image data output unit. The moving image data output unit has a function of outputting the converted moving image data generated by the converted moving image data generation unit. The converted moving image data is output from the image processing device to another device, for example. The other device as the output destination of the converted moving image data is a device (such as a display) directly connected to the image processing device by wire or wirelessly, or via the image processing device and the network (or the network and other device). Device (a display connected to another image processing device). If the image processing device includes a display, the output destination of the converted moving image data may be the display included in the image processing device. Further, if the image processing device is integrated with the camera and its appearance is that of a general web camera, the web camera as the image processing device is used similarly to the web camera in the conventional video conference system. The moving image data input to the computer in the video conference system can be used as the converted moving image data from the beginning.
As described above, the target face in the converted still image based on each of the converted still image data included in the converted moving image data generated as described above is captured by the camera at the virtual position in front of the target face. In this case, the orientation is the same as the orientation of the target face. Therefore, when a moving image based on the converted moving image data generated by the above-described image processing apparatus is displayed on some display, the target face displayed on the display is basically in a front facing state. Therefore, the application of the image processing device of the present application is not limited to a video conference, but when it is applied to a video conference, for example, a moving image based on the converted moving image data generated by the image processing device described above is used. When is displayed on the display of the other party, it is possible to reduce the discomfort that the other party feels about the line of sight in the target face or the direction of the target face. Further, such a technique is relatively inexpensive because it does not require any ingenuity in hardware such as a camera and a display, and can be realized, for example, only by combining a general computer with software. Further, such a technique repeats uniform image processing as described above, and since it is not always necessary to perform image processing on all still image data, the problem of moving image delay is eliminated. Unlikely to occur.
In addition, the target face reflected in the moving image based on the converted moving image data obtained by the present invention includes the line of sight when the target face has a natural posture when the target face is displayed on the display. When the owner of the target face rotates the target face or moves the line of sight, the target face displayed on the display also rotates or the line of sight moves accordingly. In the present invention, since only the converted moving image, which is a moving image captured when the camera is located at the virtual position, is displayed on the display, an image showing the target face based on the movement of the target face or the movement of the line of sight is displayed. Compared with the case of converting, the target face displayed on the display is not unnatural.
The image processing device is configured to be communicable via a predetermined network and used in pairs, and the converted moving image data generated by one of the image processing devices is transmitted via the network. It may be configured to be bidirectionally sent to the other of the image processing devices via the image processing device. By doing so, a video conference similar to the conventional one can be realized.
The application of the image processing device in the present invention is not limited to the video conference system. For example, when you watch a video of your face taken by selfie (selfie) on the display of your own smartphone, tablet, desktop or laptop computer, the direction of your face is It is also known that there is a sense of discomfort because the person is not facing or his or her line of sight is not facing the front. Such a problem can also be solved by the image processing device according to the present invention. In this case, naturally, the converted moving image data created from the moving image data by the image processing device does not need to be sent to a computer or the like owned by another person.
 上述したように、本願発明における画像処理装置はまた、変換動画像データ生成部を備えている。そして、これも上述したように、その前記変換動画像データ生成部は、前記動画像データに含まれる静止画像データのうちの少なくとも複数のそれぞれから、前記静止画像データによって特定される前記静止画像に写り込んだ前記対象顔のうちの顔面部分の三次元モデルを、多数の顔の機械学習によって得られた顔の三次元モデルを推定する変換アルゴリズムを用いて生成する三次元モデル生成部と、前記三次元モデル生成部で複数生成された前記三次元モデルをそれぞれ一定の角度である回転角分回転させる処理を行う三次元モデル回転部と、前記三次元モデル回転部で回転させられた前記三次元モデルのそれぞれに基づいて、前記変換静止画像データを生成する二次元画像生成部と、を備えている。
 三次元モデル生成部は、動画像データに含まれる静止画像データのうちの少なくとも複数のそれぞれから、静止画像データによって特定される静止画像に写り込んだ対象顔のうちの顔面部分の三次元モデルを生成するものである。三次元モデル及びその生成方法は、多数の顔の機械学習によって得られた顔の三次元モデルを推定する変換アルゴリズムを用いて行われる。近年、顔が写り込んだ一般的な二次元の静止画像1つから(言い換えれば、一枚の顔写真のデータから)その静止画像に写り込んでいる顔の顔面部分の三次元モデルを自動的に作る技術が開発された。かかる技術では、様々な人間の顔を様々な角度から撮像することによって生成された人間の顔の写り込んだ多数の二次元の静止画像をサンプルとしてコンピュータによって機械学習させることによって生成された、ある静止画像から人間の顔面の三次元モデルを生成するアルゴリズムである変換アルゴリズムが使用される。この技術では、その変換アルゴリズムを用いて、静止画像データによって特定される静止画像に写り込んだ対象顔のうちの顔面部分の三次元モデルを、自動的に生成する。ここで、顔面部分とは、人間の頭部のうち、概ね、耳より前で額より下の部分を意味する。
 顔が写り込んだ一般的な二次元の静止画像1つからその静止画像に写り込んでいる顔の顔面部分の三次元モデルを自動的に作る近年開発された上述の技術は世間に興味深い技術として認識されている。しかしながら、この技術は面白いとは認識されてはいるものの、その実用的な用途は、今のところ殆ど無い。本願発明は、かかる技術の実用的な用途を提案するものである。上述の変換アルゴリズムは、対象顔のうち少なくとも顔面部分の三次元モデルを生成するものであるが、三次元モデルを生成する場合に用いられる元となる二次元の静止画像は、ステレオカメラで撮影されたデータである必要もないし、また、デプスデータを含んでいる必要もない。つまり、本願発明の画像処理装置と組合せて用いられるカメラは、ごく一般的なもので良いということになる。
 三次元モデルは、上述の方法で作られるものであれば良く、例えばワイヤーフレームモデルである。三次元モデル生成部は、動画像データを構成する静止画像データのうちの少なくとも複数に基づいて三次元モデルを生成する。この「少なくとも複数の静止画像データ」は、上述した変換の対象となる静止画像データである。
 三次元モデル回転部は、三次元モデル生成部で複数生成された三次元モデルをそれぞれ一定の角度である回転角分回転させる処理を行うものである。これは、三次元モデルで特定させる顔面の向きを、仮想位置におけるカメラの方に向ける処理に相当する。
 二次元画像生成部は、三次元モデル回転部で回転させられた三次元モデルのそれぞれに基づいて、変換静止画像データを生成する。つまり、二次元画像生成部は、三次元モデルから再度二次元の静止画像についてのデータを作ることにより、変換静止画像についての変換静止画像データを生成する。
 三次元モデル回転部で三次元モデルを回転させる場合の角度(もちろん回転の向きも含む。)は、カメラの実位置と仮想位置との相対的な位置関係が一定であるから、どの静止画像データに基づいてなされる処理においても一定となる。したがって、画像処理の対象となる各静止画像データに対して三次元モデル生成部、三次元モデル回転部、及び二次元画像生成部で行われる処理は、いずれの静止画像データに基づいて処理が行われる場合においても同じとなる。これも、動画の遅延の問題が生じにくい理由の一つとなる。
As described above, the image processing device according to the present invention also includes the converted moving image data generation unit. Then, as described above, the converted moving image data generation unit converts the at least a plurality of still image data included in the moving image data into the still image specified by the still image data. A three-dimensional model generation unit that generates a three-dimensional model of the face part of the target face reflected by using a conversion algorithm that estimates a three-dimensional model of the face obtained by machine learning of a large number of faces, and A three-dimensional model rotation unit that performs a process of rotating each of the plurality of three-dimensional models generated by the three-dimensional model generation unit by a rotation angle that is a constant angle, and the three-dimensional model rotated by the three-dimensional model rotation unit. A two-dimensional image generation unit that generates the converted still image data based on each of the models.
The three-dimensional model generation unit, from at least a plurality of each of the still image data included in the moving image data, a three-dimensional model of the face part of the target face reflected in the still image specified by the still image data. To generate. The three-dimensional model and its generation method are performed using a conversion algorithm that estimates a three-dimensional model of a face obtained by machine learning of many faces. In recent years, a three-dimensional model of the facial part of a face reflected in a still image from one general two-dimensional still image (in other words, from the data of a single face photograph) is automatically generated. The technology to make was developed. According to such a technique, a large number of two-dimensional still images including human faces, which are generated by imaging various human faces from various angles, are sample-machined by a computer. A transformation algorithm, which is an algorithm for generating a three-dimensional model of a human face from a still image, is used. In this technique, the conversion algorithm is used to automatically generate a three-dimensional model of the face portion of the target face reflected in the still image specified by the still image data. Here, the face portion means a portion of the human head, which is generally in front of the ears and below the forehead.
The above-mentioned technology developed in recent years is an interesting technology in the world, which automatically creates a three-dimensional model of the face part reflected in the still image from one general two-dimensional still image in which the face is reflected. Is recognized. However, although this technique has been recognized as interesting, it has few practical uses so far. The present invention proposes a practical application of such a technique. The conversion algorithm described above is for generating a three-dimensional model of at least the face part of the target face, and the two-dimensional still image that is the source used when generating the three-dimensional model is captured by a stereo camera. Data need not be included and depth data need not be included. That is, the camera used in combination with the image processing apparatus of the present invention may be a general camera.
The three-dimensional model may be any one as long as it is created by the above method, and is, for example, a wire frame model. The three-dimensional model generation unit generates a three-dimensional model based on at least a plurality of still image data forming the moving image data. This "at least a plurality of still image data" is still image data that is the target of the above-mentioned conversion.
The three-dimensional model rotation unit performs a process of rotating a plurality of three-dimensional models generated by the three-dimensional model generation unit by a rotation angle that is a constant angle. This corresponds to the process of directing the face orientation specified by the three-dimensional model toward the camera at the virtual position.
The two-dimensional image generation unit generates converted still image data based on each of the three-dimensional models rotated by the three-dimensional model rotation unit. That is, the two-dimensional image generation unit generates converted still image data about the converted still image by creating data about the two-dimensional still image again from the three-dimensional model.
The angle (including, of course, the direction of rotation) when the 3D model is rotated by the 3D model rotation unit depends on which static image data is obtained because the relative positional relationship between the real position and the virtual position of the camera is constant. It is also constant in the processing performed based on. Therefore, the processing performed by the three-dimensional model generation unit, the three-dimensional model rotation unit, and the two-dimensional image generation unit for each still image data that is the target of image processing is performed based on which still image data. It is the same even when it is said. This is also one of the reasons why the problem of video delay is unlikely to occur.
 前記三次元モデル生成部は、前記静止画像データによって特定される前記静止画像に写り込んだ前記対象顔のうちの顔面部分を抜出して前記三次元モデルを生成するとともに、前記静止画像のうちの前記対象顔の顔面部分以外の部分の二次元の静止画像についてのデータである背景画像データを生成するようになっており、前記二次元画像生成部は、前記三次元モデル回転部で回転させられた前記三次元モデルを二次元化したデータである顔面画像データを、前記背景画像データにおける前記対象顔のうちの顔面部分に貼り込むことにより、前記変換静止画像データを生成するようになっていてもよい。
 これは、変換静止画像データを生成するための元となる静止画像データによって特定される静止画像のうち、対象顔の顔面部分のデータのみを三次元的に扱い、対象顔の顔面部分を除いた他の部分をそのまま二次元的に扱う、ということを意味する。つまり、三次元モデル生成部は、静止画像に写り込んだ対象顔の顔面部分を認識して、その部分を抜出して三次元モデルを生成し、他の部分(例えば、対象顔の耳や頭髪、或いは対象顔の持ち主の背後の背景)を二次元の静止画像としてそのまま残す。そして、三次元モデル回転部がその三次元モデルを回転させ、次いで、二次元画像生成部が三次元モデル回転部によって回転させられた三次元モデルを二次元の画像に変換し、その画像を、三次元モデル生成部が生成した対象顔の顔面部分が抜出された後の静止画像の対象顔のうち顔面部分が抜出された部分に貼り込む。変換静止画像データをこのような簡単な処理により生成することにより、動画の遅延の問題が更に生じにくくなる。もっとも、このような処理を行った場合、二次元画像生成部が三次元モデル回転部によって回転させられた三次元モデルを二次元の画像に変換することによって生成された顔面の二次元の静止画像と、三次元モデル生成部が生成した対象顔の顔面部分が抜出された後の静止画像とは、必ずしも正確に一致しない。それは、変換静止画像データによって特定される静止画像中に含まれる対象顔に多少の不自然さが生じる可能性を示唆する。しかしながら、本願発明者の研究によれば、変換静止画像データを連ねた変換動画像データに基づく動画を見た者が感じる違和感は、動画像中の対象顔の向きがあらぬ方向を向いている場合に比して遥かに小さかった。これは、その機序は詳しくは不明であるが、ある者が顔を認識する場合に脳が認識の対象となる人間の目を中心に認識するからであり、目が正しくその者の方を向いているのであれば、それ以外の不自然さを認識しないからだと考えられる。このような脳の機能により、上述のごとき変換静止画像の生成の仕方を採用しても、本願発明の効果は十分なものとなる。少なくとも、対象顔の回転角が15度内外かそれ以下の場合には、変換動画像データに基づく動画を見た者が感じる違和感は実用上問題とならない程度に小さい。
 とはいえ、前記三次元モデル生成部は、前記静止画像のうちの前記対象顔の顔面部分以外の部分の静止画像に二次元的な所定の画像処理を行ってから当該静止画像についての前記背景画像データを生成するようになっており、それにより、前記二次元画像生成部が、前記顔面画像データを、前記背景画像データにおける前記対象顔のうちの顔面部分に貼り込む際に、前記顔面画像データと前記対象顔のうちの顔面部分との縁部分がより一致するようになっていてもよい。二次元的な画像処理とは、静止画像に写っている被写体の三次元モデル化を伴わない画像処理を意味する。例えば、対象顔の顔面部分の三次元モデルを回転させた場合、その例えば縦方向の見かけ上の長さが変わることがある。そのような見かけ上の長さの変更に対応して、三次元モデル生成部は、前記対象顔の顔面部分以外の部分の静止画像に縦方向の長さの変更(拡大、或いは縮小)の処理を行うことができる。二次元的な画像処理の例としては、上述の如き1方向における画像の拡縮の他、2方向における画像の拡縮、回転等があり得る。このようにすれば、脳には殆ど認識されない、変換静止画像中の対象顔に生じる可能性のある上述の如き不自然さをより小さくすることができる。もっとも、静止画像のうちの前記対象顔の顔面部分以外の部分の静止画像に、そのような処理を加えることは必須ではない。
 前記三次元モデル回転部は、所定の点を中心として前記三次元モデルを回転させるようになっていてもよい。上述のように、三次元モデル回転部は三次元モデルを回転させる。三次元モデルを回転させるための処理としては、三次元モデルをある軸(例えば、両耳を貫く水平な直線、或いは、頭蓋の平面視した場合の中心を鉛直に貫く直線、或いはそれら直線の双方が軸となり得る。)周りに回転させる処理もあり得る。これら処理は、事実上、ロール、ヨー、ピッチの回転を行う処理となる。しかしながら、ロール、ヨー、ピッチによる回転の処理を行うには、それら3種の回転の軸とそれら軸が交わる原点を求めることが必要となるから、三次元モデルの中で耳、或いは頭蓋の平面視した場合の中心の位置を検出してその座標を特定する処理が必要となる。三次元モデルが存在する仮想の空間の中のある点(仮想の点であって、三次元モデルの内部に位置するか否かを問わない。)を中心として三次元モデルを回転させることにより、三次元モデルを、顔面の立体形状をもした単なる塊として扱うことが可能となり、三次元モデルに対について、或いは静止画像中の対象顔についてのそのような処理を省略することが可能となる。つまり、三次元モデルや静止画像において、どこが目でどこが鼻か、といったことを検出する必要がなくなるのである。かかるある点を中心とした三次元モデルの回転は、空間座標の変換により実行することができ、三次元モデルが存在する空間自体の回転と捉えることもできる。所定の点は、例えば、1つのカメラのレンズ位置とすることができる。カメラが画像処理装置に一体であるか否かを問わず、画像処理装置に対するカメラの位置が決まっているのであれば、カメラのレンズ位置を所定の点とすることで、所定の点の位置を決定することが容易になる。所定の点がカメラのレンズ位置であるかを問わず、所定の点を、三次元モデルが存在する仮想の空間の中の原点であるとすると、空間座標の演算が容易になる。
The three-dimensional model generation unit extracts the facial part of the target face reflected in the still image specified by the still image data to generate the three-dimensional model, and the three-dimensional model of the still image. Background image data, which is data about a two-dimensional still image of a portion other than the face portion of the target face, is generated, and the two-dimensional image generation unit is rotated by the three-dimensional model rotation unit. Even if the converted still image data is generated by pasting face image data, which is data obtained by converting the three-dimensional model into two-dimensional data, to the face portion of the target face in the background image data. Good.
This is to treat only the data of the face portion of the target face in the still image specified by the still image data that is the source for generating the converted still image data, and remove the face portion of the target face. It means that other parts are treated as they are in two dimensions. That is, the three-dimensional model generation unit recognizes the face part of the target face reflected in the still image, extracts the part to generate a three-dimensional model, and other parts (for example, ears and hair of the target face, Alternatively, the background behind the owner of the target face) is left as it is as a two-dimensional still image. Then, the three-dimensional model rotation unit rotates the three-dimensional model, then the two-dimensional image generation unit converts the three-dimensional model rotated by the three-dimensional model rotation unit into a two-dimensional image, the image, After the face portion of the target face generated by the three-dimensional model generation unit is extracted, the still image is pasted to the extracted portion of the target face of the target face. By generating the converted still image data by such a simple process, the problem of moving image delay is further reduced. However, when such processing is performed, the two-dimensional still image of the face generated by converting the three-dimensional model rotated by the three-dimensional model rotation unit into the two-dimensional image by the two-dimensional image generation unit. And the still image after the face portion of the target face generated by the three-dimensional model generation unit is not necessarily exactly matched. It suggests that some unnaturalness may occur in the target face included in the still image specified by the converted still image data. However, according to the research by the inventor of the present application, the discomfort felt by a person who sees a moving image based on the converted moving image data in which the converted still image data is connected is in a direction in which the target face in the moving image is not oriented. It was much smaller than it was. This is because the mechanism is unknown in detail, but when a person recognizes a face, the brain mainly recognizes the human eye, which is the target of recognition. If it is suitable, it is considered that it does not recognize other unnaturalness. With such a function of the brain, the effect of the present invention is sufficient even if the method of generating a converted still image as described above is adopted. At least when the angle of rotation of the target face is within 15 degrees or less, the discomfort felt by the person who views the moving image based on the converted moving image data is so small as not to be a practical problem.
However, the three-dimensional model generation unit performs two-dimensional predetermined image processing on a still image of a portion of the still image other than the face portion of the target face, and then the background of the still image. Image data is generated, whereby the two-dimensional image generation unit, when the face image data is pasted to a face portion of the target face in the background image data, the face image The edge portion between the data and the face portion of the target face may be more matched. The two-dimensional image processing means image processing that does not involve three-dimensional modeling of a subject in a still image. For example, when the three-dimensional model of the face portion of the target face is rotated, the apparent length in the vertical direction, for example, may change. In response to such an apparent length change, the three-dimensional model generation unit performs a vertical length change (enlargement or reduction) process on a still image of a part other than the face part of the target face. It can be performed. Examples of two-dimensional image processing may include image scaling in one direction as described above, image scaling in two directions, rotation, and the like. By doing so, it is possible to further reduce the above-mentioned unnaturalness which is hardly recognized by the brain and which may occur in the target face in the converted still image. However, it is not essential to add such processing to a still image of a portion of the still image other than the face portion of the target face.
The three-dimensional model rotating unit may rotate the three-dimensional model about a predetermined point. As described above, the 3D model rotating unit rotates the 3D model. The process for rotating the three-dimensional model includes a certain axis of the three-dimensional model (for example, a horizontal straight line that penetrates both ears, or a straight line that vertically penetrates the center of the skull when viewed in plan view, or both of these straight lines). Can be the axis.) There can be a process of rotating around. These processes are effectively roll, yaw, and pitch rotation processes. However, in order to perform rotation processing by roll, yaw, and pitch, it is necessary to find the axes of these three types of rotation and the origins at which these axes intersect, so in the three-dimensional model, the plane of the ear or skull. It is necessary to detect the position of the center when viewed and specify the coordinates. By rotating the three-dimensional model about a certain point in the virtual space where the three-dimensional model exists (whether or not it is a virtual point and is located inside the three-dimensional model), It is possible to treat the three-dimensional model as a mere mass having a three-dimensional shape of the face, and to omit such processing for the pair or the target face in the still image in the three-dimensional model. In other words, it is not necessary to detect where the eyes are and where the nose is in the three-dimensional model or the still image. The rotation of the three-dimensional model around a certain point can be executed by transforming the spatial coordinates, and can be regarded as the rotation of the space itself in which the three-dimensional model exists. The predetermined point can be, for example, the lens position of one camera. Regardless of whether the camera is integrated with the image processing device or not, if the position of the camera with respect to the image processing device is determined, the lens position of the camera is set as the predetermined point to determine the position of the predetermined point. Makes it easy to decide. Regardless of whether the predetermined point is the lens position of the camera, if the predetermined point is the origin in the virtual space where the three-dimensional model exists, the calculation of the spatial coordinates becomes easy.
 本願発明の画像処理装置が持つ三次元モデル回転部は、上述のように、三次元モデル生成部で複数生成された三次元モデルをそれぞれ一定の角度である回転角分回転させる処理を行う。ここで、三次元モデルを回転させるべき一定の回転角は、以下のようにして決定することができる。
 まず、前記回転角は、予め決定されていてもよい。その場合、回転角は、前記画像処理装置に記録されている。回転角は、カメラの実位置と仮想位置との相対的な位置関係により決まる。画像処理装置が、例えば、ラップトップ型のパーソナルコンピュータ、スマートフォン、タブレットであり、カメラが例えば筐体に着脱自在に取付けられている場合には、カメラの実位置は画像処理装置に対して相対的に固定されている。この場合、例えば、カメラの仮想位置を、ラップトップ型のパーソナルコンピュータ、スマートフォン、タブレットが備えるディスプレイの背後等の適当な位置と決定するのであれば、カメラの実位置と仮想位置を一意に決定できることになる。このように画像処理装置を構成する機器の仕様が当所から明らかなのであれば、通常ユーザがディスプレイと顔とをどの程度離した状態で画像処理装置としてのラップトップ型のパーソナルコンピュータ、スマートフォン、タブレットを使用するのかということを考慮すれば、回転角を予め決定することが可能である。例えば、本願発明の画像処理装置としてラップトップ型のパーソナルコンピュータ、スマートフォン、タブレット等のコンピュータを機能させるためのコンピュータプログラムは、多種多様なそれらコンピュータのそれぞれにおけるカメラの仮想位置(或いは、実位置と仮想位置との関係から把握可能な上述の回転角を特定するデータ)についてのデータ(つまりは、機種とカメラの仮想位置とを対にした、多数組のデータ)を有しているとともに、そのコンピュータにそのコンピュータプログラムがインストールされた後にそのコンピュータの機種がコンピュータプログラムの機能によって自動的に特定されるか、そのコンピュータにそのコンピュータプログラムがインストールされた後に、ユーザがそのコンピュータの機種を特定するための入力を行えるようにする機能を有していてもよい。そうすることによって、コンピュータプログラムによってそのコンピュータを本願発明の画像処理装置として機能させる場合に、機種と仮想位置との関係から、その画像処理装置に相応しい上述の回転角が自動的に決定されるようにすることが可能となる。
 他方、画像処理装置が例えばデスクトップ型のコンピュータにより構成される場合或いは画像処理装置がカメラと一体型であって且つウェブカメラと同様の概観を呈する場合であっても、カメラの配置位置(カメラの実位置)が少なくともある程度決定されているのであれば、カメラの実位置と、例えばディスプレイの背後に設定される仮想位置との相対的な位置関係は一意に決定されることになる。例えば、カメラの実位置をディスプレイの幅方向の中央の直上とし、その位置に配置した状態でカメラを使用することが予めわかっているのであれば、カメラの実位置と仮想位置との相対的な関係は一意に決定される。この場合において、ユーザがディスプレイと顔とをどの程度離した状態で画像処理装置を使用するのかということを更に考慮すれば(その距離はディスプレイの大きさにより予定されていることが多い)、回転角を予め決定することが可能である。もっとも、例えば、「ディスプレイの上下方向及び幅方向の中心から何cm上側の位置にカメラを配置し、カメラの仮想位置から何cm対象顔を離した状態でこの画像処理装置を使用せよ」という指示をユーザに知らしめるとともに、その位置を仮想位置として予め回転角を決定しておくといった手段を採用する方が、画像処理装置で生成される変換動画像データに基づく動画像中の対象顔は、正しく正面を向くという効果をより正確に得られることになる。
As described above, the three-dimensional model rotation unit included in the image processing apparatus of the present invention performs a process of rotating the plurality of three-dimensional models generated by the three-dimensional model generation unit by rotation angles that are constant angles. Here, the fixed rotation angle for rotating the three-dimensional model can be determined as follows.
First, the rotation angle may be determined in advance. In that case, the rotation angle is recorded in the image processing apparatus. The rotation angle is determined by the relative positional relationship between the actual position and the virtual position of the camera. If the image processing apparatus is, for example, a laptop personal computer, a smartphone, or a tablet, and the camera is detachably attached to the housing, the actual position of the camera is relative to the image processing apparatus. It is fixed to. In this case, for example, if the virtual position of the camera is determined to be an appropriate position such as behind a display included in a laptop personal computer, smartphone, or tablet, the actual position and virtual position of the camera can be uniquely determined. become. In this way, if the specifications of the devices that make up the image processing apparatus are clear from this place, a laptop personal computer, a smartphone, or a tablet as an image processing apparatus is usually used in a state where the user separates the display from the face. The rotation angle can be determined in advance in consideration of whether to use it. For example, a computer program for causing a computer such as a laptop personal computer, a smart phone, or a tablet to function as the image processing apparatus of the present invention has a virtual position (or a real position and a virtual position) of a camera in each of various computers. The computer has data (that is, a large number of pairs of models and virtual positions of the camera) about the above-mentioned data that specifies the rotation angle that can be grasped from the relationship with the position). The computer model is automatically identified by the function of the computer program after the computer program is installed on the computer, or the user identifies the computer model after the computer program is installed on the computer. It may have a function that allows the input. By doing so, when the computer is caused to function as the image processing apparatus of the present invention by the computer program, the rotation angle suitable for the image processing apparatus is automatically determined from the relationship between the model and the virtual position. It becomes possible to
On the other hand, even when the image processing device is configured by a desktop computer, or when the image processing device is integrated with the camera and has the same appearance as a webcam, the position of the camera (the camera If the actual position) is determined at least to some extent, the relative positional relationship between the actual position of the camera and the virtual position set behind the display, for example, will be uniquely determined. For example, if it is known in advance that the actual position of the camera is directly above the center of the width direction of the display and that the camera is used in the position, the relative position between the actual position and the virtual position of the camera is determined. The relationship is uniquely determined. In this case, if the user further considers how far the display and the face should be used from the image processing apparatus (the distance is often planned depending on the size of the display), the rotation It is possible to predetermine the corners. Of course, for example, the instruction "How many cm above the center of the display in the vertical and width directions should the camera be placed and how many cm away from the virtual position of the camera should the target face be used?" The target face in the moving image based on the converted moving image data generated by the image processing device is better if the means for notifying the user is determined and the rotation angle is determined in advance as the virtual position. The effect of correctly facing the front can be obtained more accurately.
 回転角は、このように、事前に決定されておらず、画像処理装置の使用時において画像処理装置によって決定されるようになっていても良い。例えば、画像処理装置は、変換動画像データの生成をはじめる前に、回転角を決定するようになっていてもよい。
 画像処理装置は、例えば、前記回転角を、前記動画像データ受付部によって受け付けられた前記動画像データに基づいて所定の演算を行うことにより決定するようになっていてもよい。画像処理装置は、カメラから動画像データを受取るようになっている。画像処理装置は、その三次元モデル生成部によって、動画像データから三次元モデルを生成することが可能である。したがって、三次元モデルをどれだけ回転させれば、仮想位置のカメラに対して正対したユーザの対象顔を、変換静止画像データに基づく静止画像中で正面を向けることができるかを演算によって決定することができる。その角度を回転角とするというのが、この発明である。
 画像処理装置は、また、前記回転角を決定するために必要な所定のパラメータを入力するための入力装置から前記パラメータについてのデータを受付けるための入力装置受付部を備えており、前記回転角を、前記入力装置受付部によって受け付けられた前記パラメータについてのデータに基づいて所定の演算を行うことにより決定するようになっていてもよい。画像処理装置を一般的に構成するコンピュータは、入力装置(例えば、キーボード、マウス、タッチパネル)が接続されているか、或いは一体として備えているのが通常であるから、その入力装置を用いてパラメータを入力することができる。かかる入力装置から入力されたパラメータに基づく演算によって回転角を決定する、というのがこの発明である。パラメータは例えば、ディスプレイの形状、大きさを特定する情報、カメラの実位置がどこであるか(例えば、ディスプレイの幅方向の中央におけるディスプレイの直上、ディスプレイの右上隅)を特定する情報、ディスプレイから対象顔までの距離を特定する情報等である。
 画像処理装置は、また、前記回転角を決定するために必要な所定のパラメータを検出するセンサから前記パラメータについてのデータを受付けるセンサ受付部を備えており、前記回転角を、前記センサ受付部によって受け付けられた前記パラメータについてのデータに基づいて所定の演算を行うことにより決定するようになっていてもよい。例えば、センサは、画像処理装置に接続され、ディスプレイの幅方向のいずれかの端部に設けられた公知或いは周知の測距装置である。測距装置によって得られたパラメータ(例えば、ディスプレイから対象顔までの距離)を利用して適切な回転角を決定するというのが、この発明である。センサで測定すべきパラメータは距離に限らない。センサは、カメラの実位置と仮想位置との相対的な位置関係や、カメラの仮想位置と対象顔との関係を求めるために有用なパラメータを測定するものとすることができる。
 画像処理装置における前記動画像データ出力部は、前記変換動画像データに基づく動画を表示する所定のディスプレイと接続されるようになっていてもよい。この場合における画像処理装置は、前記回転角を変更するためのデータである回転角変更データを受付ける回転角変更データ受付部を備えており、前記三次元モデル回転部は、前記回転角変更データ受付部が前記回転角変更データを受付けるたびに、前記回転角変更データ受付部によって受付けられた前記回転角変更データに基づいて、前記三次元モデルを回転させる前記回転角を変更するようになっていてもよい。この場合には、変換動画像データに基づく動画像がディスプレイに略実時間で表示される。ユーザは、ディスプレイに表示された自らの顔(対象顔)を見ながら回転角変更データを入力して、例えば少しずつ対象顔を回転させることで、ディスプレイに表示された対象顔を、対象顔が基本的に正面を向くように調節することができる。ディスプレイに表示された対象顔が基本的に正面を向くときにおいて三次元モデルを回転させた角度が、回転角として決定される。三次元モデルの回転方向は、これには限られないが、上下方向(X軸周り)と左右方向(Y軸周り)だけで良い。ユーザは、上述した如き入力装置を用いて、回転角変更データの入力が可能である。
 なお、回転角を予め決定しておかない場合において回転角を決定するための上述した4つの工夫は、当然に、必要に応じて組み合わせて使用することもできる。
As described above, the rotation angle may not be determined in advance and may be determined by the image processing device when the image processing device is used. For example, the image processing device may be configured to determine the rotation angle before starting the generation of the converted moving image data.
The image processing apparatus may be configured to determine the rotation angle by performing a predetermined calculation based on the moving image data received by the moving image data receiving unit. The image processing device is adapted to receive moving image data from a camera. The image processing apparatus can generate a three-dimensional model from moving image data by the three-dimensional model generation unit. Therefore, it is determined by calculation how much the 3D model can be rotated to face the target face of the user facing the camera at the virtual position in the still image based on the converted still image data. can do. It is the present invention that the angle is the rotation angle.
The image processing apparatus also includes an input device reception unit for receiving data about the parameter from an input device for inputting a predetermined parameter necessary for determining the rotation angle, and the rotation angle Alternatively, it may be determined by performing a predetermined calculation based on the data about the parameter accepted by the input device acceptance unit. Generally, a computer that constitutes an image processing apparatus is connected to an input device (for example, a keyboard, a mouse, a touch panel) or is provided as an integrated device. You can enter. It is the present invention that the rotation angle is determined by calculation based on the parameter input from such an input device. The parameters are, for example, information that specifies the shape and size of the display, information that specifies where the actual position of the camera is (for example, immediately above the display in the center of the width direction of the display, the upper right corner of the display), and the target from the display The information is information specifying the distance to the face.
The image processing device also includes a sensor reception unit that receives data about the parameter from a sensor that detects a predetermined parameter required to determine the rotation angle, and the rotation angle is set by the sensor reception unit. It may be determined by performing a predetermined calculation based on the received data about the parameter. For example, the sensor is a known or well-known distance measuring device that is connected to the image processing device and is provided at either end of the display in the width direction. The present invention is to determine an appropriate rotation angle by using a parameter (for example, the distance from the display to the target face) obtained by the distance measuring device. The parameter to be measured by the sensor is not limited to the distance. The sensor may measure a parameter useful for obtaining a relative positional relationship between the real position and the virtual position of the camera and a relationship between the virtual position of the camera and the target face.
The moving image data output unit in the image processing device may be connected to a predetermined display that displays a moving image based on the converted moving image data. The image processing apparatus in this case includes a rotation angle change data reception unit that receives rotation angle change data that is data for changing the rotation angle, and the three-dimensional model rotation unit receives the rotation angle change data. Every time the unit receives the rotation angle change data, the rotation angle for rotating the three-dimensional model is changed based on the rotation angle change data received by the rotation angle change data receiving unit. Good. In this case, a moving image based on the converted moving image data is displayed on the display in substantially real time. The user inputs the rotation angle change data while looking at his / her face (target face) displayed on the display, and for example, by rotating the target face little by little, the target face displayed on the display is changed to the target face. Basically it can be adjusted to face the front. The angle at which the three-dimensional model is rotated when the target face displayed on the display basically faces the front is determined as the rotation angle. The rotation direction of the three-dimensional model is not limited to this, but may be only the vertical direction (around the X axis) and the horizontal direction (around the Y axis). The user can input the rotation angle change data using the input device as described above.
Note that, of course, the above-mentioned four ideas for determining the rotation angle when the rotation angle is not determined in advance can be used in combination as required.
 動画像データ受付部は、動画像データをカメラから直接(例えば、他の装置、機器を経ないで)受取るようになっていても良い。他方、前記動画像データ受付部は、前記動画像データを所定のネットワークを介して前記カメラから受取るようになっていてもよい。この場合、画像処理装置はいわゆるクラウドコンピューティングの技術を利用するものとなる。つまり、ユーザの傍にある例えばコンピュータは、カメラから動画像データを受取り、それをネットワーク(例えば、インターネット)を介して、遠隔地にある画像処理装置に送る。画像処理装置で既に述べたような画像処理を行うことによって生成された変換動画像データを画像処理装置から、更にネットワークを介してユーザのコンピュータに返送する。ユーザの傍にあるコンピュータは、画像処理装置から受け取ったその変換動画像データを、カメラから受け取った動画像データとして利用することができる。例えば、そのコンピュータは、変換動画像データを、テレビ会議の相手側のコンピュータに、ネットワークを介して送ることができる。
 画像処理装置を、クラウドコンピューティングの技術を用いて構成することとすれば、ユーザが用いるコンピュータに画像処理に関する高いスペックが要求されることがなくなる。
 なお、クラウドコンピューティングの技術を用いた上述の画像処理装置をテレビ会議システムに応用する場合、一方の参加者のコンピュータからネットワークを介して受取った動画像データを変換して生成した変換動画像データを画像処理装置が送信する先は、一方の参加者のコンピュータではなく、他方の参加者のコンピュータであっても良い。
The moving image data receiving unit may directly receive the moving image data from the camera (eg, without passing through another device or device). On the other hand, the moving image data receiving unit may receive the moving image data from the camera via a predetermined network. In this case, the image processing device uses a so-called cloud computing technique. That is, for example, a computer near the user receives the moving image data from the camera and sends it to the image processing device at a remote place via a network (for example, the Internet). The converted moving image data generated by performing the image processing as already described in the image processing device is returned from the image processing device to the user's computer via the network. A computer near the user can use the converted moving image data received from the image processing apparatus as moving image data received from the camera. For example, the computer can send the converted moving image data to a computer on the other end of the video conference via a network.
If the image processing apparatus is configured by using the technology of cloud computing, the computer used by the user is not required to have high specifications regarding image processing.
When applying the above-mentioned image processing device using the technology of cloud computing to the video conference system, the converted moving image data generated by converting the moving image data received from the computer of one participant via the network. The destination to which the image processing apparatus transmits is not the computer of one participant but the computer of the other participant.
 本願発明者は、画像処理装置で実行される方法をも本願発明の一態様として提案する。かかる方法による効果は、本願発明による画像処理装置の効果に等しい。
 一例となるその方法は、動画を撮像することのできるものであり、所定の位置である実位置に存在する所定の1つのカメラで1人の被撮像者の顔である対象顔を撮像することにより得られる、二次元の静止画像についてのデータである連続する多数の静止画像データによって構成される動画像のデータである動画像データを受付ける動画像データ受付部を備えているコンピュータによって実行される方法である。
 その方法は、前記動画像データに含まれる静止画像データのうちの少なくとも複数のそれぞれを、正面を向いた前記対象顔から正面方向に伸びる仮想の直線上の所定の位置である仮想位置に前記カメラが存在する場合において前記カメラによって撮像される二次元の静止画像である変換静止画像のデータである変換静止画像データに変換することにより、連続する多数の変換静止画像データによって構成される動画像のデータである変換動画像データを生成する変換動画像データ生成過程と、前記変換動画像データ生成過程によって生成された前記変換動画像データを出力する動画像データ出力過程と、を含み、前記変換動画像データ生成過程では、前記動画像データに含まれる静止画像データのうちの少なくとも複数のそれぞれから、前記静止画像データによって特定される前記静止画像に写り込んだ前記対象顔のうちの顔面部分の三次元モデルを、多数の顔の機械学習によって得られた顔の三次元モデルを推定する変換アルゴリズムを用いて生成する三次元モデル生成過程と、前記三次元モデル生成過程で複数生成された前記対象顔の三次元モデルをそれぞれ一定の角度である回転角分回転させる処理を行う三次元モデル回転過程と、前記三次元モデル回転過程で回転させられた前記三次元モデルのそれぞれに基づいて、前記変換静止画像データを生成する二次元画像生成過程と、を実行する画像処理方法である。
The inventor of the present application also proposes a method executed by an image processing apparatus as one aspect of the present invention. The effect of this method is equal to the effect of the image processing apparatus according to the present invention.
As an example, the method is capable of capturing a moving image, and capturing a target face, which is the face of one image-captured person, with a predetermined one camera existing at a predetermined actual position. Is executed by a computer including a moving image data receiving unit that receives moving image data that is data of a moving image that is composed of a large number of continuous still image data that is data about a two-dimensional still image obtained by Is the way.
In the method, at least a plurality of each of the still image data included in the moving image data is placed at a virtual position which is a predetermined position on a virtual straight line extending in the front direction from the target face facing the front. Of the moving image formed by a number of continuous converted still image data by converting into converted still image data which is the data of the converted still image which is a two-dimensional still image captured by the camera when A converted moving image data generating step of generating converted moving image data that is data, and a moving image data output step of outputting the converted moving image data generated by the converted moving image data generating step, the converted moving image In the image data generation process, from at least a plurality of still image data included in the moving image data, the A three-dimensional model of the face part of the target face reflected in the still image specified by the still image data, using a conversion algorithm for estimating the three-dimensional model of the face obtained by machine learning of many faces And a three-dimensional model rotation process for performing a process of rotating the three-dimensional models of the target face generated in the three-dimensional model generation process by a rotation angle that is a constant angle, respectively. And a two-dimensional image generating step of generating the converted still image data based on each of the three-dimensional models rotated in the three-dimensional model rotating step.
 本願発明者は、画像処理装置として所定の例えば汎用のコンピュータを機能させるためのコンピュータプログラムをも本願発明の一態様として提案する。かかるコンピュータプログラムによる効果は、本願発明による画像処理装置の効果に等しく、また、本願による画像処理装置として所定のコンピュータを機能させることが可能となることもその効果である。
 一例となるそのコンピュータプログラムは、動画を撮像することのできるものであり、所定の位置である実位置に存在する所定の1つのカメラで1人の被撮像者の顔である対象顔を撮像することにより得られる、二次元の静止画像についてのデータである連続する多数の静止画像データによって構成される動画像のデータである動画像データを受付ける動画像データ受付部を備えているコンピュータに、前記動画像データに含まれる静止画像データのうちの少なくとも複数のそれぞれを、正面を向いた前記対象顔から正面方向に伸びる仮想の直線上の所定の位置である仮想位置に前記カメラが存在する場合において前記カメラによって撮像される二次元の静止画像である変換静止画像のデータである変換静止画像データに変換することにより、連続する多数の変換静止画像データによって構成される動画像のデータである変換動画像データを生成する変換動画像データ生成過程と、前記変換動画像データ生成過程によって生成された前記変換動画像データを出力する動画像データ出力過程と、を実行させるためのものであり、前記変換動画像データ生成過程では、前記動画像データに含まれる静止画像データのうちの少なくとも複数のそれぞれから、前記静止画像データによって特定される前記静止画像に写り込んだ前記対象顔のうちの顔面部分の三次元モデルを、多数の顔の機械学習によって得られた顔の三次元モデルを推定する変換アルゴリズムを用いて生成する三次元モデル生成過程と、前記三次元モデル生成過程で複数生成された前記三次元モデルをそれぞれ一定の角度である回転角分回転させる処理を行う三次元モデル回転過程と、前記三次元モデル回転過程で回転させられた前記三次元モデルのそれぞれに基づいて、前記変換静止画像データを生成する二次元画像生成過程と、を前記コンピュータに実行させるコンピュータプログラムである。
The present inventor also proposes, as one aspect of the present invention, a computer program for causing a predetermined, for example, general-purpose computer to function as an image processing apparatus. The effect of such a computer program is equal to the effect of the image processing apparatus according to the present invention, and it is also an effect that a predetermined computer can be made to function as the image processing apparatus according to the present application.
The computer program, which is an example, is capable of capturing a moving image, and captures a target face, which is the face of one image-captured person, with a predetermined one camera existing at a predetermined actual position. A computer provided with a moving image data receiving unit that receives moving image data that is data of a moving image configured by a large number of continuous still image data that is data about a two-dimensional still image obtained by In the case where the camera is present at a virtual position which is a predetermined position on a virtual straight line extending in the front direction from the target face facing the front, at least a plurality of each of the still image data included in the moving image data By converting into converted still image data which is data of converted still image which is a two-dimensional still image captured by the camera A converted moving image data generation process for generating converted moving image data that is moving image data composed of a large number of continuous converted still image data; and the converted moving image data generated by the converted moving image data generation process. And outputting the moving image data to be output. In the conversion moving image data generating process, the still image data is output from each of at least a plurality of still image data included in the moving image data. Generate a three-dimensional model of the face part of the target face reflected in the still image specified by using a conversion algorithm that estimates the three-dimensional model of the face obtained by machine learning of many faces. The three-dimensional model generation process and the plurality of three-dimensional models generated in the three-dimensional model generation process are each performed at a constant angle. A three-dimensional model rotation process for performing a process of rotating the three-dimensional model by a rotation angle and a two-dimensional image generation process for generating the converted still image data based on each of the three-dimensional model rotated in the three-dimensional model rotation process. And a computer program that causes the computer to execute.
第1実施形態によるテレビ会議システムの全体構成を示す図。The figure which shows the whole structure of the video conference system by 1st Embodiment. 図1に示したテレビ会議システムの通信システムの外観を示す斜視図。FIG. 2 is a perspective view showing an appearance of a communication system of the video conference system shown in FIG. 1. 図2に示したコンピュータ装置のハードウェア構成を示す図。FIG. 3 is a diagram showing a hardware configuration of the computer device shown in FIG. 2. 図2に示したコンピュータ装置の内部に生成される機能ブロックを示すブロック図。FIG. 3 is a block diagram showing functional blocks generated inside the computer device shown in FIG. 2. 図4に示した画像処理部の内部に生成される機能ブロックの例を示すブロック図。FIG. 5 is a block diagram showing an example of functional blocks generated inside the image processing unit shown in FIG. 4. 第1通信システムのカメラで生成される動画像データの内容を示す図。The figure which shows the content of the moving image data produced | generated by the camera of a 1st communication system. 第1実施形態で動画像データを変換動画像データに変換する場合におけるその変換の原理を説明するための、変換前の顔画像の一例を示す図。The figure which shows an example of the face image before conversion for demonstrating the principle of the conversion in the case of converting moving image data into conversion moving image data in 1st Embodiment. 第1実施形態で動画像データを変換動画像データに変換する場合におけるその変換の原理を説明するための、回転前の三次元モデルの一例を示す図。The figure which shows an example of the three-dimensional model before rotation for demonstrating the principle of the conversion in the case of converting moving image data into conversion moving image data in 1st Embodiment. 第1実施形態で動画像データを変換動画像データに変換する場合におけるその変換の原理を説明するための、回転後の三次元モデルの一例を示す図。The figure which shows an example of the three-dimensional model after rotation for demonstrating the principle of the conversion in the case of converting moving image data into conversion moving image data in 1st Embodiment. 第1実施形態で動画像データを変換動画像データに変換する場合におけるその変換の原理を説明するための、変換後の顔画像の一例を示す図。The figure which shows an example of the face image after conversion for demonstrating the principle of the conversion in the case of converting moving image data into conversion moving image data in 1st Embodiment. 第1実施形態で動画像データを変換動画像データに変換する場合におけるその変換の原理を説明するための他の図。FIG. 8 is another diagram for explaining the principle of conversion when converting moving image data into converted moving image data in the first embodiment. 図4に示した画像処理部の内部に生成される機能ブロックの例を示すブロック図。FIG. 5 is a block diagram showing an example of functional blocks generated inside the image processing unit shown in FIG. 4. 図4に示した画像処理部の内部に生成される機能ブロックの例を示すブロック図。FIG. 5 is a block diagram showing an example of functional blocks generated inside the image processing unit shown in FIG. 4. 図4に示した画像処理部の内部に生成される機能ブロックの例を示すブロック図。FIG. 5 is a block diagram showing an example of functional blocks generated inside the image processing unit shown in FIG. 4. 図1に示したテレビ会議システムにおける第2通信システムに含まれるディスプレイに表示される動画像の一例を示す図。The figure which shows an example of the moving image displayed on the display contained in the 2nd communication system in the video conference system shown in FIG. 図1に示したテレビ会議システムにおける第2通信システムに含まれるディスプレイに表示される動画像の他の例を示す図。The figure which shows the other example of the moving image displayed on the display contained in the 2nd communication system in the video conference system shown in FIG. 変形例によるテレビ会議システムの全体構成を示す図。The figure which shows the whole structure of the video conference system by a modification.
 以下、図面を参照しつつ本発明の好ましい第1及び第2実施形態及び変形例について説明する。
 両実施形態、及び変形例の説明において、同一の対象には同一の符号を付すものとし、重複する説明は場合により省略するものとする。また、特に矛盾しない限りにおいて、各実施形態及び変形例に記載の技術内容は相互に組み合せることができるものとする。
Hereinafter, preferred first and second embodiments and modifications of the present invention will be described with reference to the drawings.
In the description of both the embodiments and the modified examples, the same reference numerals are given to the same objects, and duplicate description will be omitted depending on the case. Further, the technical contents described in the embodiments and the modified examples can be combined with each other as long as there is no particular contradiction.
≪第1実施形態≫
 図1に、本願発明の画像処理装置を含むシステムの好ましい一実施形態の全体構成を概略で示す。
 第1実施形態によるシステムは、テレビ会議システムである。とはいえ、既に述べたように、本願発明の用途はテレビ会議システムには限定されない。
 テレビ会議システムは、第1通信システム10-1と、第2通信システム10-2とを含んで構成されている。これらはいずれも、ネットワーク400に接続可能とされている。
 ネットワーク400は、これには限られないが、この実施形態ではインターネットである。
 この実施形態における第1通信システム10-1は、テレビ会議に参加する一方のユーザが使用するものであり、第2通信システム10-2は、テレビ会議に参加する他方のユーザが使用するものである。
«First embodiment»
FIG. 1 schematically shows the overall configuration of a preferred embodiment of a system including an image processing device of the present invention.
The system according to the first embodiment is a video conference system. However, as described above, the application of the present invention is not limited to the video conference system.
The video conference system includes a first communication system 10-1 and a second communication system 10-2. All of these are connectable to the network 400.
Network 400 is, but is not limited to, the Internet in this embodiment.
The first communication system 10-1 in this embodiment is used by one user who participates in a video conference, and the second communication system 10-2 is used by the other user who participates in a video conference. is there.
 第1通信システム10-1と、第2通信システム10-2とは、本願発明との関係では実質的に同じ構成を有し、その機能、効果も共通するので、以下両者をまとめて通信システム10と称して説明を行う場合がある。
 この実施形態における通信システム10は、通信システム10の外観を示す斜視図である図2に示すように、画像処理装置としてのコンピュータ装置100と、ディスプレイ101と、カメラ210とを含んでいる。この実施形態におけるコンピュータ装置100と、ディスプレイ101と、カメラ210とは、これには限られないがすべて別体である。
The first communication system 10-1 and the second communication system 10-2 have substantially the same configuration in relation to the invention of the present application and have the same functions and effects. Therefore, both are collectively described below. In some cases, the description will be given by calling it 10.
As shown in FIG. 2, which is a perspective view showing the external appearance of the communication system 10, the communication system 10 in this embodiment includes a computer device 100 as an image processing device, a display 101, and a camera 210. The computer device 100, the display 101, and the camera 210 in this embodiment are all separate bodies, although not limited thereto.
 追って詳しく述べるがこの実施形態におけるコンピュータ装置100は、汎用のコンピュータにより構成されている。コンピュータ装置100は市販品でも十分である。より詳細には、この実施形態におけるコンピュータ装置100は公知或いは周知のデスクトップ型のパソコンである。
 コンピュータ装置100は、ネットワーク400を介しての通信が可能とされている。コンピュータ装置100がネットワーク400を介して行う通信の相手方には、そのコンピュータ装置100が含まれる通信システム10と対になる通信システム10に含まれるコンピュータ装置100が少なくとも含まれる。
As will be described later in detail, the computer device 100 in this embodiment is configured by a general-purpose computer. The computer device 100 may be a commercially available product. More specifically, the computer device 100 in this embodiment is a known or well-known desktop personal computer.
The computer device 100 is capable of communication via the network 400. The counterpart of the communication performed by the computer apparatus 100 via the network 400 includes at least the computer apparatus 100 included in the communication system 10 paired with the communication system 10 including the computer apparatus 100.
 コンピュータ装置100には、上述したディスプレイ101が接続されている。ディスプレイ101は、静止画像又は動画像を表示するためのものであり、公知、或いは周知のものを用いることができる。この実施形態におけるコンピュータ装置100は、動画像を表示できることが要求される。ディスプレイ101は市販品で足り、公知、或いは周知のものでよく、例えば、液晶ディスプレイである。この実施形態におけるディスプレイ101は、コンピュータ装置100に対してケーブルによって有線で接続されているが、コンピュータ装置100と無線で接続されていてもよい。かかるコンピュータ装置100とディスプレイ101との接続のために用いられる技術も、公知或いは周知のもので良い。
 コンピュータ装置100は、また入力装置102を備えている。入力装置102は、ユーザが所望の入力をコンピュータ装置100に対して行うためのものである。入力装置102は、公知或いは周知のものを用いることができる。この実施形態におけるコンピュータ装置100の入力装置102はキーボードとされているが、入力装置102はこれには限られず、テンキー、トラックボール、マウス、マイクロフォン端子を利用した公知、或いは周知の音声入力などを用いることも可能である。ディスプレイ101がタッチパネルである場合、ディスプレイ101は入力装置102の機能を兼ねることになる。
 コンピュータ装置100には、上述したカメラ210が1つ接続されている。カメラ210は、動画像を撮像することのできるデジタルカメラであり、撮像した動画像についてのデータである動画像データを出力することができるものとなっている。カメラ210が生成する動画像データは、二次元の静止画像についてのデータである連続する多数の静止画像データによって構成される。そのような機能を有するカメラ210は公知或いは周知であり、また市販もされている。静止画像データは例えばMJPEG形式のデータであり、また、静止画像データにはデプスデータは含まれない。この実施形態におけるカメラ210はそのようなものであってもよく、例えば、市販のウェブカメラをこの実施形態におけるカメラ210として用いることができる。カメラ210は、コンピュータ装置100に対して動画像データを出力する。それを可能とするためにカメラ210は、コンピュータ装置100と例えば有線で接続される。かかる接続は無線で行われても良い。かかるコンピュータ装置100とカメラ210との接続のために用いられる技術も、公知或いは周知のもので良い。
 カメラ210は所定の位置に固定的に配される。所定の位置は基本的にどこでも良いが、図2に示した通信システム10を使用するユーザの顔である対象顔がカメラ210で撮像された動画像に映り込むような位置である。この実施形態では、ディスプレイ101の上側におけるディスプレイ101の幅方向の略中央にカメラ210が固定されている。カメラ210が実際に位置する図2に示された位置が、本発明におけるカメラの実位置となる。
The above-described display 101 is connected to the computer device 100. The display 101 is for displaying a still image or a moving image, and a known or known one can be used. The computer device 100 in this embodiment is required to be able to display a moving image. The display 101 may be a commercially available product and may be publicly known or publicly known, and is, for example, a liquid crystal display. The display 101 in this embodiment is connected to the computer apparatus 100 by a cable through a cable, but may be wirelessly connected to the computer apparatus 100. The technique used for connecting the computer device 100 and the display 101 may be publicly known or well known.
The computer device 100 also includes an input device 102. The input device 102 is used by the user to make a desired input to the computer device 100. A known or well-known input device 102 can be used. Although the input device 102 of the computer device 100 in this embodiment is a keyboard, the input device 102 is not limited to this, and a well-known or well-known voice input using a numeric keypad, a trackball, a mouse, or a microphone terminal can be used. It is also possible to use. When the display 101 is a touch panel, the display 101 also functions as the input device 102.
One of the cameras 210 described above is connected to the computer device 100. The camera 210 is a digital camera capable of capturing a moving image, and is capable of outputting moving image data that is data regarding the captured moving image. The moving image data generated by the camera 210 is composed of a large number of continuous still image data which are data about a two-dimensional still image. The camera 210 having such a function is publicly known or well known, and is commercially available. The still image data is, for example, MJPEG format data, and the still image data does not include depth data. The camera 210 in this embodiment may be such, and for example, a commercially available webcam can be used as the camera 210 in this embodiment. The camera 210 outputs moving image data to the computer device 100. To enable this, the camera 210 is connected to the computer device 100, for example, by wire. Such connection may be wireless. The technique used for connecting the computer device 100 and the camera 210 may be publicly known or well known.
The camera 210 is fixedly arranged at a predetermined position. The predetermined position may basically be anywhere, but is a position where the target face, which is the face of the user who uses the communication system 10 shown in FIG. 2, is reflected in the moving image captured by the camera 210. In this embodiment, the camera 210 is fixed to the upper side of the display 101 at approximately the center in the width direction of the display 101. The actual position of the camera 210 shown in FIG. 2 is the actual position of the camera in the present invention.
 次に、画像処理装置を構成するコンピュータ装置100の構成について説明する。コンピュータ装置100のハードウェア構成を、図3に示す。
 ハードウェアには、CPU(central processing unit)111、ROM(read only memory)112、RAM(random access memory)113、インターフェイス114が含まれており、これらはバス116によって相互に接続されている。
 CPU111は、演算を行う演算装置である。CPU111は、例えば、ROM112、或いはRAM113に記録されたコンピュータプログラムを実行することにより、後述する処理を実行する。図示をしていないが、ハードウェアはHDD(hard disk drive)その他の大容量記録装置を備えていてもよく、上述のコンピュータプログラムは大容量記録装置に記録されていても構わない。
 ここでいうコンピュータプログラムには、少なくとも、動画像データを変換することにより変換動画像データを生成する後述する処理をコンピュータ装置100に実行させるためのコンピュータプログラムが含まれる。このコンピュータプログラムは、コンピュータ装置100にプリインストールされていたものであっても良いし、事後的にインストールされたものであっても良い。このコンピュータプログラムのコンピュータ装置100へのインストールは、メモリカード等の図示を省略の所定の記録媒体を介して行なわれても良いし、LAN或いはインターネットなどのネットワークを介して行なわれても構わない。
 ROM112は、CPU111が後述する処理を実行するために必要なコンピュータプログラムやデータを記録している。ROM112に記録されたコンピュータプログラムとしては、これに限られず、OSや、インターネットを介してホームページを閲覧するためのwebブラウザ、電子メールを扱うためのメーラ等の他のプログラムが含まれる場合も当然にある。
 RAM113は、CPU111が処理を行うために必要なワーク領域を提供する。場合によっては、上述のコンピュータプログラムやデータ(の少なくとも一部)が記録されていてもよい。
 インターフェイス114は、バス116で接続されたCPU111やRAM113等と外部との間でデータのやり取りを行うものである。インターフェイス114には、上述のディスプレイ101と、入力装置102と、カメラ210とが接続されている。
 入力装置102から入力された操作内容は、インターフェイス114からバス116に入力されるようになっている。また、カメラ210から送られた動画像データも、インターフェイス114からバス116に入力されるようになっている。
 また、周知のようにディスプレイ101に画像を表示するためのデータは、バス116からインターフェイス114に送られ、インターフェイス114からディスプレイ101に出力されるようになっている。
 インターフェイス114は、また、インターネットであるネットワーク400を介して外部と通信を行うための公知の手段である送受信機構(図示を省略)に接続されており、それにより、コンピュータ装置100は、ネットワーク400を介してデータを送信することと、ネットワーク400を介してデータを受信することとが可能になっている。かかるネットワーク400を介してのデータの送受信は、有線で行われる場合もあるが無線で行われる場合もある。送受信機構の構成は、公知或いは周知のものとすることができる。送受信機構がネットワーク400から受取ったデータは、インターフェイス114により受取られるようになっており、インターフェイス114から送受信機構にわたされたデータは、送受信機構によって、ネットワーク400を介して外部、例えば、この実施形態との関係でいえば、相手方の通信システム10に含まれるコンピュータ装置100へ送られるようになっている。
Next, the configuration of the computer device 100 that constitutes the image processing apparatus will be described. The hardware configuration of the computer device 100 is shown in FIG.
The hardware includes a CPU (central processing unit) 111, a ROM (read only memory) 112, a RAM (random access memory) 113, and an interface 114, which are interconnected by a bus 116.
The CPU 111 is a computing device that performs computation. The CPU 111 executes the processing described below by executing a computer program recorded in the ROM 112 or the RAM 113, for example. Although not shown, the hardware may include an HDD (hard disk drive) or other large-capacity recording device, and the computer program described above may be recorded in the large-capacity recording device.
The computer program mentioned here includes at least a computer program for causing the computer apparatus 100 to execute a process, which will be described later, for generating converted moving image data by converting moving image data. This computer program may be pre-installed in the computer device 100 or may be installed afterwards. The computer program may be installed in the computer device 100 via a predetermined recording medium (not shown) such as a memory card, or via a network such as a LAN or the Internet.
The ROM 112 stores computer programs and data necessary for the CPU 111 to execute the processing described below. The computer program recorded in the ROM 112 is not limited to this, and may include other programs such as an OS, a web browser for browsing a home page via the Internet, and a mailer for handling electronic mail. is there.
The RAM 113 provides a work area necessary for the CPU 111 to perform processing. In some cases, (at least a part of) the computer program and data described above may be recorded.
The interface 114 is for exchanging data between the CPU 111, the RAM 113, etc. connected by the bus 116 and the outside. The above-described display 101, input device 102, and camera 210 are connected to the interface 114.
The operation content input from the input device 102 is input to the bus 116 from the interface 114. The moving image data sent from the camera 210 is also input to the bus 116 from the interface 114.
Further, as is well known, data for displaying an image on the display 101 is sent from the bus 116 to the interface 114 and output from the interface 114 to the display 101.
The interface 114 is also connected to a transmission / reception mechanism (not shown) that is a known means for communicating with the outside via the network 400 that is the Internet. It is possible to send data via the network and receive data via the network 400. The data transmission / reception via the network 400 may be performed by wire or wirelessly. The configuration of the transmission / reception mechanism may be publicly known or well known. The data received by the transmission / reception mechanism from the network 400 is adapted to be received by the interface 114, and the data passed to the transmission / reception mechanism from the interface 114 is transmitted by the transmission / reception mechanism to the outside via the network 400, for example, this embodiment. In this connection, it is sent to the computer device 100 included in the communication system 10 of the other party.
 CPU111がコンピュータプログラムを実行することにより、コンピュータ装置100内部には、図4で示されたような機能ブロックが生成される。なお、以下の機能ブロックは、コンピュータ装置100に以下に述べるような処理を実行させるための上述のコンピュータプログラム単体の機能により生成されていても良いが、上述のコンピュータプログラムと、コンピュータ装置100にインストールされたOSその他のコンピュータプログラムとの協働により生成されても良い。
 コンピュータ装置100内には、本願発明の機能との関係で、入力部121、主制御部122、画像処理部123、出力部125が生成される。
When the CPU 111 executes the computer program, the functional blocks shown in FIG. 4 are generated inside the computer device 100. Note that the following functional blocks may be generated by the functions of the above-mentioned computer program alone for causing the computer apparatus 100 to perform the processing described below, but the above-described computer program and the computer apparatus 100 are installed. It may be generated in cooperation with the generated OS or other computer program.
An input unit 121, a main control unit 122, an image processing unit 123, and an output unit 125 are generated in the computer device 100 in relation to the functions of the present invention.
 入力部121は、インターフェイス114からの入力を受取るものである。
 インターフェイス114から入力部121への入力には、入力装置102からの入力がある。入力装置102からの入力には、詳細は追って説明するが、例えば、指定データ、及び開始データがある。入力装置102から指定データ、及び開始データ等の入力があった場合、それら入力装置102からのデータはいずれも、入力部121から主制御部122へと送られるようになっている。
 インターフェイス114から入力部121へ入力されるデータには、また、テレビ会議の相手方となる通信システム10に含まれるコンピュータ装置100から送られてきて送受信機構で受取られたデータがある。かかるデータは、例えば、後述する変換動画像データである。送受信機構、インターフェイス114を経て変換動画像データが入力部121に受取られた場合、入力部121はそれらを主制御部122へと送るようになっている。
 インターフェイス114から入力部121へ入力されるデータには、また、カメラ210から送られてきた動画像データがある。動画像データを受取った場合、入力部121はそれを主制御部122に送るようになっている。
The input unit 121 receives an input from the interface 114.
Input from the interface 114 to the input unit 121 includes input from the input device 102. Although details will be described later, the input from the input device 102 includes, for example, designation data and start data. When the input data such as the designated data and the start data is input from the input device 102, all the data from the input device 102 are sent from the input unit 121 to the main control unit 122.
The data input from the interface 114 to the input unit 121 also includes data sent from the computer device 100 included in the communication system 10 that is a counterpart of the video conference and received by the transmission / reception mechanism. Such data is, for example, converted moving image data described later. When the converted moving image data is received by the input unit 121 via the transmission / reception mechanism and the interface 114, the input unit 121 sends them to the main control unit 122.
The data input from the interface 114 to the input unit 121 also includes moving image data sent from the camera 210. When the moving image data is received, the input unit 121 sends it to the main control unit 122.
 主制御部122は、コンピュータ装置100内に生成された各機能ブロック全体の制御を行うものである。例えば、主制御部122は、テレビ会議を実現するための通信システム10間での通信についての制御を行う。
 主制御部122は、入力部121から指定データ、開始データを受取る場合がある。指定データ、開始データを受取った場合、主制御部122は、それぞれ後述するような処理を実行するようになっている。なお、指定データを受取った主制御部122は、それを出力部125へと送るようになっている。
 主制御部122は、テレビ会議の相手方となる通信システム10に含まれるコンピュータ装置100から送られてきて送受信機構で受取られた変換動画像データを入力部121から受取る場合がある。これを受取った主制御部122は、その変換動画像データを出力部125へと送るようになっている。
 主制御部122は、カメラ210から送られてきた動画像データを入力部121から受取る場合がある。これを受取った主制御部122は、後述する条件が満たされる場合に、その動画像データを画像処理部123へと送るようになっている。
The main controller 122 controls the entire functional blocks generated in the computer device 100. For example, the main control unit 122 controls communication between the communication systems 10 for realizing a video conference.
The main control unit 122 may receive designated data and start data from the input unit 121. When receiving the designated data and the start data, the main control unit 122 is configured to execute the processes described below. The main control unit 122, which receives the designated data, sends it to the output unit 125.
The main control unit 122 may receive, from the input unit 121, the converted moving image data transmitted from the computer device 100 included in the communication system 10 that is the other party of the video conference and received by the transmission / reception mechanism. Upon receiving this, the main control unit 122 sends the converted moving image data to the output unit 125.
The main control unit 122 may receive the moving image data sent from the camera 210 from the input unit 121. The main control unit 122, which has received this, sends the moving image data to the image processing unit 123 when the conditions described later are satisfied.
 画像処理部123は、画像処理を行うものである。
 画像処理部123は、上述したように主制御部122から動画像データを受取る場合がある。動画像データを受取った場合、画像処理部123は、動画像データに対して画像処理を行い、動画像データを変換動画像データに変換する。
 動画像データは、上述したように、二次元の静止画像についてのデータである連続する多数の静止画像データによって構成されている。そして、各静止画像データに基づく静止画像には、対象顔が映り込んでいる。そのような動画像データを、画像処理部123は、変換動画像データに変換するのである。かかる変換の具体的処理内容については追って詳しく述べるが、簡単にいうと、画像処理部123は、動画像データに含まれる静止画像データのうちの複数を変換して変換静止画像データとするとともに、変換静止画像データを連続させて、変換動画像データとする。つまり、変換動画像データは、変換静止画像データが連続したものである。変換静止画像データは、二次元の静止画像である変換静止画像のデータである。変換動画像データは一般的な動画像データであり、例えばMJPEG形式のデータである。
 上述したように動画像データ或いはそれに含まれる静止画像データは、実位置にあるカメラ210によって生成されたものであり、それらに基づく動画或いは静止画像には実位置から写した対象顔が映り込んでいる。対して変換静止画像データは、静止画像データに基づいて、或いは静止画像データを変換して生成されるデータである、変換静止画像のデータである。変換静止画像は、正面を向いた(ユーザが自然な体勢を取った)ときの対象顔から正面方向に伸びる仮想の直線上の所定の位置である仮想位置にカメラが存在する場合においてカメラによって撮像されるはずの二次元の静止画像である。つまり、変換静止画像データによって特定される変換静止画像に含まれる対象顔は、ユーザの顔の正面である仮想位置から写した場合における対象顔となり、基本的に正面を向いた状態となる。なお、カメラ210の仮想位置については追って詳しく説明する。
 なお、静止画像データは、動画を構成する静止画像(いわゆるフレーム)のデータである。画像処理装置がカメラから受取ったすべての静止画像データから変換静止画像データを生成しても構わないが、そうすると動画像の遅延を生じるおそれがある。したがって、遅延が生じないことに重きを置くのであれば、変換静止画像データへの変換の対象とする静止画像データを、動画像データに含まれる静止画像データのうちの、例えば、2つおき或いは3つおき(2フレームおき或いは3フレームおき)の静止画像データとすることができる。そうすると、変換動画像データのフレーム数(1秒あたりの変換動画像データに含まれる変換静止画像データの数)は、動画像データのフレーム数(1秒あたりの動画像データに含まれる静止画像データの数)よりも小さくなるが、変換動画像データのフレーム数が少なくとも6~8fps程度であれば、変換動画像データによる動画は一応動画として通用する。もちろん、変換の対象となる静止画像データは、2つおきとか3つおきとかの一定の数おきの静止画像データである必要はない。
 いずれにせよ、画像処理部123は、生成した変換動画像データを、出力部125へと送るようになっている。
The image processing unit 123 performs image processing.
The image processing unit 123 may receive the moving image data from the main control unit 122 as described above. When the moving image data is received, the image processing unit 123 performs image processing on the moving image data and converts the moving image data into converted moving image data.
As described above, the moving image data is composed of a large number of continuous still image data which are data about a two-dimensional still image. Then, the target face is reflected in the still image based on each still image data. The image processing unit 123 converts such moving image data into converted moving image data. Although the specific processing contents of such conversion will be described in detail later, in brief, the image processing unit 123 converts a plurality of still image data included in the moving image data into converted still image data, and The converted still image data is made continuous to form converted moving image data. That is, the converted moving image data is a series of converted still image data. The converted still image data is data of a converted still image that is a two-dimensional still image. The converted moving image data is general moving image data, for example, data in the MJPEG format.
As described above, the moving image data or the still image data included in the moving image data is generated by the camera 210 in the actual position, and the moving image or the still image based on the moving image data reflects the target face captured from the actual position. There is. On the other hand, the converted still image data is data of a converted still image, which is data generated based on the still image data or by converting the still image data. The converted still image is captured by the camera when the camera is present at a virtual position, which is a predetermined position on a virtual straight line extending in the front direction from the target face when facing the front (the user takes a natural posture). It is a two-dimensional still image that should be displayed. That is, the target face included in the converted still image specified by the converted still image data becomes the target face when the image is taken from the virtual position which is the front of the user's face, and is basically in a front-facing state. The virtual position of the camera 210 will be described later in detail.
The still image data is data of a still image (so-called frame) that constitutes a moving image. The image processing apparatus may generate the converted still image data from all the still image data received from the camera, but doing so may cause a delay in the moving image. Therefore, if emphasis is placed on not causing a delay, the still image data to be converted to the converted still image data is set to, for example, every two or more of the still image data included in the moving image data. Every three (every two frames or every three frames) still image data can be used. Then, the number of frames of the converted moving image data (the number of converted still image data included in the converted moving image data per second) is equal to the number of frames of the moving image data (still image data included in the moving image data per second). However, if the number of frames of the converted moving image data is at least about 6 to 8 fps, the moving image based on the converted moving image data can be used as a moving image. Of course, the still image data to be converted does not have to be a fixed number of still image data such as every two or three.
In any case, the image processing unit 123 sends the generated converted moving image data to the output unit 125.
 出力部125は、コンピュータ装置100内の機能ブロックで生成されたデータをインターフェイス114に出力するものである。
 上述したように、出力部125は、主制御部122から指定データを受取る場合がある。指定データを受取った場合、出力部125は、インターフェイス114を介してそれを送受信機構へと送るようになっている。なお、指定データは、テレビ会議を行う場合における相手方の通信システム10に含まれるコンピュータ装置100を特定する情報である。
 上述したように出力部125は、主制御部122から変換動画像データを受取る場合がある。この変換動画像データは、相手方の通信システム10に含まれるコンピュータ装置100から送られてきたものである。この変換動画像データを受取った場合出力部125は、それをインターフェイス114を介して、コンピュータ装置100と接続されているディスプレイ101に送るようになっている。ディスプレイ101には、その変換動画像データに基づく動画像が表示されることになる。
 上述したように、出力部125は、画像処理部123から変換動画像データを受取る場合がある。この変換動画像データは、出力部125がその中にあるコンピュータ装置100内で生成されたものである。この変換動画像データを受取った場合出力部125は、それをインターフェイス114を介して、送受信機構に送るようになっている。送受信機構は、上述の指定データによって特定されるコンピュータ装置100に、その変換動画像データを送るようになっている。
The output unit 125 outputs the data generated by the functional blocks in the computer device 100 to the interface 114.
As described above, the output unit 125 may receive the designated data from the main control unit 122. When the designated data is received, the output unit 125 sends it to the transmission / reception mechanism via the interface 114. The designated data is information that identifies the computer device 100 included in the communication system 10 of the other party when the video conference is held.
As described above, the output unit 125 may receive the converted moving image data from the main control unit 122. The converted moving image data is sent from the computer device 100 included in the communication system 10 of the other party. When receiving the converted moving image data, the output unit 125 sends it via the interface 114 to the display 101 connected to the computer apparatus 100. A moving image based on the converted moving image data is displayed on the display 101.
As described above, the output unit 125 may receive the converted moving image data from the image processing unit 123. The converted moving image data is generated in the computer device 100 in which the output unit 125 is located. When receiving the converted moving image data, the output unit 125 sends it to the transmitting / receiving mechanism via the interface 114. The transmission / reception mechanism is configured to send the converted moving image data to the computer device 100 specified by the above-mentioned designated data.
 次に、以上で説明したテレビ会議システムの使用方法、及び動作、特には本願発明における画像処理装置として機能する、通信システム10中のコンピュータ装置100の使用方法、動作について説明する。 Next, the usage method and operation of the video conference system described above, particularly the usage method and operation of the computer device 100 in the communication system 10 that functions as the image processing device in the present invention will be described.
 上述したように、テレビ会議システムには、テレビ会議に参加する一方のユーザが使用する第1通信システム10-1と、テレビ会議に参加する他方のユーザが使用する第2通信システム10-2とが含まれる。 As described above, the video conference system includes the first communication system 10-1 used by one user participating in the video conference and the second communication system 10-2 used by the other user participating in the video conference. Is included.
 両ユーザは、テレビ会議を行うための準備を行う。
 公知、或いは周知のテレビ会議システムを用いる場合と同様に、一方のユーザは、第1通信システム10-1中のディスプレイ101を見ながら、他方のユーザは、第2通信システム10-2中のディスプレイ101を見ながら、テレビ会議を行う。したがって、一方のユーザは、第1通信システム10-1中のディスプレイ101の前に、他方のユーザは、第2通信システム10-2中のディスプレイ101の前にそれぞれ座るなどして、それぞれ適切な位置に移動する。
Both users prepare for the video conference.
As in the case of using a known or well-known video conference system, one user watches the display 101 in the first communication system 10-1, while the other user displays the display in the second communication system 10-2. While watching 101, hold a video conference. Therefore, one user is sitting in front of the display 101 in the first communication system 10-1, and the other user is sitting in front of the display 101 in the second communication system 10-2. Move to position.
 また、テレビ会議の参加者は、テレビ会議を行う2人のユーザを特定する。2人のユーザの特定は、公知、或いは周知の技術を用いて実現することができる。例えば、2人のユーザの特定は、テレビ会議に参加する2人のユーザの少なくとも一方が、テレビ会議を行う相手方を指定することで行うことができる。もちろん、双方のユーザが相手方を指定するようにしてもよく、この実施形態では、一方のユーザがテレビ会議を行う相手側を指定し、指定された側のユーザがそれを了承することで、テレビ会議を行う2人のユーザが特定されるようになっている。
 第1通信システム10-1を用いる一方のユーザの側から相手側を特定する場合を例にとって説明を進める。まず、第1通信システム10-1を用いるユーザが、第1通信システム10-1に含まれる入力装置102を操作して、指定データを生成する。指定データは、テレビ会議を行う相手側のユーザを特定する情報である。例えば、テレビ会議に参加することのあるユーザのそれぞれには、互いにユニークな識別子であるIDが付されている。入力装置102を用いて、このIDを入力するか、予め登録されたIDから選択することで、第1通信システム10-1を用いるユーザは、指定データを入力することができる。この例では、指定データにより、第2通信システム10-2を用いるユーザのIDが指定されるものとする。入力された指定データは、入力装置102からインターフェイス114を経て入力部121に至る。入力部121は、指定データに、第1通信システム10-1自身のIDを更に付して、主制御部122経由で出力部125へそれらを送る。指定データと、第1通信システム10-1のIDは、出力部125からインターフェイス114を経て送受信機構へと送られる。送受信機構は、指定データによって特定されるIDを持つユーザの操作する通信システム10、つまり第2通信システム10-2のコンピュータ装置100に対して、ネットワーク400を介して、第1通信システム10-1のIDを送る。
 第1通信システム10-1から第2通信システム10-2へIDを送る上述の処理は、第1通信システム10-1のユーザがテレビ会議の相手方として第2通信システム10-2のユーザを特定することと、第1通信システム10-1のユーザが第2通信システム10-2のユーザに対して行うテレビ会議の申込みとを兼ねている。
In addition, the participants of the video conference specify two users who hold the video conference. The identification of the two users can be realized by using a known technique or a known technique. For example, the two users can be specified by at least one of the two users participating in the video conference specifying the other party to whom the video conference is to be performed. Of course, both users may specify the other party. In this embodiment, one user specifies the other party with whom the video conference is to be performed, and the user on the specified side approves the other party. Two users who have a meeting are specified.
The case of identifying the other party from the side of one user who uses the first communication system 10-1 will be described as an example. First, a user who uses the first communication system 10-1 operates the input device 102 included in the first communication system 10-1 to generate designated data. The designated data is information that identifies the user of the other party who holds the video conference. For example, each of the users who may participate in the video conference is given an ID that is a unique identifier. The user using the first communication system 10-1 can input the designated data by inputting this ID using the input device 102 or by selecting from the IDs registered in advance. In this example, it is assumed that the designation data designates the ID of the user who uses the second communication system 10-2. The input designated data reaches the input unit 121 from the input device 102 via the interface 114. The input unit 121 further attaches the ID of the first communication system 10-1 itself to the designated data and sends them to the output unit 125 via the main control unit 122. The designated data and the ID of the first communication system 10-1 are sent from the output unit 125 to the transmitting / receiving mechanism via the interface 114. The transmission / reception mechanism transmits the first communication system 10-1 to the communication system 10 operated by the user having the ID specified by the designated data, that is, the computer device 100 of the second communication system 10-2 via the network 400. Send your ID.
In the above-described process of sending the ID from the first communication system 10-1 to the second communication system 10-2, the user of the first communication system 10-1 identifies the user of the second communication system 10-2 as the other party of the video conference. In addition, the user of the first communication system 10-1 applies for a video conference with the user of the second communication system 10-2.
 ネットワーク400を介して第1通信システム10-1のコンピュータ装置100から送られてきた第1通信システム10-1のIDを第2通信システム10-2のコンピュータ装置100は、その送受信機構で受取る。第2通信システム10-2に含まれるコンピュータ装置100内で、そのIDは、送受信機構からインターフェイス114を経て入力部121へと至り、更には主制御部122に送られる。これを受取った主制御部122は、第1通信システム10-1のユーザからテレビ会議の申込みがあった旨の画像、例えば、第1通信システム10-1から送られてきた第1通信システム10-1のユーザのIDを含む画像を生成し、その画像のデータを出力部125へと送る。出力部125はその画像のデータをインターフェイス114を経てディスプレイ101へと送る。その結果、第2通信システム10-2に含まれるディスプレイ101には、第1通信システム10-1のユーザからテレビ会議の申込みがあった旨を示す画像が表示される。
 第1通信システム10-1のユーザとテレビ会議を行うことを了承する場合、第2通信システム10-2のユーザは、その了承の意思を示す入力を、入力装置102を用いて行う。これが第2通信システム10-2に含まれるコンピュータ装置100における指定データに相当する。第1通信システム10-1のユーザとテレビ会議を行うことを了承しない場合、第2通信システム10-2のユーザは、その了承の意思を示す入力を行わないか、第1通信システム10-1のユーザとのテレビ会議を了承しないという意思を示す入力を行う。この場合、テレビ会議は実現されない。第2通信システム10-2のユーザがテレビ会議を行うことを了承する意思表示を行った場合、その旨を示すデータである指定データが第2通信システム10-2に含まれるコンピュータ装置100における入力装置102から入力された場合、指定データは、インターフェイス114、入力部121を経て主制御部122へと送られる。
 それを受取った主制御部122は、テレビ会議を行う準備ができたことを示すデータを生成し、それを出力部125に送る。そのデータは、出力部125からインターフェイス114を経て送受信機構に送られ、送受信機構からネットワーク400を介して第1通信システム10-1へと送られる。
The computer device 100 of the second communication system 10-2 receives the ID of the first communication system 10-1 transmitted from the computer device 100 of the first communication system 10-1 via the network 400 by the transmission / reception mechanism. In the computer device 100 included in the second communication system 10-2, the ID reaches the input unit 121 from the transmission / reception mechanism via the interface 114, and is further transmitted to the main control unit 122. Upon receiving this, the main control unit 122 receives an image indicating that the user of the first communication system 10-1 has applied for the video conference, for example, the first communication system 10 sent from the first communication system 10-1. An image including the ID of the user of -1 is generated, and the data of the image is sent to the output unit 125. The output unit 125 sends the image data to the display 101 via the interface 114. As a result, an image indicating that the user of the first communication system 10-1 has applied for the video conference is displayed on the display 101 included in the second communication system 10-2.
When approving to hold a video conference with the user of the first communication system 10-1, the user of the second communication system 10-2 uses the input device 102 to make an input indicating the intention of the approval. This corresponds to the designated data in the computer device 100 included in the second communication system 10-2. If the user of the first communication system 10-1 does not agree to hold the video conference, the user of the second communication system 10-2 does not make an input indicating the intention of the approval or the first communication system 10-1. Input indicating the intention not to accept the video conference with the user. In this case, the video conference is not realized. When the user of the second communication system 10-2 gives an indication that he / she is willing to hold the video conference, designated data, which is data indicating that, is input to the computer device 100 included in the second communication system 10-2. When input from the device 102, the designated data is sent to the main control unit 122 via the interface 114 and the input unit 121.
Receiving it, the main control unit 122 generates data indicating that the video conference is ready to be conducted, and sends it to the output unit 125. The data is sent from the output unit 125 to the transmitting / receiving mechanism via the interface 114, and is then sent from the transmitting / receiving mechanism to the first communication system 10-1 via the network 400.
 第1通信システム10-1におけるコンピュータ装置100の送受信機構は、第2通信システム10-2から送られてきたそのデータを受取る。そのデータは、送受信機構からインターフェイス114、入力部121を経て、第1通信システム10-1のコンピュータ装置100の主制御部122に送られる。
 以上で、第1通信システム10-1におけるコンピュータ装置100と、第2通信システム10-2におけるコンピュータ装置100は、テレビ会議に必要となる動画像についてのデータである、変換動画像データの送受信をお互いに行うための準備が調った状態となる。
 また、テレビ会議を行う前に、両ユーザの顔である対象顔が、両ユーザの傍にある通信システム10に含まれるカメラ210の撮像範囲内にそれぞれ位置するように、テレビ会議に参加する両ユーザは、例えば、自らの姿勢を調整したり、カメラ210の位置や角度を調整したりということを必要に応じて行う。
 以上で、テレビ会議の準備は終了する。
The transmission / reception mechanism of the computer device 100 in the first communication system 10-1 receives the data sent from the second communication system 10-2. The data is sent from the transmission / reception mechanism to the main control unit 122 of the computer apparatus 100 of the first communication system 10-1 via the interface 114 and the input unit 121.
As described above, the computer device 100 in the first communication system 10-1 and the computer device 100 in the second communication system 10-2 transmit and receive the converted moving image data, which is the data about the moving image necessary for the video conference. You are ready to do each other.
In addition, before the video conference, both parties participating in the video conference are placed so that the target faces, which are the faces of both users, are located within the imaging range of the camera 210 included in the communication system 10 near both users. The user, for example, adjusts his / her own posture or adjusts the position and angle of the camera 210 as necessary.
This completes the preparation for the video conference.
 次いで、テレビ会議を開始する。
 これには限られないがこの実施形態では、第1通信システム10-1を用いるユーザが開始データの入力を行うと、第1通信システム10-1で生成された変換動画像データの第2通信システム10-2への送信が行われて、第2通信システム10-2に含まれるディスプレイ101にその変換動画像データに基づく動画像が表示されるとともに、第2通信システム10-2を用いるユーザが開始データの入力を行うと、第2通信システム10-2で生成された変換動画像データの第1通信システム10-1への送信が行われて、第1通信システム10-1に含まれるディスプレイ101にその変換動画像データに基づく動画像が表示される。これら2つの処理の内容は事実上同じであるから、第1通信システム10-1で変換動画像データが生成され、生成されたその変換動画像データが第2通信システム10-2に送られ、そして第2通信システム10-2に含まれるディスプレイ101にその変換動画像データに基づく動画像が表示される場合の処理のみに着目して以下の説明を行う。
Then, the video conference is started.
Although not limited to this, in this embodiment, when the user who uses the first communication system 10-1 inputs the start data, the second communication of the converted moving image data generated by the first communication system 10-1 is performed. Transmission to the system 10-2 is performed, a moving image based on the converted moving image data is displayed on the display 101 included in the second communication system 10-2, and a user who uses the second communication system 10-2. When the start data is input, the converted moving image data generated by the second communication system 10-2 is transmitted to the first communication system 10-1 and included in the first communication system 10-1. A moving image based on the converted moving image data is displayed on the display 101. Since the contents of these two processes are substantially the same, the converted moving image data is generated in the first communication system 10-1, and the generated converted moving image data is sent to the second communication system 10-2. Then, the following description will be given focusing only on the processing when the moving image based on the converted moving image data is displayed on the display 101 included in the second communication system 10-2.
 第1通信システム10-1のユーザは、入力装置102を用いて開始データを入力する。開始データの入力が行われると、開始データは、指定データの場合と同様に、入力装置102から、第1通信システム10-1のコンピュータ装置100内の主制御部122へと送られる。それを受取った主制御部122は、第2通信システム10-2内のコンピュータ装置100へと変換動画像データを送信するための処理を開始する。 The user of the first communication system 10-1 uses the input device 102 to input start data. When the start data is input, the start data is sent from the input device 102 to the main control unit 122 in the computer device 100 of the first communication system 10-1 as in the case of the designated data. The main control unit 122 which has received it starts the process for transmitting the converted moving image data to the computer device 100 in the second communication system 10-2.
 これには限られないがこの実施形態では、開始データの入力が行われるか否かに限らず、コンピュータ装置100には、コンピュータ装置100に接続されているカメラ210から、動画像データが送られてきており、動画像データはインターフェイス114、入力部121を経て主制御部122へと常に送られて来ている。開始データの入力が行われるまでは、主制御部122は動画像データを受取っても何らの処理も行わないが、動画像データを受取った場合には、受取った動画像データを画像処理部123へと送る。 Although not limited to this, in this embodiment, moving image data is sent from the camera 210 connected to the computer apparatus 100 to the computer apparatus 100 regardless of whether or not start data is input. The moving image data is constantly sent to the main control unit 122 via the interface 114 and the input unit 121. The main control unit 122 does not perform any processing even if the moving image data is received until the start data is input. However, when the moving image data is received, the received moving image data is processed by the image processing unit 123. Send to.
 動画像データを受取った画像処理部123は、動画像データを、変換動画像データに変換する処理を行う。動画像データ、変換動画像データはそれぞれ既に述べた通りのものであり、その変換はどのように行われても良い。この実施形態では、第1から第4の変換方法という、4種類の変換の方法を提案する。 The image processing unit 123 that has received the moving image data performs a process of converting the moving image data into converted moving image data. The moving image data and the converted moving image data are as described above, and the conversion may be performed in any way. In this embodiment, four types of conversion methods, i.e., first to fourth conversion methods, are proposed.
(第1の変換方法から第4の変換方法の共通点)
 画像処理部123は、動画像データに含まれる静止画像データから、少なくとも複数の静止画像データを画像処理(変換)の対象として抜き出す、コマ落とし部を備えている。ただし、後述するようにコマ落とし部は必須ではない。
 また、画像処理部123は、コマ落とし部で抜き出された少なくとも複数の静止画像データのそれぞれから、静止画像データによって特定される静止画像に写り込んだ対象顔のうち、顔面部分についての三次元モデルを生成する三次元モデル生成部を備えている。
 また、画像処理部123は、三次元モデル生成部で複数生成された三次元モデルをそれぞれ一定の角度である回転角分回転させる処理を行う三次元モデル回転部を備えている。
 また、画像処理部123は、三次元モデル回転部で回転させられた三次元モデルのそれぞれに基づいて、変換静止画像データを生成する二次元画像生成部を備えている。
 これらの機能は第1の変換方法から第4の変換方法で変わりはない。
 第1の変換方法から、第4の変換方法までの各変換方法で異なるのは概ね、三次元モデル回転部で対象顔を回転させる場合における三次元モデルの回転角(回転方向を含む)を決定する方法のみである。
(Common points from the first conversion method to the fourth conversion method)
The image processing unit 123 includes a frame dropping unit that extracts at least a plurality of still image data from the still image data included in the moving image data as a target of image processing (conversion). However, the frame dropping unit is not essential as described later.
In addition, the image processing unit 123, from each of the at least a plurality of still image data extracted by the frame dropping unit, the three-dimensional image of the face portion of the target face reflected in the still image specified by the still image data. A three-dimensional model generation unit that generates a model is provided.
The image processing unit 123 also includes a three-dimensional model rotation unit that performs a process of rotating the plurality of three-dimensional models generated by the three-dimensional model generation unit by a rotation angle that is a fixed angle.
The image processing unit 123 also includes a two-dimensional image generation unit that generates converted still image data based on each of the three-dimensional models rotated by the three-dimensional model rotation unit.
These functions are the same from the first conversion method to the fourth conversion method.
The difference between the first to fourth conversion methods is generally that the rotation angle (including the rotation direction) of the three-dimensional model when the target face is rotated by the three-dimensional model rotation unit is determined. The only way to do it is.
(第1の変換方法)
 第1の変換方法を画像処理部123が実行する場合、画像処理部123は、図5に示したように構成されている。
 この場合における画像処理部123は、コマ落とし部123A、三次元モデル生成部123B、三次元モデル回転部123C、二次元画像生成部123Dを備えている。
 上述したように、コマ落とし部123Aは、動画像データに含まれる静止画像データから、少なくとも複数の静止画像データを画像処理(変換)の対象として抜き出すものである。抜き出された静止画像データのみが、静止画像データから変換静止画像データへと変換される。動画像データに含まれる静止画像データのすべてを変換静止画像データへの変換の対象としないのは、コンピュータ装置100のコンピューティングパワーが、即時性の求められる動画像データの変換動画像データへの変換(或いは、静止画像データの変換静止画像データへの変換)を行うには不足する場合があり得るからである。したがって、コンピュータ装置100のコンピューティングパワーが十分なのであれば、コマ落とし部123Aは不要であるということになる。
 これには限られないが、この実施形態におけるコマ落とし部123Aは、カメラ210から送られてきた60fpsの動画像データに含まれる静止画像データを5つ置きに抜き出し、1秒あたり10の静止画像データを抜き出すこととしている。もっとも、コマ落とし部123Aは常に一定の数おきの静止画像データを抜き出す必要はなく、また、1秒あたりで抜き出される静止画像データの数を10とする必要もない。その数は、例えば、6~8程度、或いはそれ以上とすることができる。
 また、三次元モデル生成部123Bは、上述のように、コマ落とし部123Aで抜き出された少なくとも複数の静止画像データのそれぞれから、静止画像データによって特定される静止画像に写り込んだ三次元モデルを生成する。三次元モデルは、例えば、ワイヤーフレームモデルであるがこれに限定されるものではない。
 また、三次元モデル回転部123Cは、三次元モデル生成部123Bで複数生成された三次元モデルをそれぞれ一定の角度である回転角分回転させる処理を行う。三次元モデルのそれぞれが回転させられる向き及び角度はすべての三次元モデルについて一定である。また、二次元画像生成部123Dは、三次元モデル回転部123Cで回転させられた三次元モデルのそれぞれに基づいて、変換静止画像データを生成する。
 ここで、三次元モデル回転部123Cが三次元モデルを回転させるときにおける回転角は、回転させられた後の三次元モデルに基づいて二次元画像を生成した(即ち、二次元画像に戻した)ときにその二次元画像に含まれることになる対象顔(より正確には、対象顔の顔面部分)が、仮想位置にあるカメラで撮像した場合における対象顔と同じものとなるように決定される。仮想位置は、正面を向いた(ユーザが自然な体勢を取った)ときの対象顔から正面方向に伸びる仮想の直線上の所定の位置である。つまり、三次元モデル回転部123Cは、実位置にあるカメラ210で撮像された動画像データ(或いは静止画像データ)を、対象顔に関しては、仮想位置にある仮想のカメラで撮像したものと同じになるように、対象顔の顔面部分の三次元モデルを回転させるのである。
 第1の変換方法では、回転角は予め決定されている。回転角を特定するデータは例えば、三次元モデル回転部123Cに予め記録されており、その回転角を特定するデータで特定される回転角分だけ、三次元モデル回転部123Cは三次元モデルを回転させる。
(First conversion method)
When the image processing unit 123 executes the first conversion method, the image processing unit 123 is configured as shown in FIG.
The image processing unit 123 in this case includes a frame dropping unit 123A, a three-dimensional model generation unit 123B, a three-dimensional model rotation unit 123C, and a two-dimensional image generation unit 123D.
As described above, the frame dropping unit 123A extracts at least a plurality of still image data as image processing (conversion) targets from the still image data included in the moving image data. Only the extracted still image data is converted from still image data to converted still image data. All of the still image data included in the moving image data is not subject to conversion into the converted still image data because the computing power of the computer device 100 converts the moving image data into the moving image data, which requires immediateness. This is because there may be a shortage in performing conversion (or conversion of still image data into still image data). Therefore, if the computing power of the computer device 100 is sufficient, it means that the frame dropping unit 123A is unnecessary.
Although not limited to this, the frame dropping unit 123A in this embodiment extracts every five still image data included in the moving image data of 60 fps sent from the camera 210 every ten still images per second. The data will be extracted. However, the frame dropping unit 123A need not always extract a fixed number of still image data, and need not set the number of still image data extracted per second to 10. The number can be, for example, about 6 to 8 or more.
In addition, as described above, the three-dimensional model generation unit 123B reflects the three-dimensional model reflected in the still image specified by the still image data from each of the at least a plurality of still image data extracted by the frame dropping unit 123A. To generate. The three-dimensional model is, for example, a wire frame model, but is not limited to this.
In addition, the three-dimensional model rotation unit 123C performs a process of rotating each of the three-dimensional models generated by the three-dimensional model generation unit 123B by a certain rotation angle. The orientation and angle in which each of the 3D models is rotated is constant for all 3D models. The two-dimensional image generation unit 123D also generates converted still image data based on each of the three-dimensional models rotated by the three-dimensional model rotation unit 123C.
Here, the rotation angle when the three-dimensional model rotating unit 123C rotates the three-dimensional model is such that the two-dimensional image is generated based on the three-dimensional model after being rotated (that is, returned to the two-dimensional image). The target face (more accurately, the face portion of the target face) that is sometimes included in the two-dimensional image is determined so as to be the same as the target face when captured by the camera in the virtual position. .. The virtual position is a predetermined position on a virtual straight line extending in the front direction from the target face when facing the front (the user takes a natural posture). That is, the three-dimensional model rotation unit 123C makes the moving image data (or still image data) captured by the camera 210 at the actual position the same as that captured by the virtual camera at the virtual position for the target face. The 3D model of the facial part of the target face is rotated so that
In the first conversion method, the rotation angle is predetermined. The data that specifies the rotation angle is, for example, recorded in advance in the three-dimensional model rotation unit 123C, and the three-dimensional model rotation unit 123C rotates the three-dimensional model by the rotation angle specified by the data that specifies the rotation angle. Let
 三次元モデル生成部123B、三次元モデル回転部123C、二次元画像生成部123Dでそれぞれ行われる処理の内容、及び本願発明の原理を、図6~図8を用いて概念的に説明する。
 図6(A)には、カメラ210と対象顔の関係が側面図で示されている。カメラ210は、ディスプレイ101の直上の実位置に存在している。なお、この例では、カメラ210は、水平方向で考えれば対象顔の正面方向ではあるが、対象顔よりも上側に位置するものとする。この場合には、カメラ210は、角度θの分だけ上側から対象顔を撮像することになり、カメラ210で生成される動画像データによる動画像、或いは動画像データに含まれる静止画像データによる静止画像中に映り込んだ対象顔は、角度θ分だけ上側から撮像されたものとなる。かかる動画像データによる画像を、相手方の通信システム10に含まれるディスプレイ101で表示した場合の例を示すのが、図6(B)である。この例から明らかなように、動画像データそのものに基づく動画像をディスプレイ101に表示した場合、動画像に含まれる対象顔は角度θ分だけ下方を向いた状態となる。
 ここで、三次元モデル生成部123Bが、静止画像データによって特定される静止画像に含まれる対象顔のうちの顔面部分の三次元モデルを生成する。
 三次元モデル生成部123Bは、まず、静止画像に含まれる画像の中から、対象顔の顔面部分Fを抽出する。顔面部分Fの抽出の方法は、どのような方法によっても良いが一般的な画像認識技術によれば良い。図7(A)において破線で囲まれた範囲が顔面部分Fである。この実施形態における顔面部分は、これには限られないが、人間の頭部(対象顔)のうち、概ね、耳より前で額より下の部分を意味する。もっとも、顔面部分の範囲が、少なくとも目、鼻、口を含む範囲でより狭くとも、また、頭部の全体に至るまでより広くとも構わない。
 三次元モデル生成部123Bは、上述の顔面部分Fについての三次元モデルを生成する。三次元モデル生成部123Bは、多数の顔の機械学習によって得られた人間の顔の三次元モデルを推定する変換アルゴリズムを用いて三次元モデルの生成を行う。顔が写り込んだ一般的な二次元の静止画像1つから(言い換えれば、一枚の顔写真のデータから)その静止画像に写り込んでいる顔の顔面部分の三次元モデルを自動的に作る技術は、”Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression/Accepted to ICCV 2017”(URL:http://aaronsplace.co.uk/papers/jackson2017recon/)の論文に詳しく開示されている。上述の変換アルゴリズムは、様々な人間の顔を様々な角度から撮像することによって生成された人間の顔の写り込んだ多数の二次元の静止画像をサンプルとしてコンピュータによって機械学習させることによって生成されたものである。三次元モデル生成部123Bは、その変換アルゴリズムを用いて、静止画像データによって特定される静止画像に写り込んだ対象顔のうちの顔面部分Fの三次元モデルを、自動的に生成する。
 その場合により生成される三次元モデルは、例えば、図7(B)に示されたようなものとなる。図7(B)(1)に示されたのは、正面から見た対象顔の顔面部分Fの三次元モデルである。三次元モデルは、これには限られないがワイヤーフレームモデルとなっている。また、同(2)に示されたのは、顔面部分Fの三次元モデルの側面図であってワイヤーフレームを省略したものである。顔面部分Fは、図6(A)で示した角度θ分だけ下を向いている。
 三次元モデル生成部123Bは、また、静止画像データのうち、顔面部分Fを除いた部分のデータ、つまり、図7(A)における顔面部分Fの周囲の部分の静止画像についてのデータを生成し、それを二次元画像生成部123Dに送る。
 角度θ分だけ下方を向いた状態にある三次元モデルは当然に、角度θ分だけ上方向に回転させれば正面を向く。ここで、角度θは、図8に示したaとbを用いて簡単に求めることができる。θは、θ=atan(b/a)というごく簡単な計算式によって求められる。ここで、aは、カメラの仮想位置Xから対象顔までの水平方向の距離、bは、カメラ210の仮想位置Xからカメラ210の実位置までの垂直方向の距離である。なお、この例ではカメラ210の仮想位置Xは、対象顔の正面方向におけるディスプレイ101の直前の位置としている。つまり、仮想位置Xは、自然な体勢を取ったユーザの対象顔の正面方向に伸びる仮想の直線上に位置する。その条件が満たされる限り、仮想位置Xとディスプレイ101の相対的な位置関係は不問であり、例えば仮想位置Xはディスプレイ101内に位置しても良いし、ディスプレイ101の背後に位置しても良い。例えば、aが40cm、bが=10cmならθ は約14度、aが30cm、bが5cmならθは約9.5度となる。2つの角度のうちの前者は、デスクトップ型のコンピュータ装置100と組み合わせて作られる通信システム10においてよく見られる値であって、後者は、スマートフォンを用いて作られる通信システム10においてよく見られる値である。
 図7(B)に示された三次元モデルを、三次元モデル回転部123Cが角度θ分だけ垂直平面内で上方向に回転させる。そうすると、三次元モデルは、図7(C)に示したように正面を向く。図7(C)(1)に示されたのは、正面から見た対象顔の顔面部分Fの三次元モデルである。また、同(2)に示されたのは、顔面部分Fの三次元モデルの側面図であってワイヤーフレームを省略したものである。これには限られないが、この実施形態における三次元モデル回転部123Cは、所定の点を中心として三次元モデルを回転させる。三次元モデルを回転させるための処理としては、三次元モデルをある軸(例えば、両耳を貫く水平な直線、或いは、頭蓋の平面視した場合の中心を鉛直に貫く直線、或いはそれらの双方)周りに回転させることも可能である。しかしながら、そのような処理を行うには、三次元モデルの中で耳、或いは頭蓋の平面視した場合の中心の位置を検出してその座標を特定する処理が必要となる。三次元モデルが存在する仮想の空間の中のある点(仮想の点であって、三次元モデルの内部に位置するか否かを問わない。例えば、その点は、三次元モデルが存在する仮想空間を特定する原点である。)を中心として三次元モデルを回転させることにより、上述の如き煩雑な処理を省略することが可能となる。これには限られないが、この実施形態では、所定の点は、カメラのレンズ位置であり、三次元モデルが存在する仮想の空間の原点である。また、これには限られないが、この実施形態では、三次元モデルの回転は、所定の点を原点とした空間座標の変換として実行される。このようにすることで、三次元モデルや静止画像において、どこが目でどこが鼻か、といったことを検出する必要がなくなるり、三次元モデルを対象顔の顔面の形状を持った単なる塊として扱えるようになる。
 そして、二次元画像生成部123Dが、三次元モデル回転部123Cによって回転させられた後の図7(C)に示された三次元モデルを用いて、再度二次元画像のデータを生成する。かかる二次元画像は、三次元モデル生成部123Bから二次元画像生成部123Dへと送られていた、静止画像データのうち顔面部分Fを除いた部分のデータにおける、除かれている顔面部分Fに対応する範囲に貼り込まれる。そのようにして得られた静止画像が変換静止画像であり、変換静止画像のデータが変換静止画像データである。得られた変換静止画像に含まれる対象顔は、図7(D)に示したように、基本的に正面を向く。三次元モデル生成部123Bから二次元画像生成部123Dへと送られる、静止画像データのうち顔面部分Fを除いた部分のデータは、静止画像データのうち顔面部分Fを除いた部分のデータそのものであってもよいが、それに対して何らかの処理を行ったものであっても構わない。図7(D)における顔面部分Fの範囲は、図7(B)における顔面部分Fと一致しているが、その範囲に貼り込まれる、回転させられた後の三次元モデルを用いて生成された二次元画像の縁は、顔面部分Fの範囲の縁と完全には一致しないことがある。それによる不自然さを低減させたいのであれば、上述の何らかの処理を行えば良い。その処理は例えば、回転後の三次元モデルから生成された二次元画像の縁を、顔面部分Fの縁と一致させるためのものであればどのようでも良いが、その処理は二次元的な画像処理であって、例えば、1方向における画像の拡縮の他、2方向における画像の拡縮、回転等が考えられる。例えば、下を向いた対象顔の顔面部分Fの三次元モデルを回転させて正面を向けた場合、その例えば縦方向の見かけ上の長さが短くなる。そのような見かけ上の長さの変更に対応して、三次元モデル生成部123Bは、対象顔の顔面部分F以外の部分の静止画像に縦方向の長さを縮小させる処理を行うことができる。そうすると、三次元モデルから生成された顔面の画像の縁は顔面部分Fの範囲に良く一致するようになる。
 なお、カメラ210の実位置が顔の正面方向からずれているのであれば、上述の例で縦方向で三次元モデルを回転させたのと同様にして、水平平面内で横方向に回転させることが当然に必要となるが、その説明は省略する。もちろん、三次元モデル回転部123Cは、縦方向での回転と横方向の回転との2つの処理を個別に行う必要はなく、両回転を合成した1回の回転を三次元モデル回転部123Cで行うことももちろん可能である。
The contents of the processes performed by the three-dimensional model generation unit 123B, the three-dimensional model rotation unit 123C, and the two-dimensional image generation unit 123D, and the principle of the present invention will be conceptually described with reference to FIGS. 6 to 8.
FIG. 6A shows a side view of the relationship between the camera 210 and the target face. The camera 210 exists in the actual position immediately above the display 101. In this example, it is assumed that the camera 210 is located above the target face, although it is in the front direction of the target face when considered in the horizontal direction. In this case, the camera 210 images the target face from the upper side by the angle θ, and the moving image based on the moving image data generated by the camera 210 or the still image based on the still image data included in the moving image data. The target face reflected in the image is captured from above from the angle θ. FIG. 6B shows an example in which an image based on such moving image data is displayed on the display 101 included in the communication system 10 of the other party. As is clear from this example, when a moving image based on the moving image data itself is displayed on the display 101, the target face included in the moving image is directed downward by the angle θ.
Here, the three-dimensional model generation unit 123B generates a three-dimensional model of the face part of the target face included in the still image specified by the still image data.
The three-dimensional model generation unit 123B first extracts the face portion F of the target face from the image included in the still image. The method for extracting the face portion F may be any method, but a general image recognition technique may be used. The area surrounded by the broken line in FIG. 7A is the face portion F. Although not limited to this, the face portion in this embodiment means a portion of the human head (target face) that is generally in front of the ears and below the forehead. However, the range of the face part may be narrower at least in the range including eyes, nose, and mouth, or may be wider up to the entire head.
The three-dimensional model generation unit 123B generates a three-dimensional model for the above-mentioned face portion F. The three-dimensional model generation unit 123B generates a three-dimensional model using a conversion algorithm that estimates a three-dimensional model of a human face obtained by machine learning of many faces. Automatically create a three-dimensional model of the facial part of a face reflected in a still image from one general two-dimensional still image (in other words, from the data of a single facial photograph) The technology is disclosed in detail in the paper "Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression / Accepted to ICCV 2017" (URL: http://aaronsplace.co.uk/papers/jackson2017recon/). .. The conversion algorithm described above is generated by machine learning by a computer using a large number of two-dimensional still images of human faces generated by capturing various human faces from various angles. It is a thing. The three-dimensional model generation unit 123B automatically generates a three-dimensional model of the face portion F of the target face reflected in the still image specified by the still image data, using the conversion algorithm.
The three-dimensional model generated in that case is, for example, as shown in FIG. 7 (B). What is shown in FIG. 7B (1) is a three-dimensional model of the face portion F of the target face viewed from the front. The three-dimensional model is a wire frame model, but is not limited to this. Further, (2) is a side view of the three-dimensional model of the face portion F, in which the wire frame is omitted. The face portion F faces downward by the angle θ shown in FIG.
The three-dimensional model generation unit 123B also generates data of a portion of the still image data excluding the face portion F, that is, data of a still image of a portion around the face portion F in FIG. 7A. , And sends it to the two-dimensional image generation unit 123D.
The three-dimensional model, which is in the state of facing downward by the angle θ, naturally faces the front if it is rotated upward by the angle θ. Here, the angle θ can be easily obtained by using a and b shown in FIG. θ is obtained by a very simple calculation formula such as θ = atan (b / a). Here, a is the horizontal distance from the virtual position X of the camera to the target face, and b is the vertical distance from the virtual position X of the camera 210 to the actual position of the camera 210. In this example, the virtual position X of the camera 210 is the position just before the display 101 in the front direction of the target face. That is, the virtual position X is located on a virtual straight line extending in the front direction of the target face of the user who takes a natural posture. As long as the condition is satisfied, the relative positional relationship between the virtual position X and the display 101 does not matter. For example, the virtual position X may be located inside the display 101 or behind the display 101. . For example, when a is 40 cm and b is 10 cm, θ is about 14 degrees, and when a is 30 cm and b is 5 cm, θ is about 9.5 degrees. The former of the two angles is a value often found in the communication system 10 made in combination with the desktop computer device 100, and the latter is a value often found in the communication system 10 made using a smartphone. is there.
The three-dimensional model rotating unit 123C rotates the three-dimensional model shown in FIG. 7B upward by an angle θ in the vertical plane. Then, the three-dimensional model faces the front as shown in FIG. FIG. 7C (1) shows a three-dimensional model of the face portion F of the target face viewed from the front. Further, (2) is a side view of the three-dimensional model of the face portion F, in which the wire frame is omitted. Although not limited to this, the three-dimensional model rotation unit 123C in this embodiment rotates the three-dimensional model about a predetermined point. The process for rotating the 3D model is as follows: the 3D model has a certain axis (for example, a horizontal straight line that penetrates both ears, or a straight line that vertically penetrates the center of the skull when viewed in plan), or both of them. It is also possible to rotate around. However, in order to perform such processing, it is necessary to detect the position of the center of the ear or the skull in a three-dimensional model when viewed in plan and specify the coordinates thereof. A point in a virtual space in which the 3D model exists (whether or not the point is a virtual point and is located inside the 3D model. For example, the point is a virtual point where the 3D model exists. By rotating the three-dimensional model about the origin (which specifies the space)), the complicated processing as described above can be omitted. Although not limited to this, in this embodiment, the predetermined point is the lens position of the camera, and is the origin of the virtual space in which the three-dimensional model exists. Further, although not limited to this, in this embodiment, the rotation of the three-dimensional model is executed as a transformation of spatial coordinates with a predetermined point as the origin. By doing this, it becomes unnecessary to detect where the eyes are and where the nose is in the 3D model or the still image, and the 3D model can be treated as a mere mass having the shape of the target face. become.
Then, the two-dimensional image generation unit 123D again generates the two-dimensional image data using the three-dimensional model shown in FIG. 7C after being rotated by the three-dimensional model rotation unit 123C. Such a two-dimensional image is displayed on the excluded face portion F in the data of the portion of the still image data excluding the face portion F that has been sent from the three-dimensional model generating unit 123B to the two-dimensional image generating unit 123D. It is pasted in the corresponding area. The still image thus obtained is the converted still image, and the data of the converted still image is the converted still image data. The target face included in the obtained converted still image basically faces the front as shown in FIG. The data of the part of the still image data excluding the face part F, which is sent from the three-dimensional model generation part 123B to the two-dimensional image generation part 123D, is the data itself of the part of the still image data excluding the face part F. It may be present, but it may be something that has undergone some processing. The range of the face part F in FIG. 7 (D) is the same as the face part F in FIG. 7 (B), but is generated using the three-dimensional model after being rotated and pasted in the range. The edge of the two-dimensional image may not completely coincide with the edge of the range of the face portion F. If it is desired to reduce the unnaturalness due to this, the above-mentioned processing may be performed. The processing may be, for example, any method as long as the edge of the two-dimensional image generated from the rotated three-dimensional model is made to coincide with the edge of the face portion F, but the processing is a two-dimensional image. The processing may be, for example, image scaling in one direction, image scaling in two directions, rotation, and the like. For example, when the three-dimensional model of the face portion F of the target face facing downward is rotated to face the front, the apparent length in the vertical direction, for example, becomes short. In response to such an apparent length change, the three-dimensional model generation unit 123B can perform a process of reducing the vertical length of a still image of a part other than the face part F of the target face. .. Then, the edge of the image of the face generated from the three-dimensional model matches the range of the face portion F well.
If the actual position of the camera 210 is deviated from the frontal direction of the face, the camera 210 is rotated in the horizontal direction in the horizontal direction in the same manner as when the three-dimensional model is rotated in the vertical direction in the above example. Is naturally required, but the description thereof is omitted. Of course, the three-dimensional model rotation unit 123C does not need to individually perform the two processes of the vertical rotation and the horizontal rotation, and the three-dimensional model rotation unit 123C performs one rotation that is a combination of both rotations. Of course, it is possible to do it.
 このようにして、コマ落とし部123Aで抜き出された静止画像データのそれぞれは、変換静止画像データに変換される。
 その結果生成された変換静止画像データは、二次元画像生成部123Dから次々に出力部125に対して出力される。この多数連なった変換静止画像データの集合が変換動画像データである。つまり、画像処理部123から出力部125に、変換動画像データが出力されることになる。
In this way, each of the still image data extracted by the frame dropping unit 123A is converted into converted still image data.
The converted still image data generated as a result is sequentially output from the two-dimensional image generation unit 123D to the output unit 125. This set of a large number of converted still image data is the converted moving image data. That is, the converted moving image data is output from the image processing unit 123 to the output unit 125.
 なお、第1の変換方法を実行する場合には、上述のように、よくある、或いは典型的な回転角θ(上述の例でいえば、14度とか9.5度)を、三次元モデル回転部123Cが三次元モデルを回転させるための角度として用いる。この回転角は複数の回転角の中から選択可能とすることも可能ではあるが、しかしながら基本的に固定されたものとなっている。したがって、上述の例におけるaとbの数値が、カメラ210の実位置と仮想位置との関係に即していない場合もあり得る。カメラ210の仮想位置は、コンピュータプログラムとの関係で自由に決定することができることに鑑みれば、そのような事態は、要するに、カメラ210の実位置がコンピュータプログラムの設計時に予定された位置でない位置にある、という場合に生じる。
 したがって、第1の変換方法は、カメラ210の実位置が予定された位置或いはそこからそう遠くない位置に存在する場合に特に有効になる。例えば、コンピュータ装置100が、ラップトップ型のパーソナルコンピュータ、スマートフォン、タブレット等である場合には、カメラの実位置はそれらの筐体に対して固定されている。そのような場合には、例えば、カメラの仮想位置を、ラップトップ型のパーソナルコンピュータ、スマートフォン、タブレットが備えるディスプレイの直前、或いは背後等の適当な位置と決定するのであれば、カメラの実位置と仮想位置を一意に決定できることになる。このように画像処理装置を構成する機器の仕様が当所から明らかなのであれば、対象顔とカメラ210の仮想位置との距離、或いは対象顔とディスプレイ101との距離はディスプレイ101の大きさによってある程度予想可能であるから、それらを総合的に考慮すれば、ある程度確からしい精度で回転角θを予め決定することが可能である。たとえば、コンピュータ装置100を本願における画像処理装置として機能させるためのコンピュータプログラムには、多種多様なラップトップ型のパーソナルコンピュータ、スマートフォン、タブレット等のそれぞれにおけるカメラの仮想位置(或いは、実位置と仮想位置との関係から把握可能な上述の回転角を特定するデータ)についてのデータ(つまりは、機種とカメラの仮想位置とを対にした、多数組のデータ)を含めておくことができる。その場合、コンピュータ装置100にそのコンピュータプログラムがインストールされた後において、コンピュータプログラムがそのコンピュータの機種を自動的に特定する機能か、或いは、コンピュータ装置100にそのコンピュータプログラムがインストールされた後において、コンピュータプログラムがインストールされたコンピュータ装置100の機種を特定するためのユーザがした入力を受け付ける機能のいずれかを、そのコンピュータプログラムが実装していてもよい。そうすることによって、コンピュータプログラムによってそのコンピュータ装置100を本願発明の画像処理装置として機能させる場合に、機種と仮想位置との関係から、その画像処理装置に相応しい上述の回転角を自動的に決定できるようにすることが可能となる。
 また、コンピュータ装置100がこの実施形態で説明しているようにデスクトップ型であり、ディスプレイ101とカメラ210の位置関係をある程度の自由度を持って決定できる場合であっても、予め回転角を決定しておくことも可能である。その場合には、例えば、「ディスプレイの上下方向及び幅方向の中心から何cm上側の位置にカメラを配置し、ディスプレイの中央の直前のカメラの仮想位置から何cm対象顔を離した状態でこの画像処理装置を使用せよ」という指示をユーザに知らしめてユーザにディスプレイ101とカメラ210の位置関係を予め設定した通りにさせるとともに、上述のようにして決定した仮想位置とユーザがそう設定するであろうカメラ210の実位置との関係を考慮して、予め回転角を決定しておくことも可能である。
When the first conversion method is executed, as described above, the common or typical rotation angle θ (14 degrees or 9.5 degrees in the above example) is set to the three-dimensional model. The rotation unit 123C uses it as an angle for rotating the three-dimensional model. This rotation angle can be selected from a plurality of rotation angles, however, it is basically fixed. Therefore, the numerical values of a and b in the above example may not match the relationship between the real position and the virtual position of the camera 210. In view of the fact that the virtual position of the camera 210 can be freely determined in relation to the computer program, such a situation is essentially that the actual position of the camera 210 is not the position planned at the time of designing the computer program. It occurs when there is.
Therefore, the first conversion method is particularly effective when the actual position of the camera 210 exists at a planned position or a position not far from it. For example, when the computer device 100 is a laptop personal computer, a smart phone, a tablet, or the like, the actual position of the camera is fixed with respect to these housings. In such a case, for example, if the virtual position of the camera is determined to be an appropriate position immediately before or behind the display of a laptop personal computer, smartphone, or tablet, the actual position of the camera is The virtual position can be uniquely determined. If the specifications of the devices constituting the image processing apparatus are clear from this point, the distance between the target face and the virtual position of the camera 210, or the distance between the target face and the display 101 is predicted to some extent according to the size of the display 101. Since it is possible, by comprehensively considering them, it is possible to determine the rotation angle θ in advance with a certain degree of certain accuracy. For example, the computer program for causing the computer device 100 to function as the image processing device in the present application includes a virtual position (or a real position and a virtual position) of a camera in each of various laptop personal computers, smartphones, tablets, and the like. It is possible to include data (that is, a large number of sets of data, which is a pair of the model and the virtual position of the camera) about the above-described data that specifies the rotation angle that can be grasped from the relationship with. In that case, after the computer program is installed in the computer device 100, the computer program has a function of automatically specifying the model of the computer, or after the computer program is installed in the computer device 100, The computer program may implement any of the functions of receiving an input made by the user for specifying the model of the computer device 100 in which the program is installed. By doing so, when the computer device 100 is caused to function as the image processing device of the present invention by the computer program, the above-described rotation angle suitable for the image processing device can be automatically determined from the relationship between the model and the virtual position. It becomes possible to do so.
Further, even if the computer device 100 is a desktop type as described in this embodiment and the positional relationship between the display 101 and the camera 210 can be determined with some degree of freedom, the rotation angle is determined in advance. It is also possible to keep it. In that case, for example, “How many cm above the center of the display in the up-down direction and the width direction, the camera is placed, and how many cm away from the virtual position of the camera immediately in front of the center of the display. The user is instructed to use the image processing apparatus, and the user is made to have the positional relationship between the display 101 and the camera 210 set in advance, and the virtual position determined as described above and the user set it accordingly. The rotation angle can be determined in advance in consideration of the relationship with the actual position of the wax camera 210.
(第2の変換方法)
 第2の変換方法を画像処理部123が実行する場合、画像処理部123は、図9に示したように構成されている。
 この場合における画像処理部123は、第1の変換方法を実行する画像処理部123と同様に、コマ落とし部123A、三次元モデル生成部123B、三次元モデル回転部123C、二次元画像生成部123Dを備えている。それらの構成、機能は、第2の変換方法を実行する場合における三次元モデル回転部123Cが回転角を特定するデータを予め記録していないという点を除けばすべて、第1の変換方法の場合と同じである。
 他方、第2の変換方法を実行する画像処理部123には、角度検出部123Eが存在する。角度検出部123Eは、主制御部122から送られてきた動画像データに基づいて所定の演算を行うことにより、上述の回転角を決定するものである。なお、図9では、主制御部122から角度検出部123Eに直接動画像データが入力されるものとしているが、角度検出部123Eは、コマ落とし部123Aが抜き出した静止画像データに基づいて回転角θ決定してもよい。
 このような角度検出部123Eを用いるのであれば、カメラ210の実位置と仮想位置との相対的な位置関係にこだわる必要はなくなる。
 角度検出部123Eが動画像データから回転角を自動的に求められるようにするには、角度検出部123Eに機械学習をさせておくことが考えられる。様々な角度から撮像した顔の画像と、それぞれの画像をどのような角度から撮像したのかということを角度検出部123Eに学習させておけば、動画像データに含まれる静止画像データによる静止画像に映り込んだ顔がどのような角度から撮像されたのかを角度検出部123Eに検出させることが可能となる。それが可能であるなら当然に角度検出部123Eは、回転角θの大きさをもちろん回転の方向も含めて決定することができる。
 なお、第2の変換方法を用いる場合においては、例えば、「回転角を決定するまでの例えば数秒間は、正面を向いた状態を保て」といった指示をユーザに知らしめておくとともに、ユーザにその指示を守らせるようにするのが望ましい。さもなくば、入力装置102から、回転角を決定するモードを実行するためのデータの入力を可能とするとともに、回転角を決定するモードを、開始データの入力の例えば前に予め行っておくことが考えられる。
 角度検出部123Eが決定した回転角を特定するデータは、角度検出部123Eから三次元モデル回転部123Cへと送られる。三次元モデル回転部123Cはそのデータで特定される回転角で、第1の変換方法の場合と同様に各三次元モデルを同じ角度、同じ方向に回転させる。
 第2の変換方法が実行される場合でも、画像処理部123から出力部125に、変換動画像データが出力される。
(Second conversion method)
When the image processing unit 123 executes the second conversion method, the image processing unit 123 is configured as shown in FIG.
In this case, the image processing unit 123, like the image processing unit 123 that executes the first conversion method, has a frame dropping unit 123A, a three-dimensional model generation unit 123B, a three-dimensional model rotation unit 123C, and a two-dimensional image generation unit 123D. Is equipped with. All of their configurations and functions are the same as those in the case of the first conversion method except that the three-dimensional model rotation unit 123C in the case of executing the second conversion method does not previously record the data for specifying the rotation angle. Is the same as.
On the other hand, the image processing unit 123 that executes the second conversion method includes the angle detection unit 123E. The angle detector 123E determines the above-mentioned rotation angle by performing a predetermined calculation based on the moving image data sent from the main controller 122. Note that in FIG. 9, the moving image data is directly input from the main control unit 122 to the angle detection unit 123E, but the angle detection unit 123E uses the rotation angle based on the still image data extracted by the frame dropping unit 123A. θ may be determined.
If such an angle detection unit 123E is used, it is not necessary to pay attention to the relative positional relationship between the actual position and the virtual position of the camera 210.
In order for the angle detection unit 123E to automatically obtain the rotation angle from the moving image data, it is conceivable to let the angle detection unit 123E perform machine learning. If the angle detection unit 123E learns the image of the face taken from various angles and the angle at which each image was taken, the still image based on the still image data included in the moving image data can be obtained. It is possible to cause the angle detection unit 123E to detect from what angle the reflected face is imaged. If that is possible, the angle detector 123E can naturally determine the magnitude of the rotation angle θ including the direction of rotation.
In the case of using the second conversion method, for example, the user is informed of an instruction such as "keep the front facing for a few seconds until the rotation angle is determined", and the user is informed of the instruction. It is desirable to follow the instructions. Otherwise, it is possible to input data for executing the mode for determining the rotation angle from the input device 102, and the mode for determining the rotation angle should be performed in advance, for example, before the input of the start data. Is possible.
Data specifying the rotation angle determined by the angle detection unit 123E is sent from the angle detection unit 123E to the three-dimensional model rotation unit 123C. The three-dimensional model rotation unit 123C rotates each three-dimensional model in the same angle and in the same direction with the rotation angle specified by the data, as in the case of the first conversion method.
Even when the second conversion method is executed, the converted moving image data is output from the image processing unit 123 to the output unit 125.
(第3の変換方法)
 第3の変換方法を画像処理部123が実行する場合、画像処理部123は、図10に示したように構成されている。
 第3の変換方法は、予め回転角を決定しておくものではなく、第2の変換方法と同様に、回転角を決定する処理をも行うものである。第3の変換方法を実行する場合における画像処理部123は、第2の変換方法を実行する場合における画像処理部123と似たものとなっている。
 第2の変換方法を実行する画像処理部123と同様に、第3の変換方法を実行する画像処理部123は、コマ落とし部123A、三次元モデル生成部123B、三次元モデル回転部123C、二次元画像生成部123Dを備えている。他方、第3の変換方法を実行する画像処理部123は、第2の変換方法を実行する画像処理部123における角度検出部123Eの代わりに、回転角決定部123Fを備えている。
 回転角決定部123Fは、上述した角度検出部123Eと同様に、回転角を決定する機能を有している。角度検出部123Eは、動画像データに基づいて所定の演算を行うことにより回転角を決定したが、回転角決定部123Fは動画像データではなく他のデータに基づいて所定の演算を行うことにより回転角を決定する。
 回転角決定部123Fが回転角を決定するために用いるデータは、入力装置102から入力されたパラメータのデータか、センサ(図示を省略)から入力されたパラメータのデータか、或いはそれらの双方である。入力装置102又はセンサから入力される上記パラメータはいずれも、回転角を決定するために有用なものであればその種類は問わない。
 入力装置102から入力されるパラメータは、例えば、ディスプレイ101の形状(例えば、縦横比が3:4か、9:16か)を特定する情報、ディスプレイ101の大きさ(例えば、ディスプレイ101が何インチか)を特定する情報、カメラの実位置がどこであるかを特定する情報(例えば、ディスプレイ101の幅方向の中央におけるディスプレイ101の直上とか、ディスプレイ101の右上隅とか)、ディスプレイ101から対象顔までの距離を特定する情報等である。
 センサは、カメラ210の実位置と仮想位置との相対的な位置関係や、カメラ210の仮想位置と対象顔の相対的な位置関係を求めるために有用なパラメータを測定するものとすることができる。例えば、公知或いは周知の測距装置をセンサとし、センサから対象顔の距離を測定するパラメータとすることができる。
 回転角決定部123Fが決定した回転角を特定するデータは、回転角決定部123Fから三次元モデル回転部123Cへと送られる。三次元モデル回転部123Cはそのデータで特定される回転角で、第1の変換方法の場合と同様に各三次元モデルを同じ角度、同じ方向に回転させる。
 第3の変換方法が実行される場合でも、画像処理部123から出力部125に、変換動画像データが出力される。
 第3の変換方法を用いる場合においても、入力装置102から、回転角を決定するモードを実行するためのデータの入力を可能とするとともに、回転角を決定するモードを、開始データの入力の例えば前に予め行っておくのが好ましい。
(Third conversion method)
When the image processing unit 123 executes the third conversion method, the image processing unit 123 is configured as shown in FIG.
The third conversion method does not determine the rotation angle in advance, but also performs the process of determining the rotation angle, like the second conversion method. The image processing unit 123 when executing the third conversion method is similar to the image processing unit 123 when executing the second conversion method.
Similar to the image processing unit 123 that executes the second conversion method, the image processing unit 123 that executes the third conversion method includes a frame dropping unit 123A, a three-dimensional model generation unit 123B, a three-dimensional model rotation unit 123C, and a two-dimensional model rotation unit 123C. The three-dimensional image generation unit 123D is provided. On the other hand, the image processing unit 123 that executes the third conversion method includes a rotation angle determination unit 123F instead of the angle detection unit 123E in the image processing unit 123 that executes the second conversion method.
The rotation angle determination unit 123F has a function of determining the rotation angle, like the angle detection unit 123E described above. The angle detection unit 123E determines the rotation angle by performing a predetermined calculation based on the moving image data, but the rotation angle determination unit 123F performs a predetermined calculation based on other data instead of the moving image data. Determine the rotation angle.
The data used by the rotation angle determination unit 123F to determine the rotation angle is parameter data input from the input device 102, parameter data input from a sensor (not shown), or both of them. .. Any of the above parameters input from the input device 102 or the sensor may be of any type as long as it is useful for determining the rotation angle.
The parameters input from the input device 102 are, for example, information that specifies the shape of the display 101 (for example, the aspect ratio is 3: 4 or 9:16) and the size of the display 101 (for example, how many inches the display 101 is. Information that specifies where the actual position of the camera is (for example, directly above the display 101 at the center of the width direction of the display 101 or the upper right corner of the display 101), from the display 101 to the target face Is information for specifying the distance of.
The sensor may measure a parameter useful for obtaining a relative positional relationship between the real position and the virtual position of the camera 210 and a relative positional relationship between the virtual position of the camera 210 and the target face. .. For example, a known or well-known distance measuring device may be used as a sensor, and a parameter for measuring the distance of the target face from the sensor may be used.
The data specifying the rotation angle determined by the rotation angle determination unit 123F is sent from the rotation angle determination unit 123F to the three-dimensional model rotation unit 123C. The three-dimensional model rotation unit 123C rotates each three-dimensional model in the same angle and in the same direction with the rotation angle specified by the data, as in the case of the first conversion method.
Even when the third conversion method is executed, the converted moving image data is output from the image processing unit 123 to the output unit 125.
Even when the third conversion method is used, data for executing the mode for determining the rotation angle can be input from the input device 102, and the mode for determining the rotation angle can be changed by inputting the start data, for example. It is preferable to do it beforehand.
(第4の変換方法)
 第4の変換方法を画像処理部123が実行する場合、画像処理部123は、図11に示したように構成されている。
 第4の変換方法は、予め回転角を決定しておくものではなく、第2、第3の変換方法と同様に、回転角を決定する処理をも行うものである。
 第4の変換方法を実行する画像処理部123は、第1の変換方法を実行する場合における画像処理部123と同じ機能ブロックを備えている。第4の変換方法を実行する画像処理部123は、コマ落とし部123A、三次元モデル生成部123B、三次元モデル回転部123C、二次元画像生成部123Dを備えている。それらの構成、機能は、第4の変換方法を実行する場合における三次元モデル回転部123Cが回転角を特定するデータを予め記録していないという点、回転角を変更するための回転角変更データが、主制御部122から三次元モデル回転部123Cに入力されるようになっているという点、また、三次元回転モデル回転部123Cは、回転角変更データを受付けるたびに受付けた回転角変更データに基づいて、対象顔の三次元モデルを回転させる回転角を変更するようになっているという点、を除けばすべて、第1の変換方法の場合と同じである。
 第4の変換方法が実行される場合においても、第1の変換方法が実行される場合と同様に、画像処理部123で生成された変換動画像データが出力部125へと送られる。このデータは、出力部125からディスプレイ101にと送られる。そうすると、ディスプレイ101には、後述するように、変換動画像データに基づく動画像が表示されることなる。この表示は、カメラ210で撮像が行われてから略実時間で、好ましくは0.5秒以内に行われる。
 ユーザは、ディスプレイ101に表示された自らの顔(対象顔)を見ながら回転角変更データを入力して、例えば少しずつ対象顔を回転させることで、ディスプレイ101に表示された対象顔を、対象顔が基本的に正面を向くように調節する。回転角変更データは、入力装置102を用いて入力される。回転角変更データは、入力装置102で入力される他のデータと同様にして主制御部122に至り、主制御部122から、三次元モデル回転部123Cに送られる。三次元モデルの回転方向は、これには限られないが、上下方向(X軸周り)と左右方向(Y軸周り)だけで良い。それらは、もちろん入力装置102を用いて入力可能である。ディスプレイ101に表示された対象顔が基本的に正面を向いたときにおいて三次元モデル回転部123Cが三次元モデルを回転させた角度が、それ以降において三次元モデル回転部123Cが対象顔の三次元モデルを画一的な角度で回転させるときの回転角として決定される。
 第4の変換方法が実行される場合でも、画像処理部123から出力部125に、変換動画像データが出力される。
 第4の変換方法を用いる場合においても、入力装置102から、回転角を決定するモードを実行するためのデータの入力を可能とするとともに、回転角を決定するモードを、開始データの入力の例えば前に予め行っておくのが好ましい。
(Fourth conversion method)
When the image processing unit 123 executes the fourth conversion method, the image processing unit 123 is configured as shown in FIG.
The fourth conversion method does not determine the rotation angle in advance, but also performs the processing of determining the rotation angle, like the second and third conversion methods.
The image processing unit 123 that executes the fourth conversion method includes the same functional blocks as the image processing unit 123 when executing the first conversion method. The image processing unit 123 that executes the fourth conversion method includes a frame dropping unit 123A, a three-dimensional model generation unit 123B, a three-dimensional model rotation unit 123C, and a two-dimensional image generation unit 123D. Those configurations and functions are that the three-dimensional model rotation unit 123C in the case of executing the fourth conversion method does not previously record the data specifying the rotation angle, and the rotation angle change data for changing the rotation angle. Is input from the main control unit 122 to the three-dimensional model rotation unit 123C, and the three-dimensional rotation model rotation unit 123C receives the rotation angle change data every time the rotation angle change data is received. All are the same as the case of the first conversion method except that the rotation angle for rotating the three-dimensional model of the target face is changed based on the above.
Even when the fourth conversion method is executed, the converted moving image data generated by the image processing unit 123 is sent to the output unit 125, as in the case where the first conversion method is executed. This data is sent from the output unit 125 to the display 101. Then, on the display 101, a moving image based on the converted moving image data will be displayed, as will be described later. This display is performed in substantially real time after the image is captured by the camera 210, preferably within 0.5 seconds.
The user inputs the rotation angle change data while looking at his / her own face (target face) displayed on the display 101, and rotates the target face little by little, for example, to target the target face displayed on the display 101. Adjust so that your face is basically facing the front. The rotation angle change data is input using the input device 102. The rotation angle change data reaches the main control unit 122 in the same manner as other data input by the input device 102, and is sent from the main control unit 122 to the three-dimensional model rotation unit 123C. The rotation direction of the three-dimensional model is not limited to this, but may be only the vertical direction (around the X axis) and the horizontal direction (around the Y axis). Of course, they can be input using the input device 102. The angle at which the three-dimensional model rotating unit 123C rotates the three-dimensional model when the target face displayed on the display 101 is basically facing the front, and after that, the three-dimensional model rotating unit 123C causes the three-dimensional model of the target face. It is determined as the rotation angle when rotating the model at a uniform angle.
Even when the fourth conversion method is executed, the converted moving image data is output from the image processing unit 123 to the output unit 125.
Even when the fourth conversion method is used, it is possible to input the data for executing the mode for determining the rotation angle from the input device 102, and the mode for determining the rotation angle is set to, for example, the input of the start data. It is preferable to do it beforehand.
 画像処理部123が、上述の第1の変換方法から第4の変換方法のいずれを実行するにせよ、上述したように、出力部125は、画像処理部123から変換動画像データを受取る。この変換動画像データを受取った場合出力部125は、それをインターフェイス114を介して、送受信機構に送る。送受信機構は、上述の指定データによって特定されるコンピュータ装置100、つまり第2通信システム10-2に含まれるコンピュータ装置100に、その変換動画像データを送る。 Regardless of which of the first to fourth conversion methods described above is performed by the image processing unit 123, the output unit 125 receives the converted moving image data from the image processing unit 123 as described above. When receiving the converted moving image data, the output unit 125 sends it to the transmitting / receiving mechanism via the interface 114. The transmission / reception mechanism sends the converted moving image data to the computer device 100 specified by the above-mentioned designated data, that is, the computer device 100 included in the second communication system 10-2.
 第2通信システム10-2に含まれるコンピュータ装置100における送受信機構は、第1通信システム10-1から送られてきた変換動画像データを受取る。この変換動画像データは、送受信機構からインターフェイス114を経て入力部121へと送られ、入力部121から主制御部122へと送られる。
 主制御部122は、この変換動画像データを、出力部125、インターフェイス114を介して、ディスプレイ101へと送る。それにより、第2通信システム10-2におけるディスプレイ101には、第1通信システム10-1から送られてきた変換動画像データに基づく動画像が表示されることになる。
 ディスプレイ101に表示される顔画像は、図12に示したように、基本的に正面を向いたものとなる。
 基本的にというのは、ユーザが自然な体勢を取ったときという意味であると何回か述べた。ここで、第1通信システム10-1のユーザが頷いた場合に、第2通信システム10-2に含まれるディスプレイ101に表示される動画像についても一応説明する。
 図13(A)は、第1通信システム10-1のユーザが角度αだけ、水平方向から下方向を向いた状態を示している。この場合、カメラ210と対象顔の正面方向との間には、角度θ+角度α分のズレが生じている。したがって、何らの画像処理も行わなければ、第2通信システム10-2に含まれるディスプレイ101に表示される動画像に含まれる対象顔は、図13(B)に示される対象顔を、図面右側から見た状態のものとなる。しかしながら、本願発明によれば、角度θ分だけ上方向に回転させられた状態で対象顔がディスプレイ101に表示されるようになる。したがって、第2通信システム10-2に含まれるディスプレイ101に表示される動画像に含まれる対象顔は、図13(C)に示された対象顔を正面から見た状態のものとなる。つまり、角度αだけ、水平方向から下方向を向いた状態の第1通信システム10-1のユーザの対象顔が、第2通信システム10-2に含まれるディスプレイ101に表示されることになる。これは、自然な状態であり、第2通信システム10-2のユーザに対して違和感を与えない。
The transmission / reception mechanism in the computer device 100 included in the second communication system 10-2 receives the converted moving image data sent from the first communication system 10-1. The converted moving image data is sent from the transmission / reception mechanism to the input unit 121 via the interface 114, and then sent from the input unit 121 to the main control unit 122.
The main control unit 122 sends this converted moving image data to the display 101 via the output unit 125 and the interface 114. As a result, a moving image based on the converted moving image data sent from the first communication system 10-1 is displayed on the display 101 in the second communication system 10-2.
The face image displayed on the display 101 basically faces the front as shown in FIG.
I've said several times that basically means when the user is in a natural position. Here, the moving image displayed on the display 101 included in the second communication system 10-2 when the user of the first communication system 10-1 nods will be described.
FIG. 13A shows a state in which the user of the first communication system 10-1 faces downward from the horizontal direction by the angle α. In this case, a deviation of angle θ + angle α occurs between the camera 210 and the front direction of the target face. Therefore, if no image processing is performed, the target face included in the moving image displayed on the display 101 included in the second communication system 10-2 is the target face shown in FIG. It will be as seen from. However, according to the present invention, the target face is displayed on the display 101 while being rotated upward by the angle θ. Therefore, the target face included in the moving image displayed on the display 101 included in the second communication system 10-2 is a state in which the target face shown in FIG. 13C is viewed from the front. That is, the target face of the user of the first communication system 10-1 facing downward from the horizontal direction by the angle α is displayed on the display 101 included in the second communication system 10-2. This is a natural state and does not give a feeling of strangeness to the user of the second communication system 10-2.
<変形例>
 変形例によるテレビ会議システムについて説明する。
 変形例によるテレビ会議システムは、第1実施形態のテレビ会議システムと同様に、第1通信システム10-1と第2通信システム10-2を備えている。ハードウェアとして見た場合、変形例における第1通信システム10-1及び第2通信システム10-2はともに、第1実施形態におけるそれらと変わりない。両通信システム10は、コンピュータ装置100と、ディスプレイ101と、カメラ210とを備えている。
 ただし、第1実施形態における両通信システム10におけるコンピュータ装置100は、動画像データを変換動画像データに変換する機能を有していたが、変形例における両通信システム10におけるコンピュータ装置100はその機能を有していない。つまり、変形例における両通信システム10におけるコンピュータ装置100は本願発明における画像処理装置では無い。変形例における両通信システム10におけるコンピュータ装置100は、後述する変換サーバとのデータのやり取りを除き、従来のテレビ会議システムにおけるそれらと同等の機能しか基本的に持たない。
 変化例におけるテレビ会議システムでは、本願発明における画像処理装置が果たすべき動画像データを変換動画像データに変換する機能を、変換サーバ20-1、変換サーバ20-2が担う。つまり、変形例における変換サーバ20-1、変換サーバ20-2は、クラウドコンピューティングの技術を用いて、第1通信システム10-1と第2通信システム10-2に対して、動画像データを変換動画像データに変換する機能を提供するものといえる。
<Modification>
A video conference system according to a modified example will be described.
The video conference system according to the modified example includes a first communication system 10-1 and a second communication system 10-2, like the video conference system of the first embodiment. When viewed as hardware, both the first communication system 10-1 and the second communication system 10-2 in the modified example are the same as those in the first embodiment. Both communication systems 10 include a computer device 100, a display 101, and a camera 210.
However, the computer device 100 in both communication systems 10 in the first embodiment has a function of converting moving image data into converted moving image data, but the computer device 100 in both communication systems 10 in the modified example has the function. Do not have. That is, the computer device 100 in both communication systems 10 in the modification is not the image processing device in the present invention. The computer device 100 in both communication systems 10 in the modification basically has only the same functions as those in the conventional video conference system except for the data exchange with the conversion server described later.
In the video conference system in the modified example, the conversion server 20-1 and the conversion server 20-2 have a function of converting the moving image data to be converted moving image data, which the image processing apparatus according to the present invention should perform. That is, the conversion server 20-1 and the conversion server 20-2 in the modified example use the cloud computing technology to send moving image data to the first communication system 10-1 and the second communication system 10-2. It can be said that it provides a function of converting to converted moving image data.
 図14を用いて変形例について説明する。
 変形例におけるテレビ会議システムは、図14に示したように、第1通信システム10-1、第2通信システム10-2、変換サーバ20-1、変換サーバ20-2を含んで構成される。第1通信システム10-1、第2通信システム10-2、変換サーバ20-1、変換サーバ20-2はすべてネットワーク400に接続可能とされている。
 上述したように、第1通信システム10-1におけるコンピュータ装置100は、実位置にあるカメラ210から、動画像データを受取るようになっている。動画像データは、第1通信システム10-1にあるコンピュータ装置100から、変換サーバ20-1に送られるようになっている。変換サーバ20-1は、受取った動画像データを変換動画像データに変換する。そして、変換サーバ20-1は、変換動画像データを第1通信システム10-1中のコンピュータ装置100に返信する。変換動画像データは、第1実施形態の場合と同様に、第1通信システム10-1のコンピュータ装置100から第2通信システム10-2のコンピュータ装置100へと送られる。なお、変換サーバ20-1で生成された変換動画像データは、一旦第1通信システム10-1中のコンピュータ装置100に送られることなく、直接第2通信システム10-2中のコンピュータ装置100に送られるようになっていても良い。
A modified example will be described with reference to FIG.
As shown in FIG. 14, the video conference system according to the modification includes a first communication system 10-1, a second communication system 10-2, a conversion server 20-1, and a conversion server 20-2. The first communication system 10-1, the second communication system 10-2, the conversion server 20-1, and the conversion server 20-2 are all connectable to the network 400.
As described above, the computer device 100 in the first communication system 10-1 is adapted to receive the moving image data from the camera 210 in the actual position. The moving image data is sent from the computer device 100 in the first communication system 10-1 to the conversion server 20-1. The conversion server 20-1 converts the received moving image data into converted moving image data. Then, the conversion server 20-1 returns the converted moving image data to the computer device 100 in the first communication system 10-1. The converted moving image data is sent from the computer device 100 of the first communication system 10-1 to the computer device 100 of the second communication system 10-2, as in the case of the first embodiment. The converted moving image data generated by the conversion server 20-1 is directly sent to the computer device 100 in the second communication system 10-2 without being sent to the computer device 100 in the first communication system 10-1. It may be sent.
 上述の機能を発揮できるようにするための変換サーバ20-1のハードウェア構成は、基本的に第1実施形態におけるコンピュータ装置100におけるハードウェア構成と同じで良いし、その内部に生成される機能ブロックも第1実施形態におけるコンピュータ装置100における機能ブロックと同じで良い。
 第1実施形態では、コンピュータ装置100は、カメラ210から動画像データを受付けるようになっており、動画像データは、カメラ210、インターフェイス114、入力部121の順で、入力部121まで到達した。それに対して、変形例における変換サーバ20-1は、ネットワーク400を介して第1通信システム10-1中のコンピュータ装置100から動画像データを受付けるようになっており、動画像データは、その送受信機構、インターフェイス114、入力部121の順で、入力部121まで到達する。
 また、第1実施形態では、コンピュータ装置100は、入力装置102からの入力をインターフェイス114を介して受付けるようになっていた。それに対して、変化例における変換サーバ20-1は、ネットワーク400を介して第1通信システム10-1中のコンピュータ装置100から入力装置102からの入力を受付けるようになっている。
 また、第1実施形態では、コンピュータ装置100では、画像処理部123で生成された変換動画像データは、出力部125、インターフェイス114、送受信機構を経て第2通信システム10-2に送られた。それに対して、変形例における変換サーバ20-1では、画像処理部123で生成された変換動画像データは、出力部125、インターフェイス114、送受信機構を経て第1通信システム10-1へと返される。もっとも、変換サーバ20-1が変換動画像データを第2通信システム10-2に送っても良いのは上述の通りである。
 変換サーバ20-2は、変換サーバ20-1と同一の構成、機能を有しており、変換サーバ20-1が第1通信システム10-1中のコンピュータ装置100に提供するのと同様の機能を、第2通信システム10-2中のコンピュータ装置100に提供するようになっている。それにより、第1通信システム10-1と第2通信システム10-2とは、第1実施形態の場合と同様に、変換動画像データを送り合うことができるようになっている。
 なお、両通信システム10に対して、1つの変換サーバが動画像データを変換動画像データに変換する機能を提供するようになっていてもよい。
The hardware configuration of the conversion server 20-1 for enabling the above-described functions to be exhibited may be basically the same as the hardware configuration of the computer device 100 according to the first embodiment, and the functions generated therein. The blocks may be the same as the functional blocks in the computer device 100 according to the first embodiment.
In the first embodiment, the computer device 100 receives the moving image data from the camera 210, and the moving image data reaches the input unit 121 in the order of the camera 210, the interface 114, and the input unit 121. On the other hand, the conversion server 20-1 in the modification is adapted to receive the moving image data from the computer device 100 in the first communication system 10-1 via the network 400, and the moving image data is transmitted and received. The mechanism, the interface 114, and the input unit 121 are sequentially reached to the input unit 121.
Further, in the first embodiment, the computer device 100 is adapted to receive the input from the input device 102 via the interface 114. On the other hand, the conversion server 20-1 in the modification example is adapted to receive an input from the input device 102 from the computer device 100 in the first communication system 10-1 via the network 400.
In the first embodiment, in the computer device 100, the converted moving image data generated by the image processing unit 123 is sent to the second communication system 10-2 via the output unit 125, the interface 114, and the transmission / reception mechanism. On the other hand, in the conversion server 20-1 in the modified example, the converted moving image data generated by the image processing unit 123 is returned to the first communication system 10-1 via the output unit 125, the interface 114, and the transmission / reception mechanism. .. However, as described above, the conversion server 20-1 may send the converted moving image data to the second communication system 10-2.
The conversion server 20-2 has the same configuration and function as the conversion server 20-1, and has the same function as the conversion server 20-1 provides to the computer device 100 in the first communication system 10-1. Are provided to the computer device 100 in the second communication system 10-2. As a result, the first communication system 10-1 and the second communication system 10-2 can send the converted moving image data to each other, as in the case of the first embodiment.
In addition, one conversion server may provide both communication systems 10 with a function of converting moving image data into converted moving image data.
≪第2実施形態≫
 第2実施形態における画像処理装置にて説明する。
 第2実施形態における画像処理装置の外観は、ウェブカメラ様である。例えば、図2、図8、図12等に示したような外観を、第2実施形態における画像処理装置は呈している。
 第2実施形態における画像処理装置は、従来のテレビ会議システムを構成するためのコンピュータ装置に接続して用いることができる。かかるコンピュータ装置は、他のコンピュータ装置との間で互いに動画像データの送受信を行う機能を有するものであり、公知或いは周知のもので良い。
 第2実施形態における画像処理装置は、カメラと一体であり、カメラに対して、第1実施形態におけるコンピュータ装置100のハードウェア構成と同様のハードウェアを内蔵させ、また、そのハードウェアに第1実施形態で説明したのと同様のコンピュータプログラムをインストールしたものである。したがって、第2実施形態における画像処理装置は、その外観がウェブカメラ様であったとしても、その内部に図4に示したのと同様の機能ブロックを生じることになる。補足すると、第2実施形態における画像処理装置のハードウェア構成は、図3におけるインターフェイス114にカメラを接続したものとなる。もっとも、本願発明における画像処理装置は、そのような構成からカメラを除いたものとなる。
 第2実施形態における画像処理装置は、それと一体のカメラで生成された動画像データを、変換動画像データに変換する機能を持つことになる。
 第2実施形態における画像処理装置は、通常のウェブカメラと同様の方法で使用することができる。しかしながら、この画像処理装置が出力するデータは、一般的な動画像データではなく、変換動画像データとなる。したがって、両通信システムにおけるコンピュータ装置は、動画像データを変換動画像データに変換する機能を第1実施形態のときのように持たなくても、互いに変換動画像データを送り合うことができるようになる。
«Second embodiment»
An image processing apparatus according to the second embodiment will be described.
The appearance of the image processing apparatus in the second embodiment is like a webcam. For example, the image processing apparatus according to the second embodiment has the appearance as shown in FIG. 2, FIG. 8, FIG.
The image processing apparatus according to the second embodiment can be used by being connected to a computer device that constitutes a conventional video conference system. Such a computer device has a function of transmitting / receiving moving image data to / from another computer device, and may be publicly known or well known.
The image processing apparatus according to the second embodiment is integrated with a camera, and the camera includes the same hardware as the hardware configuration of the computer apparatus 100 according to the first embodiment, and the hardware has the first hardware. The same computer program as that described in the embodiment is installed. Therefore, even if the image processing apparatus according to the second embodiment has the appearance of a web camera, the same functional blocks as those shown in FIG. 4 are generated therein. Supplementally, the hardware configuration of the image processing apparatus according to the second embodiment has a camera connected to the interface 114 in FIG. However, the image processing apparatus according to the invention of the present application has such a configuration without the camera.
The image processing apparatus according to the second embodiment has a function of converting moving image data generated by a camera integrated with the image processing apparatus into converted moving image data.
The image processing apparatus according to the second embodiment can be used in the same manner as a normal webcam. However, the data output by this image processing apparatus is not general moving image data, but converted moving image data. Therefore, the computer devices in both communication systems can send the converted moving image data to each other without having the function of converting the moving image data into the converted moving image data as in the first embodiment. Become.
 10-1 第1通信システム
 10-2 第2通信システム
  100 コンピュータ装置 
  101 ディスプレイ
  102 入力装置
  121 入力部
  122 主制御部
  123 画像処理部
 123A コマ落とし部
 123B 三次元モデル生成部
 123C 三次元モデル回転部
 123D 二次元画像生成部
 20-1 変換サーバ
 20-2 変換サーバ
10-1 First Communication System 10-2 Second Communication System 100 Computer Device
101 display 102 input device 121 input unit 122 main control unit 123 image processing unit 123A frame dropping unit 123B three-dimensional model generation unit 123C three-dimensional model rotation unit 123D two-dimensional image generation unit 20-1 conversion server 20-2 conversion server

Claims (14)

  1.  動画を撮像することのできるものであり、所定の位置である実位置に存在する所定の1つのカメラで1人の被撮像者の顔である対象顔を撮像することにより得られる、二次元の静止画像についてのデータである連続する多数の静止画像データによって構成される動画像のデータである動画像データを受付ける動画像データ受付部と、
     前記動画像データに含まれる静止画像データのうちの少なくとも複数のそれぞれを、正面を向いた前記対象顔から正面方向に伸びる仮想の直線上の所定の位置である仮想位置に前記カメラが存在する場合において前記カメラによって撮像される二次元の静止画像である変換静止画像のデータである変換静止画像データに変換することにより、連続する多数の変換静止画像データによって構成される動画像のデータである変換動画像データを生成する変換動画像データ生成部と、
     前記変換動画像データ生成部によって生成された前記変換動画像データを出力する動画像データ出力部と、
     を備えている、画像処理装置であって、
     前記変換動画像データ生成部は、
     前記動画像データに含まれる静止画像データのうちの少なくとも複数のそれぞれから、前記静止画像データによって特定される前記静止画像に写り込んだ前記対象顔のうちの顔面部分の三次元モデルを、多数の顔の機械学習によって得られた顔の三次元モデルを推定する変換アルゴリズムを用いて生成する三次元モデル生成部と、
     前記三次元モデル生成部で複数生成された前記三次元モデルをそれぞれ一定の角度である回転角分回転させる処理を行う三次元モデル回転部と、
     前記三次元モデル回転部で回転させられた前記三次元モデルのそれぞれに基づいて、前記変換静止画像データを生成する二次元画像生成部と、
     を備えている、画像処理装置。
    It is possible to capture a moving image, and is a two-dimensional image obtained by capturing a target face, which is the face of one image-captured person, with a predetermined one camera existing at a predetermined actual position. A moving image data receiving unit that receives moving image data that is moving image data composed of a large number of continuous still image data that is data about a still image;
    At least a plurality of still image data included in the moving image data is present at a virtual position that is a predetermined position on a virtual straight line extending in the front direction from the target face facing front. In the conversion, the converted still image data, which is the converted still image data that is a two-dimensional still image captured by the camera, is converted into the moving image data that is composed of a large number of continuous converted still image data. A conversion moving image data generation unit that generates moving image data,
    A moving image data output unit that outputs the converted moving image data generated by the converted moving image data generation unit;
    An image processing apparatus comprising:
    The conversion moving image data generation unit,
    From at least a plurality of each of the still image data included in the moving image data, a large number of three-dimensional models of the face part of the target face reflected in the still image specified by the still image data are provided. A three-dimensional model generation unit that generates using a conversion algorithm that estimates the three-dimensional model of the face obtained by machine learning of the face,
    A three-dimensional model rotation unit that performs a process of rotating each of the three-dimensional models generated by the three-dimensional model generation unit by a rotation angle that is a constant angle,
    A two-dimensional image generation unit that generates the converted still image data based on each of the three-dimensional models rotated by the three-dimensional model rotation unit,
    An image processing apparatus comprising:
  2.  前記回転角は、予め決定されており、前記画像処理装置に記録されている、
     請求項1記載の画像処理装置。
    The rotation angle is determined in advance and recorded in the image processing device,
    The image processing apparatus according to claim 1.
  3.  前記回転角を、前記動画像データ受付部によって受け付けられた前記動画像データに基づいて所定の演算を行うことにより決定するようになっている、
     請求項1記載の画像処理装置。
    The rotation angle is determined by performing a predetermined calculation based on the moving image data received by the moving image data receiving unit.
    The image processing apparatus according to claim 1.
  4.  前記回転角を決定するために必要な所定のパラメータを入力するための入力装置から前記パラメータについてのデータを受付けるための入力装置受付部を備えており、
     前記回転角を、前記入力装置受付部によって受け付けられた前記パラメータについてのデータに基づいて所定の演算を行うことにより決定するようになっている、
     請求項1記載の画像処理装置。
    It is provided with an input device receiving unit for receiving data about the parameter from an input device for inputting a predetermined parameter necessary for determining the rotation angle,
    The rotation angle is determined by performing a predetermined calculation based on the data about the parameter accepted by the input device acceptance unit,
    The image processing apparatus according to claim 1.
  5.  前記回転角を決定するために必要な所定のパラメータを検出するセンサから前記パラメータについてのデータを受付けるセンサ受付部を備えており、
     前記回転角を、前記センサ受付部によって受け付けられた前記パラメータについてのデータに基づいて所定の演算を行うことにより決定するようになっている、
     請求項1記載の画像処理装置。
    A sensor reception unit that receives data about the parameter from a sensor that detects a predetermined parameter required to determine the rotation angle is provided,
    The rotation angle is determined by performing a predetermined calculation based on the data about the parameter received by the sensor reception unit.
    The image processing apparatus according to claim 1.
  6.  前記動画像データ出力部は、前記変換動画像データに基づく動画を表示する所定のディスプレイと接続されるようになっているとともに、
     前記回転角を変更するためのデータである回転角変更データを受付ける回転角変更データ受付部を備えており、
     前記三次元モデル回転部は、前記回転角変更データ受付部が前記回転角変更データを受付けるたびに、前記回転角変更データ受付部によって受付けられた前記回転角変更データに基づいて、前記三次元モデルを回転させる前記回転角を変更するようになっている、
     請求項1記載の画像処理装置。
    The moving image data output unit is adapted to be connected to a predetermined display for displaying a moving image based on the converted moving image data,
    A rotation angle change data reception unit that receives rotation angle change data, which is data for changing the rotation angle, is provided.
    The three-dimensional model rotation unit receives the rotation angle change data every time the rotation angle change data reception unit receives the rotation angle change data, based on the rotation angle change data received by the rotation angle change data reception unit. It is designed to change the rotation angle for rotating
    The image processing apparatus according to claim 1.
  7.  前記三次元モデル生成部は、前記静止画像データによって特定される前記静止画像に写り込んだ前記対象顔のうちの顔面部分を抜出して前記三次元モデルを生成するとともに、前記静止画像のうちの前記対象顔の顔面部分以外の部分の二次元の静止画像についてのデータである背景画像データを生成するようになっており、
     前記二次元画像生成部は、前記三次元モデル回転部で回転させられた前記三次元モデルを二次元化したデータである顔面画像データを、前記背景画像データにおける前記対象顔のうちの顔面部分に貼り込むことにより、前記変換静止画像データを生成するようになっている、
     請求項1記載の画像処理装置。
    The three-dimensional model generation unit extracts the facial part of the target face reflected in the still image specified by the still image data to generate the three-dimensional model, and the three-dimensional model of the still image. Background image data that is data about a two-dimensional still image of a part other than the face part of the target face is generated,
    The two-dimensional image generation unit, the face image data that is two-dimensionalized data of the three-dimensional model rotated by the three-dimensional model rotation unit, to the face portion of the target face in the background image data. By pasting, the converted still image data is generated.
    The image processing apparatus according to claim 1.
  8.  前記三次元モデル生成部は、前記静止画像のうちの前記対象顔の顔面部分以外の部分の静止画像に二次元的な所定の画像処理を行ってから当該静止画像についての前記背景画像データを生成するようになっており、それにより、前記二次元画像生成部が、前記顔面画像データを、前記背景画像データにおける前記対象顔のうちの顔面部分に貼り込む際に、前記顔面画像データと前記対象顔のうちの顔面部分との縁部分がより一致するようになっている、
     請求項7記載の画像処理装置。
    The three-dimensional model generation unit performs two-dimensional predetermined image processing on a still image of a portion of the still image other than the face portion of the target face, and then generates the background image data for the still image. Therefore, when the two-dimensional image generation unit pastes the face image data on a face portion of the target face in the background image data, the face image data and the target The edge part of the face and the face part are more matched,
    The image processing apparatus according to claim 7.
  9.  前記三次元モデル回転部は、所定の点を中心として前記三次元モデルを回転させるようになっている、
     請求項1記載の画像処理装置。
    The three-dimensional model rotating unit is configured to rotate the three-dimensional model about a predetermined point.
    The image processing apparatus according to claim 1.
  10.  前記カメラと一体となっている、
     請求項1記載の画像処理装置。
    Integrated with the camera,
    The image processing apparatus according to claim 1.
  11.  前記動画像データ受付部は、前記動画像データを所定のネットワークを介して前記カメラから受取るようになっている、
     請求項1記載の画像処理装置。
    The moving image data receiving unit is adapted to receive the moving image data from the camera via a predetermined network,
    The image processing apparatus according to claim 1.
  12.  前記画像処理装置は、所定のネットワークを介して通信可能とされているとともに2つ対にして用いられるものであり、
     前記画像処理装置の一方で生成された前記変換動画像データは、前記ネットワークを介して前記画像処理装置の他方へ双方向で送られるようになっている、
     請求項1記載の画像処理装置。
    The image processing apparatus is capable of communicating via a predetermined network and is used in pairs.
    The converted moving image data generated by one of the image processing apparatuses is bidirectionally transmitted to the other of the image processing apparatuses via the network.
    The image processing apparatus according to claim 1.
  13.  動画を撮像することのできるものであり、所定の位置である実位置に存在する所定の1つのカメラで1人の被撮像者の顔である対象顔を撮像することにより得られる、二次元の静止画像についてのデータである連続する多数の静止画像データによって構成される動画像のデータである動画像データを受付ける動画像データ受付部を備えているコンピュータによって実行される画像処理方法であって、
     前記動画像データに含まれる静止画像データのうちの少なくとも複数のそれぞれを、正面を向いた前記対象顔から正面方向に伸びる仮想の直線上の所定の位置である仮想位置に前記カメラが存在する場合において前記カメラによって撮像される二次元の静止画像である変換静止画像のデータである変換静止画像データに変換することにより、連続する多数の変換静止画像データによって構成される動画像のデータである変換動画像データを生成する変換動画像データ生成過程と、
     前記変換動画像データ生成過程によって生成された前記変換動画像データを出力する動画像データ出力過程と、
     を含み、
     前記変換動画像データ生成過程では、
     前記動画像データに含まれる静止画像データのうちの少なくとも複数のそれぞれから、前記静止画像データによって特定される前記静止画像に写り込んだ前記対象顔のうちの顔面部分の三次元モデルを、多数の顔の機械学習によって得られた顔の三次元モデルを推定する変換アルゴリズムを用いて生成する三次元モデル生成過程と、
     前記三次元モデル生成過程で複数生成された前記三次元モデルをそれぞれ一定の角度である回転角分回転させる処理を行う三次元モデル回転過程と、
     前記三次元モデル回転過程で回転させられた前記三次元モデルのそれぞれに基づいて、前記変換静止画像データを生成する二次元画像生成過程と、
     を実行する画像処理方法。
    It is possible to capture a moving image, and is a two-dimensional image obtained by capturing a target face, which is the face of one image-captured person, with a predetermined one camera existing at a predetermined actual position. An image processing method executed by a computer including a moving image data receiving unit that receives moving image data, which is data of a moving image composed of a number of continuous still image data that is data about a still image,
    At least a plurality of still image data included in the moving image data is present at a virtual position that is a predetermined position on a virtual straight line extending in the front direction from the target face facing front. In the conversion, the converted still image data, which is the converted still image data that is a two-dimensional still image captured by the camera, is converted into the moving image data that is composed of a large number of continuous converted still image data. A conversion moving image data generation process for generating moving image data,
    A moving image data output step of outputting the converted moving image data generated by the converted moving image data generating step;
    Including,
    In the conversion moving image data generation process,
    From at least a plurality of each of the still image data included in the moving image data, a large number of three-dimensional models of the face part of the target face reflected in the still image specified by the still image data are provided. A 3D model generation process that is generated using a transformation algorithm that estimates the 3D model of the face obtained by machine learning of the face;
    A three-dimensional model rotation process for performing a process of rotating each of the plurality of three-dimensional models generated in the three-dimensional model generation process by a rotation angle that is a constant angle,
    A two-dimensional image generation process for generating the converted still image data based on each of the three-dimensional models rotated in the three-dimensional model rotation process,
    Image processing method for executing.
  14.  動画を撮像することのできるものであり、所定の位置である実位置に存在する所定の1つのカメラで1人の被撮像者の顔である対象顔を撮像することにより得られる、二次元の静止画像についてのデータである連続する多数の静止画像データによって構成される動画像のデータである動画像データを受付ける動画像データ受付部を備えているコンピュータに、
     前記動画像データに含まれる静止画像データのうちの少なくとも複数のそれぞれを、正面を向いた前記対象顔から正面方向に伸びる仮想の直線上の所定の位置である仮想位置に前記カメラが存在する場合において前記カメラによって撮像される二次元の静止画像である変換静止画像のデータである変換静止画像データに変換することにより、連続する多数の変換静止画像データによって構成される動画像のデータである変換動画像データを生成する変換動画像データ生成過程と、
     前記変換動画像データ生成過程によって生成された前記変換動画像データを出力する動画像データ出力過程と、
     を実行させるためのものであり、
     前記変換動画像データ生成過程では、
     前記動画像データに含まれる静止画像データのうちの少なくとも複数のそれぞれから、前記静止画像データによって特定される前記静止画像に写り込んだ前記対象顔のうちの顔面部分の三次元モデルを、多数の顔の機械学習によって得られた顔の三次元モデルを推定する変換アルゴリズムを用いて生成する三次元モデル生成過程と、
     前記三次元モデル生成過程で複数生成された前記三次元モデルをそれぞれ一定の角度である回転角分回転させる処理を行う三次元モデル回転過程と、
     前記三次元モデル回転過程で回転させられた前記三次元モデルのそれぞれに基づいて、前記変換静止画像データを生成する二次元画像生成過程と、
     を前記コンピュータに実行させるコンピュータプログラム。
    It is possible to capture a moving image, and is a two-dimensional In a computer provided with a moving image data receiving unit that receives moving image data that is moving image data composed of a large number of continuous still image data that is data about a still image,
    At least a plurality of still image data included in the moving image data is present at a virtual position that is a predetermined position on a virtual straight line extending in the front direction from the target face facing front. In the conversion, the converted still image data, which is the converted still image data that is a two-dimensional still image captured by the camera, is converted into the moving image data that is composed of a large number of continuous converted still image data. A conversion moving image data generation process for generating moving image data,
    A moving image data output step of outputting the converted moving image data generated by the converted moving image data generating step;
    To run
    In the conversion moving image data generation process,
    From at least a plurality of each of the still image data included in the moving image data, a large number of three-dimensional models of the face part of the target face reflected in the still image specified by the still image data are provided. A 3D model generation process that is generated using a transformation algorithm that estimates the 3D model of the face obtained by machine learning of the face;
    A three-dimensional model rotation process for performing a process of rotating each of the plurality of three-dimensional models generated in the three-dimensional model generation process by a rotation angle that is a constant angle,
    A two-dimensional image generation process for generating the converted still image data based on each of the three-dimensional models rotated in the three-dimensional model rotation process,
    A computer program that causes the computer to execute.
PCT/JP2019/004530 2018-10-29 2019-02-08 Image processing device, method, and computer program WO2020090128A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2019507886A JP6516316B1 (en) 2018-10-29 2019-02-08 Image processing apparatus, method, computer program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JPPCT/JP2018/040130 2018-10-29
PCT/JP2018/040130 WO2020089971A1 (en) 2018-10-29 2018-10-29 Image processing apparatus, method, and computer program

Publications (1)

Publication Number Publication Date
WO2020090128A1 true WO2020090128A1 (en) 2020-05-07

Family

ID=70463023

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/JP2018/040130 WO2020089971A1 (en) 2018-10-29 2018-10-29 Image processing apparatus, method, and computer program
PCT/JP2019/004530 WO2020090128A1 (en) 2018-10-29 2019-02-08 Image processing device, method, and computer program

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/040130 WO2020089971A1 (en) 2018-10-29 2018-10-29 Image processing apparatus, method, and computer program

Country Status (2)

Country Link
JP (1) JPWO2020089971A1 (en)
WO (2) WO2020089971A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0690445A (en) * 1992-09-09 1994-03-29 Hitachi Ltd Video input/output device
JPH08237629A (en) * 1994-10-25 1996-09-13 At & T Corp System and method for video conference that provides parallax correction and feeling of presence
JP2004326179A (en) * 2003-04-21 2004-11-18 Sharp Corp Image processing device, image processing method, image processing program, and recording medium storing it
JP2005503726A (en) * 2001-09-20 2005-02-03 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Quality adaptation of real-time multimedia content transmission based on user's attention
JP2015513833A (en) * 2012-02-27 2015-05-14 エー・テー・ハー・チューリッヒEth Zuerich Method and system for image processing for gaze correction in video conferencing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014097465A1 (en) * 2012-12-21 2014-06-26 日立マクセル株式会社 Video processor and video p rocessing method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0690445A (en) * 1992-09-09 1994-03-29 Hitachi Ltd Video input/output device
JPH08237629A (en) * 1994-10-25 1996-09-13 At & T Corp System and method for video conference that provides parallax correction and feeling of presence
JP2005503726A (en) * 2001-09-20 2005-02-03 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Quality adaptation of real-time multimedia content transmission based on user's attention
JP2004326179A (en) * 2003-04-21 2004-11-18 Sharp Corp Image processing device, image processing method, image processing program, and recording medium storing it
JP2015513833A (en) * 2012-02-27 2015-05-14 エー・テー・ハー・チューリッヒEth Zuerich Method and system for image processing for gaze correction in video conferencing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JACKSON, AARON S. ET AL.: "Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression", ICCV 2017, 22 March 2017 (2017-03-22), pages 1 - 9, XP080759051, Retrieved from the Internet <URL:https://arxiv.org/abs/1703.07834> *

Also Published As

Publication number Publication date
JPWO2020089971A1 (en) 2021-02-15
WO2020089971A1 (en) 2020-05-07

Similar Documents

Publication Publication Date Title
JP7200439B1 (en) Avatar display device, avatar generation device and program
JP5208810B2 (en) Information processing apparatus, information processing method, information processing program, and network conference system
Kasahara et al. JackIn head: immersive visual telepresence system with omnidirectional wearable camera for remote collaboration
Zhang et al. Viewport: A distributed, immersive teleconferencing system with infrared dot pattern
US9424463B2 (en) System and method for eye alignment in video
JP2014233035A (en) Information processor, display control method and program
JP2020065229A (en) Video communication method, video communication device, and video communication program
JP2009089324A (en) Video conference system and program, and recoding medium
JPWO2017141584A1 (en) Information processing apparatus, information processing system, information processing method, and program
WO2015139562A1 (en) Method for implementing video conference, synthesis device, and system
US20230231983A1 (en) System and method for determining directionality of imagery using head tracking
JP6516316B1 (en) Image processing apparatus, method, computer program
JP2011113206A (en) System and method for video image communication
CN113170075B (en) Information processing device, information processing method, and program
WO2020090128A1 (en) Image processing device, method, and computer program
JP5759439B2 (en) Video communication system and video communication method
WO2016182504A1 (en) A virtual reality headset
EP2355500A1 (en) Method and system for conducting a video conference with a consistent viewing angle
JP7420585B2 (en) AR display control device, its program, and AR display system
EP4145397A1 (en) Communication terminal device, communication method, and software program
US20230247383A1 (en) Information processing apparatus, operating method of information processing apparatus, and non-transitory computer readable medium
JP2022092747A (en) Display image processing system and display image processor
Yem et al. Dual Body: Method of Tele-Cooperative Avatar Robot with Passive Sensation Feedback to Reduce Latency Perception
Brick et al. High-presence, low-bandwidth, apparent 3D video-conferencing with a single camera
Liu et al. HoloChat: 3D avatars on mobile light field displays

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2019507886

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19878647

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.07.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19878647

Country of ref document: EP

Kind code of ref document: A1