WO2020090128A1

WO2020090128A1 - Image processing device, method, and computer program

Info

Publication number: WO2020090128A1
Application number: PCT/JP2019/004530
Authority: WO
Inventors: 健志加畑
Original assignee: 有限会社アドリブ; 西村　昇; 山本　慎也
Priority date: 2018-10-29
Filing date: 2019-02-08
Publication date: 2020-05-07
Also published as: JPWO2020089971A1; WO2020089971A1

Abstract

The present invention provides technology with which it is possible to reduce discomfort felt in relation to the direction of a face or the line of sight in a moving image shown on a display. When performing a teleconference using a teleconference system, a subject face in a moving image captured by a typical web camera tends, for example, to be an image captured from slightly above, with neither the line of sight nor the direction of a face basically facing the front. This image processing device generates a three-dimensional model included in a still image in the moving image. The image processing device then turns the three-dimensional model, which faces downward by an angle θ, upward by the angle θ. Next, the image processing device generates data of a two-dimensional image again from the three-dimensional model after the same is turned. Thus, the subject face in the moving image basically faces the front.

Description

Image processing apparatus, method and computer program

The present invention relates to an image processing technique that can be applied to, for example, a video conference.

It has been a long time since networks such as the Internet have spread, and in recent years, the speed of communication by networks has been remarkable. Along with this, in recent years, it has become easier to send and receive moving images between remote places, and thus video conferencing (video call) between remote places has become extremely familiar.
Video conferencing may be realized using an expensive dedicated device (dedicated system), or a simple general-purpose device (system) such as Skype (trademark) provided by Microsoft (trademark) corporation, and video. In some cases, it is realized using software for sending and receiving.
Whether it is realized by a dedicated device or a general-purpose device, the general principle of the video conference remains unchanged. For example, in a one-to-one videoconference, both participants prepare a computer connected to the network. A display and a camera are connected to each of these computers. The camera is a digital camera capable of capturing moving images, and captures the participants of the video conference. The moving image data of the moving image in which the face of one participant is reflected by one camera is sent to the other computer via the one computer and the network. As a result, a moving image in which the face of one participant is reflected is displayed on the other display connected to the other computer. The other participant can thereby see the face of one participant. By performing such processing bidirectionally, both participants can hold a conference while looking at the other party's face.
Of course, voice and text can also be exchanged between two computers (or both participants), and at least one of them is usually required, but since exchange of voice and text is unrelated to this application, The description of is basically omitted hereafter.

Japanese Patent Laid-Open No. 2018-056907 International Publication No. 2016/158014 JP, 2016-085579, A JP-A-6-90445

There is a well-known problem in the video conference performed as described above.
As described above, when a video conference is held, moving image data of a moving image of one participant's face captured by one camera is transferred to the other computer via the one computer and the network. , Which causes a video of one participant's face to be displayed on the other display connected to the other computer.
The other participant holds the video conference while looking at the face of one participant displayed on the other display, and at that time, the line of sight of the one participant displayed on the other display joins the other participant. There is a case where the direction of one participant's face is not toward the direction of the other participant as well as the line of sight in some cases. Such a situation gives a strong sense of discomfort to the other participant. As a result, both parties of the video conference will hold the video conference with such discomfort.
Such a problem occurs because there is a problem in the above-described moving image data created by one participant or in the position of one camera that creates moving image data by one participant. For example, assume that one display is in front of one participant's face. In that case, the face of one participant is basically facing the other display (in other words, when one participant has a natural posture). In this case, for example, one camera is arranged above the center of one display in the width direction. Then, one of the cameras basically captures the face of one participant facing the one display, basically from obliquely above. In such a case, the line of sight or face of one participant displayed in the moving image displayed on the other display in front of the other participant receiving the moving image data generated by one camera is It seems that the other participant does not face the direction of the other participant and is looking downward.
The phenomenon described above, in which one participant's line of sight or face is projected in the moving image displayed on the other display in front of the other participant, is one It occurs not only when it is above the widthwise center of the display, but also where one camera is anywhere around one display. However, depending on the position where one of the cameras is arranged, the line of sight or face of one participant displayed in the moving image displayed on the other display in front of the other participant is different.

Since the above-described problems are widely known, some methods for solving such problems have already been proposed.
For example, there is known a technique in which at least a part of a display is made of a transparent member and a camera is provided inside or behind the display so that the faces of participants in front of the display are basically captured from the front. ing. However, since this technique requires a large cost to make such a device for a display, this technique has not been widely used. Also, this technology cannot be retrofitted to common displays already on the market.
Also, if the face reflected in the moving image data created by the cameras arranged around the display is displaced from the center, for example, in the moving image, the amount of displacement is detected and the moving image data is corrected. There is also known a technique in which a face displayed in a moving image based on moving image data is translated in the vertical direction or the horizontal direction. However, even if the face displayed in the moving image based on the moving image data is translated in the vertical direction or the horizontal direction, the direction of the face is not corrected. Further, since such a technique continuously detects the amount of deviation and continuously performs parallel movement in the vertical direction or the horizontal direction of the face displayed in the moving image based on the moving image data, it is not necessary for image processing. The required calculation is likely to be complicated, and the moving image is likely to be delayed.
In addition, the direction of the line of sight is further detected from the face reflected in the moving image data created by the cameras arranged around the display, and the moving image data is corrected. There is also known a technique of correcting the direction of the eyes on the face projected inside. However, by correcting the direction of the line of sight on the face displayed in the moving image based on the moving image data, the line of sight of one participant in the moving image displayed on the display in front of the other participant is changed to the other participant. Although it may be possible to match the line of sight of the other person, for example, in the case of the above-mentioned example, the user looks downward in the moving image displayed on the other display in front of the other participant. The eyes on the face of one of the participants are basically in a state of eye-gaze, so it is possible that they may even become unnatural. In addition, when correcting the image based on the direction of the line of sight, even if the entire face direction is corrected, one participant does not change the face direction only by moving the line of sight. In addition, it is possible that the direction of the face of one participant in the moving image displayed on the display in front of the other participant changes and the unnaturalness increases. Also in this technique, the moving image is likely to be delayed for the same reason as described above.
In addition, as described in Patent Document 4 described above, a three-dimensional model of a face reflected in moving image data created by a camera is generated, and the generated three-dimensional model is rotated by a predetermined angle and then again. A technique of obtaining a two-dimensional image is known. However, in order to generate a three-dimensional model of a face from moving image data, generally so-called stereo imaging is performed using two cameras, or, if imaging is performed by one camera, imaging is performed by that camera. A large number of still images forming the moving image data to be recorded must include depth data. These are not common as cameras, and the technology that compels the user to prepare such uncommon hardware is extremely difficult to spread. For example, modern laptop personal computers, smartphones, tablets, and other computers have built-in cameras, and webcams and other cameras used in combination with desktop personal computers have also become widespread. .. Unless it is a stereo camera, moving image data including depth data cannot be created, and at least it is not suitable for practical use or widespread unless it is a technology that can be applied to such widespread cameras.

INDUSTRIAL APPLICABILITY The invention of the present application, which can be used mainly in combination with a general camera in a video conference system, can reduce a sense of discomfort felt about a direction of a face or a line of sight in a moving image displayed on a display in front, is inexpensive, and It is an object of the present invention to provide a technology that does not easily cause delay.

In order to solve the above-mentioned subject, the inventor of the present application has conducted extensive research. As a result, the following findings were obtained.
As described above, the causes of discomfort that both participants in a remote place participating in the video conference feel about the line of sight or the face direction of the participant on the other side in the video displayed on the display in front are explained above. This is because there is a problem in the moving image data created by one participant, or the position of one camera that creates moving image data by one participant.
By the way, in the above example, if one display in front of one participant is entirely transparent and one camera is behind the other display, then one camera is in a video conference. In, one will see one display from the front, and basically the face of one participant will be captured from the front. If so, the line of sight or face of one participant displayed in the moving image displayed on the other display in front of the other participant receiving the moving image data generated by one camera is Face the direction of the participants. In practice, however, one display is usually not entirely transparent, so one camera would be placed somewhere around one display.
However, moving image data created by one of the cameras is captured by a virtual camera in which a moving image based on the moving image data exists at a virtual position behind the display (including the inside of the display, the same applies below). It is possible, at least in theory, to correct as described. Since the face image of the face of one participant included in the video based on the moving image data corrected in such a manner basically faces the front, the participation of the other displayed on the other display. It is possible to suppress a feeling of strangeness given to a person.
The present invention is based on such knowledge.

INDUSTRIAL APPLICABILITY The present invention is capable of capturing a moving image, and can be obtained by capturing a target face, which is the face of one image-captured person, with a predetermined one camera existing at a predetermined actual position. A moving image data receiving unit that receives moving image data that is data of a moving image composed of a large number of continuous still image data that is data about a two-dimensional still image; and still image data included in the moving image data. At least a plurality of each of the two-dimensional image taken by the camera when the camera is present at a virtual position that is a predetermined position on a virtual straight line extending in the front direction from the target face facing front. Converted still image data, which is a still image, is converted into converted still image data. A conversion moving image data generation unit that generates conversion moving image data that is image data, and a moving image data output unit that outputs the conversion moving image data generated by the conversion moving image data generation unit are provided. , An image processing device.
Then, the converted moving image data generation unit in this image processing apparatus reflects the at least a plurality of still image data included in the moving image data on the still image specified by the still image data. A three-dimensional model generation unit that generates a three-dimensional model of the face portion of the target face using a conversion algorithm that estimates the three-dimensional model of the face obtained by machine learning of a large number of faces, and the three-dimensional model. Each of the three-dimensional model rotating unit that performs a process of rotating the plurality of three-dimensional models generated by the generating unit by a rotation angle that is a constant angle, and the three-dimensional model that is rotated by the three-dimensional model rotating unit. And a two-dimensional image generation unit that generates the converted still image data based on the above.

The number of cameras in the present invention is one. The camera according to the present invention is a general camera, and the still image data forming the moving image data does not include depth data. The camera may be integrated with the image processing apparatus or may be a separate body. For example, when the image processing device is configured by the computer described in the background art (for example, a desktop computer without a camera), the camera is separate from the image processing device. The camera in that case may be, for example, a known or known webcam itself. The camera in this case, which is separate from the computer as the image processing apparatus, is connected to the computer as the image processing apparatus by wire or wirelessly. In addition, many known or well-known laptop personal computers and computers such as smartphones and tablets have an integrated camera. When the image processing apparatus is configured by such a camera-integrated computer, the camera is included in the image processing apparatus. In this case, however, to be exact, the part of the computer excluding the camera is the image processing device according to the present invention. Further, it is possible to mount the image processing device of the present invention on a conventional web camera. In this case, the part of the web camera excluding the camera is the image processing device according to the present invention.
The camera exists in a real position which is a predetermined position. The actual position is generally a predetermined location around the display, for example, if the display is connected to a computer that is an image processing apparatus. When the computer that is the image processing apparatus is, for example, a laptop personal computer, a smartphone, or a tablet, the camera is generally attached at a predetermined position above the display integrated with the computer. However, that position is the actual position of the camera in that case. If the image processing apparatus of the present invention has a web camera-like appearance, the position where it is attached is the actual position of the camera. In any case, the camera in the actual position captures the target face, which is the face of one image-captured person. The camera can capture a moving image and generates moving image data for the moving image. The moving image data generated by the camera is general data, for example, MJPEG data. The moving image data in the invention of the present application is data of a moving image composed of a large number of continuous still image data which is data about a two-dimensional still image, and this is very general moving image data.
The image processing apparatus includes a moving image data receiving unit that receives moving image data generated by the camera from the camera. When the image processing device and the camera are separate bodies and the image processing device and the camera are connected by wire, the moving image data reception unit can generally realize wired connection with the camera. It will be the input terminal provided on the device. When the image processing device and the camera are separate units and the image processing device and the camera are wirelessly connected to each other, the moving image data accepting unit can generally realize wireless communication with the camera. It will be the receiving device provided in the device. When the image processing device and the camera are integrated, the moving image data receiving unit will generally be an interface provided in the image processing device for realizing connection with the camera.
The image processing apparatus according to the present invention includes a conversion moving image data generation unit. The converted moving image data generation unit converts at least a plurality of still image data included in the moving image data into converted still image data. As described above, the moving image data received from the camera by the image processing apparatus or the still image data included in the moving image data is generated by the camera in the actual position, and the moving image or the still image based on them is copied from the actual position. The target face is included. On the other hand, the converted still image data is generated based on the still image data or by converting the still image data, and the target face to the front when the user faces the front (the user takes a natural posture). It is data of a converted still image which is a two-dimensional still image captured by the camera when the camera exists at a virtual position which is a predetermined position on a virtual straight line extending in the direction. That is, the target face included in the converted still image is the target face when the image is taken from the virtual position that is the front of the user's face. Here, since the virtual position of the camera is fixed and the relative positional relationship between the actual position and the virtual position of the camera is constant, the still image data received from the camera by the image processing device is converted into still image data. The process to be performed is basically the same for all still image data that are the target of such conversion. Therefore, the process of converting still image data into converted still image data is "lighter" than in the case of individually performing different processes, and thus the conversion of such data is less likely to cause a delay in a moving image. The converted moving image data is a series of converted still image data generated one after another by the converted moving image data generation unit.
The still image data is data of a still image (so-called frame) forming a moving image. The image processing apparatus may generate the converted still image data from all the still image data received from the camera, but doing so may cause a delay in the moving image. Therefore, if emphasis is placed on not causing a delay, the still image data to be converted to the converted still image data is set to, for example, every two or more of the still image data included in the moving image data. Every three (every two frames or every three frames) still image data can be used. Then, the number of frames or frame rate of the converted moving image data (the number of converted still image data included in the converted moving image data per second) is included in the number of frames of the moving image data (included in the moving image data per second). Although it is smaller than the number of still image data), if the number of frames of the converted moving image data is at least about 10 fps, the moving image based on the converted moving image data can be used as a moving image. Of course, the still image data to be converted does not have to be a fixed number of still image data such as every two or three.
Then, the image processing device includes a moving image data output unit. The moving image data output unit has a function of outputting the converted moving image data generated by the converted moving image data generation unit. The converted moving image data is output from the image processing device to another device, for example. The other device as the output destination of the converted moving image data is a device (such as a display) directly connected to the image processing device by wire or wirelessly, or via the image processing device and the network (or the network and other device). Device (a display connected to another image processing device). If the image processing device includes a display, the output destination of the converted moving image data may be the display included in the image processing device. Further, if the image processing device is integrated with the camera and its appearance is that of a general web camera, the web camera as the image processing device is used similarly to the web camera in the conventional video conference system. The moving image data input to the computer in the video conference system can be used as the converted moving image data from the beginning.
As described above, the target face in the converted still image based on each of the converted still image data included in the converted moving image data generated as described above is captured by the camera at the virtual position in front of the target face. In this case, the orientation is the same as the orientation of the target face. Therefore, when a moving image based on the converted moving image data generated by the above-described image processing apparatus is displayed on some display, the target face displayed on the display is basically in a front facing state. Therefore, the application of the image processing device of the present application is not limited to a video conference, but when it is applied to a video conference, for example, a moving image based on the converted moving image data generated by the image processing device described above is used. When is displayed on the display of the other party, it is possible to reduce the discomfort that the other party feels about the line of sight in the target face or the direction of the target face. Further, such a technique is relatively inexpensive because it does not require any ingenuity in hardware such as a camera and a display, and can be realized, for example, only by combining a general computer with software. Further, such a technique repeats uniform image processing as described above, and since it is not always necessary to perform image processing on all still image data, the problem of moving image delay is eliminated. Unlikely to occur.
In addition, the target face reflected in the moving image based on the converted moving image data obtained by the present invention includes the line of sight when the target face has a natural posture when the target face is displayed on the display. When the owner of the target face rotates the target face or moves the line of sight, the target face displayed on the display also rotates or the line of sight moves accordingly. In the present invention, since only the converted moving image, which is a moving image captured when the camera is located at the virtual position, is displayed on the display, an image showing the target face based on the movement of the target face or the movement of the line of sight is displayed. Compared with the case of converting, the target face displayed on the display is not unnatural.
The image processing device is configured to be communicable via a predetermined network and used in pairs, and the converted moving image data generated by one of the image processing devices is transmitted via the network. It may be configured to be bidirectionally sent to the other of the image processing devices via the image processing device. By doing so, a video conference similar to the conventional one can be realized.
The application of the image processing device in the present invention is not limited to the video conference system. For example, when you watch a video of your face taken by selfie (selfie) on the display of your own smartphone, tablet, desktop or laptop computer, the direction of your face is It is also known that there is a sense of discomfort because the person is not facing or his or her line of sight is not facing the front. Such a problem can also be solved by the image processing device according to the present invention. In this case, naturally, the converted moving image data created from the moving image data by the image processing device does not need to be sent to a computer or the like owned by another person.

As described above, the image processing device according to the present invention also includes the converted moving image data generation unit. Then, as described above, the converted moving image data generation unit converts the at least a plurality of still image data included in the moving image data into the still image specified by the still image data. A three-dimensional model generation unit that generates a three-dimensional model of the face part of the target face reflected by using a conversion algorithm that estimates a three-dimensional model of the face obtained by machine learning of a large number of faces, and A three-dimensional model rotation unit that performs a process of rotating each of the plurality of three-dimensional models generated by the three-dimensional model generation unit by a rotation angle that is a constant angle, and the three-dimensional model rotated by the three-dimensional model rotation unit. A two-dimensional image generation unit that generates the converted still image data based on each of the models.
The three-dimensional model generation unit, from at least a plurality of each of the still image data included in the moving image data, a three-dimensional model of the face part of the target face reflected in the still image specified by the still image data. To generate. The three-dimensional model and its generation method are performed using a conversion algorithm that estimates a three-dimensional model of a face obtained by machine learning of many faces. In recent years, a three-dimensional model of the facial part of a face reflected in a still image from one general two-dimensional still image (in other words, from the data of a single face photograph) is automatically generated. The technology to make was developed. According to such a technique, a large number of two-dimensional still images including human faces, which are generated by imaging various human faces from various angles, are sample-machined by a computer. A transformation algorithm, which is an algorithm for generating a three-dimensional model of a human face from a still image, is used. In this technique, the conversion algorithm is used to automatically generate a three-dimensional model of the face portion of the target face reflected in the still image specified by the still image data. Here, the face portion means a portion of the human head, which is generally in front of the ears and below the forehead.
The above-mentioned technology developed in recent years is an interesting technology in the world, which automatically creates a three-dimensional model of the face part reflected in the still image from one general two-dimensional still image in which the face is reflected. Is recognized. However, although this technique has been recognized as interesting, it has few practical uses so far. The present invention proposes a practical application of such a technique. The conversion algorithm described above is for generating a three-dimensional model of at least the face part of the target face, and the two-dimensional still image that is the source used when generating the three-dimensional model is captured by a stereo camera. Data need not be included and depth data need not be included. That is, the camera used in combination with the image processing apparatus of the present invention may be a general camera.
The three-dimensional model may be any one as long as it is created by the above method, and is, for example, a wire frame model. The three-dimensional model generation unit generates a three-dimensional model based on at least a plurality of still image data forming the moving image data. This "at least a plurality of still image data" is still image data that is the target of the above-mentioned conversion.
The three-dimensional model rotation unit performs a process of rotating a plurality of three-dimensional models generated by the three-dimensional model generation unit by a rotation angle that is a constant angle. This corresponds to the process of directing the face orientation specified by the three-dimensional model toward the camera at the virtual position.
The two-dimensional image generation unit generates converted still image data based on each of the three-dimensional models rotated by the three-dimensional model rotation unit. That is, the two-dimensional image generation unit generates converted still image data about the converted still image by creating data about the two-dimensional still image again from the three-dimensional model.
The angle (including, of course, the direction of rotation) when the 3D model is rotated by the 3D model rotation unit depends on which static image data is obtained because the relative positional relationship between the real position and the virtual position of the camera is constant. It is also constant in the processing performed based on. Therefore, the processing performed by the three-dimensional model generation unit, the three-dimensional model rotation unit, and the two-dimensional image generation unit for each still image data that is the target of image processing is performed based on which still image data. It is the same even when it is said. This is also one of the reasons why the problem of video delay is unlikely to occur.

The three-dimensional model generation unit extracts the facial part of the target face reflected in the still image specified by the still image data to generate the three-dimensional model, and the three-dimensional model of the still image. Background image data, which is data about a two-dimensional still image of a portion other than the face portion of the target face, is generated, and the two-dimensional image generation unit is rotated by the three-dimensional model rotation unit. Even if the converted still image data is generated by pasting face image data, which is data obtained by converting the three-dimensional model into two-dimensional data, to the face portion of the target face in the background image data. Good.
This is to treat only the data of the face portion of the target face in the still image specified by the still image data that is the source for generating the converted still image data, and remove the face portion of the target face. It means that other parts are treated as they are in two dimensions. That is, the three-dimensional model generation unit recognizes the face part of the target face reflected in the still image, extracts the part to generate a three-dimensional model, and other parts (for example, ears and hair of the target face, Alternatively, the background behind the owner of the target face) is left as it is as a two-dimensional still image. Then, the three-dimensional model rotation unit rotates the three-dimensional model, then the two-dimensional image generation unit converts the three-dimensional model rotated by the three-dimensional model rotation unit into a two-dimensional image, the image, After the face portion of the target face generated by the three-dimensional model generation unit is extracted, the still image is pasted to the extracted portion of the target face of the target face. By generating the converted still image data by such a simple process, the problem of moving image delay is further reduced. However, when such processing is performed, the two-dimensional still image of the face generated by converting the three-dimensional model rotated by the three-dimensional model rotation unit into the two-dimensional image by the two-dimensional image generation unit. And the still image after the face portion of the target face generated by the three-dimensional model generation unit is not necessarily exactly matched. It suggests that some unnaturalness may occur in the target face included in the still image specified by the converted still image data. However, according to the research by the inventor of the present application, the discomfort felt by a person who sees a moving image based on the converted moving image data in which the converted still image data is connected is in a direction in which the target face in the moving image is not oriented. It was much smaller than it was. This is because the mechanism is unknown in detail, but when a person recognizes a face, the brain mainly recognizes the human eye, which is the target of recognition. If it is suitable, it is considered that it does not recognize other unnaturalness. With such a function of the brain, the effect of the present invention is sufficient even if the method of generating a converted still image as described above is adopted. At least when the angle of rotation of the target face is within 15 degrees or less, the discomfort felt by the person who views the moving image based on the converted moving image data is so small as not to be a practical problem.
However, the three-dimensional model generation unit performs two-dimensional predetermined image processing on a still image of a portion of the still image other than the face portion of the target face, and then the background of the still image. Image data is generated, whereby the two-dimensional image generation unit, when the face image data is pasted to a face portion of the target face in the background image data, the face image The edge portion between the data and the face portion of the target face may be more matched. The two-dimensional image processing means image processing that does not involve three-dimensional modeling of a subject in a still image. For example, when the three-dimensional model of the face portion of the target face is rotated, the apparent length in the vertical direction, for example, may change. In response to such an apparent length change, the three-dimensional model generation unit performs a vertical length change (enlargement or reduction) process on a still image of a part other than the face part of the target face. It can be performed. Examples of two-dimensional image processing may include image scaling in one direction as described above, image scaling in two directions, rotation, and the like. By doing so, it is possible to further reduce the above-mentioned unnaturalness which is hardly recognized by the brain and which may occur in the target face in the converted still image. However, it is not essential to add such processing to a still image of a portion of the still image other than the face portion of the target face.
The three-dimensional model rotating unit may rotate the three-dimensional model about a predetermined point. As described above, the 3D model rotating unit rotates the 3D model. The process for rotating the three-dimensional model includes a certain axis of the three-dimensional model (for example, a horizontal straight line that penetrates both ears, or a straight line that vertically penetrates the center of the skull when viewed in plan view, or both of these straight lines). Can be the axis.) There can be a process of rotating around. These processes are effectively roll, yaw, and pitch rotation processes. However, in order to perform rotation processing by roll, yaw, and pitch, it is necessary to find the axes of these three types of rotation and the origins at which these axes intersect, so in the three-dimensional model, the plane of the ear or skull. It is necessary to detect the position of the center when viewed and specify the coordinates. By rotating the three-dimensional model about a certain point in the virtual space where the three-dimensional model exists (whether or not it is a virtual point and is located inside the three-dimensional model), It is possible to treat the three-dimensional model as a mere mass having a three-dimensional shape of the face, and to omit such processing for the pair or the target face in the still image in the three-dimensional model. In other words, it is not necessary to detect where the eyes are and where the nose is in the three-dimensional model or the still image. The rotation of the three-dimensional model around a certain point can be executed by transforming the spatial coordinates, and can be regarded as the rotation of the space itself in which the three-dimensional model exists. The predetermined point can be, for example, the lens position of one camera. Regardless of whether the camera is integrated with the image processing device or not, if the position of the camera with respect to the image processing device is determined, the lens position of the camera is set as the predetermined point to determine the position of the predetermined point. Makes it easy to decide. Regardless of whether the predetermined point is the lens position of the camera, if the predetermined point is the origin in the virtual space where the three-dimensional model exists, the calculation of the spatial coordinates becomes easy.

As described above, the three-dimensional model rotation unit included in the image processing apparatus of the present invention performs a process of rotating the plurality of three-dimensional models generated by the three-dimensional model generation unit by rotation angles that are constant angles. Here, the fixed rotation angle for rotating the three-dimensional model can be determined as follows.
First, the rotation angle may be determined in advance. In that case, the rotation angle is recorded in the image processing apparatus. The rotation angle is determined by the relative positional relationship between the actual position and the virtual position of the camera. If the image processing apparatus is, for example, a laptop personal computer, a smartphone, or a tablet, and the camera is detachably attached to the housing, the actual position of the camera is relative to the image processing apparatus. It is fixed to. In this case, for example, if the virtual position of the camera is determined to be an appropriate position such as behind a display included in a laptop personal computer, smartphone, or tablet, the actual position and virtual position of the camera can be uniquely determined. become. In this way, if the specifications of the devices that make up the image processing apparatus are clear from this place, a laptop personal computer, a smartphone, or a tablet as an image processing apparatus is usually used in a state where the user separates the display from the face. The rotation angle can be determined in advance in consideration of whether to use it. For example, a computer program for causing a computer such as a laptop personal computer, a smart phone, or a tablet to function as the image processing apparatus of the present invention has a virtual position (or a real position and a virtual position) of a camera in each of various computers. The computer has data (that is, a large number of pairs of models and virtual positions of the camera) about the above-mentioned data that specifies the rotation angle that can be grasped from the relationship with the position). The computer model is automatically identified by the function of the computer program after the computer program is installed on the computer, or the user identifies the computer model after the computer program is installed on the computer. It may have a function that allows the input. By doing so, when the computer is caused to function as the image processing apparatus of the present invention by the computer program, the rotation angle suitable for the image processing apparatus is automatically determined from the relationship between the model and the virtual position. It becomes possible to
On the other hand, even when the image processing device is configured by a desktop computer, or when the image processing device is integrated with the camera and has the same appearance as a webcam, the position of the camera (the camera If the actual position) is determined at least to some extent, the relative positional relationship between the actual position of the camera and the virtual position set behind the display, for example, will be uniquely determined. For example, if it is known in advance that the actual position of the camera is directly above the center of the width direction of the display and that the camera is used in the position, the relative position between the actual position and the virtual position of the camera is determined. The relationship is uniquely determined. In this case, if the user further considers how far the display and the face should be used from the image processing apparatus (the distance is often planned depending on the size of the display), the rotation It is possible to predetermine the corners. Of course, for example, the instruction "How many cm above the center of the display in the vertical and width directions should the camera be placed and how many cm away from the virtual position of the camera should the target face be used?" The target face in the moving image based on the converted moving image data generated by the image processing device is better if the means for notifying the user is determined and the rotation angle is determined in advance as the virtual position. The effect of correctly facing the front can be obtained more accurately.

As described above, the rotation angle may not be determined in advance and may be determined by the image processing device when the image processing device is used. For example, the image processing device may be configured to determine the rotation angle before starting the generation of the converted moving image data.
The image processing apparatus may be configured to determine the rotation angle by performing a predetermined calculation based on the moving image data received by the moving image data receiving unit. The image processing device is adapted to receive moving image data from a camera. The image processing apparatus can generate a three-dimensional model from moving image data by the three-dimensional model generation unit. Therefore, it is determined by calculation how much the 3D model can be rotated to face the target face of the user facing the camera at the virtual position in the still image based on the converted still image data. can do. It is the present invention that the angle is the rotation angle.
The image processing apparatus also includes an input device reception unit for receiving data about the parameter from an input device for inputting a predetermined parameter necessary for determining the rotation angle, and the rotation angle Alternatively, it may be determined by performing a predetermined calculation based on the data about the parameter accepted by the input device acceptance unit. Generally, a computer that constitutes an image processing apparatus is connected to an input device (for example, a keyboard, a mouse, a touch panel) or is provided as an integrated device. You can enter. It is the present invention that the rotation angle is determined by calculation based on the parameter input from such an input device. The parameters are, for example, information that specifies the shape and size of the display, information that specifies where the actual position of the camera is (for example, immediately above the display in the center of the width direction of the display, the upper right corner of the display), and the target from the display The information is information specifying the distance to the face.
The image processing device also includes a sensor reception unit that receives data about the parameter from a sensor that detects a predetermined parameter required to determine the rotation angle, and the rotation angle is set by the sensor reception unit. It may be determined by performing a predetermined calculation based on the received data about the parameter. For example, the sensor is a known or well-known distance measuring device that is connected to the image processing device and is provided at either end of the display in the width direction. The present invention is to determine an appropriate rotation angle by using a parameter (for example, the distance from the display to the target face) obtained by the distance measuring device. The parameter to be measured by the sensor is not limited to the distance. The sensor may measure a parameter useful for obtaining a relative positional relationship between the real position and the virtual position of the camera and a relationship between the virtual position of the camera and the target face.
The moving image data output unit in the image processing device may be connected to a predetermined display that displays a moving image based on the converted moving image data. The image processing apparatus in this case includes a rotation angle change data reception unit that receives rotation angle change data that is data for changing the rotation angle, and the three-dimensional model rotation unit receives the rotation angle change data. Every time the unit receives the rotation angle change data, the rotation angle for rotating the three-dimensional model is changed based on the rotation angle change data received by the rotation angle change data receiving unit. Good. In this case, a moving image based on the converted moving image data is displayed on the display in substantially real time. The user inputs the rotation angle change data while looking at his / her face (target face) displayed on the display, and for example, by rotating the target face little by little, the target face displayed on the display is changed to the target face. Basically it can be adjusted to face the front. The angle at which the three-dimensional model is rotated when the target face displayed on the display basically faces the front is determined as the rotation angle. The rotation direction of the three-dimensional model is not limited to this, but may be only the vertical direction (around the X axis) and the horizontal direction (around the Y axis). The user can input the rotation angle change data using the input device as described above.
Note that, of course, the above-mentioned four ideas for determining the rotation angle when the rotation angle is not determined in advance can be used in combination as required.

The moving image data receiving unit may directly receive the moving image data from the camera (eg, without passing through another device or device). On the other hand, the moving image data receiving unit may receive the moving image data from the camera via a predetermined network. In this case, the image processing device uses a so-called cloud computing technique. That is, for example, a computer near the user receives the moving image data from the camera and sends it to the image processing device at a remote place via a network (for example, the Internet). The converted moving image data generated by performing the image processing as already described in the image processing device is returned from the image processing device to the user's computer via the network. A computer near the user can use the converted moving image data received from the image processing apparatus as moving image data received from the camera. For example, the computer can send the converted moving image data to a computer on the other end of the video conference via a network.
If the image processing apparatus is configured by using the technology of cloud computing, the computer used by the user is not required to have high specifications regarding image processing.
When applying the above-mentioned image processing device using the technology of cloud computing to the video conference system, the converted moving image data generated by converting the moving image data received from the computer of one participant via the network. The destination to which the image processing apparatus transmits is not the computer of one participant but the computer of the other participant.

The inventor of the present application also proposes a method executed by an image processing apparatus as one aspect of the present invention. The effect of this method is equal to the effect of the image processing apparatus according to the present invention.
As an example, the method is capable of capturing a moving image, and capturing a target face, which is the face of one image-captured person, with a predetermined one camera existing at a predetermined actual position. Is executed by a computer including a moving image data receiving unit that receives moving image data that is data of a moving image that is composed of a large number of continuous still image data that is data about a two-dimensional still image obtained by Is the way.
In the method, at least a plurality of each of the still image data included in the moving image data is placed at a virtual position which is a predetermined position on a virtual straight line extending in the front direction from the target face facing the front. Of the moving image formed by a number of continuous converted still image data by converting into converted still image data which is the data of the converted still image which is a two-dimensional still image captured by the camera when A converted moving image data generating step of generating converted moving image data that is data, and a moving image data output step of outputting the converted moving image data generated by the converted moving image data generating step, the converted moving image In the image data generation process, from at least a plurality of still image data included in the moving image data, the A three-dimensional model of the face part of the target face reflected in the still image specified by the still image data, using a conversion algorithm for estimating the three-dimensional model of the face obtained by machine learning of many faces And a three-dimensional model rotation process for performing a process of rotating the three-dimensional models of the target face generated in the three-dimensional model generation process by a rotation angle that is a constant angle, respectively. And a two-dimensional image generating step of generating the converted still image data based on each of the three-dimensional models rotated in the three-dimensional model rotating step.

The present inventor also proposes, as one aspect of the present invention, a computer program for causing a predetermined, for example, general-purpose computer to function as an image processing apparatus. The effect of such a computer program is equal to the effect of the image processing apparatus according to the present invention, and it is also an effect that a predetermined computer can be made to function as the image processing apparatus according to the present application.
The computer program, which is an example, is capable of capturing a moving image, and captures a target face, which is the face of one image-captured person, with a predetermined one camera existing at a predetermined actual position. A computer provided with a moving image data receiving unit that receives moving image data that is data of a moving image configured by a large number of continuous still image data that is data about a two-dimensional still image obtained by In the case where the camera is present at a virtual position which is a predetermined position on a virtual straight line extending in the front direction from the target face facing the front, at least a plurality of each of the still image data included in the moving image data By converting into converted still image data which is data of converted still image which is a two-dimensional still image captured by the camera A converted moving image data generation process for generating converted moving image data that is moving image data composed of a large number of continuous converted still image data; and the converted moving image data generated by the converted moving image data generation process. And outputting the moving image data to be output. In the conversion moving image data generating process, the still image data is output from each of at least a plurality of still image data included in the moving image data. Generate a three-dimensional model of the face part of the target face reflected in the still image specified by using a conversion algorithm that estimates the three-dimensional model of the face obtained by machine learning of many faces. The three-dimensional model generation process and the plurality of three-dimensional models generated in the three-dimensional model generation process are each performed at a constant angle. A three-dimensional model rotation process for performing a process of rotating the three-dimensional model by a rotation angle and a two-dimensional image generation process for generating the converted still image data based on each of the three-dimensional model rotated in the three-dimensional model rotation process. And a computer program that causes the computer to execute.

The figure which shows the whole structure of the video conference system by 1st Embodiment. FIG. 2 is a perspective view showing an appearance of a communication system of the video conference system shown in FIG. 1. FIG. 3 is a diagram showing a hardware configuration of the computer device shown in FIG. 2. FIG. 3 is a block diagram showing functional blocks generated inside the computer device shown in FIG. 2. FIG. 5 is a block diagram showing an example of functional blocks generated inside the image processing unit shown in FIG. 4. The figure which shows the content of the moving image data produced | generated by the camera of a 1st communication system. The figure which shows an example of the face image before conversion for demonstrating the principle of the conversion in the case of converting moving image data into conversion moving image data in 1st Embodiment. The figure which shows an example of the three-dimensional model before rotation for demonstrating the principle of the conversion in the case of converting moving image data into conversion moving image data in 1st Embodiment. The figure which shows an example of the three-dimensional model after rotation for demonstrating the principle of the conversion in the case of converting moving image data into conversion moving image data in 1st Embodiment. The figure which shows an example of the face image after conversion for demonstrating the principle of the conversion in the case of converting moving image data into conversion moving image data in 1st Embodiment. FIG. 8 is another diagram for explaining the principle of conversion when converting moving image data into converted moving image data in the first embodiment. FIG. 5 is a block diagram showing an example of functional blocks generated inside the image processing unit shown in FIG. 4. FIG. 5 is a block diagram showing an example of functional blocks generated inside the image processing unit shown in FIG. 4. FIG. 5 is a block diagram showing an example of functional blocks generated inside the image processing unit shown in FIG. 4. The figure which shows an example of the moving image displayed on the display contained in the 2nd communication system in the video conference system shown in FIG. The figure which shows the other example of the moving image displayed on the display contained in the 2nd communication system in the video conference system shown in FIG. The figure which shows the whole structure of the video conference system by a modification.

Hereinafter, preferred first and second embodiments and modifications of the present invention will be described with reference to the drawings.
In the description of both the embodiments and the modified examples, the same reference numerals are given to the same objects, and duplicate description will be omitted depending on the case. Further, the technical contents described in the embodiments and the modified examples can be combined with each other as long as there is no particular contradiction.

«First embodiment»
FIG. 1 schematically shows the overall configuration of a preferred embodiment of a system including an image processing device of the present invention.
The system according to the first embodiment is a video conference system. However, as described above, the application of the present invention is not limited to the video conference system.
The video conference system includes a first communication system 10-1 and a second communication system 10-2. All of these are connectable to the network 400.
Network 400 is, but is not limited to, the Internet in this embodiment.
The first communication system 10-1 in this embodiment is used by one user who participates in a video conference, and the second communication system 10-2 is used by the other user who participates in a video conference. is there.

The first communication system 10-1 and the second communication system 10-2 have substantially the same configuration in relation to the invention of the present application and have the same functions and effects. Therefore, both are collectively described below. In some cases, the description will be given by calling it 10.
As shown in FIG. 2, which is a perspective view showing the external appearance of the communication system 10, the communication system 10 in this embodiment includes a computer device 100 as an image processing device, a display 101, and a camera 210. The computer device 100, the display 101, and the camera 210 in this embodiment are all separate bodies, although not limited thereto.

As will be described later in detail, the computer device 100 in this embodiment is configured by a general-purpose computer. The computer device 100 may be a commercially available product. More specifically, the computer device 100 in this embodiment is a known or well-known desktop personal computer.
The computer device 100 is capable of communication via the network 400. The counterpart of the communication performed by the computer apparatus 100 via the network 400 includes at least the computer apparatus 100 included in the communication system 10 paired with the communication system 10 including the computer apparatus 100.

The above-described display 101 is connected to the computer device 100. The display 101 is for displaying a still image or a moving image, and a known or known one can be used. The computer device 100 in this embodiment is required to be able to display a moving image. The display 101 may be a commercially available product and may be publicly known or publicly known, and is, for example, a liquid crystal display. The display 101 in this embodiment is connected to the computer apparatus 100 by a cable through a cable, but may be wirelessly connected to the computer apparatus 100. The technique used for connecting the computer device 100 and the display 101 may be publicly known or well known.
The computer device 100 also includes an input device 102. The input device 102 is used by the user to make a desired input to the computer device 100. A known or well-known input device 102 can be used. Although the input device 102 of the computer device 100 in this embodiment is a keyboard, the input device 102 is not limited to this, and a well-known or well-known voice input using a numeric keypad, a trackball, a mouse, or a microphone terminal can be used. It is also possible to use. When the display 101 is a touch panel, the display 101 also functions as the input device 102.
One of the cameras 210 described above is connected to the computer device 100. The camera 210 is a digital camera capable of capturing a moving image, and is capable of outputting moving image data that is data regarding the captured moving image. The moving image data generated by the camera 210 is composed of a large number of continuous still image data which are data about a two-dimensional still image. The camera 210 having such a function is publicly known or well known, and is commercially available. The still image data is, for example, MJPEG format data, and the still image data does not include depth data. The camera 210 in this embodiment may be such, and for example, a commercially available webcam can be used as the camera 210 in this embodiment. The camera 210 outputs moving image data to the computer device 100. To enable this, the camera 210 is connected to the computer device 100, for example, by wire. Such connection may be wireless. The technique used for connecting the computer device 100 and the camera 210 may be publicly known or well known.
The camera 210 is fixedly arranged at a predetermined position. The predetermined position may basically be anywhere, but is a position where the target face, which is the face of the user who uses the communication system 10 shown in FIG. 2, is reflected in the moving image captured by the camera 210. In this embodiment, the camera 210 is fixed to the upper side of the display 101 at approximately the center in the width direction of the display 101. The actual position of the camera 210 shown in FIG. 2 is the actual position of the camera in the present invention.

Next, the configuration of the computer device 100 that constitutes the image processing apparatus will be described. The hardware configuration of the computer device 100 is shown in FIG.
The hardware includes a CPU (central processing unit) 111, a ROM (read only memory) 112, a RAM (random access memory) 113, and an interface 114, which are interconnected by a bus 116.
The CPU 111 is a computing device that performs computation. The CPU 111 executes the processing described below by executing a computer program recorded in the ROM 112 or the RAM 113, for example. Although not shown, the hardware may include an HDD (hard disk drive) or other large-capacity recording device, and the computer program described above may be recorded in the large-capacity recording device.
The computer program mentioned here includes at least a computer program for causing the computer apparatus 100 to execute a process, which will be described later, for generating converted moving image data by converting moving image data. This computer program may be pre-installed in the computer device 100 or may be installed afterwards. The computer program may be installed in the computer device 100 via a predetermined recording medium (not shown) such as a memory card, or via a network such as a LAN or the Internet.
The ROM 112 stores computer programs and data necessary for the CPU 111 to execute the processing described below. The computer program recorded in the ROM 112 is not limited to this, and may include other programs such as an OS, a web browser for browsing a home page via the Internet, and a mailer for handling electronic mail. is there.
The RAM 113 provides a work area necessary for the CPU 111 to perform processing. In some cases, (at least a part of) the computer program and data described above may be recorded.
The interface 114 is for exchanging data between the CPU 111, the RAM 113, etc. connected by the bus 116 and the outside. The above-described display 101, input device 102, and camera 210 are connected to the interface 114.
The operation content input from the input device 102 is input to the bus 116 from the interface 114. The moving image data sent from the camera 210 is also input to the bus 116 from the interface 114.
Further, as is well known, data for displaying an image on the display 101 is sent from the bus 116 to the interface 114 and output from the interface 114 to the display 101.
The interface 114 is also connected to a transmission / reception mechanism (not shown) that is a known means for communicating with the outside via the network 400 that is the Internet. It is possible to send data via the network and receive data via the network 400. The data transmission / reception via the network 400 may be performed by wire or wirelessly. The configuration of the transmission / reception mechanism may be publicly known or well known. The data received by the transmission / reception mechanism from the network 400 is adapted to be received by the interface 114, and the data passed to the transmission / reception mechanism from the interface 114 is transmitted by the transmission / reception mechanism to the outside via the network 400, for example, this embodiment. In this connection, it is sent to the computer device 100 included in the communication system 10 of the other party.

When the CPU 111 executes the computer program, the functional blocks shown in FIG. 4 are generated inside the computer device 100. Note that the following functional blocks may be generated by the functions of the above-mentioned computer program alone for causing the computer apparatus 100 to perform the processing described below, but the above-described computer program and the computer apparatus 100 are installed. It may be generated in cooperation with the generated OS or other computer program.
An input unit 121, a main control unit 122, an image processing unit 123, and an output unit 125 are generated in the computer device 100 in relation to the functions of the present invention.

The input unit 121 receives an input from the interface 114.
Input from the interface 114 to the input unit 121 includes input from the input device 102. Although details will be described later, the input from the input device 102 includes, for example, designation data and start data. When the input data such as the designated data and the start data is input from the input device 102, all the data from the input device 102 are sent from the input unit 121 to the main control unit 122.
The data input from the interface 114 to the input unit 121 also includes data sent from the computer device 100 included in the communication system 10 that is a counterpart of the video conference and received by the transmission / reception mechanism. Such data is, for example, converted moving image data described later. When the converted moving image data is received by the input unit 121 via the transmission / reception mechanism and the interface 114, the input unit 121 sends them to the main control unit 122.
The data input from the interface 114 to the input unit 121 also includes moving image data sent from the camera 210. When the moving image data is received, the input unit 121 sends it to the main control unit 122.

The main controller 122 controls the entire functional blocks generated in the computer device 100. For example, the main control unit 122 controls communication between the communication systems 10 for realizing a video conference.
The main control unit 122 may receive designated data and start data from the input unit 121. When receiving the designated data and the start data, the main control unit 122 is configured to execute the processes described below. The main control unit 122, which receives the designated data, sends it to the output unit 125.
The main control unit 122 may receive, from the input unit 121, the converted moving image data transmitted from the computer device 100 included in the communication system 10 that is the other party of the video conference and received by the transmission / reception mechanism. Upon receiving this, the main control unit 122 sends the converted moving image data to the output unit 125.
The main control unit 122 may receive the moving image data sent from the camera 210 from the input unit 121. The main control unit 122, which has received this, sends the moving image data to the image processing unit 123 when the conditions described later are satisfied.

The image processing unit 123 performs image processing.
The image processing unit 123 may receive the moving image data from the main control unit 122 as described above. When the moving image data is received, the image processing unit 123 performs image processing on the moving image data and converts the moving image data into converted moving image data.
As described above, the moving image data is composed of a large number of continuous still image data which are data about a two-dimensional still image. Then, the target face is reflected in the still image based on each still image data. The image processing unit 123 converts such moving image data into converted moving image data. Although the specific processing contents of such conversion will be described in detail later, in brief, the image processing unit 123 converts a plurality of still image data included in the moving image data into converted still image data, and The converted still image data is made continuous to form converted moving image data. That is, the converted moving image data is a series of converted still image data. The converted still image data is data of a converted still image that is a two-dimensional still image. The converted moving image data is general moving image data, for example, data in the MJPEG format.
As described above, the moving image data or the still image data included in the moving image data is generated by the camera 210 in the actual position, and the moving image or the still image based on the moving image data reflects the target face captured from the actual position. There is. On the other hand, the converted still image data is data of a converted still image, which is data generated based on the still image data or by converting the still image data. The converted still image is captured by the camera when the camera is present at a virtual position, which is a predetermined position on a virtual straight line extending in the front direction from the target face when facing the front (the user takes a natural posture). It is a two-dimensional still image that should be displayed. That is, the target face included in the converted still image specified by the converted still image data becomes the target face when the image is taken from the virtual position which is the front of the user's face, and is basically in a front-facing state. The virtual position of the camera 210 will be described later in detail.
The still image data is data of a still image (so-called frame) that constitutes a moving image. The image processing apparatus may generate the converted still image data from all the still image data received from the camera, but doing so may cause a delay in the moving image. Therefore, if emphasis is placed on not causing a delay, the still image data to be converted to the converted still image data is set to, for example, every two or more of the still image data included in the moving image data. Every three (every two frames or every three frames) still image data can be used. Then, the number of frames of the converted moving image data (the number of converted still image data included in the converted moving image data per second) is equal to the number of frames of the moving image data (still image data included in the moving image data per second). However, if the number of frames of the converted moving image data is at least about 6 to 8 fps, the moving image based on the converted moving image data can be used as a moving image. Of course, the still image data to be converted does not have to be a fixed number of still image data such as every two or three.
In any case, the image processing unit 123 sends the generated converted moving image data to the output unit 125.

The output unit 125 outputs the data generated by the functional blocks in the computer device 100 to the interface 114.
As described above, the output unit 125 may receive the designated data from the main control unit 122. When the designated data is received, the output unit 125 sends it to the transmission / reception mechanism via the interface 114. The designated data is information that identifies the computer device 100 included in the communication system 10 of the other party when the video conference is held.
As described above, the output unit 125 may receive the converted moving image data from the main control unit 122. The converted moving image data is sent from the computer device 100 included in the communication system 10 of the other party. When receiving the converted moving image data, the output unit 125 sends it via the interface 114 to the display 101 connected to the computer apparatus 100. A moving image based on the converted moving image data is displayed on the display 101.
As described above, the output unit 125 may receive the converted moving image data from the image processing unit 123. The converted moving image data is generated in the computer device 100 in which the output unit 125 is located. When receiving the converted moving image data, the output unit 125 sends it to the transmitting / receiving mechanism via the interface 114. The transmission / reception mechanism is configured to send the converted moving image data to the computer device 100 specified by the above-mentioned designated data.

Next, the usage method and operation of the video conference system described above, particularly the usage method and operation of the computer device 100 in the communication system 10 that functions as the image processing device in the present invention will be described.

As described above, the video conference system includes the first communication system 10-1 used by one user participating in the video conference and the second communication system 10-2 used by the other user participating in the video conference. Is included.

Both users prepare for the video conference.
As in the case of using a known or well-known video conference system, one user watches the display 101 in the first communication system 10-1, while the other user displays the display in the second communication system 10-2. While watching 101, hold a video conference. Therefore, one user is sitting in front of the display 101 in the first communication system 10-1, and the other user is sitting in front of the display 101 in the second communication system 10-2. Move to position.

In addition, the participants of the video conference specify two users who hold the video conference. The identification of the two users can be realized by using a known technique or a known technique. For example, the two users can be specified by at least one of the two users participating in the video conference specifying the other party to whom the video conference is to be performed. Of course, both users may specify the other party. In this embodiment, one user specifies the other party with whom the video conference is to be performed, and the user on the specified side approves the other party. Two users who have a meeting are specified.
The case of identifying the other party from the side of one user who uses the first communication system 10-1 will be described as an example. First, a user who uses the first communication system 10-1 operates the input device 102 included in the first communication system 10-1 to generate designated data. The designated data is information that identifies the user of the other party who holds the video conference. For example, each of the users who may participate in the video conference is given an ID that is a unique identifier. The user using the first communication system 10-1 can input the designated data by inputting this ID using the input device 102 or by selecting from the IDs registered in advance. In this example, it is assumed that the designation data designates the ID of the user who uses the second communication system 10-2. The input designated data reaches the input unit 121 from the input device 102 via the interface 114. The input unit 121 further attaches the ID of the first communication system 10-1 itself to the designated data and sends them to the output unit 125 via the main control unit 122. The designated data and the ID of the first communication system 10-1 are sent from the output unit 125 to the transmitting / receiving mechanism via the interface 114. The transmission / reception mechanism transmits the first communication system 10-1 to the communication system 10 operated by the user having the ID specified by the designated data, that is, the computer device 100 of the second communication system 10-2 via the network 400. Send your ID.
In the above-described process of sending the ID from the first communication system 10-1 to the second communication system 10-2, the user of the first communication system 10-1 identifies the user of the second communication system 10-2 as the other party of the video conference. In addition, the user of the first communication system 10-1 applies for a video conference with the user of the second communication system 10-2.

The computer device 100 of the second communication system 10-2 receives the ID of the first communication system 10-1 transmitted from the computer device 100 of the first communication system 10-1 via the network 400 by the transmission / reception mechanism. In the computer device 100 included in the second communication system 10-2, the ID reaches the input unit 121 from the transmission / reception mechanism via the interface 114, and is further transmitted to the main control unit 122. Upon receiving this, the main control unit 122 receives an image indicating that the user of the first communication system 10-1 has applied for the video conference, for example, the first communication system 10 sent from the first communication system 10-1. An image including the ID of the user of -1 is generated, and the data of the image is sent to the output unit 125. The output unit 125 sends the image data to the display 101 via the interface 114. As a result, an image indicating that the user of the first communication system 10-1 has applied for the video conference is displayed on the display 101 included in the second communication system 10-2.
When approving to hold a video conference with the user of the first communication system 10-1, the user of the second communication system 10-2 uses the input device 102 to make an input indicating the intention of the approval. This corresponds to the designated data in the computer device 100 included in the second communication system 10-2. If the user of the first communication system 10-1 does not agree to hold the video conference, the user of the second communication system 10-2 does not make an input indicating the intention of the approval or the first communication system 10-1. Input indicating the intention not to accept the video conference with the user. In this case, the video conference is not realized. When the user of the second communication system 10-2 gives an indication that he / she is willing to hold the video conference, designated data, which is data indicating that, is input to the computer device 100 included in the second communication system 10-2. When input from the device 102, the designated data is sent to the main control unit 122 via the interface 114 and the input unit 121.
Receiving it, the main control unit 122 generates data indicating that the video conference is ready to be conducted, and sends it to the output unit 125. The data is sent from the output unit 125 to the transmitting / receiving mechanism via the interface 114, and is then sent from the transmitting / receiving mechanism to the first communication system 10-1 via the network 400.

The transmission / reception mechanism of the computer device 100 in the first communication system 10-1 receives the data sent from the second communication system 10-2. The data is sent from the transmission / reception mechanism to the main control unit 122 of the computer apparatus 100 of the first communication system 10-1 via the interface 114 and the input unit 121.
As described above, the computer device 100 in the first communication system 10-1 and the computer device 100 in the second communication system 10-2 transmit and receive the converted moving image data, which is the data about the moving image necessary for the video conference. You are ready to do each other.
In addition, before the video conference, both parties participating in the video conference are placed so that the target faces, which are the faces of both users, are located within the imaging range of the camera 210 included in the communication system 10 near both users. The user, for example, adjusts his / her own posture or adjusts the position and angle of the camera 210 as necessary.
This completes the preparation for the video conference.

Then, the video conference is started.
Although not limited to this, in this embodiment, when the user who uses the first communication system 10-1 inputs the start data, the second communication of the converted moving image data generated by the first communication system 10-1 is performed. Transmission to the system 10-2 is performed, a moving image based on the converted moving image data is displayed on the display 101 included in the second communication system 10-2, and a user who uses the second communication system 10-2. When the start data is input, the converted moving image data generated by the second communication system 10-2 is transmitted to the first communication system 10-1 and included in the first communication system 10-1. A moving image based on the converted moving image data is displayed on the display 101. Since the contents of these two processes are substantially the same, the converted moving image data is generated in the first communication system 10-1, and the generated converted moving image data is sent to the second communication system 10-2. Then, the following description will be given focusing only on the processing when the moving image based on the converted moving image data is displayed on the display 101 included in the second communication system 10-2.

The user of the first communication system 10-1 uses the input device 102 to input start data. When the start data is input, the start data is sent from the input device 102 to the main control unit 122 in the computer device 100 of the first communication system 10-1 as in the case of the designated data. The main control unit 122 which has received it starts the process for transmitting the converted moving image data to the computer device 100 in the second communication system 10-2.

Although not limited to this, in this embodiment, moving image data is sent from the camera 210 connected to the computer apparatus 100 to the computer apparatus 100 regardless of whether or not start data is input. The moving image data is constantly sent to the main control unit 122 via the interface 114 and the input unit 121. The main control unit 122 does not perform any processing even if the moving image data is received until the start data is input. However, when the moving image data is received, the received moving image data is processed by the image processing unit 123. Send to.

The image processing unit 123 that has received the moving image data performs a process of converting the moving image data into converted moving image data. The moving image data and the converted moving image data are as described above, and the conversion may be performed in any way. In this embodiment, four types of conversion methods, i.e., first to fourth conversion methods, are proposed.

(Common points from the first conversion method to the fourth conversion method)
The image processing unit 123 includes a frame dropping unit that extracts at least a plurality of still image data from the still image data included in the moving image data as a target of image processing (conversion). However, the frame dropping unit is not essential as described later.
In addition, the image processing unit 123, from each of the at least a plurality of still image data extracted by the frame dropping unit, the three-dimensional image of the face portion of the target face reflected in the still image specified by the still image data. A three-dimensional model generation unit that generates a model is provided.
The image processing unit 123 also includes a three-dimensional model rotation unit that performs a process of rotating the plurality of three-dimensional models generated by the three-dimensional model generation unit by a rotation angle that is a fixed angle.
The image processing unit 123 also includes a two-dimensional image generation unit that generates converted still image data based on each of the three-dimensional models rotated by the three-dimensional model rotation unit.
These functions are the same from the first conversion method to the fourth conversion method.
The difference between the first to fourth conversion methods is generally that the rotation angle (including the rotation direction) of the three-dimensional model when the target face is rotated by the three-dimensional model rotation unit is determined. The only way to do it is.

(First conversion method)
When the image processing unit 123 executes the first conversion method, the image processing unit 123 is configured as shown in FIG.
The image processing unit 123 in this case includes a frame dropping unit 123A, a three-dimensional model generation unit 123B, a three-dimensional model rotation unit 123C, and a two-dimensional image generation unit 123D.
As described above, the frame dropping unit 123A extracts at least a plurality of still image data as image processing (conversion) targets from the still image data included in the moving image data. Only the extracted still image data is converted from still image data to converted still image data. All of the still image data included in the moving image data is not subject to conversion into the converted still image data because the computing power of the computer device 100 converts the moving image data into the moving image data, which requires immediateness. This is because there may be a shortage in performing conversion (or conversion of still image data into still image data). Therefore, if the computing power of the computer device 100 is sufficient, it means that the frame dropping unit 123A is unnecessary.
Although not limited to this, the frame dropping unit 123A in this embodiment extracts every five still image data included in the moving image data of 60 fps sent from the camera 210 every ten still images per second. The data will be extracted. However, the frame dropping unit 123A need not always extract a fixed number of still image data, and need not set the number of still image data extracted per second to 10. The number can be, for example, about 6 to 8 or more.
In addition, as described above, the three-dimensional model generation unit 123B reflects the three-dimensional model reflected in the still image specified by the still image data from each of the at least a plurality of still image data extracted by the frame dropping unit 123A. To generate. The three-dimensional model is, for example, a wire frame model, but is not limited to this.
In addition, the three-dimensional model rotation unit 123C performs a process of rotating each of the three-dimensional models generated by the three-dimensional model generation unit 123B by a certain rotation angle. The orientation and angle in which each of the 3D models is rotated is constant for all 3D models. The two-dimensional image generation unit 123D also generates converted still image data based on each of the three-dimensional models rotated by the three-dimensional model rotation unit 123C.
Here, the rotation angle when the three-dimensional model rotating unit 123C rotates the three-dimensional model is such that the two-dimensional image is generated based on the three-dimensional model after being rotated (that is, returned to the two-dimensional image). The target face (more accurately, the face portion of the target face) that is sometimes included in the two-dimensional image is determined so as to be the same as the target face when captured by the camera in the virtual position. .. The virtual position is a predetermined position on a virtual straight line extending in the front direction from the target face when facing the front (the user takes a natural posture). That is, the three-dimensional model rotation unit 123C makes the moving image data (or still image data) captured by the camera 210 at the actual position the same as that captured by the virtual camera at the virtual position for the target face. The 3D model of the facial part of the target face is rotated so that
In the first conversion method, the rotation angle is predetermined. The data that specifies the rotation angle is, for example, recorded in advance in the three-dimensional model rotation unit 123C, and the three-dimensional model rotation unit 123C rotates the three-dimensional model by the rotation angle specified by the data that specifies the rotation angle. Let

The contents of the processes performed by the three-dimensional model generation unit 123B, the three-dimensional model rotation unit 123C, and the two-dimensional image generation unit 123D, and the principle of the present invention will be conceptually described with reference to FIGS. 6 to 8.
FIG. 6A shows a side view of the relationship between the camera 210 and the target face. The camera 210 exists in the actual position immediately above the display 101. In this example, it is assumed that the camera 210 is located above the target face, although it is in the front direction of the target face when considered in the horizontal direction. In this case, the camera 210 images the target face from the upper side by the angle θ, and the moving image based on the moving image data generated by the camera 210 or the still image based on the still image data included in the moving image data. The target face reflected in the image is captured from above from the angle θ. FIG. 6B shows an example in which an image based on such moving image data is displayed on the display 101 included in the communication system 10 of the other party. As is clear from this example, when a moving image based on the moving image data itself is displayed on the display 101, the target face included in the moving image is directed downward by the angle θ.
Here, the three-dimensional model generation unit 123B generates a three-dimensional model of the face part of the target face included in the still image specified by the still image data.
The three-dimensional model generation unit 123B first extracts the face portion F of the target face from the image included in the still image. The method for extracting the face portion F may be any method, but a general image recognition technique may be used. The area surrounded by the broken line in FIG. 7A is the face portion F. Although not limited to this, the face portion in this embodiment means a portion of the human head (target face) that is generally in front of the ears and below the forehead. However, the range of the face part may be narrower at least in the range including eyes, nose, and mouth, or may be wider up to the entire head.
The three-dimensional model generation unit 123B generates a three-dimensional model for the above-mentioned face portion F. The three-dimensional model generation unit 123B generates a three-dimensional model using a conversion algorithm that estimates a three-dimensional model of a human face obtained by machine learning of many faces. Automatically create a three-dimensional model of the facial part of a face reflected in a still image from one general two-dimensional still image (in other words, from the data of a single facial photograph) The technology is disclosed in detail in the paper "Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression / Accepted to ICCV 2017" (URL: http://aaronsplace.co.uk/papers/jackson2017recon/). .. The conversion algorithm described above is generated by machine learning by a computer using a large number of two-dimensional still images of human faces generated by capturing various human faces from various angles. It is a thing. The three-dimensional model generation unit 123B automatically generates a three-dimensional model of the face portion F of the target face reflected in the still image specified by the still image data, using the conversion algorithm.
The three-dimensional model generated in that case is, for example, as shown in FIG. 7 (B). What is shown in FIG. 7B (1) is a three-dimensional model of the face portion F of the target face viewed from the front. The three-dimensional model is a wire frame model, but is not limited to this. Further, (2) is a side view of the three-dimensional model of the face portion F, in which the wire frame is omitted. The face portion F faces downward by the angle θ shown in FIG.
The three-dimensional model generation unit 123B also generates data of a portion of the still image data excluding the face portion F, that is, data of a still image of a portion around the face portion F in FIG. 7A. , And sends it to the two-dimensional image generation unit 123D.
The three-dimensional model, which is in the state of facing downward by the angle θ, naturally faces the front if it is rotated upward by the angle θ. Here, the angle θ can be easily obtained by using a and b shown in FIG. θ is obtained by a very simple calculation formula such as θ = atan (b / a). Here, a is the horizontal distance from the virtual position X of the camera to the target face, and b is the vertical distance from the virtual position X of the camera 210 to the actual position of the camera 210. In this example, the virtual position X of the camera 210 is the position just before the display 101 in the front direction of the target face. That is, the virtual position X is located on a virtual straight line extending in the front direction of the target face of the user who takes a natural posture. As long as the condition is satisfied, the relative positional relationship between the virtual position X and the display 101 does not matter. For example, the virtual position X may be located inside the display 101 or behind the display 101. . For example, when a is 40 cm and b is 10 cm, θ is about 14 degrees, and when a is 30 cm and b is 5 cm, θ is about 9.5 degrees. The former of the two angles is a value often found in the communication system 10 made in combination with the desktop computer device 100, and the latter is a value often found in the communication system 10 made using a smartphone. is there.
The three-dimensional model rotating unit 123C rotates the three-dimensional model shown in FIG. 7B upward by an angle θ in the vertical plane. Then, the three-dimensional model faces the front as shown in FIG. FIG. 7C (1) shows a three-dimensional model of the face portion F of the target face viewed from the front. Further, (2) is a side view of the three-dimensional model of the face portion F, in which the wire frame is omitted. Although not limited to this, the three-dimensional model rotation unit 123C in this embodiment rotates the three-dimensional model about a predetermined point. The process for rotating the 3D model is as follows: the 3D model has a certain axis (for example, a horizontal straight line that penetrates both ears, or a straight line that vertically penetrates the center of the skull when viewed in plan), or both of them. It is also possible to rotate around. However, in order to perform such processing, it is necessary to detect the position of the center of the ear or the skull in a three-dimensional model when viewed in plan and specify the coordinates thereof. A point in a virtual space in which the 3D model exists (whether or not the point is a virtual point and is located inside the 3D model. For example, the point is a virtual point where the 3D model exists. By rotating the three-dimensional model about the origin (which specifies the space)), the complicated processing as described above can be omitted. Although not limited to this, in this embodiment, the predetermined point is the lens position of the camera, and is the origin of the virtual space in which the three-dimensional model exists. Further, although not limited to this, in this embodiment, the rotation of the three-dimensional model is executed as a transformation of spatial coordinates with a predetermined point as the origin. By doing this, it becomes unnecessary to detect where the eyes are and where the nose is in the 3D model or the still image, and the 3D model can be treated as a mere mass having the shape of the target face. become.
Then, the two-dimensional image generation unit 123D again generates the two-dimensional image data using the three-dimensional model shown in FIG. 7C after being rotated by the three-dimensional model rotation unit 123C. Such a two-dimensional image is displayed on the excluded face portion F in the data of the portion of the still image data excluding the face portion F that has been sent from the three-dimensional model generating unit 123B to the two-dimensional image generating unit 123D. It is pasted in the corresponding area. The still image thus obtained is the converted still image, and the data of the converted still image is the converted still image data. The target face included in the obtained converted still image basically faces the front as shown in FIG. The data of the part of the still image data excluding the face part F, which is sent from the three-dimensional model generation part 123B to the two-dimensional image generation part 123D, is the data itself of the part of the still image data excluding the face part F. It may be present, but it may be something that has undergone some processing. The range of the face part F in FIG. 7 (D) is the same as the face part F in FIG. 7 (B), but is generated using the three-dimensional model after being rotated and pasted in the range. The edge of the two-dimensional image may not completely coincide with the edge of the range of the face portion F. If it is desired to reduce the unnaturalness due to this, the above-mentioned processing may be performed. The processing may be, for example, any method as long as the edge of the two-dimensional image generated from the rotated three-dimensional model is made to coincide with the edge of the face portion F, but the processing is a two-dimensional image. The processing may be, for example, image scaling in one direction, image scaling in two directions, rotation, and the like. For example, when the three-dimensional model of the face portion F of the target face facing downward is rotated to face the front, the apparent length in the vertical direction, for example, becomes short. In response to such an apparent length change, the three-dimensional model generation unit 123B can perform a process of reducing the vertical length of a still image of a part other than the face part F of the target face. .. Then, the edge of the image of the face generated from the three-dimensional model matches the range of the face portion F well.
If the actual position of the camera 210 is deviated from the frontal direction of the face, the camera 210 is rotated in the horizontal direction in the horizontal direction in the same manner as when the three-dimensional model is rotated in the vertical direction in the above example. Is naturally required, but the description thereof is omitted. Of course, the three-dimensional model rotation unit 123C does not need to individually perform the two processes of the vertical rotation and the horizontal rotation, and the three-dimensional model rotation unit 123C performs one rotation that is a combination of both rotations. Of course, it is possible to do it.

In this way, each of the still image data extracted by the frame dropping unit 123A is converted into converted still image data.
The converted still image data generated as a result is sequentially output from the two-dimensional image generation unit 123D to the output unit 125. This set of a large number of converted still image data is the converted moving image data. That is, the converted moving image data is output from the image processing unit 123 to the output unit 125.

When the first conversion method is executed, as described above, the common or typical rotation angle θ (14 degrees or 9.5 degrees in the above example) is set to the three-dimensional model. The rotation unit 123C uses it as an angle for rotating the three-dimensional model. This rotation angle can be selected from a plurality of rotation angles, however, it is basically fixed. Therefore, the numerical values of a and b in the above example may not match the relationship between the real position and the virtual position of the camera 210. In view of the fact that the virtual position of the camera 210 can be freely determined in relation to the computer program, such a situation is essentially that the actual position of the camera 210 is not the position planned at the time of designing the computer program. It occurs when there is.
Therefore, the first conversion method is particularly effective when the actual position of the camera 210 exists at a planned position or a position not far from it. For example, when the computer device 100 is a laptop personal computer, a smart phone, a tablet, or the like, the actual position of the camera is fixed with respect to these housings. In such a case, for example, if the virtual position of the camera is determined to be an appropriate position immediately before or behind the display of a laptop personal computer, smartphone, or tablet, the actual position of the camera is The virtual position can be uniquely determined. If the specifications of the devices constituting the image processing apparatus are clear from this point, the distance between the target face and the virtual position of the camera 210, or the distance between the target face and the display 101 is predicted to some extent according to the size of the display 101. Since it is possible, by comprehensively considering them, it is possible to determine the rotation angle θ in advance with a certain degree of certain accuracy. For example, the computer program for causing the computer device 100 to function as the image processing device in the present application includes a virtual position (or a real position and a virtual position) of a camera in each of various laptop personal computers, smartphones, tablets, and the like. It is possible to include data (that is, a large number of sets of data, which is a pair of the model and the virtual position of the camera) about the above-described data that specifies the rotation angle that can be grasped from the relationship with. In that case, after the computer program is installed in the computer device 100, the computer program has a function of automatically specifying the model of the computer, or after the computer program is installed in the computer device 100, The computer program may implement any of the functions of receiving an input made by the user for specifying the model of the computer device 100 in which the program is installed. By doing so, when the computer device 100 is caused to function as the image processing device of the present invention by the computer program, the above-described rotation angle suitable for the image processing device can be automatically determined from the relationship between the model and the virtual position. It becomes possible to do so.
Further, even if the computer device 100 is a desktop type as described in this embodiment and the positional relationship between the display 101 and the camera 210 can be determined with some degree of freedom, the rotation angle is determined in advance. It is also possible to keep it. In that case, for example, “How many cm above the center of the display in the up-down direction and the width direction, the camera is placed, and how many cm away from the virtual position of the camera immediately in front of the center of the display. The user is instructed to use the image processing apparatus, and the user is made to have the positional relationship between the display 101 and the camera 210 set in advance, and the virtual position determined as described above and the user set it accordingly. The rotation angle can be determined in advance in consideration of the relationship with the actual position of the wax camera 210.

(Second conversion method)
When the image processing unit 123 executes the second conversion method, the image processing unit 123 is configured as shown in FIG.
In this case, the image processing unit 123, like the image processing unit 123 that executes the first conversion method, has a frame dropping unit 123A, a three-dimensional model generation unit 123B, a three-dimensional model rotation unit 123C, and a two-dimensional image generation unit 123D. Is equipped with. All of their configurations and functions are the same as those in the case of the first conversion method except that the three-dimensional model rotation unit 123C in the case of executing the second conversion method does not previously record the data for specifying the rotation angle. Is the same as.
On the other hand, the image processing unit 123 that executes the second conversion method includes the angle detection unit 123E. The angle detector 123E determines the above-mentioned rotation angle by performing a predetermined calculation based on the moving image data sent from the main controller 122. Note that in FIG. 9, the moving image data is directly input from the main control unit 122 to the angle detection unit 123E, but the angle detection unit 123E uses the rotation angle based on the still image data extracted by the frame dropping unit 123A. θ may be determined.
If such an angle detection unit 123E is used, it is not necessary to pay attention to the relative positional relationship between the actual position and the virtual position of the camera 210.
In order for the angle detection unit 123E to automatically obtain the rotation angle from the moving image data, it is conceivable to let the angle detection unit 123E perform machine learning. If the angle detection unit 123E learns the image of the face taken from various angles and the angle at which each image was taken, the still image based on the still image data included in the moving image data can be obtained. It is possible to cause the angle detection unit 123E to detect from what angle the reflected face is imaged. If that is possible, the angle detector 123E can naturally determine the magnitude of the rotation angle θ including the direction of rotation.
In the case of using the second conversion method, for example, the user is informed of an instruction such as "keep the front facing for a few seconds until the rotation angle is determined", and the user is informed of the instruction. It is desirable to follow the instructions. Otherwise, it is possible to input data for executing the mode for determining the rotation angle from the input device 102, and the mode for determining the rotation angle should be performed in advance, for example, before the input of the start data. Is possible.
Data specifying the rotation angle determined by the angle detection unit 123E is sent from the angle detection unit 123E to the three-dimensional model rotation unit 123C. The three-dimensional model rotation unit 123C rotates each three-dimensional model in the same angle and in the same direction with the rotation angle specified by the data, as in the case of the first conversion method.
Even when the second conversion method is executed, the converted moving image data is output from the image processing unit 123 to the output unit 125.

(Third conversion method)
When the image processing unit 123 executes the third conversion method, the image processing unit 123 is configured as shown in FIG.
The third conversion method does not determine the rotation angle in advance, but also performs the process of determining the rotation angle, like the second conversion method. The image processing unit 123 when executing the third conversion method is similar to the image processing unit 123 when executing the second conversion method.
Similar to the image processing unit 123 that executes the second conversion method, the image processing unit 123 that executes the third conversion method includes a frame dropping unit 123A, a three-dimensional model generation unit 123B, a three-dimensional model rotation unit 123C, and a two-dimensional model rotation unit 123C. The three-dimensional image generation unit 123D is provided. On the other hand, the image processing unit 123 that executes the third conversion method includes a rotation angle determination unit 123F instead of the angle detection unit 123E in the image processing unit 123 that executes the second conversion method.
The rotation angle determination unit 123F has a function of determining the rotation angle, like the angle detection unit 123E described above. The angle detection unit 123E determines the rotation angle by performing a predetermined calculation based on the moving image data, but the rotation angle determination unit 123F performs a predetermined calculation based on other data instead of the moving image data. Determine the rotation angle.
The data used by the rotation angle determination unit 123F to determine the rotation angle is parameter data input from the input device 102, parameter data input from a sensor (not shown), or both of them. .. Any of the above parameters input from the input device 102 or the sensor may be of any type as long as it is useful for determining the rotation angle.
The parameters input from the input device 102 are, for example, information that specifies the shape of the display 101 (for example, the aspect ratio is 3: 4 or 9:16) and the size of the display 101 (for example, how many inches the display 101 is. Information that specifies where the actual position of the camera is (for example, directly above the display 101 at the center of the width direction of the display 101 or the upper right corner of the display 101), from the display 101 to the target face Is information for specifying the distance of.
The sensor may measure a parameter useful for obtaining a relative positional relationship between the real position and the virtual position of the camera 210 and a relative positional relationship between the virtual position of the camera 210 and the target face. .. For example, a known or well-known distance measuring device may be used as a sensor, and a parameter for measuring the distance of the target face from the sensor may be used.
The data specifying the rotation angle determined by the rotation angle determination unit 123F is sent from the rotation angle determination unit 123F to the three-dimensional model rotation unit 123C. The three-dimensional model rotation unit 123C rotates each three-dimensional model in the same angle and in the same direction with the rotation angle specified by the data, as in the case of the first conversion method.
Even when the third conversion method is executed, the converted moving image data is output from the image processing unit 123 to the output unit 125.
Even when the third conversion method is used, data for executing the mode for determining the rotation angle can be input from the input device 102, and the mode for determining the rotation angle can be changed by inputting the start data, for example. It is preferable to do it beforehand.

(Fourth conversion method)
When the image processing unit 123 executes the fourth conversion method, the image processing unit 123 is configured as shown in FIG.
The fourth conversion method does not determine the rotation angle in advance, but also performs the processing of determining the rotation angle, like the second and third conversion methods.
The image processing unit 123 that executes the fourth conversion method includes the same functional blocks as the image processing unit 123 when executing the first conversion method. The image processing unit 123 that executes the fourth conversion method includes a frame dropping unit 123A, a three-dimensional model generation unit 123B, a three-dimensional model rotation unit 123C, and a two-dimensional image generation unit 123D. Those configurations and functions are that the three-dimensional model rotation unit 123C in the case of executing the fourth conversion method does not previously record the data specifying the rotation angle, and the rotation angle change data for changing the rotation angle. Is input from the main control unit 122 to the three-dimensional model rotation unit 123C, and the three-dimensional rotation model rotation unit 123C receives the rotation angle change data every time the rotation angle change data is received. All are the same as the case of the first conversion method except that the rotation angle for rotating the three-dimensional model of the target face is changed based on the above.
Even when the fourth conversion method is executed, the converted moving image data generated by the image processing unit 123 is sent to the output unit 125, as in the case where the first conversion method is executed. This data is sent from the output unit 125 to the display 101. Then, on the display 101, a moving image based on the converted moving image data will be displayed, as will be described later. This display is performed in substantially real time after the image is captured by the camera 210, preferably within 0.5 seconds.
The user inputs the rotation angle change data while looking at his / her own face (target face) displayed on the display 101, and rotates the target face little by little, for example, to target the target face displayed on the display 101. Adjust so that your face is basically facing the front. The rotation angle change data is input using the input device 102. The rotation angle change data reaches the main control unit 122 in the same manner as other data input by the input device 102, and is sent from the main control unit 122 to the three-dimensional model rotation unit 123C. The rotation direction of the three-dimensional model is not limited to this, but may be only the vertical direction (around the X axis) and the horizontal direction (around the Y axis). Of course, they can be input using the input device 102. The angle at which the three-dimensional model rotating unit 123C rotates the three-dimensional model when the target face displayed on the display 101 is basically facing the front, and after that, the three-dimensional model rotating unit 123C causes the three-dimensional model of the target face. It is determined as the rotation angle when rotating the model at a uniform angle.
Even when the fourth conversion method is executed, the converted moving image data is output from the image processing unit 123 to the output unit 125.
Even when the fourth conversion method is used, it is possible to input the data for executing the mode for determining the rotation angle from the input device 102, and the mode for determining the rotation angle is set to, for example, the input of the start data. It is preferable to do it beforehand.

Regardless of which of the first to fourth conversion methods described above is performed by the image processing unit 123, the output unit 125 receives the converted moving image data from the image processing unit 123 as described above. When receiving the converted moving image data, the output unit 125 sends it to the transmitting / receiving mechanism via the interface 114. The transmission / reception mechanism sends the converted moving image data to the computer device 100 specified by the above-mentioned designated data, that is, the computer device 100 included in the second communication system 10-2.

The transmission / reception mechanism in the computer device 100 included in the second communication system 10-2 receives the converted moving image data sent from the first communication system 10-1. The converted moving image data is sent from the transmission / reception mechanism to the input unit 121 via the interface 114, and then sent from the input unit 121 to the main control unit 122.
The main control unit 122 sends this converted moving image data to the display 101 via the output unit 125 and the interface 114. As a result, a moving image based on the converted moving image data sent from the first communication system 10-1 is displayed on the display 101 in the second communication system 10-2.
The face image displayed on the display 101 basically faces the front as shown in FIG.
I've said several times that basically means when the user is in a natural position. Here, the moving image displayed on the display 101 included in the second communication system 10-2 when the user of the first communication system 10-1 nods will be described.
FIG. 13A shows a state in which the user of the first communication system 10-1 faces downward from the horizontal direction by the angle α. In this case, a deviation of angle θ + angle α occurs between the camera 210 and the front direction of the target face. Therefore, if no image processing is performed, the target face included in the moving image displayed on the display 101 included in the second communication system 10-2 is the target face shown in FIG. It will be as seen from. However, according to the present invention, the target face is displayed on the display 101 while being rotated upward by the angle θ. Therefore, the target face included in the moving image displayed on the display 101 included in the second communication system 10-2 is a state in which the target face shown in FIG. 13C is viewed from the front. That is, the target face of the user of the first communication system 10-1 facing downward from the horizontal direction by the angle α is displayed on the display 101 included in the second communication system 10-2. This is a natural state and does not give a feeling of strangeness to the user of the second communication system 10-2.

<Modification>
A video conference system according to a modified example will be described.
The video conference system according to the modified example includes a first communication system 10-1 and a second communication system 10-2, like the video conference system of the first embodiment. When viewed as hardware, both the first communication system 10-1 and the second communication system 10-2 in the modified example are the same as those in the first embodiment. Both communication systems 10 include a computer device 100, a display 101, and a camera 210.
However, the computer device 100 in both communication systems 10 in the first embodiment has a function of converting moving image data into converted moving image data, but the computer device 100 in both communication systems 10 in the modified example has the function. Do not have. That is, the computer device 100 in both communication systems 10 in the modification is not the image processing device in the present invention. The computer device 100 in both communication systems 10 in the modification basically has only the same functions as those in the conventional video conference system except for the data exchange with the conversion server described later.
In the video conference system in the modified example, the conversion server 20-1 and the conversion server 20-2 have a function of converting the moving image data to be converted moving image data, which the image processing apparatus according to the present invention should perform. That is, the conversion server 20-1 and the conversion server 20-2 in the modified example use the cloud computing technology to send moving image data to the first communication system 10-1 and the second communication system 10-2. It can be said that it provides a function of converting to converted moving image data.

A modified example will be described with reference to FIG.
As shown in FIG. 14, the video conference system according to the modification includes a first communication system 10-1, a second communication system 10-2, a conversion server 20-1, and a conversion server 20-2. The first communication system 10-1, the second communication system 10-2, the conversion server 20-1, and the conversion server 20-2 are all connectable to the network 400.
As described above, the computer device 100 in the first communication system 10-1 is adapted to receive the moving image data from the camera 210 in the actual position. The moving image data is sent from the computer device 100 in the first communication system 10-1 to the conversion server 20-1. The conversion server 20-1 converts the received moving image data into converted moving image data. Then, the conversion server 20-1 returns the converted moving image data to the computer device 100 in the first communication system 10-1. The converted moving image data is sent from the computer device 100 of the first communication system 10-1 to the computer device 100 of the second communication system 10-2, as in the case of the first embodiment. The converted moving image data generated by the conversion server 20-1 is directly sent to the computer device 100 in the second communication system 10-2 without being sent to the computer device 100 in the first communication system 10-1. It may be sent.

The hardware configuration of the conversion server 20-1 for enabling the above-described functions to be exhibited may be basically the same as the hardware configuration of the computer device 100 according to the first embodiment, and the functions generated therein. The blocks may be the same as the functional blocks in the computer device 100 according to the first embodiment.
In the first embodiment, the computer device 100 receives the moving image data from the camera 210, and the moving image data reaches the input unit 121 in the order of the camera 210, the interface 114, and the input unit 121. On the other hand, the conversion server 20-1 in the modification is adapted to receive the moving image data from the computer device 100 in the first communication system 10-1 via the network 400, and the moving image data is transmitted and received. The mechanism, the interface 114, and the input unit 121 are sequentially reached to the input unit 121.
Further, in the first embodiment, the computer device 100 is adapted to receive the input from the input device 102 via the interface 114. On the other hand, the conversion server 20-1 in the modification example is adapted to receive an input from the input device 102 from the computer device 100 in the first communication system 10-1 via the network 400.
In the first embodiment, in the computer device 100, the converted moving image data generated by the image processing unit 123 is sent to the second communication system 10-2 via the output unit 125, the interface 114, and the transmission / reception mechanism. On the other hand, in the conversion server 20-1 in the modified example, the converted moving image data generated by the image processing unit 123 is returned to the first communication system 10-1 via the output unit 125, the interface 114, and the transmission / reception mechanism. .. However, as described above, the conversion server 20-1 may send the converted moving image data to the second communication system 10-2.
The conversion server 20-2 has the same configuration and function as the conversion server 20-1, and has the same function as the conversion server 20-1 provides to the computer device 100 in the first communication system 10-1. Are provided to the computer device 100 in the second communication system 10-2. As a result, the first communication system 10-1 and the second communication system 10-2 can send the converted moving image data to each other, as in the case of the first embodiment.
In addition, one conversion server may provide both communication systems 10 with a function of converting moving image data into converted moving image data.

«Second embodiment»
An image processing apparatus according to the second embodiment will be described.
The appearance of the image processing apparatus in the second embodiment is like a webcam. For example, the image processing apparatus according to the second embodiment has the appearance as shown in FIG. 2, FIG. 8, FIG.
The image processing apparatus according to the second embodiment can be used by being connected to a computer device that constitutes a conventional video conference system. Such a computer device has a function of transmitting / receiving moving image data to / from another computer device, and may be publicly known or well known.
The image processing apparatus according to the second embodiment is integrated with a camera, and the camera includes the same hardware as the hardware configuration of the computer apparatus 100 according to the first embodiment, and the hardware has the first hardware. The same computer program as that described in the embodiment is installed. Therefore, even if the image processing apparatus according to the second embodiment has the appearance of a web camera, the same functional blocks as those shown in FIG. 4 are generated therein. Supplementally, the hardware configuration of the image processing apparatus according to the second embodiment has a camera connected to the interface 114 in FIG. However, the image processing apparatus according to the invention of the present application has such a configuration without the camera.
The image processing apparatus according to the second embodiment has a function of converting moving image data generated by a camera integrated with the image processing apparatus into converted moving image data.
The image processing apparatus according to the second embodiment can be used in the same manner as a normal webcam. However, the data output by this image processing apparatus is not general moving image data, but converted moving image data. Therefore, the computer devices in both communication systems can send the converted moving image data to each other without having the function of converting the moving image data into the converted moving image data as in the first embodiment. Become.

10-1 First Communication System 10-2 Second Communication System 100 Computer Device
101 display 102 input device 121 input unit 122 main control unit 123 image processing unit 123A frame dropping unit 123B three-dimensional model generation unit 123C three-dimensional model rotation unit 123D two-dimensional image generation unit 20-1 conversion server 20-2 conversion server

Claims

It is possible to capture a moving image, and is a two-dimensional image obtained by capturing a target face, which is the face of one image-captured person, with a predetermined one camera existing at a predetermined actual position. A moving image data receiving unit that receives moving image data that is moving image data composed of a large number of continuous still image data that is data about a still image;
At least a plurality of still image data included in the moving image data is present at a virtual position that is a predetermined position on a virtual straight line extending in the front direction from the target face facing front. In the conversion, the converted still image data, which is the converted still image data that is a two-dimensional still image captured by the camera, is converted into the moving image data that is composed of a large number of continuous converted still image data. A conversion moving image data generation unit that generates moving image data,
A moving image data output unit that outputs the converted moving image data generated by the converted moving image data generation unit;
An image processing apparatus comprising:
The conversion moving image data generation unit,
From at least a plurality of each of the still image data included in the moving image data, a large number of three-dimensional models of the face part of the target face reflected in the still image specified by the still image data are provided. A three-dimensional model generation unit that generates using a conversion algorithm that estimates the three-dimensional model of the face obtained by machine learning of the face,
A three-dimensional model rotation unit that performs a process of rotating each of the three-dimensional models generated by the three-dimensional model generation unit by a rotation angle that is a constant angle,
A two-dimensional image generation unit that generates the converted still image data based on each of the three-dimensional models rotated by the three-dimensional model rotation unit,
An image processing apparatus comprising:
The rotation angle is determined in advance and recorded in the image processing device,
The image processing apparatus according to claim 1.
The rotation angle is determined by performing a predetermined calculation based on the moving image data received by the moving image data receiving unit.
The image processing apparatus according to claim 1.
It is provided with an input device receiving unit for receiving data about the parameter from an input device for inputting a predetermined parameter necessary for determining the rotation angle,
The rotation angle is determined by performing a predetermined calculation based on the data about the parameter accepted by the input device acceptance unit,
The image processing apparatus according to claim 1.
A sensor reception unit that receives data about the parameter from a sensor that detects a predetermined parameter required to determine the rotation angle is provided,
The rotation angle is determined by performing a predetermined calculation based on the data about the parameter received by the sensor reception unit.
The image processing apparatus according to claim 1.
The moving image data output unit is adapted to be connected to a predetermined display for displaying a moving image based on the converted moving image data,
A rotation angle change data reception unit that receives rotation angle change data, which is data for changing the rotation angle, is provided.
The three-dimensional model rotation unit receives the rotation angle change data every time the rotation angle change data reception unit receives the rotation angle change data, based on the rotation angle change data received by the rotation angle change data reception unit. It is designed to change the rotation angle for rotating
The image processing apparatus according to claim 1.
The three-dimensional model generation unit extracts the facial part of the target face reflected in the still image specified by the still image data to generate the three-dimensional model, and the three-dimensional model of the still image. Background image data that is data about a two-dimensional still image of a part other than the face part of the target face is generated,
The two-dimensional image generation unit, the face image data that is two-dimensionalized data of the three-dimensional model rotated by the three-dimensional model rotation unit, to the face portion of the target face in the background image data. By pasting, the converted still image data is generated.
The image processing apparatus according to claim 1.
The three-dimensional model generation unit performs two-dimensional predetermined image processing on a still image of a portion of the still image other than the face portion of the target face, and then generates the background image data for the still image. Therefore, when the two-dimensional image generation unit pastes the face image data on a face portion of the target face in the background image data, the face image data and the target The edge part of the face and the face part are more matched,
The image processing apparatus according to claim 7.
The three-dimensional model rotating unit is configured to rotate the three-dimensional model about a predetermined point.
The image processing apparatus according to claim 1.
Integrated with the camera,
The image processing apparatus according to claim 1.
The moving image data receiving unit is adapted to receive the moving image data from the camera via a predetermined network,
The image processing apparatus according to claim 1.
The image processing apparatus is capable of communicating via a predetermined network and is used in pairs.
The converted moving image data generated by one of the image processing apparatuses is bidirectionally transmitted to the other of the image processing apparatuses via the network.
The image processing apparatus according to claim 1.
It is possible to capture a moving image, and is a two-dimensional image obtained by capturing a target face, which is the face of one image-captured person, with a predetermined one camera existing at a predetermined actual position. An image processing method executed by a computer including a moving image data receiving unit that receives moving image data, which is data of a moving image composed of a number of continuous still image data that is data about a still image,
At least a plurality of still image data included in the moving image data is present at a virtual position that is a predetermined position on a virtual straight line extending in the front direction from the target face facing front. In the conversion, the converted still image data, which is the converted still image data that is a two-dimensional still image captured by the camera, is converted into the moving image data that is composed of a large number of continuous converted still image data. A conversion moving image data generation process for generating moving image data,
A moving image data output step of outputting the converted moving image data generated by the converted moving image data generating step;
Including,
In the conversion moving image data generation process,
From at least a plurality of each of the still image data included in the moving image data, a large number of three-dimensional models of the face part of the target face reflected in the still image specified by the still image data are provided. A 3D model generation process that is generated using a transformation algorithm that estimates the 3D model of the face obtained by machine learning of the face;
A three-dimensional model rotation process for performing a process of rotating each of the plurality of three-dimensional models generated in the three-dimensional model generation process by a rotation angle that is a constant angle,
A two-dimensional image generation process for generating the converted still image data based on each of the three-dimensional models rotated in the three-dimensional model rotation process,
Image processing method for executing.
It is possible to capture a moving image, and is a two-dimensional In a computer provided with a moving image data receiving unit that receives moving image data that is moving image data composed of a large number of continuous still image data that is data about a still image,
At least a plurality of still image data included in the moving image data is present at a virtual position that is a predetermined position on a virtual straight line extending in the front direction from the target face facing front. In the conversion, the converted still image data, which is the converted still image data that is a two-dimensional still image captured by the camera, is converted into the moving image data that is composed of a large number of continuous converted still image data. A conversion moving image data generation process for generating moving image data,
A moving image data output step of outputting the converted moving image data generated by the converted moving image data generating step;
To run
In the conversion moving image data generation process,
From at least a plurality of each of the still image data included in the moving image data, a large number of three-dimensional models of the face part of the target face reflected in the still image specified by the still image data are provided. A 3D model generation process that is generated using a transformation algorithm that estimates the 3D model of the face obtained by machine learning of the face;
A three-dimensional model rotation process for performing a process of rotating each of the plurality of three-dimensional models generated in the three-dimensional model generation process by a rotation angle that is a constant angle,
A two-dimensional image generation process for generating the converted still image data based on each of the three-dimensional models rotated in the three-dimensional model rotation process,
A computer program that causes the computer to execute.