WO2014097465A1 - Video processor and video p rocessing method - Google Patents

Video processor and video p rocessing method Download PDF

Info

Publication number
WO2014097465A1
WO2014097465A1 PCT/JP2012/083190 JP2012083190W WO2014097465A1 WO 2014097465 A1 WO2014097465 A1 WO 2014097465A1 JP 2012083190 W JP2012083190 W JP 2012083190W WO 2014097465 A1 WO2014097465 A1 WO 2014097465A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
processing
face
correction
unit
Prior art date
Application number
PCT/JP2012/083190
Other languages
French (fr)
Japanese (ja)
Inventor
佐々木 昭
恭一 中熊
溝添 博樹
Original Assignee
日立マクセル株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日立マクセル株式会社 filed Critical 日立マクセル株式会社
Priority to PCT/JP2012/083190 priority Critical patent/WO2014097465A1/en
Publication of WO2014097465A1 publication Critical patent/WO2014097465A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/45Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from two or more image sensors being of different type or operating in different modes, e.g. with a CMOS sensor for moving images in combination with a charge-coupled device [CCD] for still images
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • H04N23/611Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof

Definitions

  • the technical field relates to a video processing apparatus and a video processing method.
  • Patent Document 1 states that “in a conventional system, it is possible to realize a dialogue between users who actually view the display with their line of sight matched, such as a half mirror, a hologram screen, or a projector. It is necessary to use a special device and a simple and inexpensive system cannot be configured "(see Patent Document 1 [0011]). Images are taken from different angles, the foreground image area including the subject to be photographed and the background image are separated from each photographed image, and each pixel position is associated with the subject to be photographed between the separated foreground image regions. To generate relative position information that indicates the relative positional relationship of the shooting target with respect to each camera and associates them with each other.
  • the pixel position and its luminance component constituting the virtual viewpoint image to be newly generated are obtained according to the generated relative position information, and the pixel position and its luminance component are obtained. “Transmission of virtual viewpoint image to outside” (see Patent Document 1 [0015]) and the like.
  • the present application includes a plurality of means for solving the above-mentioned problems.
  • the present application includes a video input unit to which video information captured by a camera is input, and a video information input to the video input unit.
  • the video processing unit includes a face included in the video information according to a predetermined condition It is characterized in that it is determined whether or not to correct the video.
  • correction processing can be performed without reducing the number of video frames processed per second even when there are a large number of subjects in the shooting range.
  • a video conference terminal will be described as an example of a video processing device.
  • FIG. 1 is a system configuration diagram showing an embodiment of a video conference communication system.
  • 1 and 2 are video processing devices
  • 3 is a network
  • 4 is a camera
  • 5 is a monitor
  • 6 is a microphone
  • 7 is a speaker
  • 8 is a remote control.
  • a video processing apparatus 1 is connected to a camera 4, a monitor 5, a microphone 6, and a speaker 7, and can input and output video and audio.
  • the video processing device 2 has the same device and configuration as the video processing device 1.
  • the video processing apparatus 1 can hold a video conference with the video processing apparatus 2 via the network 3.
  • the video processing apparatus can be operated with the remote controller 8.
  • the camera 4 is connected to the video processing device, and outputs the captured video signal to the video processing device.
  • the monitor 5 is connected to the video processing device, and receives and displays the video signal output from the video processing device.
  • the microphone 6 is connected to the video processing device and outputs the collected sound as an audio signal to the video processing device.
  • Speaker 7 is connected to a video processing device, inputs an audio signal output from the video processing device, and outputs audio.
  • the remote controller 8 transmits a remote control signal to the video processing device and transmits an operation instruction from the user to the video processing device.
  • FIG. 2 is a block diagram showing a specific example of the internal configuration of the video processing apparatus in the video conference communication system shown in FIG.
  • 101 is a control unit
  • 102 is a memory
  • 103-1 is a video encoder
  • 103-2 is an audio encoder
  • 104-1 is a video decoder
  • 104-2 is an audio decoder
  • 118 is a multiplexing unit
  • 119 is separated.
  • 105 a stream processing unit, 106 a storage unit, 107-1 a video processing unit for video output to the monitor 5, 107-2 a video processing unit for video input from the camera 4, and 108-1 Is a sound processing unit for outputting sound to the speaker 7, 108-2 is a sound processing unit for inputting sound from the microphone 6, 109 is a remote control processing unit, 110 is a network connection unit, 112 is a video input terminal, 113 Is a video output terminal, 114 is an audio input terminal, 115 is an audio output terminal, 116 is a remote control input terminal, and 117 is a network connection terminal.
  • control unit 101 develops a program stored in the storage unit 106 in the memory 102 and executes the developed program to realize functions corresponding to various programs. Further, the program is controlled in accordance with operation information input from the remote control processing unit 109.
  • the encoder 103-1 receives the video signal from the video processing unit 107-2, the encoder 103-2 receives the audio signal from the audio processing unit 108-2, and compresses and encodes the input signal information.
  • the audio data is output to a multiplexing unit 118 or a stream processing unit 105 described later.
  • the decoder 104-1 and the decoder 104-2 receive the compression-coded video data and audio data output from the separation unit 119 or the stream processing unit 105, which will be described later, respectively. Each expands and expands to an audio signal.
  • the multiplexing unit 118 multiplexes the compression-encoded video data and the compression-encoded audio data, which are defined as Elementary Stream (ES) in the MPEG-2 system, input from the encoder 103, and transmits Transport Stream ( The packetized video / audio data called TS) is output.
  • ES Elementary Stream
  • TS Transport Stream
  • the separation unit 119 separates the video / audio data output from the stream processing unit 105 into video data and audio data.
  • the stream processing unit 105 generates a network packet that is a packet of a network protocol to be transmitted to another video processing apparatus 2 from the input video and audio data, and outputs a video and audio stream that is a continuous network packet. Also, the video / audio stream received from another video processing apparatus is changed to video / audio data of TS which is a format for processing by the separation unit 119.
  • the video / audio stream is obtained by adding header data such as time information generated by the stream processing unit 105 and video / audio format information to the video / audio data.
  • the storage unit 106 is for storing a program to be executed by the control unit 101.
  • the video processing units 107-1 and 107-2 control the video input terminal 112 and the video output terminal 113, respectively, and output the video signal input from the video input terminal 112 to the encoder 103-1, or the decoder 104-1.
  • the video signal input from is output to the video output terminal 113.
  • the video processing unit 107-2 can simultaneously process a plurality of video signals from the video output terminal 112.
  • the audio processing units 108-1 and 108-2 control the audio input terminal 114 and the audio output terminal 115, respectively, and output the audio signal input from the audio input terminal 114 to the encoder 103-2 or the decoder 104-2.
  • the audio signal input from is output to the audio output terminal 115.
  • the remote control processing unit 109 is for outputting a remote control signal input from the remote control input terminal 116 to the control unit 101 as operation information.
  • the network connection unit 110 transmits / receives a video / audio stream and connection information necessary for performing a video conference with the network connection terminal 117 and other video conference communication apparatuses connected via the network 3.
  • the video input terminal 112 is connected to the camera 4 and outputs the video signal input from the camera 4 to the video processing unit 107-2.
  • the video input / output terminal 112 can also connect a plurality of cameras and simultaneously output a plurality of video signals to the video processing unit 107-2.
  • the video output terminal 113 is connected to the monitor 5 and outputs the video signal input from the video processing unit 107-1 to the monitor 5.
  • the audio input terminal 114 is connected to the microphone 6 and outputs an audio signal input from the microphone 6 to the audio processing unit 108-2.
  • the audio output terminal 115 is connected to the speaker 7 and outputs the audio signal input from the audio processing unit 108-1 to the speaker 7.
  • FIG. 3 is a diagram showing a specific example of a program read from the storage unit 106 of the video processing apparatus in FIG. 2 and developed in the memory 102, where 301 is a face detection unit, 302 is a virtual viewpoint calculation unit, and 303. Is an angle calculation unit, 304 is a face area calculation unit, 305 is a correction processing amount calculation unit, 306 is a correction necessity calculation unit, 307 is a correction processing unit, and 309 is a processing load management unit.
  • the face detection unit 301 controls each unit such as the video processing unit 107-2, detects a human face and facial organs from the video signal captured by the camera 4, and extracts the coordinates of facial feature points. Or a program for specifying the face of a person who is speaking.
  • the virtual viewpoint calculation unit 302 is a program for correcting the video signal of the camera 4 and calculating the coordinates of the virtual viewpoint for generating a video shot from a viewpoint different from the viewpoint of the camera 4. is there.
  • the angle calculation unit 303 calculates a line segment connecting the face and the camera 4, the face, and the virtual viewpoint from the coordinates of the person's face obtained by the face detection unit 301 and the coordinates of the virtual viewpoint obtained by the virtual viewpoint calculation unit 302. This is a program for calculating an angle formed by connecting line segments.
  • the face area calculation unit 304 is a program for calculating the area ratio of the face in the captured video from the coordinates of the feature points of the person's face obtained by the face detection unit 301.
  • the correction processing amount calculation unit 305 is a program for estimating and calculating a calculation processing amount necessary for correcting the face of the person obtained by the face detection unit 301 into a face image captured from a virtual viewpoint. is there.
  • the correction necessity degree calculation unit 306 calculates the necessity degree of face correction in order to obtain the effect of matching the line of sight from the information obtained by the face detection unit 301, the face angle calculation unit 303, and the face area calculation unit 304. It is a program.
  • the correction processing unit 307 corrects the face of a person in the video signal captured by the camera 4 from information obtained by the face detection unit 301, the virtual viewpoint calculation unit 302, the angle calculation unit 303, and the like, and captures the image from the virtual viewpoint. It is a program to make such a picture.
  • the processing load management unit 309 is a program for managing the processing load status of the video processing device 1 in real time and calculating the processing amount when the correction processing unit 307 performs processing.
  • FIG. 4 is a diagram of a determination list for determining whether or not the line-of-sight shift is within an allowable range when the video processing device 1 performs the line-of-sight process.
  • the determination list is generated on the memory 102 by the control unit 101.
  • 700 is a determination list
  • 701 is an ID for identifying the face detected by the face detection unit 301
  • 702 is a shift angle that expresses the shift of the line of sight as an angle
  • 703 is a correction that quantifies the necessity of correcting the detected face.
  • Necessity 704 is an estimated processing amount that is an estimated value of the amount of calculation processing required when executing the face correction processing.
  • the ID 701 is a number assigned when the face detection unit 301 detects a face.
  • the deviation angle 702 is a value calculated by the angle calculation unit 303.
  • the correction necessity 703 is a value calculated by the correction necessity calculation unit 306.
  • the estimated processing amount 704 is a value calculated by the correction processing amount calculation unit 305.
  • FIG. 5 is a flowchart showing a specific example of the main process of the series of line-of-sight processing. This process is executed by the control unit 101 every time the video processing apparatus 1 inputs a video signal. The subsequent programs handle the input video signal in units of video frames, which are one still image.
  • the face detection unit 301 detects a face included in the video frame (S901).
  • the face detection unit 301 assigns a numerical value as an ID to the detected face and writes the ID in the determination list 700.
  • S902 and S905 indicate processing loop ends.
  • one face that has not yet been processed is selected from the detected faces (S903).
  • S904 a subroutine for determining the necessity of correcting the face is executed (S904).
  • the deviation angle 702, the correction necessity 703, and the estimated processing amount 704 of the determination list 700 are calculated by a subroutine process.
  • the estimated processing amount 704 of the determination list 700 is added together (S906). It is determined whether the combined estimated processing amount does not exceed the limit processing amount that can be processed in real time by the current control unit 101 (S907).
  • the limit processing amount is a value calculated by the processing load management unit 309 every time the flowchart of FIG. 5 is executed. This value is, for example, a processing amount necessary for performing the line-of-sight processing at a speed of 29.97 video frames per second. When processing exceeding this value is performed, the number of video frames that can be processed per second is reduced, resulting in dropped frames. In a state where the processing load of the control unit 101 is large, the limit processing amount becomes much smaller.
  • the number of video frames to be processed per second which is a reference for calculating the limit processing amount, is set to other values such as 30 and 60 according to the number of video frames to be displayed per second of the video processing device 1. can do.
  • the processing amount obtained by subtracting a predetermined processing amount from the maximum processing amount of the control unit 101 may be used as the limit processing amount, or the limit processing amount may be set using another criterion.
  • the face to be corrected is selected using the determination list 700 (S908). If not exceeded (N in S907), correction processing is performed on all faces corresponding to IDs 701 whose correction necessity level is other than 0 in the determination list 700 (S909).
  • the estimated processing amounts are added together in descending order of necessity of correction in the determination list 700, and faces up to a point where the limit processing amount calculated by the processing load management unit 309 is not selected are selected.
  • the determination list 700 of FIG. 4 for example, when the limit processing amount is 25, faces whose IDs are 1 and 2 are selected so that the total value of the estimated processing amounts 704 does not exceed 25.
  • the selected face is corrected (S909).
  • the face correction process is performed by processing the face image so that the line of sight matches.
  • the cameras 4 are respectively installed at the upper and lower portions of the monitor 5. A method of correcting a face taken from an arbitrary position between two cameras using a face image taken from two different angles will be described.
  • step S901 face detection is performed for each of the video frames input from the two cameras, and feature points of facial organs such as eyes, nose, and mouth are extracted. Using the extracted feature points, the faces in the two video frames are matched. Two matched faces are extracted from the video frame, and the two faces are synthesized by morphing processing. At this time, the morphing ratio is adjusted so that the line of sight meets the angle.
  • the morphing ratio may not be an angle at which the line of sight meets but a maximum angle within an allowable range of the line of sight deviation.
  • the sense of incongruity due to the correction may be greater than the sense of incongruity due to lack of line of sight. If it is possible to reduce the amount of deformation of the face by setting the maximum angle within the allowable range of line of sight rather than the angle at which the line of sight meets, use the maximum angle within the allowable range of line of sight.
  • the ratio that is, the correction amount of the face may be determined.
  • a TOF (Time of Flight) camera is used to generate a 3D model of the subject's face and rotate the 3D model so that the face has an angle that matches the line of sight. There is a way to correct it.
  • TOF Time of Flight
  • the video frame on which the control unit 101 has performed line-of-sight processing is output to the video encoder 103-1, and is transmitted to the video processing device 2 as a video / audio stream by the multiplexing unit 118 and the stream processing unit 105.
  • FIG. 6 is a diagram showing a specific example of the positional relationship between the camera, the monitor, and the video conference participants.
  • 4-1 and 4-2 are cameras installed above and below the monitor 5
  • 401, 403, and 406 are participants of the video conference
  • 407 is a conference table
  • 501 is a virtual viewpoint where the line of sight is eliminated
  • 502 is a camera 4-1
  • 503 is a line segment connecting the center of the lens of the camera 4-1 and the eyes of the participant 401
  • 504 is the center of the lens of the camera 4-2 and the participant.
  • a line segment connecting the eyes 401, 505 is an angle formed by the line segment 502 and the line segment 503, 506 is formed by a line segment connecting the center of the lens of the camera 4-1 and the center of the lens of the camera 4-2, and the line segment 504.
  • An angle, 507 is a line segment connecting the virtual viewpoint 501 and the eyes of the participant 401, and 508 is an angle formed by the line segment 505 and the line segment 507, which is an angle indicating a line-of-sight shift.
  • Whether the line-of-sight shift is within the allowable range can be determined by calculating the angle 508. There are several methods for calculating this angle, and any method can be used to obtain the effect of this embodiment. Therefore, all the methods will not be described, but here two cameras are used as an example. A method to be used will be described. Subsequent processing for obtaining the angle 508 is performed by the angle calculation unit 303.
  • the coordinate system of FIG. 6 is a two-dimensional plane perpendicular to the ground and the screen of the monitor 5, and the origin is the center on the screen of the monitor 5.
  • the length of the line segment 502 can be obtained if the distance between the virtual viewpoint 501 and the center of the camera 4-1 lens is known.
  • the position of the virtual viewpoint 501 can be rephrased as the position that the participant should see in order to match the line of sight.
  • the position to be viewed may be, for example, a position where the eyes of the participant on the other side of the video conference are displayed. In a situation where one of the participants on the other side is speaking, the position where the participant's eyes are to be viewed may be set.
  • the position to be viewed may not be one place.
  • the position to be viewed here may be simplified and may be the center on the screen of the monitor 5, but the display position of the participant's eyes is calculated by the virtual viewpoint calculation unit 302 and the position to be viewed can be set by this. good.
  • a value measured in advance may be used.
  • the value is stored in the storage unit 106.
  • the position of the camera 4-1 is variable, the value may be changed according to the position.
  • the length of the line segment 502 is determined from the distance between the virtual viewpoint 501 and the center of the lens of the camera 4-1.
  • the length of the line segment 503 can be obtained from the distance between the center of the lens of the camera 4-1 and the center of the lens of the camera 4-2, and the angle 505 and the angle 506 in the manner of triangulation. If the distance between the lens centers of the camera 4-1 and the camera 4-2 does not change during use of the video processing device, a value measured in advance may be stored in the storage unit of the video processing device. When the positions of the cameras 4-1 and 4-2 are variable, the values may be changed according to the positions.
  • the angle 505 can be obtained from the depression angle and angle of view of the camera 4-1, and the position of the participant's eyes on the video frame.
  • the coordinate system of the eye position is a two-dimensional plane on the video frame.
  • the depression angle of the camera is 0 when the camera orientation is horizontal and ⁇ 90 degrees when the camera orientation is directly below.
  • the values measured in advance may be stored in the storage unit of the video processing device 1.
  • the position of the participant's eyes on the video frame can be obtained by the face detection unit 301.
  • the value of the angle 505 is as follows. If the participant's eyes are at the center of the video frame, the angle 505 is 60 degrees. If the participant's eyes are at the top of the video frame, the angle 505 is 83 degrees.
  • the angle 506 can be obtained from the elevation angle and angle of view of the camera 4-2 and the position of the participant's face, as with the angle 505, the description is omitted. Thus, the length of the line segment 503 is obtained.
  • the angle 508 is obtained from the triangle formed by the line segment 502, the line segment 503, and the angle 505 obtained so far.
  • Another method for obtaining the angle 508 is to use a TOF camera for distance measurement in addition to the camera for shooting.
  • the installation position and the depression angle of the TOF camera are set to be as identical as possible with the camera 4-1.
  • the method for obtaining the line segment 502 and the method for obtaining the angle 505 are the same as those already described.
  • the line segment 503 is obtained by measuring the distance to the participant 401 with a TOF camera. Since the distance measurement method using the TOF camera is a well-known technique, the description thereof is omitted.
  • the angle 508 is obtained from the triangle formed by the line segment 502, the line segment 503, and the angle 505 obtained so far.
  • FIG. 7 is a flowchart of the subroutine processing executed in S904 of FIG. In the subroutine, processing for calculating the degree of necessity for face correction is executed.
  • a face shift angle is first calculated (S1001).
  • the deviation angle is determined by the angle detection unit 303 calculating the angle 508 of FIG.
  • the designated angle is stored in the storage unit 106 and is information set in advance in the video processing apparatus 1 for the determination process in S1002.
  • the value of the designated angle is, for example, the maximum angle in the allowable range of gaze deviation already described.
  • the designated angle is 9 degrees.
  • a correction necessity degree is calculated by quantifying whether or not to correct (S1003).
  • the degree of necessity for correction is a value calculated by the degree of necessity correction calculator 306, and the calculated value is written in the degree of necessity for correction 703 of the determination list 700. It shows that it should correct
  • the calculation method of the degree of necessity of correction is 0 when the deviation angle is less than the threshold value, and the deviation angle value when the deviation angle is greater than or equal to the threshold value.
  • an estimated processing amount which is a value obtained by estimating the processing amount when performing face correction processing, is calculated (S1004).
  • the estimated processing amount is a value calculated by the correction processing amount calculation unit 305, and the calculated value is written in the estimated processing amount 704 of the determination list 700.
  • the estimated processing amount is calculated by multiplying the face area calculated by the face area calculation unit 304 by a constant.
  • FIG. 8 is a diagram illustrating a specific example of a video frame shot by the camera 4 connected to the video processing device.
  • 400 is a video frame
  • 401 to 406 are participants of a video conference
  • 407 is a conference table. Of the participants, only the participant 402 is speaking.
  • the camera 4 is installed at a position to capture the entire meeting scene, and the larger the participant in the front of the video, the smaller the participant in the back.
  • face correction that does not require correction is performed without selecting correction, and selecting and correcting faces in descending order of necessity of correction, thereby reducing processing load and eliminating frame dropping. Can be processed.
  • the necessity of correction is determined based on the shift angle, but in the second embodiment, the necessity of correction is determined based on the face area without using the shift angle.
  • the problem that the line of sight does not match is also related to the apparent size of the participant's face shown on the monitor 5. If the apparent size of the participant's face photographed by the camera 4 connected to the video processing device displayed on the monitor 5 connected to the video processing device at another base is small, whether or not the line of sight is present Since it becomes difficult to understand, the problem that the line of sight does not match does not occur.
  • FIG. 13 is a determination list in the second embodiment. 1300 is a judgment list, and 1301 is an area ratio. The other items are the same as those in FIG.
  • the area ratio 1301 is a ratio of the face area to the entire screen, and is calculated by the face area calculation unit 304.
  • the correction necessity degree 703 of the determination list 1300 is calculated based on the area ratio 1301 by the correction necessity degree calculation unit 306.
  • the necessity of correction is set to zero. If the area ratio is greater than or equal to the area ratio, the area ratio value is used. Note that, instead of the face area ratio, it may be determined whether or not the correction necessity is set to 0 according to the face area.
  • FIG. 14 is a flowchart of the subroutine processing in the second embodiment, which is executed from S904 in FIG. In the subroutine, the degree of necessity for face correction is calculated.
  • the face area ratio is first calculated (S1401). Next, it is determined whether or not the calculated area ratio is greater than or equal to the specified threshold area ratio (S1402).
  • the area ratio of the threshold is stored in the storage unit 106 and is information set in advance in the video processing device 1 for the determination process in S1402.
  • the threshold value is 1.0%.
  • the degree of necessity for correction is calculated by quantifying whether to correct (S1403).
  • the correction necessity is calculated using the area ratio. The calculated value is written in the determination list 1400.
  • the face is not corrected for the face with low necessity for face correction, and the face is selected and corrected in the order of the necessity of face correction. It is possible to perform line-of-sight processing that suppresses the load and does not cause frame dropping.
  • a method of determining whether or not face correction is necessary a method of correcting only the face of the speaker who is speaking detected by the face detection unit can be used. Thereby, only the speaker's face needs to be corrected, and the line-of-sight processing can be further reduced.
  • the degree of necessity of correction is calculated using the deviation angle.
  • the degree of correction is close or the same. Therefore, in the third embodiment, the area ratio and the like in addition to the deviation angle are used for calculating the degree of correction. As a result, even when there are a plurality of faces having the same shift angle, a difference occurs in the necessity of correction, and it is possible to select a face that needs more correction with high accuracy.
  • FIG. 9 is a determination list in the case where the correction necessity is calculated by combining a plurality of conditions.
  • Reference numeral 800 denotes a determination list
  • reference numeral 801 denotes a speaker flag indicating whether the speaker is a speaker
  • reference numeral 802 denotes an area ratio.
  • the other items are the same as those in FIG. In FIG. 4, the correction necessity degree is calculated only from the deviation angle, but here, the necessity degree of correction is calculated from the speaker flag 801, the area ratio 802, and the deviation angle 702.
  • FIG. 10 is obtained by adding some processes to the flowchart of FIG.
  • the processing to be added is S1101 and S1102.
  • the area ratio is calculated (S1101), and the calculated value is written in the area ratio 802 of the determination list 800. It is determined whether the calculated area ratio is greater than or equal to the specified area ratio (S1102). If the determination is true (Y in S1102), a shift angle detection process (S1001) is performed. If the determination in S1102 is false (N in S1102), the correction necessity level is set to 0 (S1005).
  • the other processes in FIG. 10 are the same as those described in FIG.
  • the degree of necessity for correction is calculated based on three factors: whether the face to be corrected is a speaker, the area ratio of the face to be corrected, and the angle of gaze shift of the correction target. Or you may calculate based on four or more elements. In addition, the degree of necessity for correction may be calculated using elements other than whether the face to be corrected is a speaker, the area ratio of the face to be corrected, or the angle of gaze deviation of the correction target.
  • the face that needs to be corrected can be selected with higher accuracy.
  • the processes of S906 to S908 are further performed to correct the face.
  • the processes of S906 to S908 are omitted and the degree of correction requirement is other than 0. You may comprise so that a correction process may be performed with respect to a face.
  • the video processing apparatus has been described as a video conference apparatus.
  • the video processing apparatus may have a system configuration in which a video conference is performed in combination with another video conference apparatus.
  • FIG. 11 shows a configuration in which a video processing device and another video conference device are combined.
  • Reference numerals 1111 and 1112 denote video processing apparatuses
  • 1201 and 1202 denote video conference apparatuses.
  • a video processing device 1111 is connected to the camera 4 and the video conference device 1201 and can input and output video.
  • the video processing apparatus 1112 has the same apparatus and configuration as the video processing apparatus 1111.
  • the video processing device 1111 inputs a video signal captured by the camera 4.
  • the video processing device 1111 performs line-of-sight processing on the input video signal and outputs the processed video signal to the video conference device 1201. Since the video processing device 1111 only needs to perform line-of-sight processing, a connection function to the network 3 related to a video conference and a function of encoding and decoding a video signal are not necessary.
  • the video conference device can be configured to connect the video processing device to the input terminal to which the camera is normally connected. Therefore, even the video conference device without a line-of-sight function is connected. Thus, the effect of line-of-sight alignment can be obtained.
  • FIG. 12 shows a configuration in which the video processing apparatus is connected to another video conference apparatus via the network 3.
  • 1200 is a video processing apparatus.
  • the video processing apparatus 1200 in this embodiment receives a video / audio stream from the video conference apparatus 1201 via the network 3, converts the video / audio stream into video data and audio data, and decodes the video data. Then, the video signal is converted into a video signal, and the line-of-sight processing is performed on the video signal. Next, the video signal subjected to the line-of-sight processing is encoded into video data, and the video data and audio data are multiplexed and packetized, converted into a video stream, and output to the video conference device 1202.
  • the video processing apparatus receives the video / audio stream from the video conference apparatus 1202 via the network 3, performs similar processing including line-of-sight processing, and outputs the video / audio stream to the video conference apparatus 1201. To do.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

As the processing amount of video shot by a camera increases, distortion of the video results, making it inconvenient for a user to handle the video. The present invention provides a video processor having a video input unit to which v ideo information shot by a camera is inputted, a video processing unit for correcting video of a human face included in the video information inputted to the video input unit, and an output unit for outputting video information proce ssed by the video processing unit, wherein the video processor is chara cterized in that the video processing unit determines according to a presc ribed condition whether the video of the human face included in the video information needs to be corrected.

Description

映像処理装置および映像処理方法Video processing apparatus and video processing method
 技術分野は、映像処理装置および映像処理方法に関する。 The technical field relates to a video processing apparatus and a video processing method.
 特許文献1には、「従来のシステムでは、実際にディスプレイを視認するユーザ間において視線を一致させた状態で対話を実現することができるが、ハーフミラーやホログラムスクリーン、更にはプロジェクタ等のような特殊装置を使用する必要があり、簡易で安価なシステムを構成することができない」(特許文献1[0011]参照)こと等を課題とし、その解決手段として「撮影対象を少なくとも2台のカメラにより互いに異なる角度から撮像し、撮像した各画像から上記撮影対象を含む前景画像領域とその背景画像とをそれぞれ分離し、分離した各前景画像領域間において上記撮影対象と関連させつつ画素位置毎に対応付けを行い、各カメラに対する撮影対象の相対的な位置関係を示す相対位置情報を生成し、互いに対応付けされた画素位置並びにその輝度成分から、上記生成した相対位置情報に応じて、新たに生成すべき仮想視点画像を構成する画素位置並びにその輝度成分を求め、求めた画素位置並びにその輝度成分により構成される仮想視点画像を外部へ送信する」(特許文献1[0015]参照)こと等が記載されている。 Patent Document 1 states that “in a conventional system, it is possible to realize a dialogue between users who actually view the display with their line of sight matched, such as a half mirror, a hologram screen, or a projector. It is necessary to use a special device and a simple and inexpensive system cannot be configured "(see Patent Document 1 [0011]). Images are taken from different angles, the foreground image area including the subject to be photographed and the background image are separated from each photographed image, and each pixel position is associated with the subject to be photographed between the separated foreground image regions. To generate relative position information that indicates the relative positional relationship of the shooting target with respect to each camera and associates them with each other. From the pixel position and its luminance component, the pixel position and its luminance component constituting the virtual viewpoint image to be newly generated are obtained according to the generated relative position information, and the pixel position and its luminance component are obtained. “Transmission of virtual viewpoint image to outside” (see Patent Document 1 [0015]) and the like.
特開2005-065051号公報Japanese Patent Laid-Open No. 2005-065051
 しかし、カメラで撮影した映像の処理量が増加すると、撮影した画像に対する処理が間に合わなくなり、映像が乱れ、ユーザにとって使い勝手が悪い。 However, if the processing amount of the video shot with the camera increases, the processing for the shot image will not be in time, the video will be disturbed, and it will be unusable for the user.
 上記課題を解決するために、例えば特許請求の範囲に記載の構成を採用する。
  本願は上記課題を解決する手段を複数含んでいるが、その一例を挙げるならば、カメラにより撮影された映像情報が入力される映像入力部と、映像入力部に入力された映像情報に含まれる顔の映像を補正する映像処理部と、映像処理部で処理した映像情報を出力する出力部と、を有する映像処理装置において、映像処理部は、所定の条件に応じて映像情報に含まれる顔の映像の補正の要否を決定することを特徴とする。
In order to solve the above problems, for example, the configuration described in the claims is adopted.
The present application includes a plurality of means for solving the above-mentioned problems. For example, the present application includes a video input unit to which video information captured by a camera is input, and a video information input to the video input unit. In a video processing apparatus having a video processing unit that corrects a video of a face and an output unit that outputs video information processed by the video processing unit, the video processing unit includes a face included in the video information according to a predetermined condition It is characterized in that it is determined whether or not to correct the video.
 上記手段によれば、映像の乱れを抑制することが可能となる。 According to the above means, it is possible to suppress image disturbance.
テレビ会議通信システムの一実施形態を示すシステム構成図である。It is a system configuration figure showing one embodiment of a video conference communication system. 図1における映像処理装置の一具体例を示すブロック構成図である。It is a block block diagram which shows one specific example of the video processing apparatus in FIG. 図2での記憶部から読み出されてメモリに展開されたプログラムの一具体例を示す図である。It is a figure which shows one specific example of the program read from the memory | storage part in FIG. 2, and expand | deployed to memory. 顔の補正処理の要否を判定する際に生成される判定リストの一具体例を示す図である。It is a figure which shows one specific example of the determination list | wrist produced | generated when determining the necessity of the face correction process. 視線あわせの一連のメインの処理の一具体例を示すフローチャートである。It is a flowchart which shows one specific example of a series of main processes of a line-of-sight alignment. 2台のカメラとモニターとテレビ会議参加者の位置関係の一具体例を示す図である。It is a figure which shows one specific example of the positional relationship of two cameras, a monitor, and a video conference participant. 視線あわせのサブルーチン処理の一具体例を示すフローチャートである。It is a flowchart which shows a specific example of the subroutine process of a line-of-sight alignment. テレビ会議通信システムで撮影した映像の一具体例を示す図である。It is a figure which shows an example of the image | video image | photographed with the video conference communication system. 顔の補正処理の要否を複数の条件で判定する際に生成される判定リストの一具体例を示す図である。It is a figure which shows one specific example of the determination list | wrist produced | generated when determining the necessity of the face correction process on several conditions. 複数条件での判定を行う視線あわせのサブルーチン処理の一具体例を示すフローチャートである。It is a flowchart which shows a specific example of the subroutine process of a line-of-sight alignment which performs determination on multiple conditions. 映像処理装置と他のテレビ会議装置を組み合わせた構成例を示す図である。It is a figure which shows the structural example which combined the video processing apparatus and the other video conference apparatus. 映像処理装置がネットワークを介して他のテレビ会議装置と接続する構成例を示す図である。It is a figure which shows the structural example which a video processing apparatus connects with another video conference apparatus via a network. 顔の面積比を用いた顔の補正処理の要否を判定する際に生成される判定リストの一具体例を示す図である。It is a figure which shows one specific example of the determination list | wrist produced | generated when determining the necessity of the face correction process using the area ratio of a face. 顔の面積比を用いた視線あわせのサブルーチン処理の一具体例を示すフローチャートである。It is a flowchart which shows a specific example of the subroutine process of a line-of-sight alignment using the area ratio of a face.
 まず、実施例の概要について説明する。テレビ電話やテレビ会議においては、利用者がモニター上に表示された通話相手の目を見ていたとしても、利用者の視線の位置とカメラの設置位置が離れていることから、利用者と通話相手の間で相互に視線が合わないという問題がある。 First, the outline of the embodiment will be described. In videophone and videoconferencing, even if the user is looking at the other party's eyes displayed on the monitor, the user's line of sight is far away from the camera's installation position. There is a problem that the lines of sight do not match each other.
 カメラで撮影した個々の顔を補正するような視線あわせを行うテレビ会議システムが検討されているが、複数人が撮影範囲内にいると、人数分だけ処理量が増加する。人数が多い場合、処理量が増加したぶん処理時間が長くなるため、すべての顔について一連の処理を行うと、例えば通常1秒間に29.97フレーム分処理されるところが、それ未満のフレーム数となる。そうすると、映像としては滑らかさが損なわれ、いわゆるコマ落ちした状態となる。 テ レ ビ Video conferencing systems that adjust the line of sight to correct individual faces taken with a camera are being studied, but when multiple people are within the shooting range, the amount of processing increases by the number of people. When the number of people is large, the processing time increases, and the processing time becomes long. Therefore, when a series of processing is performed for all faces, for example, 29.97 frames are normally processed per second, but the number of frames is less than that. Become. If it does so, smoothness will be impaired as a picture, and it will be in the state where a so-called frame was dropped.
 これに対して、以下で説明する実施例によれば、撮影範囲内に多数の被写体が存在する場合でも、1秒あたりに処理する映像フレーム数を減少させること無く、補正処理を行うことができる。以下、図面を用いて実施例を説明する。 On the other hand, according to the embodiment described below, correction processing can be performed without reducing the number of video frames processed per second even when there are a large number of subjects in the shooting range. . Embodiments will be described below with reference to the drawings.
 以下の実施例では、映像処理装置としてテレビ会議端末を例に説明を行う。 In the following embodiment, a video conference terminal will be described as an example of a video processing device.
 図1は、テレビ会議通信システムの一実施形態を示すシステム構成図である。図1において、1と2は映像処理装置、3はネットワーク、4はカメラ、5はモニター、6はマイク、7はスピーカー、8はリモコンである。同図において、映像処理装置1は、カメラ4、モニター5、マイク6、スピーカー7と接続され、映像と音声の入出力を行うことができる。なお、映像処理装置2は映像処理装置1と同様の装置および構成である。 FIG. 1 is a system configuration diagram showing an embodiment of a video conference communication system. In FIG. 1, 1 and 2 are video processing devices, 3 is a network, 4 is a camera, 5 is a monitor, 6 is a microphone, 7 is a speaker, and 8 is a remote control. In the figure, a video processing apparatus 1 is connected to a camera 4, a monitor 5, a microphone 6, and a speaker 7, and can input and output video and audio. The video processing device 2 has the same device and configuration as the video processing device 1.
 映像処理装置1は、ネットワーク3を介して映像処理装置2とテレビ会議を行うことができる。映像処理装置の操作はリモコン8で行うことができる。 The video processing apparatus 1 can hold a video conference with the video processing apparatus 2 via the network 3. The video processing apparatus can be operated with the remote controller 8.
 カメラ4は、映像処理装置と接続され、撮影した映像信号を映像処理装置へ出力する。 The camera 4 is connected to the video processing device, and outputs the captured video signal to the video processing device.
 モニター5は、映像処理装置と接続され、映像処理装置から出力された映像信号を入力し、表示する。 The monitor 5 is connected to the video processing device, and receives and displays the video signal output from the video processing device.
 マイク6は、映像処理装置と接続され、集音した音声を音声信号として映像処理装置へ出力する。 The microphone 6 is connected to the video processing device and outputs the collected sound as an audio signal to the video processing device.
 スピーカー7は、映像処理装置と接続され、映像処理装置から出力された音声信号を入力し、音声を出力する。 Speaker 7 is connected to a video processing device, inputs an audio signal output from the video processing device, and outputs audio.
 リモコン8は、映像処理装置にリモコン信号の送信を行い、利用者から映像処理装置への操作指示を伝える。 The remote controller 8 transmits a remote control signal to the video processing device and transmits an operation instruction from the user to the video processing device.
 図2は、図1に示すテレビ会議通信システムでの映像処理装置の内部構成の具体例を示すブロック構成図である。図2において、101は制御部、102はメモリ、103-1は映像エンコーダ、103-2は音声エンコーダ、104-1は映像デコーダ、104-2は音声デコーダ、118は多重化部、119は分離部、105はストリーム処理部、106は記憶部、107-1はモニター5への映像出力のための映像処理部、107-2はカメラ4からの入力映像のための映像処理部、108-1はスピーカー7への音声出力のための音声処理部、108-2はマイク6からの音声入力のための音声処理部、109はリモコン処理部、110はネットワーク接続部、112は映像入力端子、113は映像出力端子、114は音声入力端子、115は音声出力端子、116はリモコン入力端子、117はネットワーク接続端子である。 FIG. 2 is a block diagram showing a specific example of the internal configuration of the video processing apparatus in the video conference communication system shown in FIG. In FIG. 2, 101 is a control unit, 102 is a memory, 103-1 is a video encoder, 103-2 is an audio encoder, 104-1 is a video decoder, 104-2 is an audio decoder, 118 is a multiplexing unit, and 119 is separated. 105, a stream processing unit, 106 a storage unit, 107-1 a video processing unit for video output to the monitor 5, 107-2 a video processing unit for video input from the camera 4, and 108-1 Is a sound processing unit for outputting sound to the speaker 7, 108-2 is a sound processing unit for inputting sound from the microphone 6, 109 is a remote control processing unit, 110 is a network connection unit, 112 is a video input terminal, 113 Is a video output terminal, 114 is an audio input terminal, 115 is an audio output terminal, 116 is a remote control input terminal, and 117 is a network connection terminal.
 制御部101は、後述するように記憶部106に格納されたプログラムをメモリ102に展開し、展開したプログラムを実行することで各種プログラムに応じた機能を実現するものである。また、リモコン処理部109から入力した操作情報に応じてプログラムを制御するものである。 As described later, the control unit 101 develops a program stored in the storage unit 106 in the memory 102 and executes the developed program to realize functions corresponding to various programs. Further, the program is controlled in accordance with operation information input from the remote control processing unit 109.
 エンコーダ103-1は、映像処理部107-2からの映像信号を、エンコーダ103-2は音声処理部108-2からの音声信号を入力し、入力した信号情報を圧縮符号化し、それぞれ映像データおよび音声データとして、後述する多重化部118またはストリーム処理部105に出力するものである。 The encoder 103-1 receives the video signal from the video processing unit 107-2, the encoder 103-2 receives the audio signal from the audio processing unit 108-2, and compresses and encodes the input signal information. The audio data is output to a multiplexing unit 118 or a stream processing unit 105 described later.
 デコーダ104-1およびデコーダ104-2は、後述する分離部119またはストリーム処理部105から出力された圧縮符号化されている映像データおよび音声データをそれぞれ入力し、圧縮されている状態から映像信号および音声信号にそれぞれ伸張展開するものである。 The decoder 104-1 and the decoder 104-2 receive the compression-coded video data and audio data output from the separation unit 119 or the stream processing unit 105, which will be described later, respectively. Each expands and expands to an audio signal.
 多重化部118は、エンコーダ103から入力した、MPEG-2システムでElementary Stream(ES)と規定される、圧縮符号化された映像データおよび圧縮符号化された音声データを多重化して、Transport Stream(TS)と呼ばれる、パケット化された映像音声データを出力する。 The multiplexing unit 118 multiplexes the compression-encoded video data and the compression-encoded audio data, which are defined as Elementary Stream (ES) in the MPEG-2 system, input from the encoder 103, and transmits Transport Stream ( The packetized video / audio data called TS) is output.
 分離部119は、ストリーム処理部105から出力された映像音声データを分離して、映像データおよび音声データとするためのものである。 The separation unit 119 separates the video / audio data output from the stream processing unit 105 into video data and audio data.
 ストリーム処理部105は、入力された映像音声データから、他の映像処理装置2に送信するネットワークプロトコルのパケットであるネットワークパケットを生成し、連続したネットワークパケットである映像音声ストリームを出力する。また、他の映像処理装置から受信した映像音声ストリームを、分離部119で処理するための形式であるTSの映像音声データに変更したりするためのものである。 The stream processing unit 105 generates a network packet that is a packet of a network protocol to be transmitted to another video processing apparatus 2 from the input video and audio data, and outputs a video and audio stream that is a continuous network packet. Also, the video / audio stream received from another video processing apparatus is changed to video / audio data of TS which is a format for processing by the separation unit 119.
 映像音声ストリームは、映像音声データに、ストリーム処理部105が生成した時間情報や映像および音声のフォーマット情報などのヘッダデータを付加したものである。 The video / audio stream is obtained by adding header data such as time information generated by the stream processing unit 105 and video / audio format information to the video / audio data.
 記憶部106は、制御部101が実行するためのプログラムを格納するためのものである。 The storage unit 106 is for storing a program to be executed by the control unit 101.
 映像処理部107-1と107-2は、それぞれ映像入力端子112と映像出力端子113の制御を行い、映像入力端子112から入力した映像信号をエンコーダ103-1に出力したり、デコーダ104-1から入力した映像信号を映像出力端子113へ出力したりする。映像処理部107-2は、映像出力端子112から複数の映像信号を同時に処理することもできる。 The video processing units 107-1 and 107-2 control the video input terminal 112 and the video output terminal 113, respectively, and output the video signal input from the video input terminal 112 to the encoder 103-1, or the decoder 104-1. The video signal input from is output to the video output terminal 113. The video processing unit 107-2 can simultaneously process a plurality of video signals from the video output terminal 112.
 音声処理部108-1と108-2は、それぞれ音声入力端子114と音声出力端子115の制御を行い、音声入力端子114から入力した音声信号をエンコーダ103-2に出力したり、デコーダ104-2から入力した音声信号を音声出力端子115へ出力したりする。 The audio processing units 108-1 and 108-2 control the audio input terminal 114 and the audio output terminal 115, respectively, and output the audio signal input from the audio input terminal 114 to the encoder 103-2 or the decoder 104-2. The audio signal input from is output to the audio output terminal 115.
 リモコン処理部109は、リモコン入力端子116から入力したリモコン信号を、制御部101に操作情報として出力するためのものである。 The remote control processing unit 109 is for outputting a remote control signal input from the remote control input terminal 116 to the control unit 101 as operation information.
 ネットワーク接続部110は、ネットワーク接続端子117、ネットワーク3を介して接続された他のテレビ会議通信装置とテレビ会議を行うために必要な映像音声ストリームや、接続情報の送受信を行う。 The network connection unit 110 transmits / receives a video / audio stream and connection information necessary for performing a video conference with the network connection terminal 117 and other video conference communication apparatuses connected via the network 3.
 映像入力端子112は、カメラ4と接続し、カメラ4から入力した映像信号を映像処理部107-2へ出力する。映像入出力端子112は、複数のカメラを接続し、複数の映像信号を同時に映像処理部107-2へ出力することもできる。 The video input terminal 112 is connected to the camera 4 and outputs the video signal input from the camera 4 to the video processing unit 107-2. The video input / output terminal 112 can also connect a plurality of cameras and simultaneously output a plurality of video signals to the video processing unit 107-2.
 映像出力端子113は、モニター5と接続し、映像処理部107-1から入力した映像信号をモニター5へ出力する。 The video output terminal 113 is connected to the monitor 5 and outputs the video signal input from the video processing unit 107-1 to the monitor 5.
 音声入力端子114は、マイク6と接続し、マイク6から入力した音声信号を音声処理部108-2へ出力する。 The audio input terminal 114 is connected to the microphone 6 and outputs an audio signal input from the microphone 6 to the audio processing unit 108-2.
 音声出力端子115は、スピーカー7と接続し、音声処理部108-1から入力した音声信号をスピーカー7へ出力する。 The audio output terminal 115 is connected to the speaker 7 and outputs the audio signal input from the audio processing unit 108-1 to the speaker 7.
 図3は、図2における映像処理装置の記憶部106から読み出されてメモリ102に展開されたプログラムの具体例を示す図であって、301は顔検出部、302は仮想視点算出部、303は角度算出部、304は顔面積算出部、305は補正処理量算出部、306は補正必要度算出部、307は補正処理部、309は処理負荷管理部である。 FIG. 3 is a diagram showing a specific example of a program read from the storage unit 106 of the video processing apparatus in FIG. 2 and developed in the memory 102, where 301 is a face detection unit, 302 is a virtual viewpoint calculation unit, and 303. Is an angle calculation unit, 304 is a face area calculation unit, 305 is a correction processing amount calculation unit, 306 is a correction necessity calculation unit, 307 is a correction processing unit, and 309 is a processing load management unit.
 顔検出部301は、映像処理部107-2などの各部を制御し、カメラ4で撮影された映像信号の中から、人物の顔および顔の器官を検出し顔の特徴点の座標を抽出したり、発声している人物の顔を特定したりするためのプログラムである。 The face detection unit 301 controls each unit such as the video processing unit 107-2, detects a human face and facial organs from the video signal captured by the camera 4, and extracts the coordinates of facial feature points. Or a program for specifying the face of a person who is speaking.
 仮想視点算出部302は、カメラ4の映像信号を補正して、カメラ4の視点とは異なる視点から撮影したような映像を生成するための、その仮想の視点の座標を算出するためのプログラムである。 The virtual viewpoint calculation unit 302 is a program for correcting the video signal of the camera 4 and calculating the coordinates of the virtual viewpoint for generating a video shot from a viewpoint different from the viewpoint of the camera 4. is there.
 角度算出部303は、顔検出部301により得られた人物の顔の座標と、仮想視点算出部302により得られた仮想視点の座標から、顔とカメラ4を結ぶ線分と顔と仮想視点を結ぶ線分のなす角度を算出するためのプログラムである。 The angle calculation unit 303 calculates a line segment connecting the face and the camera 4, the face, and the virtual viewpoint from the coordinates of the person's face obtained by the face detection unit 301 and the coordinates of the virtual viewpoint obtained by the virtual viewpoint calculation unit 302. This is a program for calculating an angle formed by connecting line segments.
 顔面積算出部304は、顔検出部301により得られた人物の顔の特徴点の座標から、撮影映像中に占める顔の面積比を算出するためのプログラムである。 The face area calculation unit 304 is a program for calculating the area ratio of the face in the captured video from the coordinates of the feature points of the person's face obtained by the face detection unit 301.
 補正処理量算出部305は、顔検出部301により得られた人物の顔を、仮想視点から撮影したような顔映像に補正するために必要な演算処理量を推測して算出するためのプログラムである。 The correction processing amount calculation unit 305 is a program for estimating and calculating a calculation processing amount necessary for correcting the face of the person obtained by the face detection unit 301 into a face image captured from a virtual viewpoint. is there.
 補正必要度算出部306は、顔検出部301や顔面角度算出部303や顔面積算出部304により得られた情報から、視線をあわせる効果を得るために顔の補正を行う必要度を算出するためのプログラムである。 The correction necessity degree calculation unit 306 calculates the necessity degree of face correction in order to obtain the effect of matching the line of sight from the information obtained by the face detection unit 301, the face angle calculation unit 303, and the face area calculation unit 304. It is a program.
 補正処理部307は、顔検出部301や仮想視点算出部302や角度算出部303などにより得られた情報から、カメラ4で撮影した映像信号中の人物の顔を補正し、仮想視点から撮影したような映像にするためのプログラムである。 The correction processing unit 307 corrects the face of a person in the video signal captured by the camera 4 from information obtained by the face detection unit 301, the virtual viewpoint calculation unit 302, the angle calculation unit 303, and the like, and captures the image from the virtual viewpoint. It is a program to make such a picture.
 処理負荷管理部309は、映像処理装置1の処理負荷の状況をリアルタイムに管理し、補正処理部307などで処理を行う際の処理量を算出するためのプログラムである。 The processing load management unit 309 is a program for managing the processing load status of the video processing device 1 in real time and calculating the processing amount when the correction processing unit 307 performs processing.
 以下、図4、図5、図6および図7を用いて映像処理装置1の視線あわせの処理方法について説明する。 Hereinafter, the line-of-sight processing method of the video processing apparatus 1 will be described with reference to FIGS. 4, 5, 6, and 7.
 テレビ電話やテレビ会議において視線が合わない原因は、カメラの位置とその被写体である参加者の視線の位置がずれているために生じる。カメラの設置位置と視線のずれがどの程度であれば許容できるかと言うことについては、佐藤、三浦および永田により発表された“映像電話における撮像管の位置に関する検討、昭42連大、No.1998(1967)”に記述があり、許容可能な範囲の限界の角度が示されている。 The reason why the line of sight does not match in a videophone or a video conference occurs because the position of the camera is different from the position of the participant's line of sight. As to what is acceptable in the camera installation position and the line of sight, “Study on the position of the image pickup tube in the video telephone, Sho 42 Rendai, No. 1998” announced by Sato, Miura and Nagata. (1967) "describes the allowable range of limit angles.
 このため、視線のずれが許容範囲内かどうかを判定し、許容範囲内であれば顔の補正を行わないようにすることで、視線のずれの違和感を生じさせずに処理負荷を低減させることができる。 For this reason, it is determined whether the line-of-sight shift is within an allowable range, and if it is within the allowable range, face correction is not performed, thereby reducing the processing load without causing a sense of incongruity of the line-of-sight shift. Can do.
 図4は、映像処理装置1が視線あわせの処理を行う際に、視線のずれが許容範囲内かどうかを判定するための判定リストの図である。判定リストは制御部101がメモリ102上に生成する。700は判定リスト、701は顔検出部301で検出した顔を識別するためのID、702は視線のずれを角度で表したずれ角、703は検出した顔を補正する必要性を数値化した補正必要度、704は顔の補正処理を実行する際に必要となる演算処理量の推測値である推定処理量である。 FIG. 4 is a diagram of a determination list for determining whether or not the line-of-sight shift is within an allowable range when the video processing device 1 performs the line-of-sight process. The determination list is generated on the memory 102 by the control unit 101. 700 is a determination list, 701 is an ID for identifying the face detected by the face detection unit 301, 702 is a shift angle that expresses the shift of the line of sight as an angle, and 703 is a correction that quantifies the necessity of correcting the detected face. Necessity 704 is an estimated processing amount that is an estimated value of the amount of calculation processing required when executing the face correction processing.
 ID701は、顔検出部301が顔を検出した際に割り当てた番号である。ずれ角702は、角度算出部303が算出した値である。補正必要度703は、補正必要度算出部306が算出した値である。推定処理量704は、補正処理量算出部305が算出した値である。 ID 701 is a number assigned when the face detection unit 301 detects a face. The deviation angle 702 is a value calculated by the angle calculation unit 303. The correction necessity 703 is a value calculated by the correction necessity calculation unit 306. The estimated processing amount 704 is a value calculated by the correction processing amount calculation unit 305.
 図5は、視線あわせの一連処理のメイン処理の一具体例を示すフローチャートである。この処理は、映像処理装置1が映像信号を入力するごとに制御部101が実行する。以降の各プログラムは、入力した映像信号を1枚の静止画である映像フレーム単位で扱う。 FIG. 5 is a flowchart showing a specific example of the main process of the series of line-of-sight processing. This process is executed by the control unit 101 every time the video processing apparatus 1 inputs a video signal. The subsequent programs handle the input video signal in units of video frames, which are one still image.
 まず、顔検出部301が映像フレームに含まれる顔の検出を行う(S901)。顔検出部301は、検出した顔にIDとなる数値を割り当て、判定リスト700にIDを書き込む。 First, the face detection unit 301 detects a face included in the video frame (S901). The face detection unit 301 assigns a numerical value as an ID to the detected face and writes the ID in the determination list 700.
 次に検出した顔の数だけS902からS905までのループ処理を行う。S902とS905は処理のループ端を示す。ループ処理では、検出された顔のうち、まだ未処理の顔をひとつ選択する(S903)。次に、その顔の補正の必要度を判定するためのサブルーチンを実行する(S904)。 Next, the loop processing from S902 to S905 is performed for the number of detected faces. S902 and S905 indicate processing loop ends. In the loop process, one face that has not yet been processed is selected from the detected faces (S903). Next, a subroutine for determining the necessity of correcting the face is executed (S904).
 サブルーチンの処理は、図7を用いて後述する。判定リスト700のずれ角702、補正必要度703および推定処理量704は、サブルーチンの処理で算出する。 Subroutine processing will be described later with reference to FIG. The deviation angle 702, the correction necessity 703, and the estimated processing amount 704 of the determination list 700 are calculated by a subroutine process.
 S902からS905のループ処理によって、すべての顔についての処理が終わると、判定リスト700の推定処理量704を合算する(S906)。合算した推定処理量が、現在の制御部101でリアルタイムに処理可能な限界処理量を超えていないか判定する(S907)。 When the processing for all faces is completed by the loop processing from S902 to S905, the estimated processing amount 704 of the determination list 700 is added together (S906). It is determined whether the combined estimated processing amount does not exceed the limit processing amount that can be processed in real time by the current control unit 101 (S907).
 限界処理量は、図5のフローチャートを実行するごとに処理負荷管理部309が算出する値である。この値は、例えば1秒当たり29.97映像フレームの速度で視線あわせの処理を行うために必要な処理量である。この値以上の処理を行うと、1秒当たりに処理可能な映像フレーム数が減少し、コマ落ちが発生する。制御部101の処理負荷が大きい状態では、限界処理量はそのぶん小さくなる。 The limit processing amount is a value calculated by the processing load management unit 309 every time the flowchart of FIG. 5 is executed. This value is, for example, a processing amount necessary for performing the line-of-sight processing at a speed of 29.97 video frames per second. When processing exceeding this value is performed, the number of video frames that can be processed per second is reduced, resulting in dropped frames. In a state where the processing load of the control unit 101 is large, the limit processing amount becomes much smaller.
 なお、限界処理量を算出する際の基準となる1秒当たりに処理する映像フレーム数は、映像処理装置1の1秒あたりに表示する映像フレーム数に応じて、30や60などほかの値にすることができる。また、制御部101の処理量の最大値から所定の処理量を差し引いた処理量を限界処理量にしたり、別の基準を用いて限界処理量を設定してもよい。 Note that the number of video frames to be processed per second, which is a reference for calculating the limit processing amount, is set to other values such as 30 and 60 according to the number of video frames to be displayed per second of the video processing device 1. can do. Alternatively, the processing amount obtained by subtracting a predetermined processing amount from the maximum processing amount of the control unit 101 may be used as the limit processing amount, or the limit processing amount may be set using another criterion.
 S906で合算した推定処理量が、限界処理量を超えている場合(S907のY)は、判定リスト700を用いて、補正処理する顔の取捨選択を行う(S908)。超えていない場合(S907のN)は、判定リスト700において補正必要度が0以外のID701に対応するすべての顔の補正処理を行う(S909)。 When the estimated processing amount added in S906 exceeds the limit processing amount (Y in S907), the face to be corrected is selected using the determination list 700 (S908). If not exceeded (N in S907), correction processing is performed on all faces corresponding to IDs 701 whose correction necessity level is other than 0 in the determination list 700 (S909).
 S908では次の処理を行う。判定リスト700の補正必要度が高い順に、推定処理量を合算していき、処理負荷管理部309が算出した限界処理量を超えないところまでの顔を選択する。図4の判定リスト700において、たとえば限界処理量が25の場合、推定処理量704の合計値が25を上回らないようにするため、IDが1と2の顔を選択する。 In S908, the following processing is performed. The estimated processing amounts are added together in descending order of necessity of correction in the determination list 700, and faces up to a point where the limit processing amount calculated by the processing load management unit 309 is not selected are selected. In the determination list 700 of FIG. 4, for example, when the limit processing amount is 25, faces whose IDs are 1 and 2 are selected so that the total value of the estimated processing amounts 704 does not exceed 25.
 S907がNの場合、またはS908の処理を終えると、選択された顔について補正処理を行う(S909)。顔の補正処理は、視線が合うように顔を映像処理することで行う。顔の補正処理の方法はいくつかあり、どの方法でも本実施例の効果は得られるため、ここではすべてを説明しないが、ここでは例として、モニター5の上部と下部にそれぞれカメラ4を設置し、異なる2つの角度から撮影した顔の映像を用いて、2つのカメラ間の任意の位置から撮影したような顔に補正する方法について説明する。 When S907 is N or when the processing of S908 is completed, the selected face is corrected (S909). The face correction process is performed by processing the face image so that the line of sight matches. There are several face correction processing methods, and the effects of the present embodiment can be obtained by any method. Therefore, not all of them will be described here. However, here, as an example, the cameras 4 are respectively installed at the upper and lower portions of the monitor 5. A method of correcting a face taken from an arbitrary position between two cameras using a face image taken from two different angles will be described.
 S901において、2台のカメラから入力した映像フレームそれぞれについて顔検出を行い、目、鼻および口などの顔の器官の特徴点を抽出する。抽出した特徴点を用いて、2つの映像フレーム中の顔のマッチングを行う。マッチングした2つの顔を映像フレームから抜きだし、2つの顔をモーフィング処理により合成する。この際、視線が合う角度となるようにモーフィングの比率を調整する。 In step S901, face detection is performed for each of the video frames input from the two cameras, and feature points of facial organs such as eyes, nose, and mouth are extracted. Using the extracted feature points, the faces in the two video frames are matched. Two matched faces are extracted from the video frame, and the two faces are synthesized by morphing processing. At this time, the morphing ratio is adjusted so that the line of sight meets the angle.
 この際、モーフィングの比率を視線が合う角度ではなく、視線のずれの許容範囲の最大の角度にしてもよい。顔の補正処理において、カメラで撮影した顔から大きく変形してしまうと、視線があわないことによる違和感よりも補正したことによる違和感のほうが大きくなる場合がある。視線があう角度にするよりも、視線のずれの許容範囲の最大の角度にするほうが、顔の変形量を抑えることができる場合は、視線のずれの許容範囲の最大の角度を用いてモーフィングの比率すなわち顔の補正量を決定してもよい。 At this time, the morphing ratio may not be an angle at which the line of sight meets but a maximum angle within an allowable range of the line of sight deviation. In the face correction process, if the face photographed by the camera is greatly deformed, the sense of incongruity due to the correction may be greater than the sense of incongruity due to lack of line of sight. If it is possible to reduce the amount of deformation of the face by setting the maximum angle within the allowable range of line of sight rather than the angle at which the line of sight meets, use the maximum angle within the allowable range of line of sight. The ratio, that is, the correction amount of the face may be determined.
 最後に、補正した顔を、上部のカメラで撮影した映像フレーム中の顔の上に貼り付ける。 Finally, paste the corrected face on the face in the video frame taken with the upper camera.
 ほかの補正処理の方法として、TOF(Time of Flight)方式のカメラを用いて、被写体の顔の3次元モデルを生成し、3次元モデルを回転させることで、視線が合う角度となるように顔に補正する方法がある。 As another correction processing method, a TOF (Time of Flight) camera is used to generate a 3D model of the subject's face and rotate the 3D model so that the face has an angle that matches the line of sight. There is a way to correct it.
 制御部101が視線あわせの処理を行った映像フレームは、映像エンコーダ103-1へ出力し、多重化部118やストリーム処理部105により、映像音声ストリームとして映像処理装置2に送信される。 The video frame on which the control unit 101 has performed line-of-sight processing is output to the video encoder 103-1, and is transmitted to the video processing device 2 as a video / audio stream by the multiplexing unit 118 and the stream processing unit 105.
 次に視線のずれ角を求める方法について説明する。 Next, a method for obtaining the gaze shift angle will be described.
 図6は、カメラとモニターとテレビ会議参加者の位置関係の一具体例を示す図である。4-1および4-2はモニター5の上下に設置したカメラ、401,403および406はテレビ会議の参加者、407は会議卓、501は視線のずれがなくなる位置の仮想の視点、502はカメラ4-1のレンズの中心と仮想視点501を結ぶ線分、503はカメラ4-1のレンズの中心と参加者401の目を結ぶ線分、504はカメラ4-2のレンズの中心と参加者401の目を結ぶ線分、505は線分502と線分503のなす角度、506はカメラ4-1のレンズの中心とカメラ4-2のレンズの中心を結ぶ線分と線分504のなす角度、507は仮想視点501と参加者401の目を結ぶ線分、508は線分505と線分507のなす角度で、視線のずれを示す角度である。 FIG. 6 is a diagram showing a specific example of the positional relationship between the camera, the monitor, and the video conference participants. 4-1 and 4-2 are cameras installed above and below the monitor 5, 401, 403, and 406 are participants of the video conference, 407 is a conference table, 501 is a virtual viewpoint where the line of sight is eliminated, and 502 is a camera 4-1, a line segment connecting the center of the lens 4-1 and the virtual viewpoint 501, 503 is a line segment connecting the center of the lens of the camera 4-1 and the eyes of the participant 401, and 504 is the center of the lens of the camera 4-2 and the participant. A line segment connecting the eyes 401, 505 is an angle formed by the line segment 502 and the line segment 503, 506 is formed by a line segment connecting the center of the lens of the camera 4-1 and the center of the lens of the camera 4-2, and the line segment 504. An angle, 507 is a line segment connecting the virtual viewpoint 501 and the eyes of the participant 401, and 508 is an angle formed by the line segment 505 and the line segment 507, which is an angle indicating a line-of-sight shift.
 視線のずれが許容範囲内であるかどうかは角度508を算出することで知ることができる。この角度を算出する方法はいくつかあり、どのような方法でこの角度を求めても本実施例の効果は得られるため、すべての方法の説明は行わないが、ここでは例としてカメラ2台を用いる方法について説明する。以降の角度508を求める処理は、角度算出部303が行う。 Whether the line-of-sight shift is within the allowable range can be determined by calculating the angle 508. There are several methods for calculating this angle, and any method can be used to obtain the effect of this embodiment. Therefore, all the methods will not be described, but here two cameras are used as an example. A method to be used will be described. Subsequent processing for obtaining the angle 508 is performed by the angle calculation unit 303.
 角度508を求めるためには、線分502、線分503および線分507で構成される三角形がわかればよい。この三角形は、線分502、線分503および角度505の数値がわかれば求めることができる。図6の座標系は、地面とモニター5の画面に対して垂直な二次元平面とし、原点はモニター5の画面上の中心とする。 In order to obtain the angle 508, it is only necessary to know the triangle formed by the line segment 502, the line segment 503, and the line segment 507. This triangle can be obtained if the numerical values of the line segment 502, the line segment 503, and the angle 505 are known. The coordinate system of FIG. 6 is a two-dimensional plane perpendicular to the ground and the screen of the monitor 5, and the origin is the center on the screen of the monitor 5.
 線分502の長さは、仮想視点501とカメラ4-1のレンズの中心間の距離がわかれば求めることができる。 The length of the line segment 502 can be obtained if the distance between the virtual viewpoint 501 and the center of the camera 4-1 lens is known.
 仮想視点501の位置は、参加者が視線を合わせるために見るべき位置と言い換えることができる。見るべき位置とは、例えばテレビ会議の相手側の参加者の目が表示されている位置とすればよい。相手側の参加者のうち一人が話している状況であれば、その参加者の目の表示位置を見るべき位置とすればよい。 The position of the virtual viewpoint 501 can be rephrased as the position that the participant should see in order to match the line of sight. The position to be viewed may be, for example, a position where the eyes of the participant on the other side of the video conference are displayed. In a situation where one of the participants on the other side is speaking, the position where the participant's eyes are to be viewed may be set.
 一方、相手側の参加者が聞き手となっている場合は、見るべき位置が一箇所にならない場合がある。ここでの見るべき位置は、単純化してモニター5の画面上中央としてもよいが、参加者の目の表示位置を仮想視点算出部302によって算出し、それによって見るべき位置を設定する方法をとっても良い。 On the other hand, when the participant on the other side is a listener, the position to be viewed may not be one place. The position to be viewed here may be simplified and may be the center on the screen of the monitor 5, but the display position of the participant's eyes is calculated by the virtual viewpoint calculation unit 302 and the position to be viewed can be set by this. good.
 カメラ4-1のレンズの位置は、映像処理装置の利用中に変動しない場合、あらかじめ測定した値を用いればよい。値は記憶部106に記憶する。カメラ4-1の位置が可変である場合、位置に応じて値を変化させればよい。 If the position of the lens of the camera 4-1 does not change during use of the video processing apparatus, a value measured in advance may be used. The value is stored in the storage unit 106. When the position of the camera 4-1 is variable, the value may be changed according to the position.
 以上により、仮想視点501とカメラ4-1のレンズの中心間の距離から線分502の長さが決定する。 As described above, the length of the line segment 502 is determined from the distance between the virtual viewpoint 501 and the center of the lens of the camera 4-1.
 線分503の長さは三角測量の要領で、カメラ4-1のレンズの中心とカメラ4-2のレンズの中心間の距離と、角度505および角度506から求めることが出来る。カメラ4-1とカメラ4-2のレンズの中心間の距離は、映像処理装置の利用中に変動しない場合、あらかじめ測定した値を、映像処理装置の記憶部に記憶しておけばよい。カメラ4-1、カメラ4-2の位置が可変である場合、位置に応じて値を変化させればよい。 The length of the line segment 503 can be obtained from the distance between the center of the lens of the camera 4-1 and the center of the lens of the camera 4-2, and the angle 505 and the angle 506 in the manner of triangulation. If the distance between the lens centers of the camera 4-1 and the camera 4-2 does not change during use of the video processing device, a value measured in advance may be stored in the storage unit of the video processing device. When the positions of the cameras 4-1 and 4-2 are variable, the values may be changed according to the positions.
 角度505は、カメラ4-1の俯角と画角と参加者の目の映像フレーム上の位置から求めることができる。目の位置の座標系は、映像フレーム上の二次元平面とする。カメラの俯角は、カメラの向きが水平な場合を0、カメラの向きが真下の場合を-90度とする。 The angle 505 can be obtained from the depression angle and angle of view of the camera 4-1, and the position of the participant's eyes on the video frame. The coordinate system of the eye position is a two-dimensional plane on the video frame. The depression angle of the camera is 0 when the camera orientation is horizontal and −90 degrees when the camera orientation is directly below.
 カメラ4-1の俯角と画角は、カメラ4-1の位置が映像処理装置の利用中に変動しない場合、あらかじめ測定した値を、映像処理装置1の記憶部に記憶しておけばよい。映像フレーム上の参加者の目の位置は、顔検出部301によって求めることが出来る。 If the position of the camera 4-1 does not change during use of the video processing device, the values measured in advance may be stored in the storage unit of the video processing device 1. The position of the participant's eyes on the video frame can be obtained by the face detection unit 301.
 例として、カメラ4-1の俯角が-30度、画角が46度の場合の角度505の値は次のようになる。参加者の目の位置が映像フレームの中心の場合は、角度505は60度である。参加者の目の位置が映像フレームの上端にある場合は、角度505は83度である。 As an example, when the depression angle of the camera 4-1 is −30 degrees and the field angle is 46 degrees, the value of the angle 505 is as follows. If the participant's eyes are at the center of the video frame, the angle 505 is 60 degrees. If the participant's eyes are at the top of the video frame, the angle 505 is 83 degrees.
 角度506は、角度505と同様にカメラ4-2の仰角と画角と参加者の顔の位置から求めることができるため、説明を省略する。以上により線分503の長さが求められる。 Since the angle 506 can be obtained from the elevation angle and angle of view of the camera 4-2 and the position of the participant's face, as with the angle 505, the description is omitted. Thus, the length of the line segment 503 is obtained.
 ここまでに得られた線分502と線分503と角度505で構成される三角形から、角度508が求められる。 The angle 508 is obtained from the triangle formed by the line segment 502, the line segment 503, and the angle 505 obtained so far.
 ほかに角度508を求める方法として、撮影用のカメラに加えて距離測定用にTOF方式のカメラを使用する方法がある。TOF方式のカメラの設置位置と俯角は、カメラ4-1と可能な限り同一となるように設置する。線分502の求め方と角度505の求め方はすでに説明した方法と同じである。線分503の求め方は、TOF方式のカメラで参加者401までの距離を測定することでおこなう。TOF方式のカメラでの距離測定方法は、周知の技術であるため説明を省略する。ここまでに得られた線分502と線分503と角度505で構成される三角形から、角度508が求められる。 Another method for obtaining the angle 508 is to use a TOF camera for distance measurement in addition to the camera for shooting. The installation position and the depression angle of the TOF camera are set to be as identical as possible with the camera 4-1. The method for obtaining the line segment 502 and the method for obtaining the angle 505 are the same as those already described. The line segment 503 is obtained by measuring the distance to the participant 401 with a TOF camera. Since the distance measurement method using the TOF camera is a well-known technique, the description thereof is omitted. The angle 508 is obtained from the triangle formed by the line segment 502, the line segment 503, and the angle 505 obtained so far.
 図7は、図5のS904で実行するサブルーチン処理のフローチャートである。サブルーチンでは、顔の補正の必要度を算出する処理を実行する。 FIG. 7 is a flowchart of the subroutine processing executed in S904 of FIG. In the subroutine, processing for calculating the degree of necessity for face correction is executed.
 サブルーチンをメイン処理から実行すると、まず顔のずれ角を算出する(S1001)。ずれ角は、角度検出部303が、すでに説明した図6の角度508を算出することで行う。 When the subroutine is executed from the main process, a face shift angle is first calculated (S1001). The deviation angle is determined by the angle detection unit 303 calculating the angle 508 of FIG.
 次に算出したずれ角が、指定した角度以上であるかの判定を行う(S1002)。指定角度は、記憶部106に記憶されており、S1002での判定処理のために、映像処理装置1にあらかじめ設定されている情報である。指定角度の値は、例えばすでに説明した視線のずれの許容範囲の最大の角度とする。ここでは例として、指定角度を9度とする。 Next, it is determined whether the calculated shift angle is equal to or greater than the specified angle (S1002). The designated angle is stored in the storage unit 106 and is information set in advance in the video processing apparatus 1 for the determination process in S1002. The value of the designated angle is, for example, the maximum angle in the allowable range of gaze deviation already described. Here, as an example, the designated angle is 9 degrees.
 ずれ角が指定角度以上の場合(S1002のY)は、補正すべきかどうかを数値化した補正必要度を算出する(S1003)。補正必要度は、補正必要度算出部306が算出する値で、算出した値は判定リスト700の補正必要度703に書き込む。補正必要度の数値が大きいほど補正すべきであることを示す。補正必要度の算出方法は、ずれ角が閾値未満では0とし、閾値以上では、ずれ角の値とする。 If the deviation angle is greater than or equal to the specified angle (Y in S1002), a correction necessity degree is calculated by quantifying whether or not to correct (S1003). The degree of necessity for correction is a value calculated by the degree of necessity correction calculator 306, and the calculated value is written in the degree of necessity for correction 703 of the determination list 700. It shows that it should correct | amend so that the numerical value of a correction necessity is large. The calculation method of the degree of necessity of correction is 0 when the deviation angle is less than the threshold value, and the deviation angle value when the deviation angle is greater than or equal to the threshold value.
 次に、顔の補正処理を行う際の処理量を推定した値である推定処理量を算出する(S1004)。推定処理量は、補正処理量算出部305が算出する値で、算出した値は判定リスト700の推定処理量704に書き込む。推定処理量は、顔面積算出部304で算出した顔の面積に定数をかけて算出する。 Next, an estimated processing amount, which is a value obtained by estimating the processing amount when performing face correction processing, is calculated (S1004). The estimated processing amount is a value calculated by the correction processing amount calculation unit 305, and the calculated value is written in the estimated processing amount 704 of the determination list 700. The estimated processing amount is calculated by multiplying the face area calculated by the face area calculation unit 304 by a constant.
 ここで、顔の面積について説明する。図8は、映像処理装置に接続されたカメラ4で撮影された映像フレームの一具体例を示す図である。400は映像フレーム、401から406はテレビ会議の参加者、407は会議卓である。参加者のうち、参加者402のみ発声中である。 Here, the area of the face will be described. FIG. 8 is a diagram illustrating a specific example of a video frame shot by the camera 4 connected to the video processing device. 400 is a video frame, 401 to 406 are participants of a video conference, and 407 is a conference table. Of the participants, only the participant 402 is speaking.
 図8に示すように、カメラ4は会議風景の全体が収まるように撮影する位置に設置されており、映像手前の参加者ほど大きく映っており、奥の参加者ほど小さく映っている。見かけの大きさが大きいほど、画像処理に要する処理量が大きくなり、見かけの大きさが小さいほど、画像処理に要する処理量は小さくなる。 As shown in FIG. 8, the camera 4 is installed at a position to capture the entire meeting scene, and the larger the participant in the front of the video, the smaller the participant in the back. The larger the apparent size, the larger the processing amount required for image processing, and the smaller the apparent size, the smaller the processing amount required for image processing.
 図7のサブルーチンの処理が終了すると、すでに説明した図5のS905の処理に戻り、視線あわせのメイン処理を引き続き実行する。 When the processing of the subroutine of FIG. 7 ends, the processing returns to the processing of S905 of FIG. 5 already described, and the main processing for line-of-sight alignment is continued.
 以上のように、補正の必要性が低い顔については補正を行わず、補正すべき必要度が高い順に顔を選択して補正することで、処理負荷を抑制しコマ落ちの発生しない、視線あわせの処理を行うことができる。 As described above, face correction that does not require correction is performed without selecting correction, and selecting and correcting faces in descending order of necessity of correction, thereby reducing processing load and eliminating frame dropping. Can be processed.
 実施例1では、ずれ角に基づいて補正の要否を判定していたが、実施例2ではずれ角を用いずに顔の面積に基づいて補正の要否を判定する。 In the first embodiment, the necessity of correction is determined based on the shift angle, but in the second embodiment, the necessity of correction is determined based on the face area without using the shift angle.
 視線が合わない問題は、モニター5に映る参加者の顔の見かけの大きさも関係する。他拠点の映像処理装置と接続されたモニター5に表示されている、映像処理装置に接続されたカメラ4で撮影された参加者の顔の見かけの大きさが小さければ、視線があっているかどうかはわかりづらくなるため、視線が合わないという問題は発生しない。 The problem that the line of sight does not match is also related to the apparent size of the participant's face shown on the monitor 5. If the apparent size of the participant's face photographed by the camera 4 connected to the video processing device displayed on the monitor 5 connected to the video processing device at another base is small, whether or not the line of sight is present Since it becomes difficult to understand, the problem that the line of sight does not match does not occur.
 このことから、映像処理装置では、視線あわせの処理の負荷を低減するために、見かけの大きさが小さい参加者は、顔の補正処理の対象外とする。 
 視線あわせの処理内容は、実施例1とほぼ同様であるが、判定リストおよびフローチャートの一部が異なるため、図13と図14を用いて説明する。
For this reason, in the video processing apparatus, in order to reduce the load of the line-of-sight alignment process, participants with a small apparent size are excluded from the face correction process.
The content of the line-of-sight process is almost the same as that of the first embodiment, but a part of the determination list and the flowchart are different, and therefore will be described with reference to FIGS.
 図13は、実施例2における判定リストである。1300は判定リスト、1301は面積比である。そのほかの項目については、図7と同様であるため説明を省略する。面積比1301は、画面全体に対する顔の面積の比率であり、顔面積算出部304で算出する。 FIG. 13 is a determination list in the second embodiment. 1300 is a judgment list, and 1301 is an area ratio. The other items are the same as those in FIG. The area ratio 1301 is a ratio of the face area to the entire screen, and is calculated by the face area calculation unit 304.
 判定リスト1300の補正必要度703は、補正必要度算出部306が面積比1301を基に算出する。ここで、顔の面積比が所定の面積比未満の場合は、補正必要度を0にする。面積比以上の場合は、面積比の値とする。なお、顔の面積比に替えて、単に顔の面積に応じて補正必要度を0にするか否かを判断してもよい。 The correction necessity degree 703 of the determination list 1300 is calculated based on the area ratio 1301 by the correction necessity degree calculation unit 306. Here, when the area ratio of the face is less than the predetermined area ratio, the necessity of correction is set to zero. If the area ratio is greater than or equal to the area ratio, the area ratio value is used. Note that, instead of the face area ratio, it may be determined whether or not the correction necessity is set to 0 according to the face area.
 図14は、実施例2におけるサブルーチン処理のフローチャートで、実施例1と同様に図5のS904から実行する。サブルーチンでは、顔の補正の必要度を算出する。 FIG. 14 is a flowchart of the subroutine processing in the second embodiment, which is executed from S904 in FIG. In the subroutine, the degree of necessity for face correction is calculated.
 サブルーチンをメイン処理から実行すると、まず顔の面積比の算出を行う(S1401)。次に算出した面積比が、指定した閾値の面積比以上かどうか判定する(S1402)。閾値の面積比は、記憶部106に記憶されており、S1402での判定処理のために、映像処理装置1にあらかじめ設定されている情報である。ここでは例として、閾値を1.0%とする。 When the subroutine is executed from the main process, the face area ratio is first calculated (S1401). Next, it is determined whether or not the calculated area ratio is greater than or equal to the specified threshold area ratio (S1402). The area ratio of the threshold is stored in the storage unit 106 and is information set in advance in the video processing device 1 for the determination process in S1402. Here, as an example, the threshold value is 1.0%.
 面積比が閾値以上の場合(S1402のY)は、補正すべきかどうかを数値化した補正必要度を算出する(S1403)。S1403では、すでに説明したように、面積比を用いて補正必要度を算出する。算出した値は、判定リスト1400に書き込む。 If the area ratio is equal to or greater than the threshold value (Y in S1402), the degree of necessity for correction is calculated by quantifying whether to correct (S1403). In S1403, as already described, the correction necessity is calculated using the area ratio. The calculated value is written in the determination list 1400.
 S1403より後の処理と、S1402のNより後の処理については、実施例1と同様であるため、説明を省略する。 Since the processing after S1403 and the processing after N in S1402 are the same as in the first embodiment, description thereof will be omitted.
 以上により、映像フレーム中の顔の面積を用いて、顔の補正の必要性が低い顔については補正を行わず、顔の補正の必要度が高い順に顔を選択して補正することで、処理負荷を抑制しコマ落ちの発生しない、視線あわせの処理を行うことができる。 As described above, by using the area of the face in the video frame, the face is not corrected for the face with low necessity for face correction, and the face is selected and corrected in the order of the necessity of face correction. It is possible to perform line-of-sight processing that suppresses the load and does not cause frame dropping.
 ほかに、顔の補正の要否を判定する方法として、顔検出部で検出した発声中の話者の顔のみ補正を行う方法を取ることもできる。これにより、話者の顔のみ補正すればよいこととなり、視線合わせの処理をより低減することができる。 In addition, as a method of determining whether or not face correction is necessary, a method of correcting only the face of the speaker who is speaking detected by the face detection unit can be used. Thereby, only the speaker's face needs to be corrected, and the line-of-sight processing can be further reduced.
 実施例1では、ずれ角を用いて補正の必要度を算出したが、ある2つの顔のずれ角の値が近い、あるいは同一の場合は、補正必要度も近い、あるいは同一となってしまう。そこで、実施例3では、ずれ角に加えて面積比なども補正必要度の算出に用いる。これにより、同じずれ角の複数の顔が存在する場合でもそれらの補正必要度に差が生じることになり、より補正が必要な顔を高精度に選択することが可能となる。 In the first embodiment, the degree of necessity of correction is calculated using the deviation angle. However, when the values of the deviation angles of two faces are close or the same, the degree of correction is close or the same. Therefore, in the third embodiment, the area ratio and the like in addition to the deviation angle are used for calculating the degree of correction. As a result, even when there are a plurality of faces having the same shift angle, a difference occurs in the necessity of correction, and it is possible to select a face that needs more correction with high accuracy.
 そこで、補正の必要度を算出する方法として、ずれ角、面積比、話者かどうかの情報を組み合わせて用いる方法について説明する。 
 図9は、複数の条件を組み合わせて補正必要度を算出する場合の判定リストである。800は判定リスト、801は話者かどうかを示す話者フラグ、802は面積比である。他の項目については、図4と同様であるため、説明を省略する。図4ではずれ角のみから補正必要度を算出しているが、ここでは話者フラグ801と面積比802とずれ角702から補正必要度を算出する。
Therefore, as a method for calculating the degree of necessity for correction, a method using a combination of information on the angle of deviation, the area ratio, and whether or not the speaker is used will be described.
FIG. 9 is a determination list in the case where the correction necessity is calculated by combining a plurality of conditions. Reference numeral 800 denotes a determination list, reference numeral 801 denotes a speaker flag indicating whether the speaker is a speaker, and reference numeral 802 denotes an area ratio. The other items are the same as those in FIG. In FIG. 4, the correction necessity degree is calculated only from the deviation angle, but here, the necessity degree of correction is calculated from the speaker flag 801, the area ratio 802, and the deviation angle 702.
 補正必要度の算出方法は、どの条件を重視するかによって変化させることができる。例として、簡単に数式で表現すると、話者フラグx、面積比y、ずれ角zとし、それぞれの重み付けの定数を、a、b、cとすると、補正必要度r=ax+by+czとなる。数式および定数の値は、どの条件を重視するかによって変化させることができる。 The calculation method for the necessity of correction can be changed depending on which condition is important. As an example, if expressed simply by mathematical formulas, if the speaker flag x, the area ratio y, and the shift angle z are set to a, b, and c, respectively, the correction necessity degree is r = ax + by + cz. The mathematical formula and the value of the constant can be changed depending on which condition is emphasized.
 補正対象の選択と補正処理は、図5のフローチャートと図10のフローチャートにより実現する。図5については、実施例1で説明済みであり、処理内容が同じであるため省略する。 The selection of the correction target and the correction process are realized by the flowchart of FIG. 5 and the flowchart of FIG. FIG. 5 has already been described in the first embodiment, and the processing content is the same, so that description thereof is omitted.
 図10は、図7のフローチャートにいくつかの処理を追加したものである。追加する処理は、S1101とS1102である。まず、面積比の算出行い(S1101)、算出した値を判定リスト800の面積比802に書き込む。算出した面積比が指定面積比以上かの判定(S1102)を行い、判定が真の場合(S1102のY)は、ずれ角検出の処理(S1001)を行う。S1102の判定が偽(S1102のN)の場合、補正必要度を0にする(S1005)。図10の他の処理は、実施例1での図7の説明内容と同一であるため、説明を省略する。 FIG. 10 is obtained by adding some processes to the flowchart of FIG. The processing to be added is S1101 and S1102. First, the area ratio is calculated (S1101), and the calculated value is written in the area ratio 802 of the determination list 800. It is determined whether the calculated area ratio is greater than or equal to the specified area ratio (S1102). If the determination is true (Y in S1102), a shift angle detection process (S1001) is performed. If the determination in S1102 is false (N in S1102), the correction necessity level is set to 0 (S1005). The other processes in FIG. 10 are the same as those described in FIG.
 実施例3では、補正対象の顔が話者かどうか、補正対象の顔の面積比、補正対象の視線のずれ角の3つの要素に基づいて補正の必要度を計算したが、2つの要素、あるいは4つ以上の要素に基づいて計算してもよい。また、補正対象の顔が話者かどうか、補正対象の顔の面積比、補正対象の視線のずれ角以外の要素を用いて補正の必要度を計算してもよい。 In the third embodiment, the degree of necessity for correction is calculated based on three factors: whether the face to be corrected is a speaker, the area ratio of the face to be corrected, and the angle of gaze shift of the correction target. Or you may calculate based on four or more elements. In addition, the degree of necessity for correction may be calculated using elements other than whether the face to be corrected is a speaker, the area ratio of the face to be corrected, or the angle of gaze deviation of the correction target.
 以上により、補正が必要な顔の選択をより高精度に行うことが出来る。 As described above, the face that needs to be corrected can be selected with higher accuracy.
 実施例1~3においては、補正必要度を算出した後、さらにS906~S908の処理を行って顔の補正処理を行っているが、S906~S908の処理を省略し、補正必要度が0以外の顔に対して補正処理を行うように構成してもよい。 In the first to third embodiments, after calculating the necessity level of correction, the processes of S906 to S908 are further performed to correct the face. However, the processes of S906 to S908 are omitted and the degree of correction requirement is other than 0. You may comprise so that a correction process may be performed with respect to a face.
 これにより、より簡易な方法で補正の必要が無いと判断された顔に対しては補正処理を行わないようにすることが可能となり、映像処理装置における処理を低減し、映像の乱れをより少なくすることができる。 As a result, it is possible to prevent the correction process from being performed on a face that is determined not to require correction by a simpler method, thereby reducing processing in the video processing apparatus and reducing video disturbance. can do.
 実施例1から4において、映像処理装置をテレビ会議装置として説明したが、別の形態として、映像処理装置は他のテレビ会議装置と組みあわせてテレビ会議を行うシステム構成であってもよい。 In the first to fourth embodiments, the video processing apparatus has been described as a video conference apparatus. However, as another form, the video processing apparatus may have a system configuration in which a video conference is performed in combination with another video conference apparatus.
 図11は、映像処理装置と他のテレビ会議装置を組み合わせた構成である。1111および1112は映像処理装置、1201および1202はテレビ会議装置である。同図において、映像処理装置1111は、カメラ4、テレビ会議装置1201と接続され、映像の入出力を行うことができる。なお、映像処理装置1112は映像処理装置1111と同様の装置および構成である。 FIG. 11 shows a configuration in which a video processing device and another video conference device are combined. Reference numerals 1111 and 1112 denote video processing apparatuses, and 1201 and 1202 denote video conference apparatuses. In the figure, a video processing device 1111 is connected to the camera 4 and the video conference device 1201 and can input and output video. Note that the video processing apparatus 1112 has the same apparatus and configuration as the video processing apparatus 1111.
 この構成において、映像処理装置1111は、カメラ4で撮影した映像信号を入力する。映像処理装置1111は入力した映像信号に視線あわせの処理を行い、処理後の映像信号をテレビ会議装置1201へ出力する。映像処理装置1111は、視線あわせの処理さえ行えば良いため、テレビ会議にかかわるネットワーク3への接続機能や、映像信号をエンコードおよびデコードする機能は不要となる。 In this configuration, the video processing device 1111 inputs a video signal captured by the camera 4. The video processing device 1111 performs line-of-sight processing on the input video signal and outputs the processed video signal to the video conference device 1201. Since the video processing device 1111 only needs to perform line-of-sight processing, a connection function to the network 3 related to a video conference and a function of encoding and decoding a video signal are not necessary.
 この構成をとることにより、テレビ会議装置は、通常カメラを接続する入力端子に映像処理装置を接続すればよい構成となるため、視線あわせの機能のないテレビ会議装置でも、映像処理装置を接続することで視線あわせの効果を得ることができる。 By adopting this configuration, the video conference device can be configured to connect the video processing device to the input terminal to which the camera is normally connected. Therefore, even the video conference device without a line-of-sight function is connected. Thus, the effect of line-of-sight alignment can be obtained.
 実施例1から5において、映像処理装置がテレビ会議の拠点ごとにひとつずつ必要となる構成で説明したが、別の形態として、ネットワーク上の映像処理装置で視線あわせの処理を行う方法であっても良い。 In the first to fifth embodiments, the configuration in which one video processing device is required for each video conference base has been described. However, as another form, there is a method for performing line-of-sight processing with a video processing device on a network. Also good.
 図12は、映像処理装置がネットワーク3を介して他のテレビ会議装置と接続する構成である。1200は映像処理装置である。本実施例での映像処理装置1200は、ネットワーク3を介して、テレビ会議装置1201から映像音声ストリームを受信し、映像音声ストリームを変換して映像データと音声データに分離し、映像データをデコードして映像信号に変換した後、映像信号に視線あわせの処理を行う。次に視線あわせの処理を行った映像信号をエンコードして映像データとし、映像データと音声データを多重化およびパケット化して映像ストリームに変換し、テレビ会議装置1202へ出力する。 FIG. 12 shows a configuration in which the video processing apparatus is connected to another video conference apparatus via the network 3. 1200 is a video processing apparatus. The video processing apparatus 1200 in this embodiment receives a video / audio stream from the video conference apparatus 1201 via the network 3, converts the video / audio stream into video data and audio data, and decodes the video data. Then, the video signal is converted into a video signal, and the line-of-sight processing is performed on the video signal. Next, the video signal subjected to the line-of-sight processing is encoded into video data, and the video data and audio data are multiplexed and packetized, converted into a video stream, and output to the video conference device 1202.
 同様に、映像処理装置は、ネットワーク3を介してテレビ会議装置1202から映像音声ストリームを受信し、視線あわせの処理をはじめとする同様の処理を行って、映像音声ストリームをテレビ会議装置1201へ出力する。 Similarly, the video processing apparatus receives the video / audio stream from the video conference apparatus 1202 via the network 3, performs similar processing including line-of-sight processing, and outputs the video / audio stream to the video conference apparatus 1201. To do.
 このような構成をとることで、各拠点に映像処理装置を設置する必要がなくなり、映像処理装置がない拠点でも、視線あわせの効果を得ることができる。 By adopting such a configuration, it is not necessary to install a video processing device at each base, and the effect of line-of-sight alignment can be obtained even at a base without a video processing device.
1、2 テレビ会議通信装置
3 ネットワーク
4、4-1,4-2 カメラ
5 モニター
6 マイク
7 スピーカー
8 リモコン
101 制御部
102 メモリ
103-1、103-2 エンコーダ
104-1、104-2 デコーダ
118 多重化部
119 分離部
105 ストリーム処理部
106 記憶部
107-1、107-2 映像処理部
108-1、108-2 音声処理部
109 リモコン処理部
110 ネットワーク接続部
112 映像入力端子
113 映像出力端子
114 音声入力端子
115 音声出力端子
116 リモコン入力端子
117 ネットワーク接続端子
118 多重化部
119 分離部
301 顔検出部
302 仮想視点算出部
303 角度算出部
304 顔面積算出部
305 補正処理量算出部
306 補正必要度算出部
307 補正処理部
309 処理負荷管理部
400 映像フレーム
401、402、……、406 人物
407 会議卓
501 仮想視点
700 判定リスト
701 ID
702 ずれ角
703 補正必要度
704 推定処理量
800 判定リスト
801 話者
802 面積比
1111、1112 映像処理装置
1200 映像処理装置
1201 テレビ会議装置
1202 テレビ会議装置
1, 2 Video conference communication device 3 Network 4, 4-1, 4-2 Camera 5 Monitor 6 Microphone 7 Speaker 8 Remote control 101 Control unit 102 Memory 103-1, 103-2 Encoder 104-1, 104-2 Decoder 118 Multiplex Conversion unit 119 separation unit 105 stream processing unit 106 storage unit 107-1, 107-2 video processing unit 108-1, 102-2 audio processing unit 109 remote control processing unit 110 network connection unit 112 video input terminal 113 video output terminal 114 audio Input terminal 115 Audio output terminal 116 Remote control input terminal 117 Network connection terminal 118 Multiplexing unit 119 Separation unit 301 Face detection unit 302 Virtual viewpoint calculation unit 303 Angle calculation unit 304 Face area calculation unit 305 Correction processing amount calculation unit 306 Correction necessity calculation Part 307 correction processing part 309 processing negative Management unit 400 image frames 401 and 402, ..., 406 person 407 conference table 501 the virtual viewpoint 700 determines list 701 ID
702 Deviation angle 703 Correction necessity 704 Estimated processing amount 800 Determination list 801 Speaker 802 Area ratio 1111, 1112 Video processing device 1200 Video processing device 1201 Video conference device 1202 Video conference device

Claims (10)

  1.  カメラにより撮影された映像情報が入力される映像入力部と、
     前記映像入力部に入力された映像情報に含まれる顔の映像を補正する映像処理部と、
     前記映像処理部で処理した映像情報を出力する出力部と、を有し、
     前記映像処理部は、所定の条件に応じて映像情報に含まれる顔の映像の補正の要否を決定することを特徴とする映像処理装置。
    A video input unit for inputting video information captured by the camera;
    A video processing unit for correcting a video of a face included in the video information input to the video input unit;
    An output unit for outputting video information processed by the video processing unit,
    The video processing apparatus, wherein the video processing unit determines whether or not correction of a face video included in video information is necessary according to a predetermined condition.
  2.  請求項1の映像処理装置であって、前記所定の条件は、映像情報に含まれる顔の視線のずれ角であることを特徴とする映像処理装置。 2. The video processing apparatus according to claim 1, wherein the predetermined condition is a face gaze shift angle included in the video information.
  3.  請求項1の映像処理装置であって、前記所定の条件は、映像情報に含まれる顔の面積または面積比であることを特徴とする映像処理装置。 2. The video processing apparatus according to claim 1, wherein the predetermined condition is a face area or an area ratio included in the video information.
  4.  請求項1の映像処理装置であって、前記所定の条件は、映像情報に含まれる顔が話者であるか否かであることを特徴とする映像処理装置。 2. The video processing apparatus according to claim 1, wherein the predetermined condition is whether or not a face included in the video information is a speaker.
  5.  請求項1~4の映像処理装置であって、
     前記映像処理部における処理量が所定の処理量よりも多くなる場合、前記所定の条件に基づいて補正を行う顔の映像を選択することを特徴とする映像処理装置。
    The video processing device according to claims 1 to 4, comprising:
    An image processing apparatus, wherein when a processing amount in the image processing unit is larger than a predetermined processing amount, a face image to be corrected is selected based on the predetermined condition.
  6.  映像情報を処理する映像処理装置における映像処理方法であって、
     カメラにより撮影された映像情報を映像処理装置に入力する映像入力ステップと、
     入力した映像情報に含まれる顔の映像を補正する映像処理ステップと、
     前記映像処理ステップで処理した映像情報を出力する出力ステップと、を有し、
     前記映像処理ステップにおいて、所定の条件に応じて映像情報に含まれる顔の映像の補正の要否を決定することを特徴とする映像処理方法。
    A video processing method in a video processing apparatus for processing video information,
    A video input step for inputting video information captured by the camera to the video processing device;
    A video processing step for correcting a face image included in the input video information;
    Outputting the video information processed in the video processing step,
    A video processing method characterized in that, in the video processing step, it is determined whether correction of a video of a face included in video information is necessary according to a predetermined condition.
  7.  請求項6の映像処理方法であって、前記所定の条件は、映像情報に含まれる顔の視線のずれ角であることを特徴とする映像処理方法。 7. The video processing method according to claim 6, wherein the predetermined condition is a shift angle of a face line of sight included in the video information.
  8.  請求項6の映像処理方法であって、前記所定の条件は、映像情報に含まれる顔の面積または面積比であることを特徴とする映像処理方法。 7. The video processing method according to claim 6, wherein the predetermined condition is a face area or an area ratio included in the video information.
  9.  請求項6の映像処理方法であって、前記所定の条件は、映像情報に含まれる顔が話者であるか否かであることを特徴とする映像処理方法。 7. The video processing method according to claim 6, wherein the predetermined condition is whether or not the face included in the video information is a speaker.
  10.  請求項6~9の映像処理方法であって、
     前記映像処理ステップにおける処理量が所定の処理量よりも多くなる場合、前記所定の条件に基づいて補正を行う顔の映像を選択することを特徴とする映像処理方法。
    The video processing method according to claim 6-9,
    An image processing method comprising: selecting a face image to be corrected based on the predetermined condition when a processing amount in the image processing step is larger than a predetermined processing amount.
PCT/JP2012/083190 2012-12-21 2012-12-21 Video processor and video p rocessing method WO2014097465A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2012/083190 WO2014097465A1 (en) 2012-12-21 2012-12-21 Video processor and video p rocessing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2012/083190 WO2014097465A1 (en) 2012-12-21 2012-12-21 Video processor and video p rocessing method

Publications (1)

Publication Number Publication Date
WO2014097465A1 true WO2014097465A1 (en) 2014-06-26

Family

ID=50977843

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2012/083190 WO2014097465A1 (en) 2012-12-21 2012-12-21 Video processor and video p rocessing method

Country Status (1)

Country Link
WO (1) WO2014097465A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019201360A (en) * 2018-05-17 2019-11-21 住友電気工業株式会社 Image processing apparatus, computer program, video call system, and image processing method
WO2020054605A1 (en) * 2018-09-12 2020-03-19 シャープ株式会社 Image display device and image processing device
WO2020089971A1 (en) * 2018-10-29 2020-05-07 有限会社アドリブ Image processing apparatus, method, and computer program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH066786A (en) * 1992-06-22 1994-01-14 A T R Tsushin Syst Kenkyusho:Kk Line of sight coincidence correcting device
JPH11266443A (en) * 1998-03-17 1999-09-28 Toshiba Corp Picture and sound transmission-reception equipment
JP2005340974A (en) * 2004-05-24 2005-12-08 Fuji Xerox Co Ltd Image-transmission control program and image display program
JP2007189624A (en) * 2006-01-16 2007-07-26 Mitsubishi Electric Corp Video telephone terminal

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH066786A (en) * 1992-06-22 1994-01-14 A T R Tsushin Syst Kenkyusho:Kk Line of sight coincidence correcting device
JPH11266443A (en) * 1998-03-17 1999-09-28 Toshiba Corp Picture and sound transmission-reception equipment
JP2005340974A (en) * 2004-05-24 2005-12-08 Fuji Xerox Co Ltd Image-transmission control program and image display program
JP2007189624A (en) * 2006-01-16 2007-07-26 Mitsubishi Electric Corp Video telephone terminal

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019201360A (en) * 2018-05-17 2019-11-21 住友電気工業株式会社 Image processing apparatus, computer program, video call system, and image processing method
WO2020054605A1 (en) * 2018-09-12 2020-03-19 シャープ株式会社 Image display device and image processing device
WO2020089971A1 (en) * 2018-10-29 2020-05-07 有限会社アドリブ Image processing apparatus, method, and computer program

Similar Documents

Publication Publication Date Title
US9774896B2 (en) Network synchronized camera settings
US9270941B1 (en) Smart video conferencing system
WO2017208820A1 (en) Video sound processing device, video sound processing method, and program
US20210281802A1 (en) IMPROVED METHOD AND SYSTEM FOR VIDEO CONFERENCES WITH HMDs
US10681276B2 (en) Virtual reality video processing to compensate for movement of a camera during capture
US11076127B1 (en) System and method for automatically framing conversations in a meeting or a video conference
CN109997175B (en) Determining the size of a virtual object
JP6871801B2 (en) Image processing equipment, image processing method, information processing equipment, imaging equipment and image processing system
JPWO2017094543A1 (en) Information processing apparatus, information processing system, information processing apparatus control method, and parameter setting method
WO2017141511A1 (en) Information processing apparatus, information processing system, information processing method, and program
JP7042571B2 (en) Image processing device and its control method, program
JP2019103067A (en) Information processing device, storage device, image processing device, image processing system, control method, and program
JP4461739B2 (en) Imaging device
WO2017141584A1 (en) Information processing apparatus, information processing system, information processing method, and program
EP3465631B1 (en) Capturing and rendering information involving a virtual environment
JP2019022151A (en) Information processing apparatus, image processing system, control method, and program
WO2014097465A1 (en) Video processor and video p rocessing method
CN114651448A (en) Information processing system, information processing method, and program
WO2021200184A1 (en) Information processing device, information processing method, and program
WO2015089944A1 (en) Method and device for processing picture of video conference, and conference terminal
US11100716B2 (en) Image generating apparatus and image generation method for augmented reality
KR101515404B1 (en) Method and apparatus for controlling of virtual camera
KR20200097543A (en) Augmented Reality-based performance video viewing system and performance image providing method using it
JP2019096926A (en) Image processing system, image processing method and program
JP2019075740A (en) Image processing system, image processing apparatus, image transmission method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12890321

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12890321

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP