WO2014097465A1 - Processeur vidéo et procédé de traitement vidéo - Google Patents

Processeur vidéo et procédé de traitement vidéo Download PDF

Info

Publication number
WO2014097465A1
WO2014097465A1 PCT/JP2012/083190 JP2012083190W WO2014097465A1 WO 2014097465 A1 WO2014097465 A1 WO 2014097465A1 JP 2012083190 W JP2012083190 W JP 2012083190W WO 2014097465 A1 WO2014097465 A1 WO 2014097465A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
processing
face
correction
unit
Prior art date
Application number
PCT/JP2012/083190
Other languages
English (en)
Japanese (ja)
Inventor
佐々木 昭
恭一 中熊
溝添 博樹
Original Assignee
日立マクセル株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日立マクセル株式会社 filed Critical 日立マクセル株式会社
Priority to PCT/JP2012/083190 priority Critical patent/WO2014097465A1/fr
Publication of WO2014097465A1 publication Critical patent/WO2014097465A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/45Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from two or more image sensors being of different type or operating in different modes, e.g. with a CMOS sensor for moving images in combination with a charge-coupled device [CCD] for still images
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • H04N23/611Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof

Definitions

  • the technical field relates to a video processing apparatus and a video processing method.
  • Patent Document 1 states that “in a conventional system, it is possible to realize a dialogue between users who actually view the display with their line of sight matched, such as a half mirror, a hologram screen, or a projector. It is necessary to use a special device and a simple and inexpensive system cannot be configured "(see Patent Document 1 [0011]). Images are taken from different angles, the foreground image area including the subject to be photographed and the background image are separated from each photographed image, and each pixel position is associated with the subject to be photographed between the separated foreground image regions. To generate relative position information that indicates the relative positional relationship of the shooting target with respect to each camera and associates them with each other.
  • the pixel position and its luminance component constituting the virtual viewpoint image to be newly generated are obtained according to the generated relative position information, and the pixel position and its luminance component are obtained. “Transmission of virtual viewpoint image to outside” (see Patent Document 1 [0015]) and the like.
  • the present application includes a plurality of means for solving the above-mentioned problems.
  • the present application includes a video input unit to which video information captured by a camera is input, and a video information input to the video input unit.
  • the video processing unit includes a face included in the video information according to a predetermined condition It is characterized in that it is determined whether or not to correct the video.
  • correction processing can be performed without reducing the number of video frames processed per second even when there are a large number of subjects in the shooting range.
  • a video conference terminal will be described as an example of a video processing device.
  • FIG. 1 is a system configuration diagram showing an embodiment of a video conference communication system.
  • 1 and 2 are video processing devices
  • 3 is a network
  • 4 is a camera
  • 5 is a monitor
  • 6 is a microphone
  • 7 is a speaker
  • 8 is a remote control.
  • a video processing apparatus 1 is connected to a camera 4, a monitor 5, a microphone 6, and a speaker 7, and can input and output video and audio.
  • the video processing device 2 has the same device and configuration as the video processing device 1.
  • the video processing apparatus 1 can hold a video conference with the video processing apparatus 2 via the network 3.
  • the video processing apparatus can be operated with the remote controller 8.
  • the camera 4 is connected to the video processing device, and outputs the captured video signal to the video processing device.
  • the monitor 5 is connected to the video processing device, and receives and displays the video signal output from the video processing device.
  • the microphone 6 is connected to the video processing device and outputs the collected sound as an audio signal to the video processing device.
  • Speaker 7 is connected to a video processing device, inputs an audio signal output from the video processing device, and outputs audio.
  • the remote controller 8 transmits a remote control signal to the video processing device and transmits an operation instruction from the user to the video processing device.
  • FIG. 2 is a block diagram showing a specific example of the internal configuration of the video processing apparatus in the video conference communication system shown in FIG.
  • 101 is a control unit
  • 102 is a memory
  • 103-1 is a video encoder
  • 103-2 is an audio encoder
  • 104-1 is a video decoder
  • 104-2 is an audio decoder
  • 118 is a multiplexing unit
  • 119 is separated.
  • 105 a stream processing unit, 106 a storage unit, 107-1 a video processing unit for video output to the monitor 5, 107-2 a video processing unit for video input from the camera 4, and 108-1 Is a sound processing unit for outputting sound to the speaker 7, 108-2 is a sound processing unit for inputting sound from the microphone 6, 109 is a remote control processing unit, 110 is a network connection unit, 112 is a video input terminal, 113 Is a video output terminal, 114 is an audio input terminal, 115 is an audio output terminal, 116 is a remote control input terminal, and 117 is a network connection terminal.
  • control unit 101 develops a program stored in the storage unit 106 in the memory 102 and executes the developed program to realize functions corresponding to various programs. Further, the program is controlled in accordance with operation information input from the remote control processing unit 109.
  • the encoder 103-1 receives the video signal from the video processing unit 107-2, the encoder 103-2 receives the audio signal from the audio processing unit 108-2, and compresses and encodes the input signal information.
  • the audio data is output to a multiplexing unit 118 or a stream processing unit 105 described later.
  • the decoder 104-1 and the decoder 104-2 receive the compression-coded video data and audio data output from the separation unit 119 or the stream processing unit 105, which will be described later, respectively. Each expands and expands to an audio signal.
  • the multiplexing unit 118 multiplexes the compression-encoded video data and the compression-encoded audio data, which are defined as Elementary Stream (ES) in the MPEG-2 system, input from the encoder 103, and transmits Transport Stream ( The packetized video / audio data called TS) is output.
  • ES Elementary Stream
  • TS Transport Stream
  • the separation unit 119 separates the video / audio data output from the stream processing unit 105 into video data and audio data.
  • the stream processing unit 105 generates a network packet that is a packet of a network protocol to be transmitted to another video processing apparatus 2 from the input video and audio data, and outputs a video and audio stream that is a continuous network packet. Also, the video / audio stream received from another video processing apparatus is changed to video / audio data of TS which is a format for processing by the separation unit 119.
  • the video / audio stream is obtained by adding header data such as time information generated by the stream processing unit 105 and video / audio format information to the video / audio data.
  • the storage unit 106 is for storing a program to be executed by the control unit 101.
  • the video processing units 107-1 and 107-2 control the video input terminal 112 and the video output terminal 113, respectively, and output the video signal input from the video input terminal 112 to the encoder 103-1, or the decoder 104-1.
  • the video signal input from is output to the video output terminal 113.
  • the video processing unit 107-2 can simultaneously process a plurality of video signals from the video output terminal 112.
  • the audio processing units 108-1 and 108-2 control the audio input terminal 114 and the audio output terminal 115, respectively, and output the audio signal input from the audio input terminal 114 to the encoder 103-2 or the decoder 104-2.
  • the audio signal input from is output to the audio output terminal 115.
  • the remote control processing unit 109 is for outputting a remote control signal input from the remote control input terminal 116 to the control unit 101 as operation information.
  • the network connection unit 110 transmits / receives a video / audio stream and connection information necessary for performing a video conference with the network connection terminal 117 and other video conference communication apparatuses connected via the network 3.
  • the video input terminal 112 is connected to the camera 4 and outputs the video signal input from the camera 4 to the video processing unit 107-2.
  • the video input / output terminal 112 can also connect a plurality of cameras and simultaneously output a plurality of video signals to the video processing unit 107-2.
  • the video output terminal 113 is connected to the monitor 5 and outputs the video signal input from the video processing unit 107-1 to the monitor 5.
  • the audio input terminal 114 is connected to the microphone 6 and outputs an audio signal input from the microphone 6 to the audio processing unit 108-2.
  • the audio output terminal 115 is connected to the speaker 7 and outputs the audio signal input from the audio processing unit 108-1 to the speaker 7.
  • FIG. 3 is a diagram showing a specific example of a program read from the storage unit 106 of the video processing apparatus in FIG. 2 and developed in the memory 102, where 301 is a face detection unit, 302 is a virtual viewpoint calculation unit, and 303. Is an angle calculation unit, 304 is a face area calculation unit, 305 is a correction processing amount calculation unit, 306 is a correction necessity calculation unit, 307 is a correction processing unit, and 309 is a processing load management unit.
  • the face detection unit 301 controls each unit such as the video processing unit 107-2, detects a human face and facial organs from the video signal captured by the camera 4, and extracts the coordinates of facial feature points. Or a program for specifying the face of a person who is speaking.
  • the virtual viewpoint calculation unit 302 is a program for correcting the video signal of the camera 4 and calculating the coordinates of the virtual viewpoint for generating a video shot from a viewpoint different from the viewpoint of the camera 4. is there.
  • the angle calculation unit 303 calculates a line segment connecting the face and the camera 4, the face, and the virtual viewpoint from the coordinates of the person's face obtained by the face detection unit 301 and the coordinates of the virtual viewpoint obtained by the virtual viewpoint calculation unit 302. This is a program for calculating an angle formed by connecting line segments.
  • the face area calculation unit 304 is a program for calculating the area ratio of the face in the captured video from the coordinates of the feature points of the person's face obtained by the face detection unit 301.
  • the correction processing amount calculation unit 305 is a program for estimating and calculating a calculation processing amount necessary for correcting the face of the person obtained by the face detection unit 301 into a face image captured from a virtual viewpoint. is there.
  • the correction necessity degree calculation unit 306 calculates the necessity degree of face correction in order to obtain the effect of matching the line of sight from the information obtained by the face detection unit 301, the face angle calculation unit 303, and the face area calculation unit 304. It is a program.
  • the correction processing unit 307 corrects the face of a person in the video signal captured by the camera 4 from information obtained by the face detection unit 301, the virtual viewpoint calculation unit 302, the angle calculation unit 303, and the like, and captures the image from the virtual viewpoint. It is a program to make such a picture.
  • the processing load management unit 309 is a program for managing the processing load status of the video processing device 1 in real time and calculating the processing amount when the correction processing unit 307 performs processing.
  • FIG. 4 is a diagram of a determination list for determining whether or not the line-of-sight shift is within an allowable range when the video processing device 1 performs the line-of-sight process.
  • the determination list is generated on the memory 102 by the control unit 101.
  • 700 is a determination list
  • 701 is an ID for identifying the face detected by the face detection unit 301
  • 702 is a shift angle that expresses the shift of the line of sight as an angle
  • 703 is a correction that quantifies the necessity of correcting the detected face.
  • Necessity 704 is an estimated processing amount that is an estimated value of the amount of calculation processing required when executing the face correction processing.
  • the ID 701 is a number assigned when the face detection unit 301 detects a face.
  • the deviation angle 702 is a value calculated by the angle calculation unit 303.
  • the correction necessity 703 is a value calculated by the correction necessity calculation unit 306.
  • the estimated processing amount 704 is a value calculated by the correction processing amount calculation unit 305.
  • FIG. 5 is a flowchart showing a specific example of the main process of the series of line-of-sight processing. This process is executed by the control unit 101 every time the video processing apparatus 1 inputs a video signal. The subsequent programs handle the input video signal in units of video frames, which are one still image.
  • the face detection unit 301 detects a face included in the video frame (S901).
  • the face detection unit 301 assigns a numerical value as an ID to the detected face and writes the ID in the determination list 700.
  • S902 and S905 indicate processing loop ends.
  • one face that has not yet been processed is selected from the detected faces (S903).
  • S904 a subroutine for determining the necessity of correcting the face is executed (S904).
  • the deviation angle 702, the correction necessity 703, and the estimated processing amount 704 of the determination list 700 are calculated by a subroutine process.
  • the estimated processing amount 704 of the determination list 700 is added together (S906). It is determined whether the combined estimated processing amount does not exceed the limit processing amount that can be processed in real time by the current control unit 101 (S907).
  • the limit processing amount is a value calculated by the processing load management unit 309 every time the flowchart of FIG. 5 is executed. This value is, for example, a processing amount necessary for performing the line-of-sight processing at a speed of 29.97 video frames per second. When processing exceeding this value is performed, the number of video frames that can be processed per second is reduced, resulting in dropped frames. In a state where the processing load of the control unit 101 is large, the limit processing amount becomes much smaller.
  • the number of video frames to be processed per second which is a reference for calculating the limit processing amount, is set to other values such as 30 and 60 according to the number of video frames to be displayed per second of the video processing device 1. can do.
  • the processing amount obtained by subtracting a predetermined processing amount from the maximum processing amount of the control unit 101 may be used as the limit processing amount, or the limit processing amount may be set using another criterion.
  • the face to be corrected is selected using the determination list 700 (S908). If not exceeded (N in S907), correction processing is performed on all faces corresponding to IDs 701 whose correction necessity level is other than 0 in the determination list 700 (S909).
  • the estimated processing amounts are added together in descending order of necessity of correction in the determination list 700, and faces up to a point where the limit processing amount calculated by the processing load management unit 309 is not selected are selected.
  • the determination list 700 of FIG. 4 for example, when the limit processing amount is 25, faces whose IDs are 1 and 2 are selected so that the total value of the estimated processing amounts 704 does not exceed 25.
  • the selected face is corrected (S909).
  • the face correction process is performed by processing the face image so that the line of sight matches.
  • the cameras 4 are respectively installed at the upper and lower portions of the monitor 5. A method of correcting a face taken from an arbitrary position between two cameras using a face image taken from two different angles will be described.
  • step S901 face detection is performed for each of the video frames input from the two cameras, and feature points of facial organs such as eyes, nose, and mouth are extracted. Using the extracted feature points, the faces in the two video frames are matched. Two matched faces are extracted from the video frame, and the two faces are synthesized by morphing processing. At this time, the morphing ratio is adjusted so that the line of sight meets the angle.
  • the morphing ratio may not be an angle at which the line of sight meets but a maximum angle within an allowable range of the line of sight deviation.
  • the sense of incongruity due to the correction may be greater than the sense of incongruity due to lack of line of sight. If it is possible to reduce the amount of deformation of the face by setting the maximum angle within the allowable range of line of sight rather than the angle at which the line of sight meets, use the maximum angle within the allowable range of line of sight.
  • the ratio that is, the correction amount of the face may be determined.
  • a TOF (Time of Flight) camera is used to generate a 3D model of the subject's face and rotate the 3D model so that the face has an angle that matches the line of sight. There is a way to correct it.
  • TOF Time of Flight
  • the video frame on which the control unit 101 has performed line-of-sight processing is output to the video encoder 103-1, and is transmitted to the video processing device 2 as a video / audio stream by the multiplexing unit 118 and the stream processing unit 105.
  • FIG. 6 is a diagram showing a specific example of the positional relationship between the camera, the monitor, and the video conference participants.
  • 4-1 and 4-2 are cameras installed above and below the monitor 5
  • 401, 403, and 406 are participants of the video conference
  • 407 is a conference table
  • 501 is a virtual viewpoint where the line of sight is eliminated
  • 502 is a camera 4-1
  • 503 is a line segment connecting the center of the lens of the camera 4-1 and the eyes of the participant 401
  • 504 is the center of the lens of the camera 4-2 and the participant.
  • a line segment connecting the eyes 401, 505 is an angle formed by the line segment 502 and the line segment 503, 506 is formed by a line segment connecting the center of the lens of the camera 4-1 and the center of the lens of the camera 4-2, and the line segment 504.
  • An angle, 507 is a line segment connecting the virtual viewpoint 501 and the eyes of the participant 401, and 508 is an angle formed by the line segment 505 and the line segment 507, which is an angle indicating a line-of-sight shift.
  • Whether the line-of-sight shift is within the allowable range can be determined by calculating the angle 508. There are several methods for calculating this angle, and any method can be used to obtain the effect of this embodiment. Therefore, all the methods will not be described, but here two cameras are used as an example. A method to be used will be described. Subsequent processing for obtaining the angle 508 is performed by the angle calculation unit 303.
  • the coordinate system of FIG. 6 is a two-dimensional plane perpendicular to the ground and the screen of the monitor 5, and the origin is the center on the screen of the monitor 5.
  • the length of the line segment 502 can be obtained if the distance between the virtual viewpoint 501 and the center of the camera 4-1 lens is known.
  • the position of the virtual viewpoint 501 can be rephrased as the position that the participant should see in order to match the line of sight.
  • the position to be viewed may be, for example, a position where the eyes of the participant on the other side of the video conference are displayed. In a situation where one of the participants on the other side is speaking, the position where the participant's eyes are to be viewed may be set.
  • the position to be viewed may not be one place.
  • the position to be viewed here may be simplified and may be the center on the screen of the monitor 5, but the display position of the participant's eyes is calculated by the virtual viewpoint calculation unit 302 and the position to be viewed can be set by this. good.
  • a value measured in advance may be used.
  • the value is stored in the storage unit 106.
  • the position of the camera 4-1 is variable, the value may be changed according to the position.
  • the length of the line segment 502 is determined from the distance between the virtual viewpoint 501 and the center of the lens of the camera 4-1.
  • the length of the line segment 503 can be obtained from the distance between the center of the lens of the camera 4-1 and the center of the lens of the camera 4-2, and the angle 505 and the angle 506 in the manner of triangulation. If the distance between the lens centers of the camera 4-1 and the camera 4-2 does not change during use of the video processing device, a value measured in advance may be stored in the storage unit of the video processing device. When the positions of the cameras 4-1 and 4-2 are variable, the values may be changed according to the positions.
  • the angle 505 can be obtained from the depression angle and angle of view of the camera 4-1, and the position of the participant's eyes on the video frame.
  • the coordinate system of the eye position is a two-dimensional plane on the video frame.
  • the depression angle of the camera is 0 when the camera orientation is horizontal and ⁇ 90 degrees when the camera orientation is directly below.
  • the values measured in advance may be stored in the storage unit of the video processing device 1.
  • the position of the participant's eyes on the video frame can be obtained by the face detection unit 301.
  • the value of the angle 505 is as follows. If the participant's eyes are at the center of the video frame, the angle 505 is 60 degrees. If the participant's eyes are at the top of the video frame, the angle 505 is 83 degrees.
  • the angle 506 can be obtained from the elevation angle and angle of view of the camera 4-2 and the position of the participant's face, as with the angle 505, the description is omitted. Thus, the length of the line segment 503 is obtained.
  • the angle 508 is obtained from the triangle formed by the line segment 502, the line segment 503, and the angle 505 obtained so far.
  • Another method for obtaining the angle 508 is to use a TOF camera for distance measurement in addition to the camera for shooting.
  • the installation position and the depression angle of the TOF camera are set to be as identical as possible with the camera 4-1.
  • the method for obtaining the line segment 502 and the method for obtaining the angle 505 are the same as those already described.
  • the line segment 503 is obtained by measuring the distance to the participant 401 with a TOF camera. Since the distance measurement method using the TOF camera is a well-known technique, the description thereof is omitted.
  • the angle 508 is obtained from the triangle formed by the line segment 502, the line segment 503, and the angle 505 obtained so far.
  • FIG. 7 is a flowchart of the subroutine processing executed in S904 of FIG. In the subroutine, processing for calculating the degree of necessity for face correction is executed.
  • a face shift angle is first calculated (S1001).
  • the deviation angle is determined by the angle detection unit 303 calculating the angle 508 of FIG.
  • the designated angle is stored in the storage unit 106 and is information set in advance in the video processing apparatus 1 for the determination process in S1002.
  • the value of the designated angle is, for example, the maximum angle in the allowable range of gaze deviation already described.
  • the designated angle is 9 degrees.
  • a correction necessity degree is calculated by quantifying whether or not to correct (S1003).
  • the degree of necessity for correction is a value calculated by the degree of necessity correction calculator 306, and the calculated value is written in the degree of necessity for correction 703 of the determination list 700. It shows that it should correct
  • the calculation method of the degree of necessity of correction is 0 when the deviation angle is less than the threshold value, and the deviation angle value when the deviation angle is greater than or equal to the threshold value.
  • an estimated processing amount which is a value obtained by estimating the processing amount when performing face correction processing, is calculated (S1004).
  • the estimated processing amount is a value calculated by the correction processing amount calculation unit 305, and the calculated value is written in the estimated processing amount 704 of the determination list 700.
  • the estimated processing amount is calculated by multiplying the face area calculated by the face area calculation unit 304 by a constant.
  • FIG. 8 is a diagram illustrating a specific example of a video frame shot by the camera 4 connected to the video processing device.
  • 400 is a video frame
  • 401 to 406 are participants of a video conference
  • 407 is a conference table. Of the participants, only the participant 402 is speaking.
  • the camera 4 is installed at a position to capture the entire meeting scene, and the larger the participant in the front of the video, the smaller the participant in the back.
  • face correction that does not require correction is performed without selecting correction, and selecting and correcting faces in descending order of necessity of correction, thereby reducing processing load and eliminating frame dropping. Can be processed.
  • the necessity of correction is determined based on the shift angle, but in the second embodiment, the necessity of correction is determined based on the face area without using the shift angle.
  • the problem that the line of sight does not match is also related to the apparent size of the participant's face shown on the monitor 5. If the apparent size of the participant's face photographed by the camera 4 connected to the video processing device displayed on the monitor 5 connected to the video processing device at another base is small, whether or not the line of sight is present Since it becomes difficult to understand, the problem that the line of sight does not match does not occur.
  • FIG. 13 is a determination list in the second embodiment. 1300 is a judgment list, and 1301 is an area ratio. The other items are the same as those in FIG.
  • the area ratio 1301 is a ratio of the face area to the entire screen, and is calculated by the face area calculation unit 304.
  • the correction necessity degree 703 of the determination list 1300 is calculated based on the area ratio 1301 by the correction necessity degree calculation unit 306.
  • the necessity of correction is set to zero. If the area ratio is greater than or equal to the area ratio, the area ratio value is used. Note that, instead of the face area ratio, it may be determined whether or not the correction necessity is set to 0 according to the face area.
  • FIG. 14 is a flowchart of the subroutine processing in the second embodiment, which is executed from S904 in FIG. In the subroutine, the degree of necessity for face correction is calculated.
  • the face area ratio is first calculated (S1401). Next, it is determined whether or not the calculated area ratio is greater than or equal to the specified threshold area ratio (S1402).
  • the area ratio of the threshold is stored in the storage unit 106 and is information set in advance in the video processing device 1 for the determination process in S1402.
  • the threshold value is 1.0%.
  • the degree of necessity for correction is calculated by quantifying whether to correct (S1403).
  • the correction necessity is calculated using the area ratio. The calculated value is written in the determination list 1400.
  • the face is not corrected for the face with low necessity for face correction, and the face is selected and corrected in the order of the necessity of face correction. It is possible to perform line-of-sight processing that suppresses the load and does not cause frame dropping.
  • a method of determining whether or not face correction is necessary a method of correcting only the face of the speaker who is speaking detected by the face detection unit can be used. Thereby, only the speaker's face needs to be corrected, and the line-of-sight processing can be further reduced.
  • the degree of necessity of correction is calculated using the deviation angle.
  • the degree of correction is close or the same. Therefore, in the third embodiment, the area ratio and the like in addition to the deviation angle are used for calculating the degree of correction. As a result, even when there are a plurality of faces having the same shift angle, a difference occurs in the necessity of correction, and it is possible to select a face that needs more correction with high accuracy.
  • FIG. 9 is a determination list in the case where the correction necessity is calculated by combining a plurality of conditions.
  • Reference numeral 800 denotes a determination list
  • reference numeral 801 denotes a speaker flag indicating whether the speaker is a speaker
  • reference numeral 802 denotes an area ratio.
  • the other items are the same as those in FIG. In FIG. 4, the correction necessity degree is calculated only from the deviation angle, but here, the necessity degree of correction is calculated from the speaker flag 801, the area ratio 802, and the deviation angle 702.
  • FIG. 10 is obtained by adding some processes to the flowchart of FIG.
  • the processing to be added is S1101 and S1102.
  • the area ratio is calculated (S1101), and the calculated value is written in the area ratio 802 of the determination list 800. It is determined whether the calculated area ratio is greater than or equal to the specified area ratio (S1102). If the determination is true (Y in S1102), a shift angle detection process (S1001) is performed. If the determination in S1102 is false (N in S1102), the correction necessity level is set to 0 (S1005).
  • the other processes in FIG. 10 are the same as those described in FIG.
  • the degree of necessity for correction is calculated based on three factors: whether the face to be corrected is a speaker, the area ratio of the face to be corrected, and the angle of gaze shift of the correction target. Or you may calculate based on four or more elements. In addition, the degree of necessity for correction may be calculated using elements other than whether the face to be corrected is a speaker, the area ratio of the face to be corrected, or the angle of gaze deviation of the correction target.
  • the face that needs to be corrected can be selected with higher accuracy.
  • the processes of S906 to S908 are further performed to correct the face.
  • the processes of S906 to S908 are omitted and the degree of correction requirement is other than 0. You may comprise so that a correction process may be performed with respect to a face.
  • the video processing apparatus has been described as a video conference apparatus.
  • the video processing apparatus may have a system configuration in which a video conference is performed in combination with another video conference apparatus.
  • FIG. 11 shows a configuration in which a video processing device and another video conference device are combined.
  • Reference numerals 1111 and 1112 denote video processing apparatuses
  • 1201 and 1202 denote video conference apparatuses.
  • a video processing device 1111 is connected to the camera 4 and the video conference device 1201 and can input and output video.
  • the video processing apparatus 1112 has the same apparatus and configuration as the video processing apparatus 1111.
  • the video processing device 1111 inputs a video signal captured by the camera 4.
  • the video processing device 1111 performs line-of-sight processing on the input video signal and outputs the processed video signal to the video conference device 1201. Since the video processing device 1111 only needs to perform line-of-sight processing, a connection function to the network 3 related to a video conference and a function of encoding and decoding a video signal are not necessary.
  • the video conference device can be configured to connect the video processing device to the input terminal to which the camera is normally connected. Therefore, even the video conference device without a line-of-sight function is connected. Thus, the effect of line-of-sight alignment can be obtained.
  • FIG. 12 shows a configuration in which the video processing apparatus is connected to another video conference apparatus via the network 3.
  • 1200 is a video processing apparatus.
  • the video processing apparatus 1200 in this embodiment receives a video / audio stream from the video conference apparatus 1201 via the network 3, converts the video / audio stream into video data and audio data, and decodes the video data. Then, the video signal is converted into a video signal, and the line-of-sight processing is performed on the video signal. Next, the video signal subjected to the line-of-sight processing is encoded into video data, and the video data and audio data are multiplexed and packetized, converted into a video stream, and output to the video conference device 1202.
  • the video processing apparatus receives the video / audio stream from the video conference apparatus 1202 via the network 3, performs similar processing including line-of-sight processing, and outputs the video / audio stream to the video conference apparatus 1201. To do.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

A mesure que la quantité de traitement vidéo d'informations prises par une caméra augmente, la vidéo peut se déformer, empêchant ainsi un utilisateur de l'exploiter. La présente invention concerne un processeur vidéo comprenant : un module d'entrée vidéo dans lequel des informations prises par une caméra sont entrées ; un module de traitement vidéo pour corriger une vidéo d'un visage humain compris dans les informations vidéo entrées dans le module d'entrée vidéo ; et un module de sortie pour délivrer en sortie des informations vidéo traitées par le module de traitement vidéo. Le processeur vidéo est caractérisé en ce que le module de traitement vidéo détermine, sur la base d'une condition prescrite, si la vidéo du visage humain compris dans les informations vidéo doit être corrigé, ou non.
PCT/JP2012/083190 2012-12-21 2012-12-21 Processeur vidéo et procédé de traitement vidéo WO2014097465A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2012/083190 WO2014097465A1 (fr) 2012-12-21 2012-12-21 Processeur vidéo et procédé de traitement vidéo

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2012/083190 WO2014097465A1 (fr) 2012-12-21 2012-12-21 Processeur vidéo et procédé de traitement vidéo

Publications (1)

Publication Number Publication Date
WO2014097465A1 true WO2014097465A1 (fr) 2014-06-26

Family

ID=50977843

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2012/083190 WO2014097465A1 (fr) 2012-12-21 2012-12-21 Processeur vidéo et procédé de traitement vidéo

Country Status (1)

Country Link
WO (1) WO2014097465A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019201360A (ja) * 2018-05-17 2019-11-21 住友電気工業株式会社 画像処理装置、コンピュータプログラム、ビデオ通話システム、及び画像処理方法
WO2020054605A1 (fr) * 2018-09-12 2020-03-19 シャープ株式会社 Dispositif d'affichage d'image et dispositif de traitement d'image
WO2020089971A1 (fr) * 2018-10-29 2020-05-07 有限会社アドリブ Appareil de traitement d'image, procédé et programme informatique

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH066786A (ja) * 1992-06-22 1994-01-14 A T R Tsushin Syst Kenkyusho:Kk 視線一致補正装置
JPH11266443A (ja) * 1998-03-17 1999-09-28 Toshiba Corp 画像及び音声送受信装置
JP2005340974A (ja) * 2004-05-24 2005-12-08 Fuji Xerox Co Ltd 画像送信制御プログラム、及び画像表示プログラム
JP2007189624A (ja) * 2006-01-16 2007-07-26 Mitsubishi Electric Corp テレビ電話端末

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH066786A (ja) * 1992-06-22 1994-01-14 A T R Tsushin Syst Kenkyusho:Kk 視線一致補正装置
JPH11266443A (ja) * 1998-03-17 1999-09-28 Toshiba Corp 画像及び音声送受信装置
JP2005340974A (ja) * 2004-05-24 2005-12-08 Fuji Xerox Co Ltd 画像送信制御プログラム、及び画像表示プログラム
JP2007189624A (ja) * 2006-01-16 2007-07-26 Mitsubishi Electric Corp テレビ電話端末

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019201360A (ja) * 2018-05-17 2019-11-21 住友電気工業株式会社 画像処理装置、コンピュータプログラム、ビデオ通話システム、及び画像処理方法
WO2020054605A1 (fr) * 2018-09-12 2020-03-19 シャープ株式会社 Dispositif d'affichage d'image et dispositif de traitement d'image
WO2020089971A1 (fr) * 2018-10-29 2020-05-07 有限会社アドリブ Appareil de traitement d'image, procédé et programme informatique

Similar Documents

Publication Publication Date Title
US9774896B2 (en) Network synchronized camera settings
US20150358539A1 (en) Mobile Virtual Reality Camera, Method, And System
US9270941B1 (en) Smart video conferencing system
WO2017208820A1 (fr) Dispositif de traitement de son vidéo, procédé de traitement de son vidéo, et programme
US20210281802A1 (en) IMPROVED METHOD AND SYSTEM FOR VIDEO CONFERENCES WITH HMDs
US10681276B2 (en) Virtual reality video processing to compensate for movement of a camera during capture
US11076127B1 (en) System and method for automatically framing conversations in a meeting or a video conference
JP6871801B2 (ja) 画像処理装置、画像処理方法、情報処理装置、撮像装置および画像処理システム
JPWO2017094543A1 (ja) 情報処理装置、情報処理システム、情報処理装置の制御方法、及び、パラメーターの設定方法
WO2017141511A1 (fr) Appareil de traitement d'informations, système de traitement d'informations, procédé de traitement d'informations et programme
CN113973190A (zh) 视频虚拟背景图像处理方法、装置及计算机设备
JP2019103067A (ja) 情報処理装置、記憶装置、画像処理装置、画像処理システム、制御方法、及びプログラム
JP4461739B2 (ja) 撮像装置
JP2019036791A (ja) 画像処理装置、画像処理システム、制御方法、及び、プログラム
WO2017141584A1 (fr) Appareil de traitement d'informations, système de traitement d'informations, procédé de traitement d'informations, et programme
EP3465631B1 (fr) Capture et rendu d'informations impliquant un environnement virtuel
JP2019022151A (ja) 情報処理装置、画像処理システム、制御方法、及び、プログラム
WO2014097465A1 (fr) Processeur vidéo et procédé de traitement vidéo
CN114651448A (zh) 信息处理系统、信息处理方法和程序
WO2021200184A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations et programme
WO2015089944A1 (fr) Procédé et dispositif pour traiter l'image d'une vidéoconférence et terminal de conférence
US11100716B2 (en) Image generating apparatus and image generation method for augmented reality
KR101515404B1 (ko) 가상 카메라의 제어 장치 및 방법
KR20200097543A (ko) Ar 기반 퍼포먼스 영상 관람 시스템 및 이를 이용한 퍼포먼스 영상 제공 방법
JP2019096926A (ja) 画像処理装置、画像処理方法およびプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12890321

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12890321

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP