WO2010106743A1 - 画像音声通信装置およびその通信方法 - Google Patents
画像音声通信装置およびその通信方法 Download PDFInfo
- Publication number
- WO2010106743A1 WO2010106743A1 PCT/JP2010/001362 JP2010001362W WO2010106743A1 WO 2010106743 A1 WO2010106743 A1 WO 2010106743A1 JP 2010001362 W JP2010001362 W JP 2010001362W WO 2010106743 A1 WO2010106743 A1 WO 2010106743A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pts
- image
- audio
- unit
- timing
- Prior art date
Links
- 238000004891 communication Methods 0.000 title claims abstract description 110
- 238000000034 method Methods 0.000 title claims description 55
- 238000012937 correction Methods 0.000 claims abstract description 266
- 238000004364 calculation method Methods 0.000 claims description 34
- 230000005540 biological transmission Effects 0.000 claims description 17
- 230000008859 change Effects 0.000 claims description 10
- 230000007423 decrease Effects 0.000 claims description 7
- 206010052143 Ocular discomfort Diseases 0.000 abstract 1
- 230000008569 process Effects 0.000 description 39
- 238000012545 processing Methods 0.000 description 20
- 230000033001 locomotion Effects 0.000 description 10
- 238000001514 detection method Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 5
- 230000007704 transition Effects 0.000 description 5
- 230000006866 deterioration Effects 0.000 description 4
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000010485 coping Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/24—Systems for the transmission of television signals using pulse code modulation
- H04N7/52—Systems for transmission of a pulse code modulated video signal with one or more other pulse code modulated signals, e.g. an audio signal or a synchronizing signal
- H04N7/54—Systems for transmission of a pulse code modulated video signal with one or more other pulse code modulated signals, e.g. an audio signal or a synchronizing signal the signals being synchronous
- H04N7/56—Synchronising systems therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/242—Synchronization processes, e.g. processing of PCR [Program Clock References]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4305—Synchronising client clock from received content stream, e.g. locking decoder clock with encoder clock, extraction of the PCR packets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
- H04N21/43072—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/434—Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
- H04N21/4341—Demultiplexing of audio and video streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44004—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving video buffer management, e.g. video decoder buffer or video display buffer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/442—Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
- H04N21/44209—Monitoring of downstream path of the transmission network originating from a server, e.g. bandwidth variations of a wireless network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/442—Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
- H04N21/4425—Monitoring of client processing errors or hardware failure
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/478—Supplemental services, e.g. displaying phone caller identification, shopping application
- H04N21/4788—Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/142—Constructional details of the terminal equipment, e.g. arrangements of the camera and the display
Definitions
- the present invention relates to a video and audio communication apparatus and a communication method thereof, and more particularly to a video and audio communication apparatus for performing a video conference and a communication method thereof.
- ADSL Asymmetric Digital Subscriber Line
- optical fiber networks have rapidly spread, and low-cost and high-speed Internet connections have become available.
- ADSL Asymmetric Digital Subscriber Line
- AV Audio Video reproduction is performed according to the time stamp given to each packet on the TV conference apparatus serving as the transmitting terminal, but the system clocks of the transmitting terminal and the receiving terminal are synchronized. By not doing so, a shift occurs in the playback time.
- the AV playback timing is the TV conference apparatus on the transmitting terminal side Since it is earlier than this, the reproduction data is short (underflow).
- the system clock in the TV conference apparatus on the receiving terminal side is slower (delayed) than the TV conference apparatus on the transmitting terminal side
- the AV generation timing on the TV conference apparatus on the receiving terminal side is the TV on the transmitting terminal side Since it is slower than the conference apparatus, reproduction data is accumulated (overflow).
- a video conference apparatus for coping with this has been disclosed (see Patent Document 1 below).
- the accuracy of the input timestamp is calculated based on the time difference between the input timestamp and the self-running timestamp, and if the calculated accuracy error is out of the range, the image data is processed. Control to skip or repeat. Thereby, the image data to be reproduced is corrected.
- control of skip or repeat of image data such as frame skip or frame repeat is performed.
- the image may be temporarily stopped during the conversation, and the image may be temporarily interrupted. That is, in the video conference terminal disclosed in Patent Document 1, although it is possible to suppress the difference in reproduction time between the transmitting terminal and the receiving terminal, quality deterioration such as video interruption occurs.
- the present invention provides an image and voice communication apparatus capable of eliminating a system clock shift without causing a user to feel discomfort of images and sounds, and a communication method thereof.
- the purpose is
- a video and audio communication apparatus is a video and audio communication apparatus, and comprises: a transmitting / receiving unit for transmitting / receiving an image and audio through a network; A timing determination unit that determines a timing at which the correction amount of PTS (Presentation Time Stamp) in the received image or sound is to be updated based on the content of the image or sound received by the transmission / reception unit; By updating the correction amount of PTS in the received image or sound at the determined timing, a PTS correction unit that corrects the PTS, and the PTS corrected at the current time indicated by the video and audio communication device Output the received image and sound And an image sound output unit.
- PTS Presentation Time Stamp
- the image and voice communication apparatus further includes a user input unit to which user operation information is input by a user operation, and the timing determination unit is an image in which the user operation information input to the user input unit is received.
- the timing of the user operation accompanied by the screen layout change may be determined as the timing at which the correction amount should be updated.
- the timing determination unit may be configured to determine whether the received image is the received image when the correlation value between the image received by the transmitting and receiving unit and the temporally previous image of the received image is higher than a preset threshold.
- the timing output by the image / sound output unit may be determined as the timing at which the correction amount should be updated.
- the PTS correction of the image is performed at the timing when the temporal correlation of the displayed image is high and the movement in the screen is small, so that the user can perform PTS correction of the image, such as frame skipping or frame repeating. You can do it without being aware of it. As a result, it is possible to eliminate the shift of the system clock without causing the user to feel uncomfortable with the image and the sound.
- the timing determination unit outputs a timing at which the received image is output by the video and audio output unit as the correction amount. It may be determined as the timing to be updated.
- the PTS correction of the image is performed at a timing when the data amount of the image to be output is small and the movement in the screen is expected to be small, thereby, for example, the PTS of the image such as frame skip or frame repeat.
- the correction can be made unnoticeable to the user. As a result, it is possible to eliminate the shift of the system clock without causing the user to feel uncomfortable with the image and the sound.
- the timing determination unit may be configured to output a timing at which the received audio is output by the image audio output unit when the level of the audio received by the transmitting and receiving unit is smaller than a preset threshold. It may be determined as the timing to be updated.
- the video and audio communication apparatus further includes a voice input unit for collecting and inputting voice transmitted by the transmitting and receiving unit using a microphone, and the timing determination unit inputs the voice to the voice input unit.
- the timing at which the input audio is output by the image audio output unit may be determined as the timing at which the correction amount should be updated when the level of the acquired audio is larger than a preset threshold.
- PTS correction is performed, for example, at a timing when the level of the input voice is large so that the surrounding sound is large or the speaker such as the user is in the uttered state, for example PTS correction can be made unnoticeable to the user.
- PTS correction can be made unnoticeable to the user.
- the video / audio communication device further monitors a buffer for temporarily storing the image or voice received by the transmitting / receiving unit, and monitors the remaining capacity of the capacity of the buffer, and performs PTS correction based on the remaining capacity.
- a PTS correction amount calculation unit for calculating an amount, wherein the PTS correction unit adds the PTS correction amount calculated by the PTS correction amount calculation unit to the PTS in the image or audio at the timing determined by the timing determination unit. May be used to correct the PTS in the image or sound of the determined timing.
- the present invention can be realized not only as an apparatus but also as an integrated circuit including processing means included in such an apparatus, or as a method in which the processing means constituting the apparatus are steps. May be realized as a program that causes a computer to execute the program.
- the programs may be distributed via a recording medium such as a CD-ROM or a communication medium such as the Internet.
- an audio-video communication apparatus and a communication method thereof that can eliminate the shift of the system clock without causing the user to feel uncomfortable with the image and the audio.
- the timing at which the PTS correction amount that the user hardly notices should be updated is determined, and the PTS of the image or audio is corrected at that timing, which may occur as the PTS correction is performed.
- the system clock shift can be eliminated without causing the user to feel discomfort due to frame skipping or the like.
- FIG. 1 is a diagram showing an example of the configuration of a TV conference system provided with the video and audio communication apparatus of the present invention.
- FIG. 2 is a block diagram showing the configuration of the video and audio communication apparatus according to the present invention.
- FIG. 3 is a flowchart for explaining the transmission side process of the video and audio communication apparatus according to the present invention.
- FIG. 4 is a flowchart for explaining the reception process of the video and audio communication apparatus according to the present invention.
- FIG. 5 is a flowchart for explaining an example of PTS correction amount determination processing according to the present invention.
- FIG. 6 is a flowchart for explaining the image difference value calculation process according to the present invention.
- FIG. 7 is a flowchart for explaining the screen layout determination process according to the present invention.
- FIG. 1 is a diagram showing an example of the configuration of a TV conference system provided with the video and audio communication apparatus of the present invention.
- FIG. 2 is a block diagram showing the configuration of the video and audio communication apparatus according to the present invention.
- FIG. 8 is a flowchart for explaining the input speech level detection process according to the present invention.
- FIG. 9 is a flowchart for explaining the received speech level detection process according to the present invention.
- FIG. 10 is a flowchart for explaining PTS correction timing determination processing of an image of the audio and video communications apparatus according to the present invention.
- FIG. 11 is a flowchart for explaining a PTS correction timing determination process of audio of the audio and video communications apparatus according to the present invention.
- FIG. 12 is a block diagram showing the minimum configuration of the video and audio communication apparatus according to the present invention.
- FIG. 1 is a diagram showing an example of the configuration of a TV conference system provided with the video and audio communication apparatus of the present invention.
- the video and audio communication apparatus 100 bidirectionally transmits video and audio data to another video and audio communication apparatus 300 via the network 207.
- the image and sound communication apparatus 100 transmits the image and sound picked up by the camera microphone 101 to another image and sound communication apparatus 300 via the network 207 and receives the image and sound data from the other image and sound communication apparatus 300. Do. Also, the video and audio communication apparatus 100 performs PTS correction on the received video and audio data, thereby preventing an overflow and an underflow caused by a clock shift between the video and audio communication apparatus 300, that is, between the apparatuses. Audio data is output to the monitor / speaker 103.
- the other video and audio communication apparatus 300 transmits the video and audio captured by the camera and microphone 301 to the other video and audio communication apparatus 300 via the network 207.
- the other audiovisual communication device 300 receives the audiovisual data from the audiovisual communication device 100, and outputs the received audiovisual data to the monitor / speaker 303.
- FIG. 2 is a block diagram showing the configuration of the video and audio communication apparatus according to the present invention.
- the video and audio communication apparatus 100 includes a video and audio input unit 104, an encoding unit 105, a transmitting unit 106, a receiving unit 108, a PTS correction amount calculating unit 109, and a decoding unit 110. And a PTS correction timing determination unit 111, a PTS correction unit 112, an image / sound output unit 113, a reception buffer 114, and an output buffer 115.
- a camera microphone 101 for capturing an image is connected to the outside, and a user input unit 102, which is a user interface for inputting a GUI operation which is an operation from a user, is connected.
- the image and sound communication apparatus 100 is connected to a monitor / speaker 103 for reproducing image and sound data.
- the image / sound input unit 104 is an interface for inputting uncompressed image and sound data from the camera / microphone 101 that picks up an image.
- the image / sound input unit 104 receives an image and sound data (hereinafter, referred to as own device image / sound data) input in frame units by the camera / microphone 101, the encoding unit 105, the image / sound output unit 113, and PTS correction timing. And output to the determination unit 111.
- Encoding section 105 encodes (compress-codes) the image-audio data of its own apparatus input from image-audio input section 104, and outputs the encoded image-audio data of itself to transmitting section 106.
- the encoding unit 105 may use, for example, H.264.
- H.264 audio and video data are compressed using a compression coding method such as H.264 and MPEG-4 AAC.
- the transmitting unit 106 outputs the encoded image and sound data input from the encoding unit 105 to the network 207.
- the transmitting unit 106 packetizes and outputs its own device audio and video data into, for example, an RTP (Realtime Transport Protocol).
- RTP Realtime Transport Protocol
- the transmitting unit 106 describes PTS (Presentation Time Stamp), which is an output time, in the time stamp area of the RTP header of the RTP packet, and performs the RTP packetization on its own apparatus image / sound data as the network 207 Output to another video / audio communication device 300.
- PTS Presentation Time Stamp
- the receiving unit 108 includes a receiving buffer 114, and receives and receives image and audio data (hereinafter, referred to as an opposing apparatus audio and video data) transmitted from another video and audio communication apparatus 300 via the network 207 and receives the other apparatus
- the audio / video data is output to the decoding unit 110, and the amount of received data is output to the PTS correction amount calculation unit 109.
- the reception unit 108 temporarily stores the RTP packet of the other apparatus image / sound data received in the reception buffer 114.
- the receiving unit 108 outputs the reception time and the reception data amount extracted from the RTP packet of the partner apparatus image and sound data stored in the reception buffer 114 to the PTS correction amount calculation unit 109 and stores the partner apparatus image and sound in the reception buffer 114.
- the RTP packet of data is output to the decoding unit 110.
- the reception buffer 114 temporarily stores the RTP packet of the partner apparatus image / sound data received by the reception unit 108.
- the RTP packet of the partner apparatus image / sound data stored in the reception buffer 114 is output to the decoding unit 110 via the reception unit 108.
- the PTS correction amount calculation unit 109 observes the received data amount, and calculates the PTS correction amount based on the observed received data amount. Specifically, the PTS correction amount calculation unit 109 calculates the PTS correction amount using the received data amount input from the reception unit 108 and the remaining capacity of the capacity of the reception buffer 114 input from the decoding unit 110, The calculated PTS correction amount is output to the PTS correction timing determination unit 111.
- the decoding unit 110 decodes the other device image audio data input from the receiving unit 108, and outputs the decoded other device image audio data to the PTS correction unit 112 and the PTS correction timing determination unit 111. Also, the decoding unit 110 confirms the remaining capacity of the capacity of the reception buffer 114 and, while outputting it to the PTS correction amount calculation unit 109, confirms whether or not the output buffer 115 has a vacant state in which there is a vacancy. . Then, the decoding unit 110 performs the decoding process when it is in the state capable of decoding. That is, the decoding unit 110 receives the RTP packet of the partner apparatus image / sound data from the reception buffer 114 and performs the decoding process when the output buffer 115 is in the decoding enabled state where there is a vacancy in the output buffer 115.
- the decoding unit 110 converts the RTP packet into the encoded image data format and the encoded audio data format, and outputs the PTS as the output time.
- the decoding unit 110 may use the H.264 standard for encoded image data. In H.264, decoding is performed on the encoded audio data, and MPEG-4 AAC decoding is performed on the encoded audio data, and the image data and audio data after decoding (hereinafter referred to as the opposite device decoded image audio data) are PTS. It is output to the correction timing determination unit 111.
- the decoding unit 110 associates the PTS with the video / audio data after the other party apparatus decoding, and stores it in the output buffer 115.
- the PTS correction timing determination unit 111 is based on the sound transmitted by the transmission unit 106 or the content of the image or sound received by the reception unit 108, that is, the own device image sound data input from the image sound input unit 104, Using at least one of the user operation information input from the user input unit 102 and the decoded video / audio data input from the decoding unit 110, the PTS correction timing that is the timing to update the PTS correction amount is determined Together with the PTS correction amount calculated by the PTS correction amount calculation unit 109, a PTS correction request is output to the PTS correction unit 112.
- the PTS correction timing determination unit 111 determines that the timing at which the user does not notice easily is the timing at which the correction amount of PTS should be updated as the timing for correcting the clock shift, and requests the PTS correction unit 112 for PTS correction timing. Notify by.
- the PTS correction unit 112 corrects the PTS associated with the video / audio data after decryption of the partner apparatus. Specifically, PTS correction unit 112 outputs PTS correction timing determination unit 111 to the PTS information associated with the partner apparatus's decoded audio / video data stored in output buffer 115 from decoding unit 110. The PTS information is corrected using the PTS correction amount, and the corrected PTS information is output to the image / sound output unit 113.
- the audio / video output unit 113 outputs the audio / video data after decryption of the other device stored in the output buffer 115 to the monitor / speaker 103 in accordance with the corrected PTS information input from the PTS correction unit 112. That is, the video / audio output unit 113 compares the corrected PTS value input from the PTS correction unit 112 with the system clock (current time) of the video / audio communication device 100 and determines the partner device of the PTS close to the system clock.
- the decoded image and audio data are output from the output buffer 115 to the monitor / speaker 103.
- the video and audio communication apparatus 100 is configured.
- the operation to be described below is stored as a control program in a storage device such as a ROM or a flash memory (not shown) of the video and audio communication device 100 and is controlled by the CPU.
- FIG. 3 is a flowchart for explaining the transmission process of the video and audio communication apparatus according to the present invention.
- the video and audio communication apparatus 100 performs video and audio input processing (S201). Specifically, the image / sound input unit 104 receives uncompressed image / sound data from the camera / microphone 101 externally connected in units of frames, and the input device / image / sound data is encoded by the coding unit 105 It is output to the PTS correction timing determination unit 111 and the video / audio output unit 113.
- the video and audio communication apparatus 100 performs video and audio coding processing (S202).
- encoding unit 105 applies, for example, to H.264 audio / video data that has not been compressed and input from video / audio input unit 104.
- H.264 and MPEG-4 AAC or the like are used for compression encoding, and the encoded audio and video data are output to the transmitting unit 106.
- the video and audio communication apparatus 100 performs transmission processing (S203). Specifically, the transmitting unit 106 packetizes, for example, RTP (Realtime Transport Protocol) of the post-encoding image-audio data input from the encoding unit 105. That is, the transmitting unit 106 describes PTS (Presentation Time Stamp), which is an output time, in the time stamp area of the RTP header, and performs RTP packetization on its own apparatus audiovisual data through another network 207 via the network 207. It is output to the voice communication device 300.
- RTP Realtime Transport Protocol
- PTS Presentation Time Stamp
- the video and audio communication apparatus 100 outputs its own video and audio data to another video and audio communication apparatus 300 via the network 207.
- FIG. 4 is a flow chart for explaining the reception side process of the video and audio communication apparatus according to the present invention.
- the audio and video communications apparatus 100 performs a packet reception process (S301). Specifically, the receiving unit 108 receives the RTP packet of the partner apparatus image / voice data transmitted from the other video / audio communication apparatus 300 via the network 207, and receives the RTP packet of the received partner apparatus image / voice data. Once stored in 114. Then, the PTS correction amount calculation unit 109 outputs the reception time and the reception data amount extracted from the received other-party device image and sound data stored in the reception buffer 114 to the PTS correction amount calculation unit 109, and the decoding unit 110 is in a decodable state. In this case, it outputs the received RTP packet of the other-party video / audio data to the decoding unit 110.
- the video and audio communication apparatus 100 performs a packet decoding process (S302). Specifically, the decoding unit 110 confirms the remaining capacity of the capacity of the reception buffer 114, and outputs it to the PTS correction amount calculation unit 109, and at the same time, there is a vacant state in the output buffer 115. Confirm. Then, when it is in the decodable state, it decodes the RTP packet of the partner apparatus image audio data received from the reception buffer 114, calculates PTS which is the output time, and decodes the partner apparatus image audio data. Are output to the PTS correction timing determination unit 111. In addition, the decoding unit 110 associates the decoded image data and the decoded audio data with the PTS, and stores the PTS in the output buffer 115.
- the audio and video communications apparatus 100 performs PTS correction amount calculation processing (S303). Specifically, the PTS correction amount calculation unit 109 calculates the PTS correction amount using the received data amount input from the reception unit 108 and the remaining capacity of the reception buffer 114 input from the decoding unit 110, The calculated PTS correction amount is output to the PTS correction timing determination unit 111. The details of the PTS correction amount calculation process will not be described here because they will be described later.
- the audio and video communications apparatus 100 performs PTS correction timing determination processing (S304). Specifically, the PTS correction timing determination unit 111 receives the image-audio data of its own apparatus input from the image-audio input unit 104, the user operation information input from the user input unit 102, and the decoding input from the decoding unit 110. PTS correction timing is determined using at least one of the post-image audio data. The details of the PTS correction timing determination will not be described here because they will be described later.
- the PTS correction timing determination unit 111 determines the PTS correction timing in S304 (in the case of YES in S304)
- the PTS correction unit 112 sends a PTS correction request together with the PTS correction amount calculated by the PTS correction amount calculation unit 109.
- the audio and video communications apparatus 100 performs PTS offset change (S305) and performs PTS correction (S306).
- PTS correction unit 112 outputs PTS correction timing determination unit 111 to the PTS information associated with the partner apparatus's decoded audio / video data stored in output buffer 115 from decoding unit 110.
- the PTS information is corrected using the corrected PTS correction amount.
- the PTS information after correction is output to the image / sound output unit 113.
- the PTS correction unit 112 corrects the PTS based on (Expression 1) to (Expression 4) below.
- Offset_V and Offset_A indicate PTS offset values of image data and audio data, respectively
- Offset_V_prev and Offset_A_prev indicate previous values of PTS offset values of image data and audio data, respectively.
- Correct_V and Correct_A indicate PTS correction values of image data and audio data, respectively.
- PTS_V ′ (t) and PTS_A ′ (t) indicate PTS values after PTS correction of the image and audio of frame t, respectively
- PTS_V (t) and PTS_A ( t) indicates the PTS values of the image and audio of frame t, respectively.
- the PTS correction unit 112 uses the PTS values (PTS_V (t) and PTS_A (t)) of the image and audio data of the frame t stored in the output buffer 115 based on (Expression 1) to (Expression 4).
- the PTS value is always corrected by adding the offset values (Offset_V and Offset_A).
- the PTS correction unit 112 uses the PTS correction amounts (Correct_V and Correct_A) output from the PTS correction timing determination unit 111 to update offset values (Offset_V and Offset_A) used for PTS correction.
- the PTS correction unit 112 can change the PTS discontinuously by updating the offset using the PTS correction amount at the timing determined by the PTS correction timing determination unit 111.
- the audio and video communications apparatus 100 performs audio and video output processing (S307). Specifically, the video / audio output unit 113 compares the corrected PTS value input from the PTS correction unit 112 with the system clock (current time) of the video / audio communication device 100, and the system clock (current time) The video / audio data after the other party's device decoding of the PTS close to is output from the output buffer 115 to the monitor / speaker 103.
- the PTS correction timing determination unit 111 does not output anything to the PTS correction unit 112.
- the PTS correction unit 112 does not change the PTS offset (S306). Then, the video / audio output unit 113 compares the PTS value input from the PTS correction unit 112 with the system clock (current time) of the video / audio communication device 100, and the PTS partner close to the system clock (current time). The audio / video data after device decoding is output from the output buffer 115 to the monitor / speaker 103.
- the video and audio communication apparatus 100 performs processing on the receiving side.
- FIG. 5 is a flowchart for explaining an example of the PTS correction amount calculation process of the video and audio communication apparatus according to the present invention.
- the video and audio communication apparatus 100 performs an average reception rate calculation process (S3031). Specifically, the PTS correction amount calculation unit 109 calculates an average reception rate (AverageBps) using the amount of received data input from the reception unit 108.
- the formula for calculating the average reception rate is shown in (Equation 5), the method for calculating the reception rate is not limited to the formula shown in (Equation 5).
- AverageBps indicates the average reception rate (bit / s)
- RecvBits indicates the amount of received data (bit).
- N indicates a preset statistical section N (seconds)
- SUM RecvBits indicates a total value of the amount of received data received by the receiving unit 108 in the statistical section N (seconds).
- the PTS correction amount calculation unit 109 calculates an average reception rate using the average value of the data amount received by the reception unit 108 in the statistical section N (seconds), as shown in (Expression 5).
- the audio and video communications apparatus 100 performs statistical processing of the remaining amount of the reception buffer 114 (S3032). Specifically, the PTS correction amount calculation unit 109 statistically processes the remaining capacity of the capacity of the reception buffer 114 input from the decoding unit 110, and determines whether the remaining capacity of the buffer tends to increase or decrease. Do.
- the evaluation value of the increase / decrease tendency a delay time which is one of the influences exerted by the increase / decrease of the remaining capacity of the buffer capacity will be described.
- a formula for calculating the current delay time (CurrDelay), which is an evaluation value of the increase / decrease tendency, is shown in (Expression 6).
- CurrDelay indicates the current delay time
- BufferLevel indicates the current remaining capacity (bit) of the reception buffer 114.
- AverageBps indicates an average reception rate (bit / s)
- INIT_DELAY indicates a preset initial delay time.
- the PTS correction amount calculation unit 109 calculates the time required to consume the buffer by dividing the remaining capacity of the buffer capacity by the average reception rate as shown in (Expression 6), and consumes the calculated buffer.
- the current delay time is obtained by subtracting the initial delay time from the time required for That is, by observing the tendency of the current delay time, the tendency of the remaining capacity of the reception buffer 114 to be exerted on the delay time is observed.
- the current delay time (CurrDelay) is calculated at a constant interval (equation 6) without performing statistical processing on the increase / decrease tendency of the current delay time. Since the average reception rate is equivalent to the average coding rate, it is used to calculate the time for consuming the buffer.
- the audio and video communications apparatus 100 performs PTS correction amount determination processing (S3033). Specifically, the PTS correction amount calculation unit 109 calculates the PTS correction amount using the average reception rate (AveregeBps) and the remaining capacity of the capacity of the reception buffer 114. A formula for calculating the PTS correction amount is shown in (Expression 7).
- CurrDelay indicates a current delay time
- Correct_A indicates a PTS correction amount of audio
- Correct_V indicates a PTS correction amount of an image.
- TH_H and TH_L indicate predetermined thresholds (however, TH_L ⁇ INIT_DELAY ⁇ TH_L), and SCALE indicates a constant for converting a second to a PTS unit of 90 kHZ.
- Equation 7 shows the following 1 to 3. 1. When the current delay time is positive and the absolute value is larger than the threshold (TH_H), the PTS correction amount is set to a negative value. 2. If the current delay time is negative and the absolute value is larger than the threshold (TH_L), the PTS correction amount is set to a positive value. 3. In cases other than 1 and 2 above, the PTS correction amount is 0.
- the PTS correction amount calculation unit 109 determines the PTS correction amount by the threshold determination of the current delay time as shown in (Expression 7).
- the PTS correction amount calculation unit 109 calculates the PTS correction amount as the same value for the image and the sound according to (Expression 7), the method of calculating the PTS correction amount is not limited to (Expression 7). For example, by treating the current delay time and the average reception rate separately for the image and the sound, the PTS correction amount may be calculated separately for the image and the sound.
- the video and audio communication apparatus 100 performs PTS correction amount calculation processing.
- FIG. 6 is a flowchart for explaining the image difference value calculation process according to the present invention.
- the PTS correction timing determination unit 111 performs difference processing between successive images using the decoded image data input from the decoding unit 110, and calculates a value absolute sum (SAD) as an image difference value (S401). ).
- the PTS correction timing determination unit 111 determines whether the calculated image difference value is smaller than a predetermined threshold (S402).
- the PTS correction timing determination unit 111 determines that it is PTS correction timing (S403). As described above, the PTS correction timing determination unit 111 determines that the timing at which the movement of the display image is small and the user hardly notices skip or repeat control such as frame skip is the PTS correction timing at which the PTS correction amount should be updated. .
- the PTS correction timing determination unit 111 determines that it is not the PTS correction timing (S404).
- the PTS correction timing determination unit 111 uses the decoded image data input from the decoding unit 110 to determine PTS correction timing.
- the difference value absolute sum (SAD) as the image difference value described above is calculated, for example, by (Expression 8).
- SAD (i) indicates the absolute difference sum of the ith image
- Y (x, y, i) indicates the luminance value of the pixel at the x and y coordinates of the ith image
- W indicates the number of horizontal pixels of the image
- H indicates the number of vertical pixels of the image.
- the image difference value is a total value of the difference absolute values between successive images, and it can be said that the smaller the image difference value is, the smaller the movement in time. Therefore, when the image difference value calculated in this way is smaller than a predetermined threshold, it is determined that it is the timing to update the PTS correction amount that the user is not likely to notice as the timing to correct the clock shift.
- the calculation method of an image difference value is not limited to (Formula 8), What is necessary is just a method which can detect the motion in an image.
- the data amount of the received image may be monitored, and when the data amount of the received image is small, it may be determined that the image has little movement. This is because, in image coding, a difference image between frames is often predicted and coding processing is performed, and in an image with a small amount of movement, the difference value is small and the amount of data as the coding result is also small. .
- FIG. 7 is a flowchart for explaining the screen layout determination process according to the present invention.
- the PTS correction timing determination unit 111 analyzes user operation information, for example, a user request input from the user input unit 102 (S411), and determines whether there is a screen layout change (S412).
- the PTS correction timing determination unit 111 analyzes user operation information and determines that there is a screen layout change when the screen transition shown in the following 1 to 3 is performed (YES in S412), it is determined that it is PTS correction timing It determines (S413).
- the PTS correction timing determination unit 111 changes the screen layout significantly to allow the user to not easily notice timing for PTS correction of an image such as a frame skip and the PTS correction amount. It is determined that it is the PTS correction timing to be updated.
- the PTS correction timing determination unit 111 determines that the screen layout is not changed (in the case of NO in S402). It is determined not (S414).
- the PTS correction timing determination unit 111 uses the user operation information input from the user input unit 102 to determine PTS correction timing.
- the screen transition determined as the screen layout change is not limited to cases 1 to 3 where the above-mentioned screen display changes largely. For example, even if the GUI display is always displayed on the screen viewed by the user, it may be determined that the screen layout has been changed, for example, when the user performs a menu operation on the GUI.
- FIG. 8 is a flowchart for explaining the input speech level detection process according to the present invention.
- the PTS correction timing determination unit 111 detects an input audio level (AudioInLevel) using the input audio data input from the video and audio input unit 104 (S421).
- the input sound level to be detected is, for example, an average volume of a certain section.
- the PTS correction timing determination unit 111 determines whether the detected input audio level is larger than a predetermined threshold (S422).
- the PTS correction timing determination unit 111 determines that it is PTS correction timing (S423). Because, if the input voice level to be detected is large, the surrounding voice is large or the user (speaker) is speaking and it is difficult to notice the sound skipping of the received voice, so the PTS correction amount should be updated at PTS correction timing. It is because it can be determined that there is.
- the PTS correction timing determination unit 111 determines that it is not the PTS correction timing when the detected input audio level is smaller than a predetermined threshold (in the case of NO in S422) (S424).
- the PTS correction timing determination unit 111 determines the PTS correction timing using the input audio data of the own device input from the video and audio input unit 104.
- FIG. 9 is a flowchart for explaining the received speech level detection process according to the present invention.
- the PTS correction timing determination unit 111 detects a received audio level (AudioOutLevel) using the decoded audio data input from the decoding unit 110 (S431).
- the reception sound level to be detected is, for example, an average volume of a certain section.
- the PTS correction timing determination unit 111 determines whether the detected received audio level is smaller than a predetermined threshold (S432).
- the PTS correction timing determination unit 111 determines that it is a PTS correction timing when the detected input audio level is smaller than a predetermined threshold (in the case of YES in S432) (S433). This is because when the detected received audio level is smaller than a predetermined threshold, it is difficult to notice the sound skipping of the received audio, so it can be determined that it is the PTS correction timing at which the PTS correction amount should be updated.
- a predetermined threshold in the case of YES in S432
- the PTS correction timing determination unit 111 determines that the PTS correction timing is not reached (S424) when the detected input audio level is larger than a predetermined threshold (NO in S422).
- the PTS correction timing determination unit 111 determines the PTS correction timing using the decoded audio data input from the decoding unit 110.
- the PTS correction timing determination unit 111 may determine the PTS correction timing using at least one of FIGS. 6 to 9 described above. For example, the PTS correction timing may be determined for only the image, or the PTS correction timing may be determined for only the audio.
- FIG. 10 is a flowchart for explaining PTS correction timing determination processing of an image of the audio and video communications apparatus according to the present invention.
- the PTS correction timing determination unit 111 performs image difference value calculation processing (S400).
- the PTS correction timing determination unit 111 performs screen layout determination processing (S410).
- the image difference value calculation process of S400 performs the processes of S401 to S404 described above, and the screen layout determination process of S420 performs the process of S411 to S414 described above, and thus the description thereof is omitted.
- the PTS correction timing determination unit 111 confirms whether or not the PTS correction timing is determined in at least one process of S400 and S420 (S452).
- the PTS correction timing determination unit 111 determines that it is the PTS correction timing when it is determined that it is the PTS correction timing in at least one process of S400 and S420 (in the case of YES of S452) (S453).
- PTS correction timing at which the PTS correction amount should be updated is the timing when the movement of the screen is small or the screen layout changes significantly as the timing at which the user hardly notices skip or repeat control such as frame skip. It is determined that
- the PTS correction timing determination unit 111 determines that it is not the PTS correction timing when it is determined that it is not the PTS correction timing in any of the processes of S400 and S420 (case of NO of S452) (S454).
- the PTS correction timing determination unit 111 determines the PTS correction timing of the image.
- FIG. 11 is a flowchart for explaining a PTS correction timing determination process of audio of the audio and video communications apparatus according to the present invention.
- the PTS correction timing determination unit 111 performs an input sound level detection process (S420).
- the PTS correction timing determination unit 111 performs a received audio level detection process (S430).
- the input speech level detection process of S420 performs the processes of S421 to 424 described above
- the reception speech level detection process of S430 performs the process of S431 to 434 described above, and thus the description thereof is omitted.
- the PTS correction timing determination unit 111 confirms whether or not the PTS correction timing is determined in at least one process of S420 and S430 (S452).
- the PTS correction timing determination unit 111 determines that it is the PTS correction timing when it is determined that it is the PTS correction timing in at least one process of S430 and S420 (in the case of YES of S452) (S453).
- the timing at which the user hardly notices sound skipping it is determined that the timing when the input audio level is large or the received audio level is small is the PTS correction timing at which the PTS correction amount should be updated.
- the PTS correction timing determination unit 111 determines that it is not the PTS correction timing when it is determined that it is not the PTS correction timing in any of the processes of S400 and S420 (case of NO of S452) (S454).
- the PTS correction timing determination unit 111 determines the PTS correction timing of audio.
- the PTS correction timing determination unit 111 may simultaneously determine PTS correction timings for images and sounds as shown in FIGS. 10 and 11, and the PTS correction timings may be freely combined with S400, S410, S420 and S430. It may be determined.
- the PTS correction amount calculation unit 109 monitors the increase / decrease tendency of the capacity of the reception buffer 114, calculates the PTS correction amount in the direction to offset the system clock shift amount, and determines the PTS correction timing.
- the unit 111 determines the PTS correction timing as the timing at which the user hardly notices the correction of the image or the sound. Then, according to the PTS correction request determined by the PTS correction timing determination unit 111, the PTS correction unit 112 corrects the PTS of an image or audio using a PTS correction amount, and the PTS corrected by the image / audio output unit 113 Output image and sound according to.
- the output time correction (cancellation of the system clock deviation), which is necessary due to the system clock difference between the transmitting and receiving terminals such as the video and audio communication apparatus 100 and another video and audio communication apparatus 300, It can be performed without causing a sense of incongruity with the image and the voice.
- the present invention is to prevent deterioration of subjective quality due to skipping of frames and skipping of sound which reduces the feeling of facing, particularly in an audiovisual communication apparatus used as a highly realistic TV conference apparatus using a large screen. It is possible and useful.
- the video and audio communication apparatus 100 includes the video and audio input unit 104, the encoding unit 105, the transmitting unit 106, the receiving unit 108, the PTS correction amount calculating unit 109, and Although the encoding unit 110, the PTS correction timing determination unit 111, the PTS correction unit 112, the video / audio output unit 113, the reception buffer 114, and the output buffer 115 are provided, the present invention is not limited thereto. As shown in FIG. 12, if at least the transmitting / receiving unit 106/108, the PTS correction timing determination unit 111, the PTS correction unit 112, and the video / audio output unit 113 are provided as the minimum configuration of the video / audio communication device 100 Good.
- the video and audio communication apparatus 100 transmits and receives the image and the voice through the network by the transceiver unit 106/108, the content of the voice transmitted by the transceiver unit 106/108, and the transceiver unit 106/108.
- the PTS correction timing determination unit 111 that determines the timing at which the PTS correction amount in the received image or sound should be updated based on the content of the received image or the content of sound received by the transmission / reception unit 106/108
- the PTS correction unit 112 corrects the PTS by updating the correction amount of the PTS in the received image or sound at the timing determined by the PTS correction timing determination unit 111;
- an image and sound output unit 113 for outputting an image and voice.
- the transmitting and receiving units 106 and 108 are obtained by integrating the functions of the transmitting unit 106 and the receiving unit 108 described above.
- the audio-video communication apparatus and the communication method of the present invention of the present invention were explained based on the embodiment, the present invention is not limited to this embodiment. Without departing from the spirit of the present invention, various modifications that may occur to those skilled in the art may be made to the present embodiment, or a form constructed by combining components in different embodiments is also included in the scope of the present invention. .
- the present invention can be used for a video and audio communication apparatus and method thereof, and in particular, can be used for a highly realistic video and audio communication apparatus and method using a large screen in particular.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
図1は、本発明の画像音声通信装置を備えるTV会議システム構成例を示す図である。
2.相手画像表示とGUI表示との画面遷移
3.GUI表示と自画像表示との画面遷移
このように、PTS補正タイミング判定部111は、画面レイアウトが大きく変わることで、フレームのスキップなどの画像のPTS補正にユーザが気づきにくいタイミングを、PTS補正量を更新すべきPTS補正タイミングであると判定する。
そして、この最小構成によれば、ユーザが気づきにくいタイミングを判定することができ、判定したタイミングにおいてPTS補正が行われた画像または音声を出力することができる。それにより、ユーザに対して画像・音声の違和感を生じさせずにシステムクロックのズレを解消する効果を奏することができる。すなわち、ユーザに対して画像・音声の違和感を生じさせずにシステムクロックのズレを解消することができる画像音声通信装置を実現することができる。
101、301 カメラ・マイク
102 ユーザ入力
103、303 モニタ・スピーカ
104 画像音声入力部
105 符号化部
106 送信部
108 受信部
109 PTS補正量算出部
110 復号化部
111 PTS補正タイミング判定部
112 PTS補正部
113 画像音声出力部
114 受信バッファ
115 出力バッファ
207 ネットワーク
300 他の画像音声通信装置
Claims (11)
- 画像音声通信装置であって、
ネットワークを通じて、画像および音声を送受信する送受信部と、
前記送受信部により送信される音声の内容、前記送受信部により受信された画像の内容、または前記送受信部により受信された音声の内容に基づいて、当該受信された画像または音声におけるPTS(Presentation Time Stamp)の補正量を更新すべきタイミングを判定するタイミング判定部と、
前記タイミング判定部により判定されたタイミングに当該受信された画像または音声におけるPTSの補正量を更新することにより、当該PTSを補正するPTS補正部と、
前記画像音声通信装置が示す現在時刻における補正された当該PTSに対応する当該受信された画像および音声を出力する画像音声出力部とを備える
画像音声通信装置。 - 前記画像音声通信装置は、さらに、
ユーザ操作によりユーザ操作情報が入力されるユーザ入力部を備え、
前記タイミング判定部は、前記ユーザ入力部に入力されたユーザ操作情報が前記受信された画像の画面レイアウト変更を伴うユーザ操作を示す場合に、当該画面レイアウト変更を伴うユーザ操作のタイミングを、前記補正量を更新すべきタイミングとして判定する
請求項1に記載の画像音声通信装置。 - 前記タイミング判定部は、
前記送受信部により受信された画像と当該受信された画像の時間的に前の画像との相関値が予め設定した閾値より高い場合に、当該受信された画像が前記画像音声出力部により出力されるタイミングを前記補正量の更新すべきタイミングとして判定する
請求項1に記載の画像音声通信装置。 - 前記タイミング判定部は、
前記送受信部により受信された画像のデータ量が予め設定した閾値より小さい場合に、前記受信された画像が前記画像音声出力部により出力されるタイミングを前記補正量の更新すべきタイミングとして判定する
請求項1に記載の画像音声通信装置。 - 前記タイミング判定部は、
前記送受信部により受信された音声のレベルが予め設定した閾値よりも小さい場合に、前記受信された音声が前記画像音声出力部により出力されるタイミングを前記補正量の更新すべきタイミングとして判定する
請求項1に記載の画像音声通信装置。 - 前記画像音声通信装置は、さらに、前記送受信部により送信される音声がマイクを用いて収音されて入力される音声入力部とを備え、
前記タイミング判定部は、
前記音声入力部に入力された音声のレベルが予め設定した閾値よりも大きい場合に、前記入力された音声が前記画像音声出力部により出力されるタイミングを前記補正量の更新すべきタイミングとして判定する
請求項1に記載の画像音声通信装置。 - 前記画像音声通信装置は、さらに、前記送受信部により受信された画像または音声を一時的に記憶するバッファと、
前記バッファの容量の残量を監視し、当該残量に基づいて、PTS補正量を算出するPTS補正量算出部とを備え、
前記PTS補正部は、前記タイミング判定部により判定されたタイミングの画像または音声におけるPTSに前記PTS補正量算出部により算出されたPTS補正量を加算することを用いて、当該判定されたタイミングの画像または音声におけるPTSを補正する
請求項1に記載の画像音声通信装置。 - 前記PTS補正量算出部は、
当該残量が単調増加する場合にマイナス値のPTS補正量を算出し、当該残量が単調減少する場合にプラス値のPTS補正量を算出する
請求項7に記載の画像音声通信装置。 - 画像音声通信装置の通信方法であって、
ネットワークを通じて、画像および音声を送受信する送受信ステップと、
前記送受信ステップにおいて送信される音声、または前記送受信部により受信された画像もしくは音声の内容に基づいて、当該受信された画像または音声におけるPTSの補正量を更新すべきタイミングを判定するタイミング判定ステップと、
前記タイミング判定ステップにおいて判定されたタイミングに当該受信された画像または音声におけるPTSの補正量を更新することにより、当該PTSを補正するPTS補正ステップと、
前記画像音声通信装置が示す現在時刻における補正された当該PTSに対応する当該受信された画像および音声を出力する画像音声出力ステップとを含む
通信方法。 - 画像音声通信装置が通信するためのプログラムであって、
ネットワークを通じて、画像および音声を送受信する送受信ステップと、
前記送受信ステップにおいて送信される音声、または前記送受信部により受信された画像もしくは音声の内容に基づいて、当該受信された画像または音声におけるPTSの補正量を更新すべきタイミングを判定するタイミング判定ステップと、
前記タイミング判定ステップにおいて判定されたタイミングに当該受信された画像または音声におけるPTSの補正量を更新することにより、当該PTSを補正するPTS補正ステップと、
前記画像音声通信装置が示す現在時刻における補正された当該PTSに対応する当該受信された画像および音声を出力する画像音声出力ステップとを含む
ことをコンピュータに実行させるためのプログラム。 - 画像音声通信装置の集積回路であって、
ネットワークを通じて、画像および音声を送受信する送受信部と、
前記送受信部により送信される音声、または前記送受信部により受信された画像もしくは音声の内容に基づいて、当該受信された画像または音声におけるPTSの補正量を更新すべきタイミングを判定するタイミング判定部と、
前記タイミング判定部により判定されたタイミングに当該受信された画像または音声におけるPTSの補正量を更新することにより、当該PTSを補正するPTS補正部と、
前記画像音声通信装置が示す現在時刻における補正された当該PTSに対応する当該受信された画像および音声を出力する画像音声出力部とを備える
集積回路。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010800015685A CN102067595B (zh) | 2009-03-16 | 2010-03-01 | 图像声音通信装置以及其通信方法 |
JP2011504727A JP5490782B2 (ja) | 2009-03-16 | 2010-03-01 | 画像音声通信装置およびその通信方法 |
US12/992,703 US9007525B2 (en) | 2009-03-16 | 2010-03-01 | Audio and video communications apparatus and communications method thereof |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009-063498 | 2009-03-16 | ||
JP2009063498 | 2009-03-16 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2010106743A1 true WO2010106743A1 (ja) | 2010-09-23 |
Family
ID=42739411
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2010/001362 WO2010106743A1 (ja) | 2009-03-16 | 2010-03-01 | 画像音声通信装置およびその通信方法 |
Country Status (4)
Country | Link |
---|---|
US (1) | US9007525B2 (ja) |
JP (1) | JP5490782B2 (ja) |
CN (1) | CN102067595B (ja) |
WO (1) | WO2010106743A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021241264A1 (ja) * | 2020-05-27 | 2021-12-02 | ソニーグループ株式会社 | 放送コンテンツ制作システムおよび放送コンテンツ制作方法、並びにプログラム |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102740131B (zh) * | 2012-07-09 | 2015-12-02 | 深圳市香江文化传播有限公司 | 基于实时传输协议的网络电视直播方法及系统 |
US10158927B1 (en) * | 2012-09-05 | 2018-12-18 | Google Llc | Systems and methods for detecting audio-video synchronization using timestamps |
US9531921B2 (en) * | 2013-08-30 | 2016-12-27 | Audionow Ip Holdings, Llc | System and method for video and secondary audio source synchronization |
CN106507217B (zh) * | 2016-10-27 | 2019-07-02 | 腾讯科技(北京)有限公司 | 视频流的时间戳的处理方法和装置 |
CN113573119B (zh) * | 2021-06-15 | 2022-11-29 | 荣耀终端有限公司 | 多媒体数据的时间戳生成方法及装置 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004015553A (ja) * | 2002-06-07 | 2004-01-15 | Sanyo Electric Co Ltd | 同期制御方法と装置およびそれを用いた同期再生装置およびテレビジョン受信装置 |
JP2007049460A (ja) * | 2005-08-10 | 2007-02-22 | Hitachi Ltd | ディジタル放送受信装置 |
JP2008258665A (ja) * | 2007-03-30 | 2008-10-23 | Toshiba Corp | ストリーム再生装置 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5594467A (en) * | 1989-12-06 | 1997-01-14 | Video Logic Ltd. | Computer based display system allowing mixing and windowing of graphics and video |
US6081299A (en) * | 1998-02-20 | 2000-06-27 | International Business Machines Corporation | Methods and systems for encoding real time multimedia data |
US6760749B1 (en) * | 2000-05-10 | 2004-07-06 | Polycom, Inc. | Interactive conference content distribution device and methods of use thereof |
JP4182437B2 (ja) * | 2004-10-04 | 2008-11-19 | ソニー株式会社 | オーディオビデオ同期システム及びモニター装置 |
CN100362864C (zh) * | 2005-07-13 | 2008-01-16 | 浙江大学 | 基于单芯片的网络可视电话系统 |
US7657668B2 (en) * | 2006-08-16 | 2010-02-02 | Qnx Software Systems (Wavemakers), Inc. | Clock synchronization of data streams |
EP2081373A1 (en) * | 2008-01-15 | 2009-07-22 | Hitachi, Ltd. | Video/audio reproducing apparatus |
US8279945B2 (en) * | 2008-01-28 | 2012-10-02 | Mediatek Inc. | Method for compensating timing mismatch in A/V data stream |
CN102177726B (zh) * | 2008-08-21 | 2014-12-03 | 杜比实验室特许公司 | 用于音频和视频签名生成和检测的特征优化和可靠性估计 |
US8428145B2 (en) * | 2008-12-31 | 2013-04-23 | Entropic Communications, Inc. | System and method for providing fast trick modes |
-
2010
- 2010-03-01 CN CN2010800015685A patent/CN102067595B/zh not_active Expired - Fee Related
- 2010-03-01 WO PCT/JP2010/001362 patent/WO2010106743A1/ja active Application Filing
- 2010-03-01 US US12/992,703 patent/US9007525B2/en active Active
- 2010-03-01 JP JP2011504727A patent/JP5490782B2/ja not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004015553A (ja) * | 2002-06-07 | 2004-01-15 | Sanyo Electric Co Ltd | 同期制御方法と装置およびそれを用いた同期再生装置およびテレビジョン受信装置 |
JP2007049460A (ja) * | 2005-08-10 | 2007-02-22 | Hitachi Ltd | ディジタル放送受信装置 |
JP2008258665A (ja) * | 2007-03-30 | 2008-10-23 | Toshiba Corp | ストリーム再生装置 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021241264A1 (ja) * | 2020-05-27 | 2021-12-02 | ソニーグループ株式会社 | 放送コンテンツ制作システムおよび放送コンテンツ制作方法、並びにプログラム |
Also Published As
Publication number | Publication date |
---|---|
US9007525B2 (en) | 2015-04-14 |
CN102067595B (zh) | 2013-07-24 |
JP5490782B2 (ja) | 2014-05-14 |
JPWO2010106743A1 (ja) | 2012-09-20 |
US20110063504A1 (en) | 2011-03-17 |
CN102067595A (zh) | 2011-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7843974B2 (en) | Audio and video synchronization | |
JP5490782B2 (ja) | 画像音声通信装置およびその通信方法 | |
JP5718292B2 (ja) | ビデオテレフォニーのためのピクチャーインピクチャー処理 | |
CA2552769A1 (en) | Synchronization watermarking in multimedia streams | |
CN101710997A (zh) | 基于mpeg-2系统实现视、音频同步的方法及系统 | |
US20070116113A1 (en) | System and method for decreasing end-to-end delay during video conferencing session | |
KR101841313B1 (ko) | 멀티미디어 흐름 처리 방법 및 대응하는 장치 | |
JP2007259142A (ja) | ネットワーク伝送用の映像信号符号化システム及び映像信号符号化方法,映像出力装置,信号変換装置 | |
JP4768250B2 (ja) | 送信装置、受信装置、送受信装置、送信方法及び伝送システム | |
US20150312294A1 (en) | Content Message for Video Conferencing | |
KR20060096044A (ko) | 미디어 신호의 송신 방법 및 수신 방법과 송수신 방법 및장치 | |
US20130166769A1 (en) | Receiving device, screen frame transmission system and method | |
JP4662085B2 (ja) | 動画像蓄積システム、動画像蓄積方法および動画像蓄積プログラム | |
CN114554277A (zh) | 多媒体的处理方法、装置、服务器及计算机可读存储介质 | |
JP2008131591A (ja) | リップシンク制御装置及びリップシンク制御方法 | |
Bertoglio et al. | Intermedia synchronization for videoconference over IP | |
JP3913726B2 (ja) | 多地点テレビ会議制御装置及び多地点テレビ会議システム | |
US8872971B2 (en) | Video display apparatus, video processing method, and video display system | |
JP2012141787A (ja) | 映像表示装置及びその表示方法 | |
US20130136191A1 (en) | Image processing apparatus and control method thereof | |
KR20090010385A (ko) | 화상 통신 단말의 화상 통화 녹화 방법 및 장치 | |
KR20160111662A (ko) | 영상 처리 시스템 및 방법 | |
JP2004180190A (ja) | カメラ制御装置及びその制御ステップを実行するプログラム | |
JP4348238B2 (ja) | 遠隔通信方法及び装置 | |
KR20030057505A (ko) | 실시간 전송 프로토콜을 이용한 멀티미디어 데이터 전송시스템 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201080001568.5 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011504727 Country of ref document: JP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10753240 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 12992703 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 10753240 Country of ref document: EP Kind code of ref document: A1 |