WO2011030680A1 - Encoding device, encoding method, and encoding program - Google Patents
Encoding device, encoding method, and encoding program Download PDFInfo
- Publication number
- WO2011030680A1 WO2011030680A1 PCT/JP2010/064603 JP2010064603W WO2011030680A1 WO 2011030680 A1 WO2011030680 A1 WO 2011030680A1 JP 2010064603 W JP2010064603 W JP 2010064603W WO 2011030680 A1 WO2011030680 A1 WO 2011030680A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- picture
- encoding
- reference picture
- pictures
- network
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/147—Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/107—Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/115—Selection of the code volume for a coding unit prior to coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/132—Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/164—Feedback from the receiver or from the transmission channel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
Definitions
- the present invention relates to an encoding device, an encoding method, and an encoding program for compressing and encoding image data and outputting them to a network.
- An object of the present invention is to provide an encoding device, an encoding method, and an encoding program that can change the bit rate of data output to a network by quickly following changes in the bandwidth of the network.
- the encoding device encodes image data input by an image input means, and a reference picture that is a picture to be referred to when other pictures are decoded, and any other pictures A non-reference picture that is a picture that is not referred to even when decoding the video, and outputs the result to a network, the detection means for detecting information on the bandwidth of the network, and the non-reference picture A generating unit that determines the generation conditions based on the band information detected by the detecting unit, an encoding unit that encodes image data under the generation conditions determined by the determining unit, and the encoding unit Deletion to delete non-reference pictures among encoded pictures according to the band information detected by the detection means And stage, and an output means for outputting the remaining pictures in which the non-reference picture is deleted to the network by said deletion means.
- the encoding apparatus can determine the generation condition of the non-reference picture that has little influence on the video quality even if it is deleted according to the information of the network bandwidth. Then, the generated non-reference picture can be deleted before being output to the network according to the information of the network bandwidth. Therefore, the bit rate of data output can be changed by quickly following the fluctuation of the network bandwidth without causing a time delay due to the encoding process or the like.
- the detection means may include first detection means for detecting a fluctuation amount of the network bandwidth as the bandwidth information.
- the determining means determines a ratio of the number of non-reference pictures to the number of pictures generated within a predetermined period of reference pictures and non-reference pictures based on the variation detected by the first detection means. What is necessary is just to determine as production
- the determination unit is configured to calculate a ratio of a non-reference picture generated by encoding a prediction error with another picture by inter-frame encoding to a number of pictures generated within the predetermined period, The generation condition may be determined.
- an I picture Intra-coded Picture
- a P picture Predictive-coded Picture
- B picture B picture generated by inter-frame coding.
- Bidirectional-coded Picture may be a non-reference picture.
- the data size of a picture generated by interframe coding is smaller than the data size of an I picture generated by intraframe coding. Therefore, when the determining unit determines the generation ratio of the non-reference picture generated by the interframe coding, the data size of the generated picture does not increase rapidly even when the generation ratio is changed. Therefore, the load applied to the network bandwidth is not increased.
- the detection means may include second detection means for detecting the bandwidth of the network as the bandwidth information.
- the deletion unit may delete the non-reference picture when the band detected by the second detection unit is lowered. In this case, the encoding device can quickly reduce the bit rate of data output when the band is reduced. Therefore, it is possible to appropriately prevent the bandwidth from becoming congested.
- the encoding device may include storage control means for storing the picture encoded by the encoding means in a buffer for temporarily storing data.
- the deletion unit may delete the non-reference picture stored in the buffer. In this case, since the encoding apparatus can appropriately delete the non-reference picture stored in the buffer, the deletion process can be easily performed.
- the detection means may include third detection means for detecting a fluctuation amount of the free capacity of the buffer as the band information.
- the determination unit may determine the generation condition based on the fluctuation amount of the free space of the buffer detected by the third detection unit.
- the encoding apparatus can measure network bandwidth information including internal factors in the apparatus from the buffer, and determine the generation condition of the non-reference picture. Therefore, a non-reference picture can be generated more appropriately.
- the determination means minimizes a deviation in the number of reference pictures generated between a non-reference picture and a next non-reference picture generated among a plurality of repeatedly generated pictures. It is desirable to determine the generation conditions. In this case, non-reference pictures in consecutive pictures are generated in a distributed manner without being generated in a biased manner. Therefore, even when the encoding apparatus deletes the non-reference picture, it is possible to reduce the possibility that the video is reproduced so as to be interrupted.
- the deleting means deletes a non-reference picture that is an I picture in preference to a non-reference picture that is a P picture.
- the encoding method encodes the image data input by the image input means, and a reference picture that is a picture to be referred to when other pictures are decoded, and any other pictures
- a non-reference picture is detected by the detection step.
- the encoding method it is possible to determine the generation condition of the non-reference picture that has little influence on the video quality even if it is deleted, according to the network bandwidth information. Then, the generated non-reference picture can be deleted before output according to network bandwidth information. Therefore, the bit rate of data output can be changed by quickly following the fluctuation of the network bandwidth without causing a time delay due to the encoding process or the like.
- the encoding program encodes image data input by the image input means, and a reference picture that is a picture that is referenced when other pictures are decoded, and any other pictures
- the encoding program it is possible to determine the generation condition of the non-reference picture that has little influence on the video quality even if it is deleted according to the information of the network bandwidth. Then, the generated non-reference picture can be deleted before output according to network bandwidth information. Therefore, the bit rate of data output can be changed by quickly following the fluctuation of the network bandwidth without causing a time delay due to the encoding process or the like.
- FIG. 2 is a block diagram showing an electrical configuration of the video conference apparatus 1.
- FIG. It is a functional block diagram of the video conference apparatus 1 which concerns on 1st embodiment. It is a flowchart of the main process which the video conference apparatus 1 performs.
- This is an example of a generation condition determined when the rate of variation is 0% and the number of pictures in a GOP is 10.
- FIG. 1 a video conference apparatus 1 according to a first embodiment that embodies the encoding apparatus of the present invention will be described with reference to the drawings.
- the drawings to be referred to are used for explaining technical features that can be adopted by the present invention.
- the configuration of the apparatus, the flowcharts of various processes, and the like described in the drawings are not intended to be limited to these, but are merely illustrative examples.
- the video conference apparatus 1 is connected to another video conference apparatus 1 via the network 8 (see FIG. 1).
- Each video conference device 1 inputs and outputs image data and audio data.
- users at multiple bases can share video and audio. Therefore, even when all the users are not at the same base, the users can smoothly execute the conference.
- the video conference apparatus 1 includes a CPU 10 that controls the video conference apparatus 1.
- a ROM 11, a RAM 12, a hard disk drive (hereinafter referred to as “HDD”) 13, and an input / output interface 19 are connected to the CPU 10 via a bus 18.
- the ROM 11 stores a program for operating the video conference device 1 and initial values.
- the RAM 12 temporarily stores various information used in the control program.
- the HDD 13 is a non-volatile storage device that stores various types of information. Instead of the HDD 13, a storage device such as an EEPROM or a memory card may be used.
- the input / output interface 19 is connected to an audio input processing unit 21, an audio output processing unit 22, a video input processing unit 23, a video output processing unit 24, an operation unit 25, and an external communication I / F 26.
- the voice input processing unit 21 processes input of voice data from the microphone 31 that inputs voice.
- the audio output processing unit 22 processes the operation of the speaker 32 that outputs audio.
- the video input processing unit 23 processes input of video data (moving image data) from the camera 33 that captures video.
- the video output processing unit 24 processes the operation of the display device 34 that displays video.
- the operation unit 25 is used for a user to input various instructions to the video conference apparatus 1.
- the external communication I / F 26 connects the video conference device 1 to the network 8.
- the RAM 12 will be described in detail.
- the RAM 12 is provided with various storage areas such as a work area 121 and a FIFO buffer area 122 (hereinafter also referred to as “FIFO buffer 122”).
- the work area 121 stores various data such as flags necessary for processing.
- FIFO buffer area 122 encoded data that is encoded image data is temporarily stored before being output to the network 8.
- the FIFO buffer is a buffer that outputs stored data in the order in which they are stored.
- the video conference apparatus 1 converts the image data input from the camera 33 into H.264. Image compression encoding is performed based on the H.264 standard to generate encoded data.
- the video conference apparatus 1 outputs the generated encoded data to another video conference apparatus 1 via the network 8. Note that the video conference device 1 decodes the encoded data input from the other video conference device 1 via the network 8 and causes the display device 34 to display the decoded data.
- Image compression coding includes intra-frame coding and inter-frame coding.
- the intra-frame coding is coding performed by intra-screen prediction within one frame of image data of a plurality of consecutive frames input by the camera.
- An I picture (Intra-coded Picture) that is encoded data generated by intra-frame encoding is decoded independently without referring to other pictures.
- interframe coding prediction data is calculated by referring to data of a frame different from data of a frame to be encoded among continuous frame data, and the calculated prediction error is encoded.
- Coded data generated by interframe coding includes a P picture (Predictive-coded Picture) and a B picture (Bidirectional-coded Picture).
- P pictures generated by referring to past pictures are mainly used.
- To decode a P picture a picture referenced at the time of encoding is required. However, the data amount of the P picture is smaller than that of the I picture that is decoded alone.
- a picture that is referred to when any other P picture is decoded is referred to as a “reference picture”.
- a picture that is not referred to when any other P picture is decoded is referred to as a “non-reference picture”.
- the I picture is often used as a reference picture.
- the first I picture is a non-reference picture. It becomes.
- decoding a P picture it is possible to refer to either an I picture or a P picture.
- a P picture can be either a reference picture or a non-reference picture.
- DCT / quantization 42 is performed on the input image 41 from the camera 33.
- the coefficient transformed by DCT is quantized according to the quantization parameter.
- inverse quantization / inverse DCT 43 is performed on a part of the quantized data.
- the data subjected to the inverse quantization / inverse DCT 43 is subjected to the deblocking filter 44 and stored in the frame memory 45.
- intra-frame prediction 46 is performed on the data stored in the frame memory 45, and further DCT / quantization 42 is performed.
- Entropy encoding 47 is performed on the quantized data.
- the encoded data 48 generated by the entropy encoding 47 is input to the FIFO buffer 122.
- the encoded data input to the FIFO buffer 122 only the encoded data that has not been deleted is subjected to the network output 50 through the non-reference picture deleting means 49 described later.
- motion prediction 51 is performed using the input image 41, and motion compensation 52 based on the previous predicted image in the frame memory 45 is performed.
- the prediction error calculated by the motion compensation 52 is subjected to weighted prediction 53 using a weighting factor related to brightness, and further DCT / quantization 42 is performed.
- Entropy encoding 47 is performed on the quantized data, and encoded data 48 is generated.
- the subsequent flow is the same as in the case of intra-frame coding.
- the bit rate cannot be changed by quickly following the fluctuation of the band. As a result, for example, when a video conference is being performed, reproduction of the video is delayed, and the user cannot smoothly execute the conference.
- the video conference device 1 performs the available bandwidth measurement 55 and the FIFO buffer monitoring 56 of the network 8. Then, the non-reference picture deleting unit 49 deletes a part of the encoded data in the FIFO buffer 122 when the available bandwidth of the network 8 is reduced. By deleting a part of the encoded data to be output, the bit rate of output to the network 8 can be quickly reduced when the available bandwidth is reduced.
- the device that receives the encoded data cannot decode not only the deleted picture but also the P picture that needs to refer to the deleted picture at the time of decoding. Therefore, non-reference pictures should be deleted in order to prevent significant degradation of video quality. If the non-reference picture that can be deleted is not stored in the FIFO buffer 122 when the available bandwidth decreases, the bit rate cannot be changed in accordance with the decrease in the available bandwidth. Therefore, the video conference apparatus 1 measures the fluctuation amount of the usable bandwidth of the network 8 and the fluctuation amount of the free capacity of the FIFO buffer 122. The picture mode control block 58 changes the P picture generation ratio for all the pictures based on the measured variation.
- the reference picture control block 59 selects the reference picture of the P picture based on the measured variation, thereby changing the ratio of the non-reference picture to all the pictures. When the available bandwidth decreases, the non-reference picture is deleted. Details of the above processing will be described below.
- a main process performed by the video conference apparatus 1 will be described with reference to FIGS.
- the main process is executed by the CPU 10 in accordance with a program stored in the ROM 11.
- the main process is started when an instruction to execute transmission / reception of image data is input.
- the usable bandwidth (W) of the network 8 and the variation (A) of the usable bandwidth are measured (S2).
- available bandwidth for example, a packet train transfer method is used for probe packet transfer, and pathload that estimates the available bandwidth using the increasing tendency of one-way transfer delay between probe packets, ICMP (INTERNET CONTROL MESSAGE PROTOCOL, etc.) ) ECHO REQUEST packets are continuously transmitted, and a known bandwidth measurement technique such as cprobe that obtains an available bandwidth by observing the packet interval of the response packet may be used.
- a known bandwidth measurement technique such as cprobe that obtains an available bandwidth by observing the packet interval of the response packet may be used.
- the fluctuation amount includes both an increase amount and a decrease amount.
- the non-reference picture generation condition mainly indicates a ratio of the number of non-reference pictures to a preset number of pictures generated within a predetermined time (for example, 1 second).
- the video conference apparatus 1 determines the number of non-reference pictures by determining the number of pictures in the GOP (Group of Pictures) and the number of reference pictures that are referenced when the P picture is encoded and decoded. Determine the percentage.
- a GOP is a set of a preset number of pictures generated within a predetermined period in order to efficiently manage a plurality of data.
- the ratio of the number of non-reference pictures to the number of pictures in the GOP can be increased. Further, as will be described below, the proportion of non-reference pictures can be determined by appropriately determining which reference picture of the P picture is used.
- the reference picture control block 59 (see FIG. 2) described above controls the motion compensation 52 based on the determined relationship between the P picture and the reference picture.
- FIG. 4 is an example of non-reference picture generation conditions that are finally determined when the rate of change in the available bandwidth is 0% and the number of pictures in the GOP is 10. If the rate of change in the usable bandwidth is 0%, the possibility that the usable bandwidth is suddenly reduced is low. Therefore, it is rare to rapidly reduce the bit rate by deleting a large number of pictures. Therefore, it is not necessary to increase the ratio of non-reference pictures. If it is not necessary to increase the number of non-reference pictures, the reference picture of a P picture is preferably the picture immediately before the P picture. This is because the prediction error is reduced and the amount of data is reduced by using the immediately preceding picture as a reference picture. Therefore, as shown in FIG. 4, all the reference pictures of the P picture are pictures immediately before the P picture. As a result, the non-reference picture is only the last picture in the GOP.
- FIG. 5 is an example of non-reference picture generation conditions determined when the rate of change in the available bandwidth is 50% and the number of pictures in the GOP is 10.
- the video conference apparatus 1 determines the generation condition of the non-reference picture so that the ratio of the change amount of the available band and the ratio of the non-reference picture in the picture in the GOP are closest. Therefore, in the example shown in FIG. 5, since the ratio of the amount of change in the usable bandwidth is 50%, the generation condition is determined so that 5 out of 10 pictures are non-reference pictures.
- the video conference apparatus 1 determines the generation condition so that the reference picture and the non-reference picture are arranged equally. In other words, the generation condition is determined so that the deviation in the number of reference pictures located between the two most recent non-reference pictures is minimized. When this deviation is large, the reference picture and the non-reference picture are not evenly arranged. Therefore, there is a high possibility that problems such as video interruption occur when non-reference pictures are deleted. Therefore, in the example illustrated in FIG. 5, the video conference apparatus 1 alternately arranges reference pictures and non-reference pictures. As a result, there is no bias in the number of reference pictures located between non-reference pictures.
- the closest (newest) picture among the reference pictures before the P picture is determined as the reference picture of the P picture. That is, by making the reference pictures of a plurality of P pictures the same picture, the proportion of non-reference pictures that are P pictures can be increased.
- FIG. 6 shows an example of generation conditions determined when the rate of change in the available bandwidth is 40% and the number of pictures in the GOP is 10. If the rate of change in the usable bandwidth is 40%, the video conference apparatus 1 determines the generation condition so that four out of ten pictures are non-reference pictures. Then, the arrangement of the reference picture and the non-reference picture is determined so that the number of reference pictures located between the non-reference pictures is “2”, “1”, “2”, and “1” in order.
- FIG. 7 shows an example of generation conditions determined when the rate of change in the available bandwidth is 90% and the number of pictures in the GOP is 10. If the rate of change in the available bandwidth is 90%, the video conference apparatus 1 determines the generation condition so that nine out of ten pictures are non-reference pictures. In this case, the reference pictures of all P pictures are the first I picture in the GOP. In this way, by setting the reference pictures of all P pictures in the GOP as I pictures, the ratio of non-reference pictures that are P pictures can be maximized.
- the video conference apparatus 1 can determine the rate at which non-reference pictures that are P pictures are generated in the process of S5. Even if the ratio of I pictures to all pictures is increased, the ratio of non-reference pictures can be increased. However, the data size of the I picture is larger than the data size of the P picture. Therefore, the load applied to the network 8 increases. On the other hand, the video conference apparatus 1 can change the generation ratio of the non-reference picture by changing the ratio of the non-reference picture that is a P picture without rapidly increasing the load applied to the network 8. .
- the process for each frame is performed (S6).
- the image data is encoded (encoded) according to the determined generation condition, and the non-reference picture is deleted based on the information of the network bandwidth.
- the amount of change (C) in the free capacity of the FIFO buffer area 122 is measured (S16).
- the fluctuation amount (C) is the difference between the previous free capacity of the FIFO buffer area 122 and the current free capacity of the FIFO buffer area 122.
- the number of non-reference pictures corresponding to the amount of decrease in available bandwidth (W) is deleted.
- an I picture having a larger data amount is preferentially deleted.
- an I picture (I) and a P picture (P) are generated in the order of I / I / P / I / I / P and the P picture refers to the immediately preceding I picture, The I picture two frames before is preferentially deleted.
- H.264 is used. Since the H.264 standard is adopted, an access unit (unit of picture defined in H.264) composed only of non-reference pictures is deleted.
- the fluctuation amount (A) of the available bandwidth measured last time in S2 (see FIG. 3) in order to determine the generation condition of the non-reference picture, and the fluctuation amount of the free capacity of the FIFO buffer area 122 measured this time in S16.
- the difference from (C) is calculated (S19).
- the remaining encoded data that has not been deleted in the FIFO buffer area 122 is sequentially output to the network 8 so that the bit rate of data output follows the available bandwidth (W) of the network 8 (S21).
- the absolute value of the difference in fluctuation amount calculated in S19 is equal to or larger than the absolute value of the difference in changeable value of the output bit rate when the generation condition is updated (S22). For example, when processing is performed with 15 pictures in a GOP, each time a new non-reference picture in one GOP is deleted or the deletion is stopped by one, the output bit rate is changed. The average value can be changed by about 6.7%. Therefore, in this case, the changeable value of the output bit rate is increased by about 6.7% by updating the generation condition so that the number of non-reference pictures in the GOP is increased by one.
- the changeable value of the output bit rate is reduced by about 6.7%.
- the video conference apparatus 1 does not update the generation condition when the change in the amount of change is small, and updates only when the change in the amount of change is large. Therefore, in S22, the absolute value of the difference between the changeable value of the output bit rate before the generation condition is updated and the changeable value of the output bit rate after the update is calculated. If the absolute value of the difference in variation calculated in S19 is smaller than the absolute value of the difference in output bit rate changeable value (S22: NO), the process directly returns to S11.
- the process returns to the main process (see FIG. 3) in order to update the generation condition.
- the main process when the frame-by-frame process (S6) is completed, the process returns to S2, and the non-reference picture generation conditions are updated (S2 to S5).
- the video conference apparatus 1 determines the non-reference picture generation condition based on the band information of the network 8.
- the image data is encoded with the determined generation conditions.
- the non-reference picture is deleted from the encoded picture, and the remaining picture is output to the network. Therefore, the video conference apparatus 1 determines the generation condition of the non-reference picture that has little influence on the video quality even if it is deleted according to the band information of the network 8, and outputs the generated non-reference picture to the network. Can be deleted before. Therefore, the video conference apparatus 1 can quickly follow the fluctuation of the bandwidth of the network 8 and change the output bit rate without causing a time delay due to encoding processing, buffering, or the like.
- the video conference apparatus 1 can quickly and rapidly reduce the output bit rate by increasing the ratio of non-reference pictures.
- the proportion of non-reference pictures is reduced, the reference picture of a P picture can be made as close as possible to that P picture, and the encoding process can be performed efficiently.
- the video conference apparatus 1 determines the ratio of non-reference pictures to all pictures as a generation condition from the fluctuation amount of the available bandwidth of the network 8. Therefore, it is possible to appropriately generate a non-reference picture that has little influence on the video quality even if it is deleted in accordance with the amount of change in the available bandwidth. When the available bandwidth actually decreases, the output bit rate can be quickly decreased by deleting the generated non-reference picture.
- an I picture generated by intra-frame coding is a non-reference picture and cases where a P picture and a B picture generated by inter-frame coding are non-reference pictures.
- the data size of a picture generated by interframe coding is smaller than the data size of an I picture generated by intraframe coding.
- the video conference apparatus 1 can appropriately determine the generation ratio of non-reference pictures (non-reference pictures that are P pictures) generated by interframe coding. As a result, even when the generation ratio is changed, the total amount of data size of the generated picture does not increase rapidly. Therefore, the load applied to the network bandwidth is not increased.
- the video conference apparatus 1 temporarily stores the generated encoded data in the FIFO buffer area 122. Accordingly, the deletion process can be easily performed by appropriately deleting the non-reference pictures stored in the FIFO buffer area 122. It is also possible to delete a plurality of non-reference pictures at once.
- the video conference apparatus 1 acquires network bandwidth information including internal factors of the video conference apparatus 1 by measuring the amount of change in the free capacity of the FIFO buffer area 122. When the obtained amount of variation becomes large, the non-reference picture generation conditions can be updated. Therefore, a non-reference picture can be generated by appropriately reflecting internal factors of the video conference apparatus 1 itself.
- the video conference apparatus 1 can generate a plurality of non-reference pictures by distributing them without bias. Therefore, even when a plurality of non-reference pictures are deleted, it is possible to reduce the possibility that the video is reproduced so as to be interrupted. Further, when deleting the non-reference picture, the video conference apparatus 1 can reduce the output bit rate more quickly and efficiently by preferentially deleting the I picture having a larger data amount than the P picture. .
- the video conference device 1 corresponds to the “encoding device” of the present invention.
- the camera 33 corresponds to “image input means”.
- the usable bandwidth (W) of the network 8, the usable bandwidth variation (A), and the free capacity variation (C) of the FIFO buffer area 122 correspond to “network bandwidth information”.
- the CPU 10 that detects band information in S2 of FIG. 3 and S15 and S16 of FIG. 8 functions as a “detection unit”.
- the CPU 10 that determines the non-reference picture generation conditions in S5 of FIG. 3 functions as a “determination unit”.
- the CPU 10 that encodes the image data in S11 of FIG. 8 functions as an “encoding unit”.
- the CPU 10 that deletes the non-reference picture in S18 of FIG. 8 functions as a “deleting unit”.
- the CPU 10 that outputs the encoded data in S21 of FIG. 8 functions as an “output unit”.
- the CPU 10 that measures the available bandwidth (W) of the network 8 in S2 of FIG. 3 and S15 of FIG. 8 functions as a “second detection unit”.
- the FIFO buffer area 122 of the RAM 12 corresponds to a “buffer”.
- the CPU 10 that inputs the encoded data to the FIFO buffer area 122 in S12 of FIG. 8 functions as a “storage control unit”.
- the CPU 10 that measures the fluctuation amount of the free capacity of the FIFO buffer area 122 in S16 of FIG. 8 functions as a “third detection unit”.
- the process of detecting band information in S2 of FIG. 3 and S15 and S16 of FIG. 8 corresponds to the “detection step” of the present invention.
- the process of determining the non-reference picture generation condition in S5 of FIG. 3 corresponds to the “determination step”.
- the process of encoding the image data in S11 of FIG. 8 corresponds to the “encoding step”.
- the process of deleting the non-reference picture in S18 of FIG. 8 corresponds to a “deletion step”.
- the process of outputting the encoded data in S21 of FIG. 8 corresponds to an “output step”.
- the video conference apparatus 101 according to the second embodiment is different from the video conference apparatus 1 according to the first embodiment only in that the encoded data is not buffered. Therefore, the same number is attached
- the video conference apparatus 101 does not include the FIFO buffer 122 (see FIG. 2).
- the picture mode control block 58 and the reference picture control block 59 generate a non-reference picture using information obtained by the available bandwidth measurement 55 of the network 8.
- the present invention can be implemented without using a buffer for buffering encoded data. Details of the processing will be described below.
- the CPU 10 of the video conference apparatus 101 starts a main process when an instruction to execute transmission / reception of image data is input.
- the main process is the same as the main process (see FIG. 3) performed by the video conference apparatus 1 of the first embodiment except for the frame-by-frame process described below. Therefore, the description of the main process is omitted.
- the image data is converted according to the determined quantization parameter (S3, see FIG. 3) and the non-reference picture generation conditions (S5, see FIG. 3).
- One by one is encoded (S11). It is determined whether there is a non-reference picture in the encoded data output to the network 8 (S113). If there is no non-reference picture (S113: NO), the process proceeds to S121.
- the available bandwidth (W) of the network 8 is measured (S15).
- the fluctuation amount (B) of the usable bandwidth of the network 8 is measured (S116). It is determined whether or not the available bandwidth (W) measured this time in S15 is lower than the available bandwidth (W) measured last time in S2 (see FIG. 3) or S15 (S17). If the current time is not lower (S17: NO), the process proceeds to S119 as it is. If the available bandwidth (W) measured in the current processing of S15 is lower (S17: YES), the non-reference picture before output to the network is deleted (S118).
- the difference between the fluctuation amount (A) of the available bandwidth measured last time in S2 (see FIG. 3) and the fluctuation amount (B) of the available bandwidth measured this time in S116 is calculated (S119).
- the encoded data that has not been deleted is output to the network 8 (S121). It is determined whether or not the absolute value of the difference in the available bandwidth fluctuation amount calculated in S119 is equal to or greater than the absolute value of the difference in the changeable value of the output bit rate when the generation condition is updated (S22). If the absolute value of the difference in the amount of change in the available bandwidth is smaller than the absolute value of the difference in the changeable value of the output bit rate (S22: NO), the process returns to S11 as it is. If the difference is greater than or equal to the absolute value of the changeable value (S22: YES), the process returns to the main process (see S3) in order to update the generation condition.
- the video conference apparatus 101 determines a non-reference picture generation condition in accordance with the amount of change in the available bandwidth of the network 8, and selects a non-reference picture when the available bandwidth decreases. Can be deleted. Accordingly, the output bit rate to the network 8 can be changed by quickly following the change in the available bandwidth.
- the present invention can be implemented without using a buffer for buffering encoded data.
- the available bandwidth (W) of the network 8 and the fluctuation amount (A, B) of the available bandwidth correspond to “network bandwidth information” of the present invention.
- the CPU 10 that detects band information in S2 of FIG. 3 and S15 and S116 of FIG. 10 functions as a “detection unit”.
- the CPU 10 that encodes the image data in S11 of FIG. 10 functions as an “encoding unit”.
- the CPU 10 that deletes the non-reference picture in S118 in FIG. 10 functions as a “deleting unit”.
- the CPU 10 that outputs the encoded data in S121 of FIG. 10 functions as an “output unit”.
- the present invention is not limited to the above-described embodiment, and various modifications are possible.
- the present invention is not limited to a video conference apparatus.
- the present invention can be applied to any device that outputs encoded data via a network, such as a server that distributes video.
- the Coding is performed based on the H.264 standard, but other standards may be adopted.
- the video conference apparatuses 1 and 101 of the above embodiment perform processing such as deletion of non-reference pictures based on the value of the available bandwidth of the network 8.
- the video conference apparatuses 1 and 101 may measure the bandwidth actually used in the network 8 and perform processing.
- a condition other than the condition determined in the above embodiment may be determined.
- the frame rate, resolution, and the like may be determined as generation conditions.
- the video conference apparatus 101 may delete a non-reference picture when the amount of decrease in available bandwidth exceeds a threshold value.
- the non-reference picture generation condition is updated when the change amount of the fluctuation amount of the free capacity of the FIFO buffer 122 or the change amount of the fluctuation amount of the usable bandwidth becomes large (FIG. 8). (See S22 and S22 in FIG. 10).
- the trigger for updating the non-reference picture generation condition can also be changed.
- the generation condition may be repeatedly updated every predetermined time or every time a predetermined number of pictures are output.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Provided are an encoding device, an encoding method, and an encoding program, which will enable the bit rate of data output to the network to be changed in quick response to the fluctuation in the network band. A video conference device detects information about the network band and determines the generating condition of non-reference pictures on the basis of the detected information. An input image (41) is encoded with the determined generating condition to generate encoded data (48). From among the generated encoded data, the video conference device deletes non-reference pictures, which are not referred to when decoding any of the other pictures, when the network's available band drops. Only the remaining pictures that were not deleted are output to the network.
Description
本発明は、画像データを画像圧縮符号化してネットワークに出力する符号化装置、符号化方法、および符号化プログラムに関する。
The present invention relates to an encoding device, an encoding method, and an encoding program for compressing and encoding image data and outputting them to a network.
従来、動画像のデータをネットワークに出力するための様々な技術が知られている。データをネットワークに出力する場合、出力のビットレートをネットワークの帯域に応じて制御する技術が知られている。例えば、特許文献1に記載の通信サービスユニットは、バッファにデータを一時的に保存することで、帯域に応じたデータ送信を行っている。この場合、バッファのオーバーフローの発生を防止する必要があるため、エンコーダのビットレートを減少させることも行われている。
Conventionally, various techniques for outputting moving image data to a network are known. In the case of outputting data to a network, a technique for controlling an output bit rate according to a network bandwidth is known. For example, the communication service unit described in Patent Literature 1 performs data transmission according to a band by temporarily storing data in a buffer. In this case, since it is necessary to prevent the occurrence of buffer overflow, the bit rate of the encoder is also reduced.
しかしながら、エンコーダのビットレートを減少させる従来の技術では、ビットレートを減少させてから、減少させたビットレートで実際にデータが出力されるまでには、符号化処理等を経る必要があるため時間遅延が生じる。従って、帯域の変動に素早く追従してネットワークへの出力のビットレートを変更することはできなかった。
However, in the conventional technique for reducing the bit rate of the encoder, since it is necessary to go through an encoding process or the like after the bit rate is reduced until data is actually output at the reduced bit rate, time is required. There is a delay. Therefore, it has been impossible to change the bit rate of the output to the network by quickly following the fluctuation of the band.
本発明は、ネットワークへのデータ出力のビットレートを、ネットワークの帯域の変動に素早く追従して変更することができる符号化装置、符号化方法、および符号化プログラムを提供することを目的とする。
An object of the present invention is to provide an encoding device, an encoding method, and an encoding program that can change the bit rate of data output to a network by quickly following changes in the bandwidth of the network.
本発明の第一の態様に係る符号化装置は、画像入力手段によって入力される画像データを符号化して、他のピクチャの復号化時に参照されるピクチャである参照ピクチャ、および他のいずれのピクチャの復号化時にも参照されないピクチャである非参照ピクチャからなる連続するピクチャを生成し、ネットワークに出力する符号化装置であって、前記ネットワークの帯域の情報を検出する検出手段と、前記非参照ピクチャの生成条件を、前記検出手段によって検出された帯域の情報に基づいて決定する決定手段と、前記決定手段によって決定された生成条件で画像データを符号化する符号化手段と、前記符号化手段によって符号化されたピクチャのうち非参照ピクチャを、前記検出手段によって検出された帯域の情報に応じて削除する削除手段と、前記削除手段によって前記非参照ピクチャが削除された残りのピクチャを前記ネットワークへ出力する出力手段とを備えている。
The encoding device according to the first aspect of the present invention encodes image data input by an image input means, and a reference picture that is a picture to be referred to when other pictures are decoded, and any other pictures A non-reference picture that is a picture that is not referred to even when decoding the video, and outputs the result to a network, the detection means for detecting information on the bandwidth of the network, and the non-reference picture A generating unit that determines the generation conditions based on the band information detected by the detecting unit, an encoding unit that encodes image data under the generation conditions determined by the determining unit, and the encoding unit Deletion to delete non-reference pictures among encoded pictures according to the band information detected by the detection means And stage, and an output means for outputting the remaining pictures in which the non-reference picture is deleted to the network by said deletion means.
第一の態様に係る符号化装置は、削除しても映像品質への影響が少ない非参照ピクチャの生成条件を、ネットワークの帯域の情報に応じて決定することができる。そして、生成した非参照ピクチャを、ネットワークの帯域の情報に応じて、ネットワークへの出力前に削除することができる。よって、符号化処理等を経ることによる時間遅延を生じさせずに、ネットワークの帯域の変動に素早く追従して、データ出力のビットレートを変更することができる。
The encoding apparatus according to the first aspect can determine the generation condition of the non-reference picture that has little influence on the video quality even if it is deleted according to the information of the network bandwidth. Then, the generated non-reference picture can be deleted before being output to the network according to the information of the network bandwidth. Therefore, the bit rate of data output can be changed by quickly following the fluctuation of the network bandwidth without causing a time delay due to the encoding process or the like.
前記検出手段は、前記ネットワークの帯域の変動量を前記帯域の情報として検出する第一検出手段を備えてもよい。前記決定手段は、前記第一検出手段によって検出された前記変動量に基づいて、参照ピクチャと非参照ピクチャとからなる所定期間内で生成されるピクチャの数に対する非参照ピクチャの数の割合を前記生成条件として決定すればよい。符号化装置は、非参照ピクチャの割合を増やせば、多くの非参照ピクチャを削除することができるため、ネットワークへの出力のビットレートを急激に減少させることもできる。一方、符号化装置は、非参照ピクチャの割合を減らせば、効率よく符号化処理を行うことができる。従って、符号化装置は、削除しても映像品質への影響が少ない非参照ピクチャの割合を、帯域の変動量に合わせて生成することで、適切な処理を行うことができる。
The detection means may include first detection means for detecting a fluctuation amount of the network bandwidth as the bandwidth information. The determining means determines a ratio of the number of non-reference pictures to the number of pictures generated within a predetermined period of reference pictures and non-reference pictures based on the variation detected by the first detection means. What is necessary is just to determine as production | generation conditions. Since the encoding apparatus can delete many non-reference pictures if the ratio of non-reference pictures is increased, the bit rate of output to the network can be drastically reduced. On the other hand, the encoding apparatus can perform the encoding process efficiently by reducing the ratio of non-reference pictures. Therefore, the encoding apparatus can perform appropriate processing by generating the proportion of non-reference pictures that have little effect on video quality even if deleted according to the amount of bandwidth variation.
前記決定手段は、フレーム間符号化によって他のピクチャとの間の予測誤差が符号化されることで生成される非参照ピクチャについての、前記所定期間内で生成されるピクチャの数に対する割合を、前記生成条件として決定してもよい。画像圧縮符号化では、フレーム内符号化によって生成されるIピクチャ(Intra-coded Picture)が非参照ピクチャとなる場合と、フレーム間符号化によって生成されるPピクチャ(Predictive-coded Picture)およびBピクチャ(Bidirectional-coded Picture)が非参照ピクチャとなる場合とがある。フレーム間符号化によって生成されるピクチャのデータサイズは、フレーム内符号化によって生成されるIピクチャのデータサイズよりも小さい。従って、決定手段が、フレーム間符号化によって生成される非参照ピクチャの生成割合を決定すると、生成割合を変更した場合でも、生成されるピクチャのデータサイズが急激に増大することはない。よって、ネットワークの帯域に与える負荷を増大させることがない。
The determination unit is configured to calculate a ratio of a non-reference picture generated by encoding a prediction error with another picture by inter-frame encoding to a number of pictures generated within the predetermined period, The generation condition may be determined. In image compression coding, an I picture (Intra-coded Picture) generated by intra-frame coding becomes a non-reference picture, and a P picture (Predictive-coded Picture) and B picture generated by inter-frame coding. (Bidirectional-coded Picture) may be a non-reference picture. The data size of a picture generated by interframe coding is smaller than the data size of an I picture generated by intraframe coding. Therefore, when the determining unit determines the generation ratio of the non-reference picture generated by the interframe coding, the data size of the generated picture does not increase rapidly even when the generation ratio is changed. Therefore, the load applied to the network bandwidth is not increased.
前記検出手段は、前記ネットワークの帯域を前記帯域の情報として検出する第二検出手段を備えてもよい。前記削除手段は、前記第二検出手段によって検出された帯域が低下した場合に非参照ピクチャを削除すればよい。この場合、符号化装置は、帯域が低下した場合に、データ出力のビットレートを素早く減少させることができる。よって、帯域が輻輳状態となることを適切に防止することができる。
The detection means may include second detection means for detecting the bandwidth of the network as the bandwidth information. The deletion unit may delete the non-reference picture when the band detected by the second detection unit is lowered. In this case, the encoding device can quickly reduce the bit rate of data output when the band is reduced. Therefore, it is possible to appropriately prevent the bandwidth from becoming congested.
前記符号化装置は、データを一時的に保存するバッファに、前記符号化手段によって符号化されたピクチャを記憶させる記憶制御手段を備えてもよい。前記削除手段は、前記バッファに保存されている非参照ピクチャを削除すればよい。この場合、符号化装置は、バッファ内に保存されている非参照ピクチャを適宜削除することができるため、削除処理を容易に行うことができる。
The encoding device may include storage control means for storing the picture encoded by the encoding means in a buffer for temporarily storing data. The deletion unit may delete the non-reference picture stored in the buffer. In this case, since the encoding apparatus can appropriately delete the non-reference picture stored in the buffer, the deletion process can be easily performed.
前記検出手段は、前記バッファの空き容量の変動量を前記帯域の情報として検出する第三検出手段を備えてもよい。前記決定手段は、前記第三検出手段によって検出された前記バッファの空き容量の変動量に基づいて前記生成条件を決定すればよい。この場合、符号化装置は、装置内の内部的要因を含めたネットワークの帯域の情報をバッファから計測し、非参照ピクチャの生成条件を決定することができる。よって、より適切に非参照ピクチャを生成することができる。
The detection means may include third detection means for detecting a fluctuation amount of the free capacity of the buffer as the band information. The determination unit may determine the generation condition based on the fluctuation amount of the free space of the buffer detected by the third detection unit. In this case, the encoding apparatus can measure network bandwidth information including internal factors in the apparatus from the buffer, and determine the generation condition of the non-reference picture. Therefore, a non-reference picture can be generated more appropriately.
前記決定手段は、繰り返し生成される複数のピクチャの中で、非参照ピクチャが生成されてから次に非参照ピクチャが生成されるまでの間に生成される参照ピクチャの数の偏りを最小とする前記生成条件を決定することが望ましい。この場合、連続するピクチャの中の非参照ピクチャが偏って生成されることなく、分散して生成される。よって、符号化装置が非参照ピクチャを削除した場合でも、映像が途切れるように再生されるおそれを低下させることができる。
The determination means minimizes a deviation in the number of reference pictures generated between a non-reference picture and a next non-reference picture generated among a plurality of repeatedly generated pictures. It is desirable to determine the generation conditions. In this case, non-reference pictures in consecutive pictures are generated in a distributed manner without being generated in a biased manner. Therefore, even when the encoding apparatus deletes the non-reference picture, it is possible to reduce the possibility that the video is reproduced so as to be interrupted.
前記削除手段は、Pピクチャである非参照ピクチャよりも、Iピクチャである非参照ピクチャを優先して削除することが望ましい。データ量がPピクチャよりも大きいIピクチャを優先して削除することで、ネットワークの帯域の変動により素早く追従し、削除するピクチャの数を減少させることができる。
It is desirable that the deleting means deletes a non-reference picture that is an I picture in preference to a non-reference picture that is a P picture. By preferentially deleting an I picture whose data amount is larger than that of a P picture, it is possible to quickly follow a change in network bandwidth and reduce the number of pictures to be deleted.
本発明の第二の態様に係る符号化方法は、画像入力手段によって入力された画像データを符号化して、他のピクチャの復号化時に参照されるピクチャである参照ピクチャ、および他のいずれのピクチャの復号化時にも参照されないピクチャである非参照ピクチャからなる連続するピクチャを生成し、ネットワークに出力する符号化方法であって、前記ネットワークの帯域の情報を検出する検出ステップと、前記非参照ピクチャの生成条件を、前記検出ステップによって検出された帯域の情報に基づいて決定する決定ステップと、前記決定ステップによって決定された生成条件で画像データを符号化する符号化ステップと、前記符号化ステップによって符号化されたピクチャのうち非参照ピクチャを、前記検出ステップによって検出された帯域の情報に応じて削除する削除ステップと、前記削除ステップによって前記非参照ピクチャが削除された残りのピクチャを前記ネットワークへ出力する出力ステップとを備えている。
The encoding method according to the second aspect of the present invention encodes the image data input by the image input means, and a reference picture that is a picture to be referred to when other pictures are decoded, and any other pictures An encoding method for generating a continuous picture composed of non-reference pictures which are pictures that are not referenced even at the time of decoding, and outputting them to a network, the step of detecting information on the bandwidth of the network, and the non-reference pictures Are determined based on the band information detected by the detection step, an encoding step for encoding image data under the generation condition determined by the determination step, and the encoding step. Among the encoded pictures, a non-reference picture is detected by the detection step. A deletion step of deleting in response to the information, the remaining pictures in which the non-reference picture is deleted by said deletion step and an output step of outputting to the network.
第二の態様に係る符号化方法によると、削除しても映像品質への影響が少ない非参照ピクチャの生成条件を、ネットワークの帯域の情報に応じて決定することができる。そして、生成した非参照ピクチャを、ネットワークの帯域の情報に応じて、出力前に削除することができる。よって、符号化処理等を経ることによる時間遅延を生じさせずに、ネットワークの帯域の変動に素早く追従して、データ出力のビットレートを変更することができる。
According to the encoding method according to the second aspect, it is possible to determine the generation condition of the non-reference picture that has little influence on the video quality even if it is deleted, according to the network bandwidth information. Then, the generated non-reference picture can be deleted before output according to network bandwidth information. Therefore, the bit rate of data output can be changed by quickly following the fluctuation of the network bandwidth without causing a time delay due to the encoding process or the like.
本発明の第三の態様に係る符号化プログラムは、画像入力手段によって入力された画像データを符号化して、他のピクチャの復号化時に参照されるピクチャである参照ピクチャ、および他のいずれのピクチャの復号化時にも参照されないピクチャである非参照ピクチャからなる連続するピクチャを生成し、ネットワークに出力する符号化プログラムであって、コンピュータに、前記ネットワークの帯域の情報を検出する検出ステップと、前記非参照ピクチャの生成条件を、前記検出ステップによって検出された帯域の情報に基づいて決定する決定ステップと、前記決定ステップによって決定された生成条件で画像データを符号化する符号化ステップと、前記符号化ステップによって符号化されたピクチャのうち非参照ピクチャを、前記検出ステップによって検出された帯域の情報に応じて削除する削除ステップと、前記削除ステップによって前記非参照ピクチャが削除された残りのピクチャを前記ネットワークへ出力する出力ステップとを実行させることを特徴とする。
The encoding program according to the third aspect of the present invention encodes image data input by the image input means, and a reference picture that is a picture that is referenced when other pictures are decoded, and any other pictures An encoding program for generating a continuous picture composed of non-reference pictures which are pictures that are not referenced even at the time of decoding, and outputting them to a network, the computer detecting a network bandwidth information; and A determination step of determining a non-reference picture generation condition based on information of a band detected by the detection step; an encoding step of encoding image data using the generation condition determined by the determination step; Non-reference pictures among the pictures encoded by the encoding step are detected. A deletion step of deleting in accordance with the bandwidth of the information detected by the step, and wherein the remaining pictures in which the non-reference picture is deleted by said deletion step be executed and an output step of outputting to the network.
本発明の第三の態様に係る符号化プログラムによると、削除しても映像品質への影響が少ない非参照ピクチャの生成条件を、ネットワークの帯域の情報に応じて決定することができる。そして、生成した非参照ピクチャを、ネットワークの帯域の情報に応じて、出力前に削除することができる。よって、符号化処理等を経ることによる時間遅延を生じさせずに、ネットワークの帯域の変動に素早く追従して、データ出力のビットレートを変更することができる。
According to the encoding program according to the third aspect of the present invention, it is possible to determine the generation condition of the non-reference picture that has little influence on the video quality even if it is deleted according to the information of the network bandwidth. Then, the generated non-reference picture can be deleted before output according to network bandwidth information. Therefore, the bit rate of data output can be changed by quickly following the fluctuation of the network bandwidth without causing a time delay due to the encoding process or the like.
以下、本発明の符号化装置を具現化した第一の実施形態であるテレビ会議装置1について、図面を参照して説明する。なお、参照する図面は、本発明が採用し得る技術的特徴を説明するために用いられるものである。図面に記載されている装置の構成、各種処理のフローチャート等は、それのみに限定する趣旨ではなく、単なる説明例である。
Hereinafter, a video conference apparatus 1 according to a first embodiment that embodies the encoding apparatus of the present invention will be described with reference to the drawings. The drawings to be referred to are used for explaining technical features that can be adopted by the present invention. The configuration of the apparatus, the flowcharts of various processes, and the like described in the drawings are not intended to be limited to these, but are merely illustrative examples.
テレビ会議装置1は、ネットワーク8(図1参照)を介して他のテレビ会議装置1と接続する。各テレビ会議装置1は、画像データおよび音声データを互いに入出力する。その結果、複数の拠点のユーザが映像および音声を共有することができる。よって、全てのユーザが同一拠点にいない場合でも、ユーザは円滑に会議を実行することができる。
The video conference apparatus 1 is connected to another video conference apparatus 1 via the network 8 (see FIG. 1). Each video conference device 1 inputs and outputs image data and audio data. As a result, users at multiple bases can share video and audio. Therefore, even when all the users are not at the same base, the users can smoothly execute the conference.
図1を参照して、テレビ会議装置1の電気的構成について説明する。テレビ会議装置1は、テレビ会議装置1の制御を司るCPU10を備えている。CPU10には、ROM11、RAM12、ハードディスクドライブ(以下、「HDD」という。)13、および入出力インターフェース19が、バス18を介して接続されている。
The electrical configuration of the video conference device 1 will be described with reference to FIG. The video conference apparatus 1 includes a CPU 10 that controls the video conference apparatus 1. A ROM 11, a RAM 12, a hard disk drive (hereinafter referred to as “HDD”) 13, and an input / output interface 19 are connected to the CPU 10 via a bus 18.
ROM11は、テレビ会議装置1を動作させるためのプログラムおよび初期値等を記憶している。RAM12は、制御プログラムで使用される各種の情報を一時的に記憶する。HDD13は、各種の情報を記憶する不揮発性の記憶装置である。HDD13の代わりに、EEPROMまたはメモリカード等の記憶装置を用いてもよい。
The ROM 11 stores a program for operating the video conference device 1 and initial values. The RAM 12 temporarily stores various information used in the control program. The HDD 13 is a non-volatile storage device that stores various types of information. Instead of the HDD 13, a storage device such as an EEPROM or a memory card may be used.
入出力インターフェース19には、音声入力処理部21、音声出力処理部22、映像入力処理部23、映像出力処理部24、操作部25、および外部通信I/F26が接続されている。音声入力処理部21は、音声を入力するマイク31からの音声データの入力を処理する。音声出力処理部22は、音声を出力するスピーカ32の動作を処理する。映像入力処理部23は、映像を撮像するカメラ33からの映像データ(動画像データ)の入力を処理する。映像出力処理部24は、映像を表示する表示装置34の動作を処理する。操作部25は、ユーザがテレビ会議装置1に各種指示を入力するために用いられる。外部通信I/F26は、テレビ会議装置1をネットワーク8に接続する。
The input / output interface 19 is connected to an audio input processing unit 21, an audio output processing unit 22, a video input processing unit 23, a video output processing unit 24, an operation unit 25, and an external communication I / F 26. The voice input processing unit 21 processes input of voice data from the microphone 31 that inputs voice. The audio output processing unit 22 processes the operation of the speaker 32 that outputs audio. The video input processing unit 23 processes input of video data (moving image data) from the camera 33 that captures video. The video output processing unit 24 processes the operation of the display device 34 that displays video. The operation unit 25 is used for a user to input various instructions to the video conference apparatus 1. The external communication I / F 26 connects the video conference device 1 to the network 8.
RAM12について詳細に説明する。RAM12には、ワークエリア121、およびFIFOバッファエリア122(以下、「FIFOバッファ122」とも言う。)等の各種記憶エリアが設けられている。ワークエリア121には、処理に必要なフラグ等の各種データが記憶される。FIFOバッファエリア122には、符号化された画像のデータである符号化データが、ネットワーク8に出力される前に一時的に記憶される。なお、FIFOバッファとは、格納したデータを先に格納した順に出力する方式のバッファである。
The RAM 12 will be described in detail. The RAM 12 is provided with various storage areas such as a work area 121 and a FIFO buffer area 122 (hereinafter also referred to as “FIFO buffer 122”). The work area 121 stores various data such as flags necessary for processing. In the FIFO buffer area 122, encoded data that is encoded image data is temporarily stored before being output to the network 8. The FIFO buffer is a buffer that outputs stored data in the order in which they are stored.
次に、テレビ会議装置1における画像データの処理の概要について説明する。テレビ会議装置1は、カメラ33から入力された画像データを、H.264の規格に基づいて画像圧縮符号化し、符号化データを生成する。テレビ会議装置1は、生成した符号化データを、ネットワーク8を介して他のテレビ会議装置1に出力する。なお、テレビ会議装置1は、他のテレビ会議装置1からネットワーク8を介して入力された符号化データを復号して表示装置34に表示させる。しかし、この処理は本発明の要部ではないため、以下では画像圧縮符号化および符号化データの出力の処理について説明を行う。画像圧縮符号化には、フレーム内符号化とフレーム間符号化とがある。
Next, an outline of image data processing in the video conference apparatus 1 will be described. The video conference apparatus 1 converts the image data input from the camera 33 into H.264. Image compression encoding is performed based on the H.264 standard to generate encoded data. The video conference apparatus 1 outputs the generated encoded data to another video conference apparatus 1 via the network 8. Note that the video conference device 1 decodes the encoded data input from the other video conference device 1 via the network 8 and causes the display device 34 to display the decoded data. However, since this process is not the main part of the present invention, the following will describe the process of image compression encoding and output of encoded data. Image compression coding includes intra-frame coding and inter-frame coding.
フレーム内符号化とは、カメラによって入力された連続する複数フレーム分の画像データのうちの1フレーム分の画像データ内で、画面内予測によって行われる符号化である。フレーム内符号化によって生成される符号化データであるIピクチャ(Intra-coded Picture)は、他のピクチャを参照することなく単独で復号化される。
The intra-frame coding is coding performed by intra-screen prediction within one frame of image data of a plurality of consecutive frames input by the camera. An I picture (Intra-coded Picture) that is encoded data generated by intra-frame encoding is decoded independently without referring to other pictures.
一方、フレーム間符号化では、連続するフレームのデータのうち、符号化されるフレームのデータとは異なるフレームのデータが参照されて予測誤差が計算され、計算された予測誤差が符号化される。フレーム間符号化によって生成される符号化データには、Pピクチャ(Predictive-coded Picture)およびBピクチャ(Bidirectional-coded Picture)がある。本実施形態では、過去のピクチャを参照することで生成されるPピクチャが主に利用される。Pピクチャを復号するには、符号化時に参照されたピクチャが必要となる。しかし、Pピクチャのデータ量は、単独で復号化されるIピクチャに比べて少ない。
On the other hand, in interframe coding, prediction data is calculated by referring to data of a frame different from data of a frame to be encoded among continuous frame data, and the calculated prediction error is encoded. Coded data generated by interframe coding includes a P picture (Predictive-coded Picture) and a B picture (Bidirectional-coded Picture). In this embodiment, P pictures generated by referring to past pictures are mainly used. To decode a P picture, a picture referenced at the time of encoding is required. However, the data amount of the P picture is smaller than that of the I picture that is decoded alone.
他のいずれかのPピクチャの復号化時に参照されるピクチャを「参照ピクチャ」という。一方、他のいずれのPピクチャの復号化時にも参照されないピクチャを「非参照ピクチャ」という。一般に、IピクチャとPピクチャとを連続して生成する場合、Iピクチャは参照ピクチャとされる場合が多い。しかし、Iピクチャ、Iピクチャ、Pピクチャという順に符号化データが生成された場合、Pピクチャの復号化時にそのPピクチャの直前のIピクチャを参照していれば、最初のIピクチャは非参照ピクチャとなる。Pピクチャの復号化時には、IピクチャおよびPピクチャのいずれを参照することもできる。また、Pピクチャは、参照ピクチャおよび非参照ピクチャのいずれにもなり得る。
A picture that is referred to when any other P picture is decoded is referred to as a “reference picture”. On the other hand, a picture that is not referred to when any other P picture is decoded is referred to as a “non-reference picture”. Generally, when an I picture and a P picture are generated in succession, the I picture is often used as a reference picture. However, when encoded data is generated in the order of I picture, I picture, and P picture, if the I picture immediately before the P picture is referenced when the P picture is decoded, the first I picture is a non-reference picture. It becomes. When decoding a P picture, it is possible to refer to either an I picture or a P picture. A P picture can be either a reference picture or a non-reference picture.
図2に示すように、テレビ会議装置1では、まず、カメラ33からの入力画像41に対してDCT/量子化42が行われる。ここでは、DCT(Discrete Cosine Transform:離散コサイン変換)によって変換された係数が、量子化パラメータに従って量子化される。次いで、量子化された後のデータの一部に対し、逆量子化/逆DCT43が行われる。逆量子化/逆DCT43が施されたデータにデブロッキングフィルタ44がかけられて、フレームメモリ45に記憶される。
As shown in FIG. 2, in the video conference device 1, first, DCT / quantization 42 is performed on the input image 41 from the camera 33. Here, the coefficient transformed by DCT (Discrete Cosine Transform) is quantized according to the quantization parameter. Next, inverse quantization / inverse DCT 43 is performed on a part of the quantized data. The data subjected to the inverse quantization / inverse DCT 43 is subjected to the deblocking filter 44 and stored in the frame memory 45.
フレーム内符号化が行われる場合には、フレームメモリ45に記憶されたデータに対し、画面内予測46が行われ、さらにDCT/量子化42が行われる。量子化されたデータに対し、エントロピー符号化47が行われる。エントロピー符号化47によって生成された符号化データ48は、FIFOバッファ122に入力される。FIFOバッファ122に入力された符号化データは、後述する非参照ピクチャ削除手段49を経て、削除されなかった符号化データのみがネットワーク出力50の対象となる。
When intra-frame coding is performed, intra-frame prediction 46 is performed on the data stored in the frame memory 45, and further DCT / quantization 42 is performed. Entropy encoding 47 is performed on the quantized data. The encoded data 48 generated by the entropy encoding 47 is input to the FIFO buffer 122. As for the encoded data input to the FIFO buffer 122, only the encoded data that has not been deleted is subjected to the network output 50 through the non-reference picture deleting means 49 described later.
フレーム間符号化が行われる場合には、入力画像41によって動き予測51が行われ、フレームメモリ45内の以前の予測画像に基づく動き補償52が行われる。動き補償52によって算出された予測誤差に対して、明るさに関する重み係数による重み付き予測53が行われ、さらにDCT/量子化42が行われる。量子化されたデータに対し、エントロピー符号化47が行われて、符号化データ48が生成される。以後の流れは、フレーム内符号化の場合と同じである。
When inter-frame coding is performed, motion prediction 51 is performed using the input image 41, and motion compensation 52 based on the previous predicted image in the frame memory 45 is performed. The prediction error calculated by the motion compensation 52 is subjected to weighted prediction 53 using a weighting factor related to brightness, and further DCT / quantization 42 is performed. Entropy encoding 47 is performed on the quantized data, and encoded data 48 is generated. The subsequent flow is the same as in the case of intra-frame coding.
ここで、ネットワーク8の帯域が低下した場合(狭くなった場合)の処理について説明する。高いビットレートによるデータ出力を、帯域が低下した状態で行うと、輻輳状態の発生、パケットロス等の不具合が生じるおそれがある。その結果、映像品質の劣化が生じる。
Here, the processing when the bandwidth of the network 8 is reduced (when narrowed) will be described. If data output at a high bit rate is performed in a state where the bandwidth is reduced, problems such as occurrence of congestion and packet loss may occur. As a result, the video quality is degraded.
従来では、例えば、符号化データを一時的に保存するバッファ(本実施形態における「FIFOバッファ122」)を設けることで、帯域に応じた符号化データの出力を行う装置がある。この装置では、FIFOバッファ122のオーバーフローを防止する必要がある。そこで、FIFOバッファ122の空き容量を監視し、空き容量が減少した場合に、DCT/量子化42における量子化パラメータへのフィードバックを行うことで、符号化データのデータ量を調整している。しかし、量子化パラメータを変更してから、変更された量子化パラメータに基づいて符号化データが実際に生成されるまでには、符号化処理等を経る必要があるため時間遅延が生じる。よって、ビットレートを素早く変更することはできず、オーバーフローが生じるおそれは残る。さらに、FIFOバッファ122で符号化データをバッファリングする方法では、バッファリングされた符号化データがネットワーク8に出力されるまでに時間遅延が生じる。従って、帯域の変動に素早く追従してビットレートを変更することはできなかった。その結果、例えばテレビ会議を行っている場合には、映像の再生が遅れ、ユーザは円滑に会議を実行できなかった。
Conventionally, for example, there is an apparatus that outputs encoded data corresponding to a band by providing a buffer (“FIFO buffer 122” in the present embodiment) that temporarily stores encoded data. In this apparatus, it is necessary to prevent the FIFO buffer 122 from overflowing. Therefore, the free capacity of the FIFO buffer 122 is monitored, and when the free capacity decreases, the data amount of the encoded data is adjusted by performing feedback to the quantization parameter in the DCT / quantization 42. However, a time delay occurs because it is necessary to go through an encoding process or the like after the quantization parameter is changed until the encoded data is actually generated based on the changed quantization parameter. Therefore, the bit rate cannot be changed quickly, and there is a possibility that overflow will occur. Further, in the method of buffering the encoded data by the FIFO buffer 122, a time delay occurs until the buffered encoded data is output to the network 8. Therefore, the bit rate cannot be changed by quickly following the fluctuation of the band. As a result, for example, when a video conference is being performed, reproduction of the video is delayed, and the user cannot smoothly execute the conference.
第一の実施形態のテレビ会議装置1は、ネットワーク8の可用帯域計測55、およびFIFOバッファ監視56を行う。そして、非参照ピクチャ削除手段49は、ネットワーク8の可用帯域が低下した場合に、FIFOバッファ122内の符号化データの一部を削除する。出力する符号化データの一部を削除することで、可用帯域が低下した場合に、ネットワーク8への出力のビットレートを素早く減少させることができる。
The video conference device 1 according to the first embodiment performs the available bandwidth measurement 55 and the FIFO buffer monitoring 56 of the network 8. Then, the non-reference picture deleting unit 49 deletes a part of the encoded data in the FIFO buffer 122 when the available bandwidth of the network 8 is reduced. By deleting a part of the encoded data to be output, the bit rate of output to the network 8 can be quickly reduced when the available bandwidth is reduced.
但し、参照ピクチャが削除されると、符号化データを受信する装置は、削除されたピクチャだけでなく、削除されたピクチャを復号化時に参照する必要があるPピクチャも復号化できなくなる。よって、映像品質の大幅な劣化を防ぐためには、非参照ピクチャを削除すべきである。可用帯域が低下した場合に、削除可能な非参照ピクチャがFIFOバッファ122に保存されていなければ、可用帯域の低下に合わせてビットレートを変更することができない。そこで、テレビ会議装置1では、ネットワーク8の可用帯域の変動量、およびFIFOバッファ122の空き容量の変動量が計測される。ピクチャモード制御ブロック58は、計測された変動量に基づいて、全てのピクチャに対するPピクチャの生成割合を変化させる。参照ピクチャ制御ブロック59は、計測された変動量に基づいてPピクチャの参照ピクチャを選択することで、全てのピクチャに対する非参照ピクチャの割合を変化させる。そして、可用帯域が低下すると、非参照ピクチャを削除する。以上の処理の詳細について、以下説明する。
However, if the reference picture is deleted, the device that receives the encoded data cannot decode not only the deleted picture but also the P picture that needs to refer to the deleted picture at the time of decoding. Therefore, non-reference pictures should be deleted in order to prevent significant degradation of video quality. If the non-reference picture that can be deleted is not stored in the FIFO buffer 122 when the available bandwidth decreases, the bit rate cannot be changed in accordance with the decrease in the available bandwidth. Therefore, the video conference apparatus 1 measures the fluctuation amount of the usable bandwidth of the network 8 and the fluctuation amount of the free capacity of the FIFO buffer 122. The picture mode control block 58 changes the P picture generation ratio for all the pictures based on the measured variation. The reference picture control block 59 selects the reference picture of the P picture based on the measured variation, thereby changing the ratio of the non-reference picture to all the pictures. When the available bandwidth decreases, the non-reference picture is deleted. Details of the above processing will be described below.
図3から図8を参照して、テレビ会議装置1が行うメイン処理について説明する。メイン処理は、ROM11に記憶されているプログラムに従ってCPU10により実行される。メイン処理は、画像データの送受信の実行指示が入力されると開始される。
A main process performed by the video conference apparatus 1 will be described with reference to FIGS. The main process is executed by the CPU 10 in accordance with a program stored in the ROM 11. The main process is started when an instruction to execute transmission / reception of image data is input.
図3に示すように、メイン処理が開始されると、各種データが初期化される(S1)。ネットワーク8の可用帯域(W)と、可用帯域の変動量(A)とが計測される(S2)。可用帯域の計測には、例えばプローブパケットの転送にパケットトレイン転送方式を用い、各プローブパケット間の片道転送遅延の増加傾向を利用し可用帯域幅の推定を行うpathloadや、ICMP(INTERNET CONTROL MESSAGE PROTOCOL)のECHO REQUESTパケットを連続して送信し、その応答パケットのパケット間隔を観測することで利用可能帯域を求めるcprobe等の公知の帯域計測技術を用いればよい。可用帯域の変動量には、今回検出された可用帯域(W)の値と、前回検出された可用帯域(W)の値との差を用いればよい。変動量は、増加量および減少量を共に含む。
As shown in FIG. 3, when the main process is started, various data are initialized (S1). The usable bandwidth (W) of the network 8 and the variation (A) of the usable bandwidth are measured (S2). For the measurement of available bandwidth, for example, a packet train transfer method is used for probe packet transfer, and pathload that estimates the available bandwidth using the increasing tendency of one-way transfer delay between probe packets, ICMP (INTERNET CONTROL MESSAGE PROTOCOL, etc.) ) ECHO REQUEST packets are continuously transmitted, and a known bandwidth measurement technique such as cprobe that obtains an available bandwidth by observing the packet interval of the response packet may be used. As the amount of change in the available bandwidth, a difference between the value of the available bandwidth (W) detected this time and the value of the available bandwidth (W) detected last time may be used. The fluctuation amount includes both an increase amount and a decrease amount.
次いで、計測された可用帯域(W)で符号化データが出力されるように、量子化パラメータへのフィードバックが行われる(S3)。可用帯域(W)に占める、可用帯域の変動量(A)の割合が算出される(S4)。算出された変動量の割合から、非参照ピクチャの生成条件が決定される(S5)。
Next, feedback to the quantization parameter is performed so that the encoded data is output in the measured available bandwidth (W) (S3). A ratio of the fluctuation amount (A) of the usable bandwidth to the usable bandwidth (W) is calculated (S4). A non-reference picture generation condition is determined from the calculated variation amount ratio (S5).
図4から図7を参照して、非参照ピクチャの生成条件の決定方法について詳細に説明する。非参照ピクチャの生成条件とは、主に、所定時間内(例えば1秒間)で生成される予め設定されたピクチャの数に対する非参照ピクチャの数の割合を示す。詳細には、テレビ会議装置1は、GOP(Group of Pictures)中のピクチャ数と、Pピクチャの符号化時および復号化時に参照される参照ピクチャの数とを決定することで、非参照ピクチャの割合を決定する。GOPとは、複数のデータを効率的に管理するために、所定期間内で生成される予め設定された数のピクチャのまとまりである。GOP中の参照ピクチャ数を減少させることで、GOP中のピクチャ数に占める非参照ピクチャ数の割合を増加させることができる。また、以下説明するように、Pピクチャの参照ピクチャをいずれにするか適宜決定することで、非参照ピクチャの割合を決定することもできる。先述した参照ピクチャ制御ブロック59(図2参照)は、決定されたPピクチャと参照ピクチャとの関係に基づいて動き補償52を制御する。
Referring to FIGS. 4 to 7, the method for determining the non-reference picture generation condition will be described in detail. The non-reference picture generation condition mainly indicates a ratio of the number of non-reference pictures to a preset number of pictures generated within a predetermined time (for example, 1 second). Specifically, the video conference apparatus 1 determines the number of non-reference pictures by determining the number of pictures in the GOP (Group of Pictures) and the number of reference pictures that are referenced when the P picture is encoded and decoded. Determine the percentage. A GOP is a set of a preset number of pictures generated within a predetermined period in order to efficiently manage a plurality of data. By reducing the number of reference pictures in the GOP, the ratio of the number of non-reference pictures to the number of pictures in the GOP can be increased. Further, as will be described below, the proportion of non-reference pictures can be determined by appropriately determining which reference picture of the P picture is used. The reference picture control block 59 (see FIG. 2) described above controls the motion compensation 52 based on the determined relationship between the P picture and the reference picture.
図4は、可用帯域の変動量の割合が0%、GOP中のピクチャ数が10とされた場合に最終的に決定される、非参照ピクチャの生成条件の例である。可用帯域の変動量の割合が0%であれば、可用帯域が急激に低下するおそれは低いため、ピクチャを大量に削除してビットレートを急激に減少させる場合は稀である。従って、非参照ピクチャの割合を上げる必要性は低い。非参照ピクチャの数を増加させる必要がなければ、Pピクチャの参照ピクチャは、そのPピクチャの直前のピクチャとすることが望ましい。直前のピクチャを参照ピクチャとすることで、予測誤差が小さくなり、データ量が小さくなるためである。よって、図4に示すように、Pピクチャの参照ピクチャは、全てそのPピクチャの直前のピクチャとしている。その結果、非参照ピクチャは、GOP中の最後のピクチャのみとなる。
FIG. 4 is an example of non-reference picture generation conditions that are finally determined when the rate of change in the available bandwidth is 0% and the number of pictures in the GOP is 10. If the rate of change in the usable bandwidth is 0%, the possibility that the usable bandwidth is suddenly reduced is low. Therefore, it is rare to rapidly reduce the bit rate by deleting a large number of pictures. Therefore, it is not necessary to increase the ratio of non-reference pictures. If it is not necessary to increase the number of non-reference pictures, the reference picture of a P picture is preferably the picture immediately before the P picture. This is because the prediction error is reduced and the amount of data is reduced by using the immediately preceding picture as a reference picture. Therefore, as shown in FIG. 4, all the reference pictures of the P picture are pictures immediately before the P picture. As a result, the non-reference picture is only the last picture in the GOP.
図5は、可用帯域の変動量の割合が50%、GOP中のピクチャ数が10とされた場合に決定される、非参照ピクチャの生成条件の例である。テレビ会議装置1は、可用帯域の変動量の割合と、GOP中のピクチャにおける非参照ピクチャの割合とが最も近くなるように、非参照ピクチャの生成条件を決定する。よって、図5に示す例では、可用帯域の変動量の割合が50%であるため、10枚中5枚のピクチャが非参照ピクチャとなるように生成条件を決定する。
FIG. 5 is an example of non-reference picture generation conditions determined when the rate of change in the available bandwidth is 50% and the number of pictures in the GOP is 10. The video conference apparatus 1 determines the generation condition of the non-reference picture so that the ratio of the change amount of the available band and the ratio of the non-reference picture in the picture in the GOP are closest. Therefore, in the example shown in FIG. 5, since the ratio of the amount of change in the usable bandwidth is 50%, the generation condition is determined so that 5 out of 10 pictures are non-reference pictures.
さらに、テレビ会議装置1は、参照ピクチャと非参照ピクチャとが均等に配置されるように生成条件を決定する。換言すると、直近の2つの非参照ピクチャの間に位置する参照ピクチャの数の偏りが最小となるように、生成条件を決定する。この偏りが大きい場合は、参照ピクチャと非参照ピクチャとが均等に配置されない。よって、非参照ピクチャを削除した場合に、映像が途切れる等の不具合が生じるおそれが高い。そこで、テレビ会議装置1は、図5に示す例では、参照ピクチャと非参照ピクチャとを交互に配置する。その結果、非参照ピクチャ間に位置する参照ピクチャの数の偏りが無くなる。そして、Pピクチャの参照ピクチャを決定する場合、そのPピクチャよりも前の参照ピクチャのうち最も近い(最も新しい)ピクチャを、そのPピクチャの参照ピクチャに決定する。つまり、複数のPピクチャの参照ピクチャを同一のピクチャとすることで、Pピクチャである非参照ピクチャの割合を増やすことができる。
Furthermore, the video conference apparatus 1 determines the generation condition so that the reference picture and the non-reference picture are arranged equally. In other words, the generation condition is determined so that the deviation in the number of reference pictures located between the two most recent non-reference pictures is minimized. When this deviation is large, the reference picture and the non-reference picture are not evenly arranged. Therefore, there is a high possibility that problems such as video interruption occur when non-reference pictures are deleted. Therefore, in the example illustrated in FIG. 5, the video conference apparatus 1 alternately arranges reference pictures and non-reference pictures. As a result, there is no bias in the number of reference pictures located between non-reference pictures. When determining the reference picture of the P picture, the closest (newest) picture among the reference pictures before the P picture is determined as the reference picture of the P picture. That is, by making the reference pictures of a plurality of P pictures the same picture, the proportion of non-reference pictures that are P pictures can be increased.
図6は、可用帯域の変動量の割合が40%、GOP中のピクチャ数が10とされた場合に決定される生成条件の例である。可用帯域の変動量の割合が40%であれば、テレビ会議装置1は、10枚中4枚のピクチャが非参照ピクチャとなるように生成条件を決定する。そして、非参照ピクチャ間に位置する参照ピクチャの数が順に「2」「1」「2」「1」となるように、参照ピクチャおよび非参照ピクチャの配置を決定する。
FIG. 6 shows an example of generation conditions determined when the rate of change in the available bandwidth is 40% and the number of pictures in the GOP is 10. If the rate of change in the usable bandwidth is 40%, the video conference apparatus 1 determines the generation condition so that four out of ten pictures are non-reference pictures. Then, the arrangement of the reference picture and the non-reference picture is determined so that the number of reference pictures located between the non-reference pictures is “2”, “1”, “2”, and “1” in order.
図7は、可用帯域の変動量の割合が90%、GOP中のピクチャ数が10とされた場合に決定される生成条件の例である。可用帯域の変動量の割合が90%であれば、テレビ会議装置1は、10枚中9枚のピクチャが非参照ピクチャとなるように生成条件を決定する。この場合、全てのPピクチャの参照ピクチャは、GOP中の先頭のIピクチャとなる。このように、GOP中の全てのPピクチャの参照ピクチャをIピクチャとすることで、Pピクチャである非参照ピクチャの割合を最大にすることができる。
FIG. 7 shows an example of generation conditions determined when the rate of change in the available bandwidth is 90% and the number of pictures in the GOP is 10. If the rate of change in the available bandwidth is 90%, the video conference apparatus 1 determines the generation condition so that nine out of ten pictures are non-reference pictures. In this case, the reference pictures of all P pictures are the first I picture in the GOP. In this way, by setting the reference pictures of all P pictures in the GOP as I pictures, the ratio of non-reference pictures that are P pictures can be maximized.
なお、図4から図7に示すように、テレビ会議装置1は、Pピクチャである非参照ピクチャが生成される割合を、S5の処理で決定することができる。全ピクチャに対するIピクチャの割合を増やしても、非参照ピクチャの割合を上げることは可能である。しかし、Iピクチャのデータサイズは、Pピクチャのデータサイズよりも大きい。よって、ネットワーク8に加わる負荷が増大する。これに対し、テレビ会議装置1は、Pピクチャである非参照ピクチャの割合を変化させることで、ネットワーク8に与える負荷を急激に増大させることなく、非参照ピクチャの生成割合を変化させることができる。
Note that, as shown in FIGS. 4 to 7, the video conference apparatus 1 can determine the rate at which non-reference pictures that are P pictures are generated in the process of S5. Even if the ratio of I pictures to all pictures is increased, the ratio of non-reference pictures can be increased. However, the data size of the I picture is larger than the data size of the P picture. Therefore, the load applied to the network 8 increases. On the other hand, the video conference apparatus 1 can change the generation ratio of the non-reference picture by changing the ratio of the non-reference picture that is a P picture without rapidly increasing the load applied to the network 8. .
図3の説明に戻る。非参照ピクチャの生成条件が決定されると(S5)、フレーム毎処理が行われる(S6)。フレーム毎処理では、決定された生成条件に従って画像データが符号化(エンコード)され、ネットワークの帯域の情報に基づいて非参照ピクチャが削除される。
Returning to the explanation of FIG. When the non-reference picture generation conditions are determined (S5), the process for each frame is performed (S6). In the process for each frame, the image data is encoded (encoded) according to the determined generation condition, and the non-reference picture is deleted based on the information of the network bandwidth.
図8に示すように、フレーム毎処理が開始されると、まず、決定されている量子化パラメータ(S3、図3参照)、および非参照ピクチャの生成条件(S5、図3参照)に従って、画像データが1枚ずつ順にエンコードされる(S11)。エンコードされたデータである符号化データが、FIFOバッファエリア122へ入力される(S12)。FIFOバッファエリア122内に非参照ピクチャがあるか否かが判断され(S13)、非参照ピクチャがなければ(S13:NO)、処理はS21へ移行する。非参照ピクチャがある場合(S13:YES)、ネットワーク8の可用帯域(W)が計測される(S15)。FIFOバッファエリア122の空き容量の変動量(C)が計測される(S16)。変動量(C)は、前回のFIFOバッファエリア122の空き容量と、今回のFIFOバッファエリア122の空き容量との差である。
As shown in FIG. 8, when the processing for each frame is started, first, according to the determined quantization parameter (S3, see FIG. 3) and the non-reference picture generation condition (S5, see FIG. 3) Data is encoded one by one in order (S11). Encoded data that is encoded data is input to the FIFO buffer area 122 (S12). It is determined whether or not there is a non-reference picture in the FIFO buffer area 122 (S13). If there is no non-reference picture (S13: NO), the process proceeds to S21. When there is a non-reference picture (S13: YES), the available bandwidth (W) of the network 8 is measured (S15). The amount of change (C) in the free capacity of the FIFO buffer area 122 is measured (S16). The fluctuation amount (C) is the difference between the previous free capacity of the FIFO buffer area 122 and the current free capacity of the FIFO buffer area 122.
S2(図3参照)またはS15で前回計測された可用帯域(W)より、S15で今回計測された可用帯域(W)の方が低いか否かが判断される(S17)。今回の方が低くなければ(S17:NO)、処理はそのままS19へ移行する。今回計測された可用帯域(W)の方が低ければ(S17:YES)、FIFOバッファエリア122内の非参照ピクチャが削除される(S18)。
It is determined whether or not the available bandwidth (W) measured this time in S15 is lower than the available bandwidth (W) measured last time in S2 (see FIG. 3) or S15 (S17). If the current time is not lower (S17: NO), the process proceeds directly to S19. If the available bandwidth (W) measured this time is lower (S17: YES), the non-reference picture in the FIFO buffer area 122 is deleted (S18).
テレビ会議装置1では、FIFOバッファエリア122内に複数の非参照ピクチャが存在する場合、可用帯域(W)の低下量に応じた数の非参照ピクチャが削除される。さらに、Pピクチャである非参照ピクチャと、Iピクチャである非参照ピクチャとが共にFIFOバッファエリア122に記憶されている場合には、データ量がより大きいIピクチャが優先して削除される。例えば、Iピクチャ(I)およびPピクチャ(P)が、I/I/P/I/I/Pの順に生成され、Pピクチャが直前のIピクチャを参照している場合には、Pピクチャの2フレーム分前のIピクチャが優先して削除される。なお、本実施形態ではH.264の規格が採用されているため、非参照ピクチャのみで構成されたアクセスユニット(H.264で定義されたピクチャの単位)が削除される。
In the video conference apparatus 1, when there are a plurality of non-reference pictures in the FIFO buffer area 122, the number of non-reference pictures corresponding to the amount of decrease in available bandwidth (W) is deleted. Further, when both a non-reference picture that is a P picture and a non-reference picture that is an I picture are stored in the FIFO buffer area 122, an I picture having a larger data amount is preferentially deleted. For example, when an I picture (I) and a P picture (P) are generated in the order of I / I / P / I / I / P and the P picture refers to the immediately preceding I picture, The I picture two frames before is preferentially deleted. In the present embodiment, H.264 is used. Since the H.264 standard is adopted, an access unit (unit of picture defined in H.264) composed only of non-reference pictures is deleted.
次いで、非参照ピクチャの生成条件を決定するためにS2(図3参照)で前回計測された可用帯域の変動量(A)と、S16で今回計測されたFIFOバッファエリア122の空き容量の変動量(C)との差が算出される(S19)。データ出力のビットレートがネットワーク8の可用帯域(W)に追従するように、FIFOバッファエリア122内で削除されなかった残りの符号化データが、順にネットワーク8に出力される(S21)。
Next, the fluctuation amount (A) of the available bandwidth measured last time in S2 (see FIG. 3) in order to determine the generation condition of the non-reference picture, and the fluctuation amount of the free capacity of the FIFO buffer area 122 measured this time in S16. The difference from (C) is calculated (S19). The remaining encoded data that has not been deleted in the FIFO buffer area 122 is sequentially output to the network 8 so that the bit rate of data output follows the available bandwidth (W) of the network 8 (S21).
次いで、S19で算出された変動量の差の絶対値が、生成条件を更新した場合の出力ビットレートの変更可能値の差の絶対値以上であるか否かが判断される(S22)。例えば、GOP中のピクチャ数を15枚として処理を行っている場合には、1つのGOP中の非参照ピクチャを1枚新たに削除、または削除を1枚分停止する毎に、出力ビットレートの平均値を約6.7%変更することができる。従って、この場合、GOP中の非参照ピクチャの数を1枚増加させるように生成条件を更新することで、出力ビットレートの変更可能値が約6.7%増加する。逆に、GOP中の非参照ピクチャの数を1枚減少させるように生成条件を変更すると、出力ビットレートの変更可能値が約6.7%減少する。テレビ会議装置1は、変動量の変化が少ない場合には生成条件の更新を行わず、変動量の変化が大きい場合のみ更新を行う。そこで、S22では、生成条件の更新前の出力ビットレートの変更可能値と、更新後の出力ビットレートの変更可能値との差の絶対値が算出される。S19で算出された変動量の差の絶対値が、出力ビットレートの変更可能値の差の絶対値より小さければ(S22:NO)、処理はそのままS11へ戻る。出力ビットレートの変更可能値の差の絶対値以上であれば(S22:YES)、生成条件を更新するために、処理はメイン処理(図3参照)へ戻る。メイン処理では、フレーム毎処理(S6)が終了すると、処理はS2へ戻り、非参照ピクチャの生成条件が更新される(S2~S5)。
Next, it is determined whether or not the absolute value of the difference in fluctuation amount calculated in S19 is equal to or larger than the absolute value of the difference in changeable value of the output bit rate when the generation condition is updated (S22). For example, when processing is performed with 15 pictures in a GOP, each time a new non-reference picture in one GOP is deleted or the deletion is stopped by one, the output bit rate is changed. The average value can be changed by about 6.7%. Therefore, in this case, the changeable value of the output bit rate is increased by about 6.7% by updating the generation condition so that the number of non-reference pictures in the GOP is increased by one. Conversely, when the generation condition is changed so as to reduce the number of non-reference pictures in the GOP by one, the changeable value of the output bit rate is reduced by about 6.7%. The video conference apparatus 1 does not update the generation condition when the change in the amount of change is small, and updates only when the change in the amount of change is large. Therefore, in S22, the absolute value of the difference between the changeable value of the output bit rate before the generation condition is updated and the changeable value of the output bit rate after the update is calculated. If the absolute value of the difference in variation calculated in S19 is smaller than the absolute value of the difference in output bit rate changeable value (S22: NO), the process directly returns to S11. If it is equal to or greater than the absolute value of the difference in the changeable value of the output bit rate (S22: YES), the process returns to the main process (see FIG. 3) in order to update the generation condition. In the main process, when the frame-by-frame process (S6) is completed, the process returns to S2, and the non-reference picture generation conditions are updated (S2 to S5).
以上説明したように、第一の実施形態のテレビ会議装置1は、ネットワーク8の帯域の情報に基づいて非参照ピクチャの生成条件を決定する。決定した生成条件で画像データを符号化する。帯域の情報に応じて、符号化されたピクチャから非参照ピクチャを削除し、残りのピクチャをネットワークへ出力する。従って、テレビ会議装置1は、削除しても映像品質への影響が少ない非参照ピクチャの生成条件を、ネットワーク8の帯域の情報に応じて決定し、生成された非参照ピクチャをネットワークへの出力前に削除することができる。よって、テレビ会議装置1は、符号化処理やバッファリング等を経ることによる時間遅延を生じさせずに、ネットワーク8の帯域の変動に素早く追従して、出力ビットレートを変更することができる。
As described above, the video conference apparatus 1 according to the first embodiment determines the non-reference picture generation condition based on the band information of the network 8. The image data is encoded with the determined generation conditions. In accordance with the band information, the non-reference picture is deleted from the encoded picture, and the remaining picture is output to the network. Therefore, the video conference apparatus 1 determines the generation condition of the non-reference picture that has little influence on the video quality even if it is deleted according to the band information of the network 8, and outputs the generated non-reference picture to the network. Can be deleted before. Therefore, the video conference apparatus 1 can quickly follow the fluctuation of the bandwidth of the network 8 and change the output bit rate without causing a time delay due to encoding processing, buffering, or the like.
詳細には、テレビ会議装置1は、非参照ピクチャの割合を増やしておくことで、出力ビットレートを素早く急激に減少させることができる。一方、非参照ピクチャの割合を減らせば、Pピクチャの参照ピクチャをそのPピクチャになるべく近いピクチャとすることができ、効率よく符号化処理を行うことができる。テレビ会議装置1は、ネットワーク8の可用帯域の変動量から、全ピクチャに対する非参照ピクチャの割合を生成条件として決定する。よって、削除しても映像品質への影響が少ない非参照ピクチャを、可用帯域の変動量に合わせて適切に生成することができる。そして、実際に可用帯域が低下した場合に、生成されている非参照ピクチャを削除することで、出力ビットレートを素早く低下させることができる。
Specifically, the video conference apparatus 1 can quickly and rapidly reduce the output bit rate by increasing the ratio of non-reference pictures. On the other hand, if the proportion of non-reference pictures is reduced, the reference picture of a P picture can be made as close as possible to that P picture, and the encoding process can be performed efficiently. The video conference apparatus 1 determines the ratio of non-reference pictures to all pictures as a generation condition from the fluctuation amount of the available bandwidth of the network 8. Therefore, it is possible to appropriately generate a non-reference picture that has little influence on the video quality even if it is deleted in accordance with the amount of change in the available bandwidth. When the available bandwidth actually decreases, the output bit rate can be quickly decreased by deleting the generated non-reference picture.
画像圧縮符号化では、フレーム内符号化によって生成されるIピクチャが非参照ピクチャとなる場合と、フレーム間符号化によって生成されるPピクチャおよびBピクチャが非参照ピクチャとなる場合とがある。フレーム間符号化によって生成されるピクチャのデータサイズは、フレーム内符号化によって生成されるIピクチャのデータサイズよりも小さい。テレビ会議装置1は、フレーム間符号化によって生成される非参照ピクチャ(Pピクチャである非参照ピクチャ)の生成割合を適宜決定することができる。その結果、生成割合を変更した場合でも、生成されるピクチャのデータサイズの総量が急激に増大することはない。よって、ネットワークの帯域に与える負荷を増大させることがない。
In image compression coding, there are cases where an I picture generated by intra-frame coding is a non-reference picture and cases where a P picture and a B picture generated by inter-frame coding are non-reference pictures. The data size of a picture generated by interframe coding is smaller than the data size of an I picture generated by intraframe coding. The video conference apparatus 1 can appropriately determine the generation ratio of non-reference pictures (non-reference pictures that are P pictures) generated by interframe coding. As a result, even when the generation ratio is changed, the total amount of data size of the generated picture does not increase rapidly. Therefore, the load applied to the network bandwidth is not increased.
テレビ会議装置1は、生成した符号化データをFIFOバッファエリア122に一時的に保存する。従って、FIFOバッファエリア122に保存されている非参照ピクチャを適宜削除することで、削除処理を容易に行うことができる。複数の非参照ピクチャを1度に削除することもできる。また、テレビ会議装置1は、テレビ会議装置1自身の内部的要因を含めたネットワークの帯域の情報を、FIFOバッファエリア122の空き容量の変動量を計測することで取得する。取得した変動量が大きくなった場合に、非参照ピクチャの生成条件を更新することができる。よって、テレビ会議装置1自身の内部的要因も適切に反映させて非参照ピクチャを生成することができる。
The video conference apparatus 1 temporarily stores the generated encoded data in the FIFO buffer area 122. Accordingly, the deletion process can be easily performed by appropriately deleting the non-reference pictures stored in the FIFO buffer area 122. It is also possible to delete a plurality of non-reference pictures at once. In addition, the video conference apparatus 1 acquires network bandwidth information including internal factors of the video conference apparatus 1 by measuring the amount of change in the free capacity of the FIFO buffer area 122. When the obtained amount of variation becomes large, the non-reference picture generation conditions can be updated. Therefore, a non-reference picture can be generated by appropriately reflecting internal factors of the video conference apparatus 1 itself.
テレビ会議装置1は、複数の非参照ピクチャを偏らせることなく分散させて生成することができる。よって、複数の非参照ピクチャを削除した場合でも、映像が途切れるように再生されるおそれを低下させることができる。また、テレビ会議装置1は、非参照ピクチャを削除する場合、データ量がPピクチャよりも大きいIピクチャを優先して削除することで、より素早く、且つ効率よく出力ビットレートを減少させることができる。
The video conference apparatus 1 can generate a plurality of non-reference pictures by distributing them without bias. Therefore, even when a plurality of non-reference pictures are deleted, it is possible to reduce the possibility that the video is reproduced so as to be interrupted. Further, when deleting the non-reference picture, the video conference apparatus 1 can reduce the output bit rate more quickly and efficiently by preferentially deleting the I picture having a larger data amount than the P picture. .
なお、上記第一の実施形態において、テレビ会議装置1が本発明の「符号化装置」に相当する。カメラ33が「画像入力手段」に相当する。ネットワーク8の可用帯域(W)、可用帯域の変動量(A)、およびFIFOバッファエリア122の空き容量の変動量(C)が「ネットワークの帯域の情報」に相当する。図3のS2、および図8のS15,S16で帯域の情報を検出するCPU10が「検出手段」として機能する。図3のS5で非参照ピクチャの生成条件を決定するCPU10が「決定手段」として機能する。図8のS11で画像データをエンコードするCPU10が「符号化手段」として機能する。図8のS18で非参照ピクチャを削除するCPU10が「削除手段」として機能する。図8のS21で符号化データを出力するCPU10が「出力手段」として機能する。
In the first embodiment, the video conference device 1 corresponds to the “encoding device” of the present invention. The camera 33 corresponds to “image input means”. The usable bandwidth (W) of the network 8, the usable bandwidth variation (A), and the free capacity variation (C) of the FIFO buffer area 122 correspond to “network bandwidth information”. The CPU 10 that detects band information in S2 of FIG. 3 and S15 and S16 of FIG. 8 functions as a “detection unit”. The CPU 10 that determines the non-reference picture generation conditions in S5 of FIG. 3 functions as a “determination unit”. The CPU 10 that encodes the image data in S11 of FIG. 8 functions as an “encoding unit”. The CPU 10 that deletes the non-reference picture in S18 of FIG. 8 functions as a “deleting unit”. The CPU 10 that outputs the encoded data in S21 of FIG. 8 functions as an “output unit”.
図3のS2でネットワーク8の可用帯域の変動量(A)を計測するCPU10が「第一検出手段」として機能する。図3のS2および図8のS15でネットワーク8の可用帯域(W)を計測するCPU10が「第二検出手段」として機能する。RAM12のFIFOバッファエリア122が「バッファ」に相当する。図8のS12で符号化データをFIFOバッファエリア122へ入力するCPU10が「記憶制御手段」として機能する。図8のS16でFIFOバッファエリア122の空き容量の変動量を計測するCPU10が「第三検出手段」として機能する。
3, the CPU 10 that measures the fluctuation amount (A) of the usable bandwidth of the network 8 in S2 of FIG. The CPU 10 that measures the available bandwidth (W) of the network 8 in S2 of FIG. 3 and S15 of FIG. 8 functions as a “second detection unit”. The FIFO buffer area 122 of the RAM 12 corresponds to a “buffer”. The CPU 10 that inputs the encoded data to the FIFO buffer area 122 in S12 of FIG. 8 functions as a “storage control unit”. The CPU 10 that measures the fluctuation amount of the free capacity of the FIFO buffer area 122 in S16 of FIG. 8 functions as a “third detection unit”.
図3のS2、および図8のS15,S16で帯域の情報を検出する処理が、本発明の「検出ステップ」に相当する。図3のS5で非参照ピクチャの生成条件を決定する処理が「決定ステップ」に相当する。図8のS11で画像データをエンコードする処理が「符号化ステップ」に相当する。図8のS18で非参照ピクチャを削除する処理が「削除ステップ」に相当する。図8のS21で符号化データを出力する処理が「出力ステップ」に相当する。
The process of detecting band information in S2 of FIG. 3 and S15 and S16 of FIG. 8 corresponds to the “detection step” of the present invention. The process of determining the non-reference picture generation condition in S5 of FIG. 3 corresponds to the “determination step”. The process of encoding the image data in S11 of FIG. 8 corresponds to the “encoding step”. The process of deleting the non-reference picture in S18 of FIG. 8 corresponds to a “deletion step”. The process of outputting the encoded data in S21 of FIG. 8 corresponds to an “output step”.
次に、本発明の第二の実施形態に係るテレビ会議装置101について、図9および図10を参照して説明する。第二の実施形態に係るテレビ会議装置101は、符号化データをバッファリングしない点が上記第一の実施形態に係るテレビ会議装置1と異なるのみである。よって、同一の構成および処理については同一の番号を付し、この説明を省略または簡略化する。
Next, the video conference apparatus 101 according to the second embodiment of the present invention will be described with reference to FIG. 9 and FIG. The video conference apparatus 101 according to the second embodiment is different from the video conference apparatus 1 according to the first embodiment only in that the encoded data is not buffered. Therefore, the same number is attached | subjected about the same structure and process, and this description is abbreviate | omitted or simplified.
図9を参照して、テレビ会議装置101における画像データの処理の概要について説明する。図9に示すように、テレビ会議装置101はFIFOバッファ122(図2参照)を備えていない。ピクチャモード制御ブロック58および参照ピクチャ制御ブロック59は、ネットワーク8の可用帯域計測55によって得られた情報を用いて非参照ピクチャを生成する。このように、本発明は、符号化データをバッファリングするバッファを用いなくとも実施することが可能である。以下、処理の詳細について説明する。
With reference to FIG. 9, an outline of image data processing in the video conference apparatus 101 will be described. As shown in FIG. 9, the video conference apparatus 101 does not include the FIFO buffer 122 (see FIG. 2). The picture mode control block 58 and the reference picture control block 59 generate a non-reference picture using information obtained by the available bandwidth measurement 55 of the network 8. Thus, the present invention can be implemented without using a buffer for buffering encoded data. Details of the processing will be described below.
テレビ会議装置101のCPU10は、画像データの送受信の実行指示が入力されるとメイン処理を開始する。メイン処理は、以下説明するフレーム毎処理以外は、第一の実施形態のテレビ会議装置1が行うメイン処理(図3参照)と同じである。よって、メイン処理の説明は省略する。図10に示すように、フレーム毎処理が開始されると、決定されている量子化パラメータ(S3、図3参照)、および非参照ピクチャの生成条件(S5、図3参照)に従って、画像データが1枚ずつエンコードされる(S11)。ネットワーク8に出力される符号化データに非参照ピクチャがあるか否かが判断される(S113)。非参照ピクチャがなければ(S113:NO)、処理はS121へ移行する。
The CPU 10 of the video conference apparatus 101 starts a main process when an instruction to execute transmission / reception of image data is input. The main process is the same as the main process (see FIG. 3) performed by the video conference apparatus 1 of the first embodiment except for the frame-by-frame process described below. Therefore, the description of the main process is omitted. As shown in FIG. 10, when the processing for each frame is started, the image data is converted according to the determined quantization parameter (S3, see FIG. 3) and the non-reference picture generation conditions (S5, see FIG. 3). One by one is encoded (S11). It is determined whether there is a non-reference picture in the encoded data output to the network 8 (S113). If there is no non-reference picture (S113: NO), the process proceeds to S121.
非参照ピクチャがある場合(S113:YES)、ネットワーク8の可用帯域(W)が計測される(S15)。ネットワーク8の可用帯域の変動量(B)が計測される(S116)。S2(図3参照)またはS15で前回計測された可用帯域(W)より、S15で今回計測された可用帯域(W)の方が低いか否かが判断される(S17)。今回の方が低くなければ(S17:NO)、処理はそのままS119へ移行する。今回のS15の処理で計測された可用帯域(W)の方が低ければ(S17:YES)、ネットワークへの出力前の非参照ピクチャが削除される(S118)。
When there is a non-reference picture (S113: YES), the available bandwidth (W) of the network 8 is measured (S15). The fluctuation amount (B) of the usable bandwidth of the network 8 is measured (S116). It is determined whether or not the available bandwidth (W) measured this time in S15 is lower than the available bandwidth (W) measured last time in S2 (see FIG. 3) or S15 (S17). If the current time is not lower (S17: NO), the process proceeds to S119 as it is. If the available bandwidth (W) measured in the current processing of S15 is lower (S17: YES), the non-reference picture before output to the network is deleted (S118).
次いで、S2(図3参照)で前回計測された可用帯域の変動量(A)と、S116で今回計測された可用帯域の変動量(B)との差が算出される(S119)。削除されなかった符号化データが、ネットワーク8に出力される(S121)。S119で算出された可用帯域の変動量の差の絶対値が、生成条件を更新した場合の出力ビットレートの変更可能値の差の絶対値以上であるか否かが判断される(S22)。可用帯域の変動量の差の絶対値が、出力ビットレートの変更可能値の差の絶対値より小さい場合には(S22:NO)、処理はそのままS11へ戻る。変更可能値の差の絶対値以上である場合には(S22:YES)、生成条件を更新するために、処理はメイン処理(S3参照)へ戻る。
Next, the difference between the fluctuation amount (A) of the available bandwidth measured last time in S2 (see FIG. 3) and the fluctuation amount (B) of the available bandwidth measured this time in S116 is calculated (S119). The encoded data that has not been deleted is output to the network 8 (S121). It is determined whether or not the absolute value of the difference in the available bandwidth fluctuation amount calculated in S119 is equal to or greater than the absolute value of the difference in the changeable value of the output bit rate when the generation condition is updated (S22). If the absolute value of the difference in the amount of change in the available bandwidth is smaller than the absolute value of the difference in the changeable value of the output bit rate (S22: NO), the process returns to S11 as it is. If the difference is greater than or equal to the absolute value of the changeable value (S22: YES), the process returns to the main process (see S3) in order to update the generation condition.
以上説明したように、第二の実施形態のテレビ会議装置101は、ネットワーク8の可用帯域の変動量に応じて非参照ピクチャの生成条件を決定し、可用帯域が低下した場合に非参照ピクチャを削除することができる。従って、可用帯域の変動に素早く追従して、ネットワーク8への出力ビットレートを変更することができる。このように、本発明は、符号化データをバッファリングするバッファを用いなくとも実施できる。
As described above, the video conference apparatus 101 according to the second embodiment determines a non-reference picture generation condition in accordance with the amount of change in the available bandwidth of the network 8, and selects a non-reference picture when the available bandwidth decreases. Can be deleted. Accordingly, the output bit rate to the network 8 can be changed by quickly following the change in the available bandwidth. As described above, the present invention can be implemented without using a buffer for buffering encoded data.
なお、第二の実施形態において、ネットワーク8の可用帯域(W)、および可用帯域の変動量(A,B)が、本発明の「ネットワークの帯域の情報」に相当する。図3のS2、および図10のS15,S116で帯域の情報を検出するCPU10が「検出手段」として機能する。図10のS11で画像データをエンコードするCPU10が「符号化手段」として機能する。図10のS118で非参照ピクチャを削除するCPU10が「削除手段」として機能する。図10のS121で符号化データを出力するCPU10が「出力手段」として機能する。
In the second embodiment, the available bandwidth (W) of the network 8 and the fluctuation amount (A, B) of the available bandwidth correspond to “network bandwidth information” of the present invention. The CPU 10 that detects band information in S2 of FIG. 3 and S15 and S116 of FIG. 10 functions as a “detection unit”. The CPU 10 that encodes the image data in S11 of FIG. 10 functions as an “encoding unit”. The CPU 10 that deletes the non-reference picture in S118 in FIG. 10 functions as a “deleting unit”. The CPU 10 that outputs the encoded data in S121 of FIG. 10 functions as an “output unit”.
本発明は、上記実施形態に限定されることはなく、様々な変形が可能であることは言うまでもない。例えば、本発明が適用できるのはテレビ会議装置に限られない。他にも、映像を配信するサーバ等、ネットワークを介して符号化データを出力する装置であれば、本発明を適用できる。上記実施形態ではH.264の規格に基づいて符号化が行われているが、他の規格を採用することもできる。
It goes without saying that the present invention is not limited to the above-described embodiment, and various modifications are possible. For example, the present invention is not limited to a video conference apparatus. In addition, the present invention can be applied to any device that outputs encoded data via a network, such as a server that distributes video. In the above embodiment, the Coding is performed based on the H.264 standard, but other standards may be adopted.
上記実施形態のテレビ会議装置1,101は、ネットワーク8の可用帯域の値に基づいて、非参照ピクチャの削除等の処理を行っている。しかし、テレビ会議装置1,101は、ネットワーク8の実際に利用している帯域を計測して処理を行ってもよい。
The video conference apparatuses 1 and 101 of the above embodiment perform processing such as deletion of non-reference pictures based on the value of the available bandwidth of the network 8. However, the video conference apparatuses 1 and 101 may measure the bandwidth actually used in the network 8 and perform processing.
図3のS5で行われる非参照ピクチャの生成条件の決定では、上記実施形態で決定されている条件以外の条件を決定してもよい。例えば、全ピクチャに対する非参照ピクチャの割合以外に、フレームレート、解像度等を生成条件として決定してもよい。
In the determination of the non-reference picture generation condition performed in S5 of FIG. 3, a condition other than the condition determined in the above embodiment may be determined. For example, in addition to the ratio of non-reference pictures to all pictures, the frame rate, resolution, and the like may be determined as generation conditions.
第二の実施形態における図10のS118の処理では、ネットワーク8の可用帯域が低下した場合に非参照ピクチャを必ず削除する必要はない。例えば、テレビ会議装置101は、可用帯域の低下量が閾値を超えた場合に非参照ピクチャを削除してもよい。
In the processing of S118 of FIG. 10 in the second embodiment, it is not always necessary to delete the non-reference picture when the available bandwidth of the network 8 is reduced. For example, the video conference apparatus 101 may delete a non-reference picture when the amount of decrease in available bandwidth exceeds a threshold value.
上記実施の形態では、FIFOバッファ122の空き容量の変動量の変化量、または可用帯域の変動量の変化量が大きくなった場合に、非参照ピクチャの生成条件を更新している(図8のS22、および図10のS22参照)。しかし、非参照ピクチャの生成条件の更新契機も変更できる。例えば、所定時間毎、又は所定数のピクチャの出力が終了する毎に、繰り返し生成条件を更新してもよい。
In the above embodiment, the non-reference picture generation condition is updated when the change amount of the fluctuation amount of the free capacity of the FIFO buffer 122 or the change amount of the fluctuation amount of the usable bandwidth becomes large (FIG. 8). (See S22 and S22 in FIG. 10). However, the trigger for updating the non-reference picture generation condition can also be changed. For example, the generation condition may be repeatedly updated every predetermined time or every time a predetermined number of pictures are output.
1,101 テレビ会議装置
8 ネットワーク
10 CPU
12 RAM
33 カメラ
122 FIFOバッファエリア 1,101 Video conference device 8Network 10 CPU
12 RAM
33Camera 122 FIFO buffer area
8 ネットワーク
10 CPU
12 RAM
33 カメラ
122 FIFOバッファエリア 1,101 Video conference device 8
12 RAM
33
Claims (10)
- 画像入力手段によって入力される画像データを符号化して、他のピクチャの復号化時に参照されるピクチャである参照ピクチャ、および他のいずれのピクチャの復号化時にも参照されないピクチャである非参照ピクチャからなる連続するピクチャを生成し、ネットワークに出力する符号化装置であって、
前記ネットワークの帯域の情報を検出する検出手段と、
前記非参照ピクチャの生成条件を、前記検出手段によって検出された帯域の情報に基づいて決定する決定手段と、
前記決定手段によって決定された生成条件で画像データを符号化する符号化手段と、
前記符号化手段によって符号化されたピクチャのうち非参照ピクチャを、前記検出手段によって検出された帯域の情報に応じて削除する削除手段と、
前記削除手段によって前記非参照ピクチャが削除された残りのピクチャを前記ネットワークへ出力する出力手段と
を備えたことを特徴とする符号化装置。 Encoding image data input by the image input means, from a reference picture that is a picture that is referenced when other pictures are decoded, and a non-reference picture that is a picture that is not referenced when any other picture is decoded An encoding device that generates a continuous picture and outputs it to a network,
Detecting means for detecting information of a bandwidth of the network;
A determination unit that determines a generation condition of the non-reference picture based on information on a band detected by the detection unit;
Encoding means for encoding image data under the generation conditions determined by the determination means;
A deletion unit that deletes a non-reference picture among the pictures encoded by the encoding unit in accordance with band information detected by the detection unit;
An encoding device comprising: output means for outputting, to the network, a remaining picture from which the non-reference picture has been deleted by the deleting means. - 前記検出手段は、前記ネットワークの帯域の変動量を前記帯域の情報として検出する第一検出手段を備え、
前記決定手段は、前記第一検出手段によって検出された前記変動量に基づいて、参照ピクチャと非参照ピクチャとからなる所定時間内で生成されるピクチャの数に対する非参照ピクチャの数の割合を前記生成条件として決定することを特徴とする請求項1に記載の符号化装置。 The detection means includes first detection means for detecting a fluctuation amount of the bandwidth of the network as the bandwidth information,
The determining means, based on the fluctuation amount detected by the first detecting means, calculates a ratio of the number of non-reference pictures to the number of pictures generated within a predetermined time consisting of reference pictures and non-reference pictures. The encoding apparatus according to claim 1, wherein the encoding condition is determined as a generation condition. - 前記決定手段は、フレーム間符号化によって他のピクチャとの間の予測誤差が符号化されることで生成される非参照ピクチャについての、前記所定時間内で生成されるピクチャの数に対する割合を、前記生成条件として決定することを特徴とする請求項2に記載の符号化装置。 The determination unit is configured to calculate a ratio of a non-reference picture generated by encoding a prediction error with another picture by inter-frame encoding to a number of pictures generated within the predetermined time, The encoding apparatus according to claim 2, wherein the encoding condition is determined as the generation condition.
- 前記検出手段は、前記ネットワークの帯域を前記帯域の情報として検出する第二検出手段を備え、
前記削除手段は、前記第二検出手段によって検出された帯域が低下した場合に非参照ピクチャを削除することを特徴とする請求項1に記載の符号化装置。 The detection means comprises second detection means for detecting the bandwidth of the network as the bandwidth information,
The encoding apparatus according to claim 1, wherein the deleting unit deletes the non-reference picture when the band detected by the second detecting unit is lowered. - データを一時的に保存するバッファに、前記符号化手段によって符号化されたピクチャを記憶させる記憶制御手段を備え、
前記削除手段は、前記バッファに保存されている非参照ピクチャを削除することを特徴とする請求項1に記載の符号化装置。 A storage control means for storing the picture encoded by the encoding means in a buffer for temporarily storing data;
The encoding apparatus according to claim 1, wherein the deletion unit deletes a non-reference picture stored in the buffer. - 前記検出手段は、前記バッファの空き容量の変動量を前記帯域の情報として検出する第三検出手段を備え、
前記決定手段は、前記第三検出手段によって検出された前記バッファの空き容量の変動量に基づいて前記生成条件を決定することを特徴とする請求項5に記載の符号化装置。 The detection means includes third detection means for detecting a fluctuation amount of the free capacity of the buffer as information of the band,
6. The encoding apparatus according to claim 5, wherein the determination unit determines the generation condition based on a fluctuation amount of the free space of the buffer detected by the third detection unit. - 前記決定手段は、繰り返し生成されるピクチャの中で、非参照ピクチャが生成されてから次に非参照ピクチャが生成されるまでの間に生成される参照ピクチャの数の偏りを最小とする前記生成条件を決定することを特徴とする請求項1に記載の符号化装置。 The determination means minimizes a deviation in the number of reference pictures generated between generation of a non-reference picture and generation of a non-reference picture next among the repeatedly generated pictures. The encoding apparatus according to claim 1, wherein the condition is determined.
- 前記削除手段は、フレーム間符号化によって他のピクチャとの間の予測誤差が符号化されることで生成されるPピクチャである非参照ピクチャよりも、フレーム内符号化によって生成されるIピクチャである非参照ピクチャを優先して削除することを特徴とする請求項1に記載の符号化装置。 The deletion means is an I picture generated by intraframe encoding rather than a non-reference picture that is a P picture generated by encoding a prediction error with another picture by interframe encoding. 2. The encoding apparatus according to claim 1, wherein a certain non-reference picture is deleted preferentially.
- 画像入力手段によって入力された画像データを符号化して、他のピクチャの復号化時に参照されるピクチャである参照ピクチャ、および他のいずれのピクチャの復号化時にも参照されないピクチャである非参照ピクチャからなる連続するピクチャを生成し、ネットワークに出力する符号化方法であって、
前記ネットワークの帯域の情報を検出する検出ステップと、
前記非参照ピクチャの生成条件を、前記検出ステップによって検出された帯域の情報に基づいて決定する決定ステップと、
前記決定ステップによって決定された生成条件で画像データを符号化する符号化ステップと、
前記符号化ステップによって符号化されたピクチャのうち非参照ピクチャを、前記検出ステップによって検出された帯域の情報に応じて削除する削除ステップと、
前記削除ステップによって前記非参照ピクチャが削除された残りのピクチャを前記ネットワークへ出力する出力ステップと
を備えたことを特徴とする符号化方法。 By encoding the image data input by the image input means, from a reference picture that is a picture that is referenced when other pictures are decoded and a non-reference picture that is a picture that is not referenced when any other picture is decoded An encoding method for generating a continuous picture and outputting to a network,
A detecting step for detecting information of a bandwidth of the network;
A determination step of determining a generation condition of the non-reference picture based on information of a band detected by the detection step;
An encoding step for encoding image data under the generation conditions determined by the determination step;
A deletion step of deleting a non-reference picture among the pictures encoded by the encoding step according to the band information detected by the detection step;
And an output step of outputting the remaining picture from which the non-reference picture has been deleted by the deleting step to the network. - 画像入力手段によって入力された画像データを符号化して、他のピクチャの復号化時に参照されるピクチャである参照ピクチャ、および他のいずれのピクチャの復号化時にも参照されないピクチャである非参照ピクチャからなる連続するピクチャを生成し、ネットワークに出力する符号化プログラムであって、
コンピュータに、
前記ネットワークの帯域の情報を検出する検出ステップと、
前記非参照ピクチャの生成条件を、前記検出ステップによって検出された帯域の情報に基づいて決定する決定ステップと、
前記決定ステップによって決定された生成条件で画像データを符号化する符号化ステップと、
前記符号化ステップによって符号化されたピクチャのうち非参照ピクチャを、前記検出ステップによって検出された帯域の情報に応じて削除する削除ステップと、
前記削除ステップによって前記非参照ピクチャが削除された残りのピクチャを前記ネットワークへ出力する出力ステップと
を実行させることを特徴とする符号化プログラム。 By encoding the image data input by the image input means, from a reference picture that is a picture that is referenced when other pictures are decoded and a non-reference picture that is a picture that is not referenced when any other picture is decoded An encoding program for generating a continuous picture and outputting it to a network,
On the computer,
A detecting step for detecting information of a bandwidth of the network;
A determination step of determining a generation condition of the non-reference picture based on information of a band detected by the detection step;
An encoding step for encoding image data under the generation conditions determined by the determination step;
A deletion step of deleting a non-reference picture among the pictures encoded by the encoding step according to the band information detected by the detection step;
And a step of outputting, to the network, a remaining picture from which the non-reference picture has been deleted by the deleting step.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009207117A JP2011061362A (en) | 2009-09-08 | 2009-09-08 | Encoding device, encoding method, and encoding program |
JP2009-207117 | 2009-09-08 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011030680A1 true WO2011030680A1 (en) | 2011-03-17 |
Family
ID=43732354
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2010/064603 WO2011030680A1 (en) | 2009-09-08 | 2010-08-27 | Encoding device, encoding method, and encoding program |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP2011061362A (en) |
WO (1) | WO2011030680A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170055001A1 (en) * | 2014-05-08 | 2017-02-23 | Mitsubishi Electric Corporation | Image encoding apparatus and image decoding apparatus |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2687231A4 (en) | 2011-03-18 | 2014-10-22 | Univ Kagoshima | Composition for treatment and diagnosis of pancreatic cancer |
US8793389B2 (en) * | 2011-12-20 | 2014-07-29 | Qualcomm Incorporated | Exchanging a compressed version of previously communicated session information in a communications system |
JP5853757B2 (en) * | 2012-02-21 | 2016-02-09 | 富士通株式会社 | Moving picture coding apparatus and moving picture coding method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06351006A (en) * | 1993-06-10 | 1994-12-22 | Nippon Telegr & Teleph Corp <Ntt> | Variable rate encoding device for picture signal |
JPH06350983A (en) * | 1993-06-10 | 1994-12-22 | Nippon Telegr & Teleph Corp <Ntt> | Variable rate encoder and decoder for picture signal |
JPH08191451A (en) * | 1995-01-10 | 1996-07-23 | Canon Inc | Moving picture transmitter |
JP2008301309A (en) * | 2007-06-01 | 2008-12-11 | Panasonic Corp | Encoding rate control method, transmission apparatus to control encoding rate, program storage medium, and integrated circuit |
JP2009060553A (en) * | 2007-09-04 | 2009-03-19 | Meidensha Corp | Method of transceiving mpeg data |
-
2009
- 2009-09-08 JP JP2009207117A patent/JP2011061362A/en active Pending
-
2010
- 2010-08-27 WO PCT/JP2010/064603 patent/WO2011030680A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06351006A (en) * | 1993-06-10 | 1994-12-22 | Nippon Telegr & Teleph Corp <Ntt> | Variable rate encoding device for picture signal |
JPH06350983A (en) * | 1993-06-10 | 1994-12-22 | Nippon Telegr & Teleph Corp <Ntt> | Variable rate encoder and decoder for picture signal |
JPH08191451A (en) * | 1995-01-10 | 1996-07-23 | Canon Inc | Moving picture transmitter |
JP2008301309A (en) * | 2007-06-01 | 2008-12-11 | Panasonic Corp | Encoding rate control method, transmission apparatus to control encoding rate, program storage medium, and integrated circuit |
JP2009060553A (en) * | 2007-09-04 | 2009-03-19 | Meidensha Corp | Method of transceiving mpeg data |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170055001A1 (en) * | 2014-05-08 | 2017-02-23 | Mitsubishi Electric Corporation | Image encoding apparatus and image decoding apparatus |
Also Published As
Publication number | Publication date |
---|---|
JP2011061362A (en) | 2011-03-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8731152B2 (en) | Reducing use of periodic key frames in video conferencing | |
US9071841B2 (en) | Video transcoding with dynamically modifiable spatial resolution | |
JP5166021B2 (en) | Method and system for enabling fast channel changes for DSL systems | |
JP4309098B2 (en) | Method for transmitting hierarchical video coding information | |
JP4554927B2 (en) | Rate control method and system in video transcoding | |
JP4753204B2 (en) | Encoding processing apparatus and encoding processing method | |
JP2006087125A (en) | Method of encoding sequence of video frames, encoded bit stream, method of decoding image or sequence of images, use including transmission or reception of data, method of transmitting data, coding and/or decoding apparatus, computer program, system, and computer readable storage medium | |
JP2011050117A (en) | Trickmode and speed transition | |
TWI482500B (en) | Methods and signal processing apparatuses for processing an input bitstream | |
JP3668110B2 (en) | Image transmission system and image transmission method | |
US8812724B2 (en) | Method and device for transmitting variable rate video data | |
JP2012507892A (en) | Method and system for determining the quality value of a video stream | |
WO2011030680A1 (en) | Encoding device, encoding method, and encoding program | |
KR102424258B1 (en) | Method and encoder system for encoding video | |
JP4861371B2 (en) | Video quality estimation apparatus, method, and program | |
JP2009124518A (en) | Image transmission device | |
JP4447443B2 (en) | Image compression processor | |
JP5212319B2 (en) | Encoding apparatus, encoding method, and encoding program | |
JP2008263443A (en) | Information processing apparatus and method, and program | |
JP3836701B2 (en) | Method and apparatus and program for encoding moving picture, and method and apparatus for moving picture audio multiplexing | |
JP5141656B2 (en) | COMMUNICATION CONTROL DEVICE, COMMUNICATION CONTROL METHOD, AND COMMUNICATION CONTROL PROGRAM | |
JP4718736B2 (en) | Video encoding device | |
Yunus et al. | A rate control model of MPEG-4 encoder for video transmission over Wireless Sensor Network | |
JP2003023639A (en) | Data transmitter and method, data transmission program, and recording medium | |
JP2009246489A (en) | Video-signal switching apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10815277 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 10815277 Country of ref document: EP Kind code of ref document: A1 |