WO2016151852A1 - Dispositif de reproduction audio, dispositif d'affichage d'image, et procédé de reproduction audio correspondant - Google Patents

Dispositif de reproduction audio, dispositif d'affichage d'image, et procédé de reproduction audio correspondant Download PDF

Info

Publication number
WO2016151852A1
WO2016151852A1 PCT/JP2015/059430 JP2015059430W WO2016151852A1 WO 2016151852 A1 WO2016151852 A1 WO 2016151852A1 JP 2015059430 W JP2015059430 W JP 2015059430W WO 2016151852 A1 WO2016151852 A1 WO 2016151852A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
data
time
image
reproduction
Prior art date
Application number
PCT/JP2015/059430
Other languages
English (en)
Japanese (ja)
Inventor
栄作 石井
Original Assignee
Necディスプレイソリューションズ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Necディスプレイソリューションズ株式会社 filed Critical Necディスプレイソリューションズ株式会社
Priority to PCT/JP2015/059430 priority Critical patent/WO2016151852A1/fr
Publication of WO2016151852A1 publication Critical patent/WO2016151852A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L7/00Arrangements for synchronising receiver with transmitter

Definitions

  • the present invention relates to an audio reproduction device, an image display device, and an audio reproduction method for reproducing audio indicated by audio data received via a network.
  • the packet communication described above is considered to be used for transmission and reception of various data.
  • a technology for transmitting moving image data from a computer or the like to a projector via a network and reproducing the moving image data received by the projector is put into practical use.
  • an image display device such as a projector is installed in the vicinity of an image transmission device such as a computer (for example, a visual distance), and an image reproduced on the image display device by the image transmission device at hand.
  • a usage mode for operating voice In that case, in the configuration in which moving image data of several seconds or more is accumulated and played back in the buffer provided in the image display device, it takes several seconds or longer to start playback of video or audio on the image display device. The operability by the transmission device is significantly reduced.
  • video is transmitted as continuous still image data (image data), and only audio is transmitted as audio data separately from the image data, thereby improving the operability of the image transmission apparatus.
  • image data and audio data are transmitted separately in this way, for example, the transmission rate of image data transmitted from the image transmission device is changed according to the transmission speed of the network, and the image quality and frame rate to be reproduced by the image display device are changed.
  • the time it is possible to shorten the time until image reproduction (display) starts.
  • audio delay time the time from reception of audio data to the start of reproduction of the audio indicated by the audio data
  • Patent Document 1 Japanese Patent Application Laid-Open No. 2004-104701 discloses a clock on the transmission side and the reception side. In order to prevent overflow and underflow of a buffer for storing moving image data due to the difference, a method for correcting the clock of the receiving side device is described.
  • An object of the present invention is to provide an audio reproduction device, an image display device, and an audio reproduction method thereof that suppress the occurrence of audio interruption caused by delay fluctuation.
  • an audio reproduction device of the present invention is an audio reproduction device that reproduces audio indicated by audio data received via a network, A data processing device that adjusts the reproduction start timing of the audio indicated by the received audio data based on the amount of deviation from the ideal arrival time for each of the audio data received via the network; An audio output device for reproducing the audio based on the reproduction start timing of the audio adjusted by the data processing device;
  • an audio playback device that plays back audio represented by audio data received via a network
  • the audio data received via the network is adjusted to extend the playback time of the voice indicated by the network, and the data transmission rate of the network increases from the current state.
  • An audio output device that reproduces the audio based on the reproduction time of the audio adjusted by the data processing device;
  • An image display device includes the above sound reproduction device, An image output device for displaying an image indicated by image data received via the network; With The audio data transmitted corresponding to the image data is received.
  • the audio reproduction method of the present invention is an audio reproduction method by an image display device for reproducing image data and audio indicated by image data and audio data received via a network, Based on the amount of deviation from the ideal arrival time for each audio data received via the network, adjust the audio playback start timing indicated by the received audio data, The audio is reproduced based on the reproduction start timing of the audio adjusted by the data processing device.
  • FIG. 1 is a block diagram illustrating a configuration example of an image reproduction system according to the present invention.
  • FIG. 2 is a block diagram showing a configuration example of the image display apparatus shown in FIG.
  • FIG. 3A is a schematic diagram illustrating an ideal transmission example of audio data in a network.
  • FIG. 3B is a schematic diagram illustrating an actual transmission example of audio data in the network.
  • FIG. 4A is a diagram showing an example of frequency distribution information created by the data processing apparatus shown in FIG. 2, and is a histogram showing an example in which the amount of deviation of each voice packet is distributed in a positive value.
  • FIG. 4B is a diagram illustrating an example of frequency distribution information created by the data processing device 12 illustrated in FIG.
  • FIG. 2 is a histogram illustrating an example in which the deviation amount of each voice packet is distributed in a negative value.
  • Figure 5 is a maximum duration T B retractable sound audio buffer shown in FIG. 2, the delay and time T P of the audio, and the reproduction time T D of the audio corresponding to the amount of audio data remaining in the sound buffer It is a schematic diagram which shows the example of a relationship.
  • FIG. 6A is a histogram showing an example of frequency distribution information when the network has a stable data transmission rate.
  • FIG. 6B is a histogram showing an example of frequency distribution information when the network 3 has an unstable data transmission rate.
  • FIG. 7 is a flowchart showing an example of the processing procedure of the image display apparatus of the present invention.
  • FIG. 1 is a block diagram illustrating a configuration example of an image reproduction system according to the present invention.
  • an image reproduction system of the present invention includes an image transmission device 2 that transmits image data and audio data, and an image that reproduces images and audio indicated by the image data and audio data received from the image transmission device 2.
  • the display device 1 has a configuration in which the image transmission device 2 and the image display device 1 are connected via a network 3.
  • the image transmission device 2 transmits video to be reproduced by the image display device 1 as image data composed of continuous still image data, and transmits audio data corresponding to the image data separately from the image data. Further, the image transmission device 2 transmits the audio data corresponding to the image data with priority over the image data. Further, the image transmission device 2 changes the transmission rate of the image data according to the transmission speed of the network, and changes the image quality and frame rate of the image to be reproduced by the image display device 1. Note that image data and audio data are transmitted as image packets and audio packets including the respective data.
  • the image data and the audio data are, for example, an identifier indicating the image transmission device 2 that transmits each data, a time stamp corresponding to the timing when each data is captured, a time stamp corresponding to the timing when each data is reproduced or displayed, and It may be associated with information such as a content name corresponding to image data and audio data.
  • the image packet and the audio packet may include such information.
  • the audio data sampling frequency, the number of sampling bits, the number of channels (monaural, stereo, etc.), and audio packet transmission Audio related information which is information necessary for reproduction of audio indicated by audio data, including an interval (audio transmission unit time) T and the like is transmitted.
  • the voice packet transmission interval corresponds to an ideal voice packet arrival interval.
  • the image display device 1 receives the image data, sound data, and sound related information that the image transmission device 2 instructs to reproduce, and reproduces the image and sound indicated by the image data and sound data.
  • the image transmission device 2 includes a communication device that transmits image data and audio data via the network 3, a CPU (Central Processing Unit) that executes processing according to a program, and a memory that stores data and programs processed by the CPU. It can be realized by an information processing apparatus (computer).
  • the communication device may have any known configuration regardless of the wired method or the wireless method as long as data can be transmitted via the network 3.
  • the network 3 is a known data transmission path including a network device (not shown) that relays packets transmitted and received between the image transmission device 2 and the image display device 1. Since the network 3 has a configuration in which a number of data transmission paths are formed by a number of network devices as is well known, the network 3 is shown in a cloud shape in FIG.
  • FIG. 2 is a block diagram illustrating a configuration example of the image display apparatus 1 illustrated in FIG.
  • the image display device 1 includes a communication device 11 that receives image data and audio data from the image transmission device 2 via the network 3, and data that performs a required process on the received image data and audio data.
  • a processing device 12 and a storage device that holds information generated by the data processing device 12 and holds received image data and received voice data or voice data processed by the data processing device 12 (voice correction data) 13, an audio output device 14 that reproduces and outputs the audio indicated by the received audio data or audio correction data, and an image output device 15 that reproduces and displays an image indicated by the image data.
  • the audio reproduction device 4 of the present invention is configured to include the communication device 11, the data processing device 12, the storage device 13, and the audio output device 14 shown in FIG.
  • the image display apparatus 1 includes a projector, a display, and a function capable of reproducing an image and sound, including the functions of the communication device 11, the data processing device 12, the storage device 13, the sound output device 14, and the image output device 15 shown in FIG. It can be realized with a monitor.
  • the functions of the data processing device 12 and the storage device 13 can be realized by an information processing device (computer) that includes a CPU (Central Processing Unit) that executes processing according to a program and a memory that stores data and programs processed by the CPU.
  • the communication device 11 includes a data receiving unit 111 that sequentially receives image data and audio data from the image transmission device 2 via the network 3 and outputs the received image data and audio data to the image / audio dividing unit 121.
  • the communication device 11 may have any known configuration regardless of the wired method or the wireless method as long as data can be transmitted via the network 3. Data transmission is performed using, for example, well-known packet communication.
  • the storage device 13 includes a video memory 131 that holds image data, an audio correction data storage memory 132 that holds information used for audio data correction, and an audio buffer 133 that holds audio data or audio correction data.
  • the data processing device 12 includes an audio / video dividing unit 121, an audio transmission processing unit 122, and an audio data processing unit 123.
  • the image / sound dividing unit 121 performs necessary processing such as decoding processing on the image data received from the communication device 11, and writes the processed image data in the video memory 131 included in the storage device 13.
  • the image / audio dividing unit 121 outputs the audio data and the audio related information received from the communication device 11 to the audio transmission processing unit 122.
  • the voice transmission processing unit 122 determines a predetermined reference time from the arrival time of the voice packet including the voice data received by the communication device 11, and based on the voice related information, an ideal arrival time for each voice packet, in other words, Then, based on the reference time and a predetermined arrival interval, a deviation amount from the scheduled arrival time is detected, and deviation amount information that is information relating to the deviation amount is generated.
  • the shift amount information includes, for example, information indicating the distribution of the shift amount (frequency distribution information), an average value, a maximum value, a minimum value, or a value of the shift amount.
  • frequency distribution information is mainly used for the shift amount information will be described.
  • the audio transmission processing unit 122 writes the determined reference time in the audio correction data storage memory 132 included in the storage device 13. In addition, the voice transmission processing unit 122 writes the generated frequency distribution information in the voice correction data storage memory 132 included in the storage device 13 and outputs the voice data received from the communication device 11 to the voice data processing unit 123. Further, the voice transmission processing unit 122 writes the voice related information in the voice correction data storage memory 132 provided in the storage device 13.
  • the reference time is a reference value of the arrival time of audio data (audio packet) received from the image transmission device 2 (not shown), and the first audio data (first audio data is the first audio data among the received audio data (audio packets).
  • the arrival time at which the received voice packet) is set as an initial value.
  • the audio data may be audio data corresponding to the received image data. In that case, the audio data and the image data are transmitted in association with each other.
  • the voice transmission processing unit 122 detects a deviation amount from the reference time for each voice data (voice packet), and starts generating deviation amount information.
  • the image display device 1 starts reproduction of silent audio from the reference time to a predetermined time.
  • audio playback may be started when a predetermined time has elapsed from the reference time.
  • the reference time is a reference when playing back sound. Since the audio packets arrive at the image display device 1 at a substantially constant cycle, the image display device 1 holds the audio data received at each substantially constant cycle by the audio buffer 133, and is held by the audio buffer 133. Audio data is read and played sequentially.
  • the audio data processing unit 123 adjusts the reproduction start timing of the audio indicated by the audio data based on the deviation amount of the audio data.
  • the sound reproduction start timing is adjusted by, for example, expanding or shortening the reproduced sound.
  • the audio data processing unit 123 reads out the frequency distribution information from the correction data storage memory 132, and based on the frequency distribution information, processes for expanding or shortening the reproduced audio of a predetermined adjustment time are converted into audio data.
  • the processed audio data (audio correction data) is written in the audio buffer 133.
  • the audio output device 14 reproduces audio based on the audio reproduction start timing adjusted by the data processing device 12.
  • the audio output device 14 includes an audio output unit 141 that sequentially reads audio data or audio correction data from the audio buffer 133 and reproduces / outputs the audio.
  • the image output device 15 includes an image output unit 151 that sequentially reads image data from the video memory 131 and displays an image.
  • FIG. 3A is a schematic diagram showing an ideal transmission example of voice data in the network 3
  • FIG. 3B is a schematic diagram showing an actual transmission example of voice data in the network 3.
  • FIG. 4A is a diagram showing an example of frequency distribution information created by the data processing device 12 shown in FIG. 2, and is a histogram showing an example in which the amount of deviation of each voice packet is distributed in a positive value.
  • FIG. 4B is a diagram illustrating an example of frequency distribution information created by the data processing device 12 illustrated in FIG. 2, and is a histogram illustrating an example in which the deviation amount of each voice packet is distributed in a negative value.
  • FIG. 4A is a diagram showing an example of frequency distribution information created by the data processing device 12 shown in FIG. 2, and is a histogram showing an example in which the deviation amount of each voice packet is distributed in a negative value.
  • the 3A shows a state in which audio packets are ideally transmitted via the network 3, in which audio packets arrive at the image display device 1 at regular intervals (audio transmission unit time T).
  • FIG. 3B When no problem has occurred in the image transmission device 2, the image display device 1, and the network 3, when voice packets are transmitted from the image transmission device 2 at regular intervals (sound transmission unit time T), FIG. As shown, it is considered that a voice packet arrives at the image display device 1 at every voice transmission unit time T. However, actual voice packet arrival intervals vary as indicated by T 0 to T 6 in FIG. 3B, for example, due to “unstable data transmission time fluctuation (delay fluctuation)” of the data transmission time by the network 3.
  • Equation (1) shows the amount of deviation ⁇ Tn from the ideal arrival time of the nth voice packet.
  • ⁇ Tn takes a positive value when the voice packet arrives later than ideal, and takes a negative value when the voice packet arrives earlier than ideal.
  • the actual voice packet arrival intervals T 0 to T n can be detected using the arrival time of each voice packet.
  • the arrival time includes, for example, the timing when the voice packet arrives.
  • time counting is started from the arrival point of the first voice packet arriving at the image display device 1 as a base point (reference time), and the timing, base point (reference time) and voice transmission unit time at which another voice packet arrives.
  • the amount of deviation for each voice packet may be detected based on T.
  • each packet includes information indicating which part of the data is the data included in the packet. Therefore, the image display apparatus 1 determines whether the received audio packet is a packet including the first audio data, a packet including the last audio data, or a first audio data packet. Can be determined. Further, the value of the ideal voice packet arrival interval (voice transmission unit time T) is notified to the image display device 1 from the image transmission device 2 in advance, so that the image display device 1 is known. In this case, in order to calculate ⁇ Tn for each voice packet using the above equation (1), it is only necessary to know the arrival time of the voice packet that has first arrived at the image display device 1.
  • the arrival time of the first voice data is set as the “reference time” among the voice data newly transmitted for playback transmitted from the image transmission device 2.
  • the voice transmission processing unit 122 detects a deviation in the current reference time (initial value) based on the frequency distribution information indicating the distribution of ⁇ Tn. For example, as shown in FIG.
  • the image display apparatus 1 receives a voice packet, storing the audio data contained in the voice packet into the audio buffer 133 starts reproducing the voice represented by voice data from the reference time after (n-1) T + T P .
  • the voice delay time T P may be set in advance to a value satisfying T P ⁇ T + ⁇ Tn MAX in consideration of the maximum value ( ⁇ Tn MAX ) of the deviation amount ⁇ Tn from the ideal arrival time of the voice packet due to delay fluctuation.
  • .DELTA.Tn MAX network 3 is unknown, the delay time T P of the voice that the user can tolerate is different depending on the type of sound to be reproduced.
  • the speech delay time TP is as short as possible for conversation and the like, and BGM (Back-Ground Music) or the like often does not cause a problem even if the speech delay time TP is long. Therefore, an adjustment mechanism for adjusting the T P to the image display apparatus 1 or the image transmission apparatus 2 is provided, the user may be allowed to arbitrarily set the T P by the adjusting mechanism.
  • the image display device 1 starts playing the voice
  • the next voice packet if the arrival next voice packet within T P, the next audio data at the time when the reproduction of the audio data previously received is completed is stored in the audio buffer 133 Therefore, the audio is not played back intermittently.
  • the next voice packet arrives later than T P, the next audio data when the reproduction is completed audio data previously received is not yet stored in the audio buffer 133. In that case, a silent state is maintained until the next audio data is received.
  • the entire voice packet tends to arrive later than the ideal arrival time, that is, when the current reference time is earlier, there is a high possibility that a silent state in which the voice cannot be reproduced in the image display device 1 occurs. Become.
  • the image display apparatus 1 When the entire voice packet tends to arrive earlier than the ideal arrival time, that is, when the current reference time is late, the image display apparatus 1 does not generate a silent state. In that case, by adjusting the reference time to the correct time can be set shorter than the current delay time T P of the speech. However, even if the reproduction of the audio data previously received is completed, since the audio data corresponding to a time longer than the audio data corresponding to T P remains in the audio buffer 133, possibly audio buffer 133 overflows is there.
  • the data processing device 12 adjusts the playback start timing of the voice indicated by the voice data included in the received voice packet based on the shift amount ⁇ Tn for each voice packet.
  • the voice transmission processing unit 122 detects the corrected deviation amount based on deviation amount information that is information on the deviation amount ⁇ Tn for each voice packet, for example, the frequency distribution information.
  • the audio data processing unit 123 adjusts the audio reproduction start timing based on the detected amount of correction deviation.
  • the average value of ⁇ Tn of each voice packet is a positive value, it is assumed that each voice packet tends to arrive later than the ideal arrival time, and the average value of ⁇ Tn of each voice packet is a negative value The voice packets may tend to arrive earlier than the ideal arrival time.
  • the frequency distribution information includes a frequency distribution in which ⁇ Tn is classified for each predetermined range, as shown in FIG. 4A or 4B.
  • the generation of the deviation amount information is started when the reference time is set.
  • the generation of the deviation amount information may be performed every predetermined period, may be a period from the first reception of the audio data to the last reception, or every period for adjusting the sound reproduction start timing. Good.
  • the correction amount is detected based on the deviation amount ⁇ Tn for each voice packet. For example, the most frequently occurring amount ⁇ Tn is used as the correction deviation amount.
  • As the correction deviation amount an average value of the deviation amounts ⁇ Tn may be used.
  • the average value of the most frequently occurring amount ⁇ Tn or the deviation amount ⁇ Tn is an example of the corrected deviation amount.
  • the correction deviation amount may be a value corresponding to the predetermined range, for example, a median value, a maximum value, or a minimum value of the predetermined range.
  • audio data corresponding to a predetermined adjustment time may be used to extend or shorten the audio playback time.
  • the corrected deviation amount is used for the sound expansion time or shortening time.
  • a method of extending or shortening the audio playback time for example, a method of re-sampling the received audio data at a sampling frequency different from that at the time of sampling the audio data, or deleting a part of the audio data or silent data There is a way to insert.
  • the expansion time or the shortened time is about 0.5% of the playback time of the audio data used for adjustment so that the user of the video display device 1 does not feel uncomfortable with the playback audio.
  • the correction deviation amount is ⁇ 50 ms (milliseconds)
  • the decompression time and shortening time may be any time as long as the user of the video display device 1 does not feel uncomfortable with the playback sound, and may be set to 0.5% or less of the playback time of the sound data used for adjustment.
  • the reproduction time of the audio data included in each audio packet is 100 ms and the correction deviation amount is ⁇ 50 ms (or +50 ms)
  • the adjustment time for adjusting the correction deviation amount ⁇ 50 ms (or +50 ms) is 10 seconds.
  • the number of audio data corresponding to is 100. Therefore, the voice data processing unit 123 shortens each voice data to 99.5 ms (or extends to 100.5 ms).
  • the shortened (or expanded) sound data is stored in the sound buffer 133 as sound correction data.
  • the stored audio correction data is sequentially reproduced. By repeating this process 100 times, the reproduction start timing can be automatically adjusted without interruption of the sound.
  • the data processing device 12 adjusts the playback start timing of the voice indicated by the voice data and corrects the reference time based on the deviation amount ⁇ Tn for each voice packet.
  • the correction amount of the reference time and the adjustment amount of the reproduction start timing are detected based on the deviation amount ⁇ Tn for each audio packet.
  • the voice transmission processing unit 122 corrects the reference time based on the correction deviation amount. For example, as described above, when the reproduction time of the audio data included in each audio packet is 100 ms and the correction deviation amount is ⁇ 50 ms (or +50 ms), the audio data corresponding to the adjustment time of 10 seconds for adjusting the audio reproduction timing. Will be 100.
  • the audio data processing unit 123 shortens each audio data to 99.5 ms (or expands to 100.5 ms), and the audio transmission processing unit 122 shifts the reference time by ⁇ 0.5 ms (or +0.5 ms).
  • the shortened (or expanded) sound data is stored in the sound buffer 133 as sound correction data.
  • the stored audio correction data is sequentially reproduced. By repeating this process 100 times, the sound reproduction start timing can be adjusted without interruption of the sound, and the reference time can be corrected to the correction amount ⁇ 50 ms (or +50 ms).
  • the adjustment of the audio reproduction start timing based on the correction deviation amount may be adjusted based on the correction amount of the reference time. Including. Further, since the adjustment amount of the audio reproduction start timing is determined based on the correction deviation amount, the adjustment of the reference time based on the correction deviation amount is adjusted based on the adjustment amount of the audio reproduction start timing. Including.
  • the sound reproduction start timing can be appropriately set by adjusting the sound reproduction time.
  • the data processing device 12 adjusts the playback time of the voice indicated by the voice data received via the network 3 to be extended in the adjustment period
  • the audio reproduction time indicated by the audio data received via the network 3 is adjusted to be shortened in the adjustment period.
  • the audio output device reproduces audio based on the audio reproduction time adjusted by the data processing device. Note that extending or shortening the audio reproduction time adjusts the audio reproduction start timing.
  • the correction amount of the reference time and the adjustment amount of the audio reproduction start timing are set to the same amount, and the process of gradually correcting or adjusting is performed.
  • the present invention is not limited to such processing.
  • the correction of the reference time may be performed at once ( ⁇ 50 ms or +50 ms). In this case, it is desirable to stop the shift amount detection process during the adjustment time of the audio reproduction start timing. In order to appropriately detect the shift amount during this adjustment period, it is necessary to take into account the elapsed time from the start of adjustment and the adjustment amount of the reproduction start timing.
  • each voice packet tends to arrive later than the ideal arrival time (when the correction deviation is a positive value)
  • it is corrected to delay the current reference time, and the voice playback time Adjust to extend. If each voice packet tends to arrive earlier than the ideal arrival time (if the correction deviation is a negative value), correct the current reference time so that the voice playback time is reduced. Adjust to shorten.
  • the audio playback time (by adjusting the audio playback start timing) in this way, the audio data amount held in the audio buffer 133 becomes an appropriate value, and audio interruptions and the like can be suppressed. In addition, since the overflow of the audio data can be suppressed, the capacity of the audio buffer 133 can be reduced. Furthermore, by adjusting the reference time, it is possible to readjust according to the state of the network even after adjusting the audio reproduction.
  • a threshold T MAX is set in advance so that ⁇ Tn ⁇ T MAX is satisfied so that a voice packet that arrives extremely late compared to other voice packets is not used for calculation (detection) of the correction deviation amount.
  • the correction deviation amount may be calculated using only ⁇ Tn.
  • the deviation amount information may be generated as information on ⁇ Tn that satisfies ⁇ Tn ⁇ T MAX . In that case, a more appropriate correction deviation amount can be calculated.
  • a predetermined threshold value T TH is provided for the correction deviation amount, and when the absolute value of the correction deviation amount exceeds the threshold value T TH , the reference time is corrected and the sound reproduction start timing is adjusted (sound Data processing) may be performed.
  • the adjustment of the sound reproduction start timing may be performed every predetermined period. In this case, the processing load on the data processing device 12 is reduced because the reference time and the playback start timing of the audio are not frequently changed.
  • the threshold value may be set to a different value depending on whether the correction deviation amount is positive or negative. In that case, it is possible to set whether or not to place importance on the delay time of the sound, or whether or not to place importance on the suppression of the occurrence of the sound interruption.
  • the frequency of ⁇ Tn of each voice packet received thereafter is corrected to be distributed in the vicinity of zero (0).
  • the sufficiently large value for the delay time T P a T + .DELTA.Tn MAX speech Even without this, the occurrence of voice interruptions is suppressed. Therefore, it is not necessary to set the audio delay time TP to an unnecessarily long time while suppressing the occurrence of audio interruption due to delay fluctuation.
  • the audio delay time T P may be set so as to satisfy T P ⁇ T + ⁇ Tn MAX .
  • the value of ⁇ Tn MAX is the shift amount information.
  • 6A is a histogram showing an example of frequency distribution information when the network 3 has a stable data transmission rate
  • FIG. 6B is a histogram showing an example of frequency distribution information when the network 3 has an unstable data transmission rate. It is. 6A and 6B show examples of frequency distribution information after the reference time is corrected.
  • the frequency distribution of ⁇ Tn of each voice packet is relatively narrow and ⁇ Tn MAX is a small value as shown in FIG. 6A.
  • the frequency distribution of ⁇ Tn of each voice packet is relatively wide as shown in FIG. 6B, and ⁇ Tn MAX is a large value.
  • the length of the frequency distribution in the time axis direction that is, the maximum deviation amount ⁇ Tn from the ideal arrival time of the voice packet ( ⁇ Tn MAX : maximum value in the positive region) ) detects, adjusts the delay time T P of the speech in response to the .DELTA.Tn MAX.
  • the deviation amount information that is a frequency distribution may be generated as information on ⁇ Tn that satisfies ⁇ Tn ⁇ T MAX .
  • the same processing can be applied when using the average value and the maximum value of the deviation amounts as the deviation amount information.
  • the adjusted speech based on the delay time T P may be adjusted playback start timing of the sound. For example, to shorten the delay time T P of the sound, if in accordance with the time of the shortening reduces the audio data of a predetermined adjustment time, a longer delay time T P of the sound, depending on the time to the long The voice data for a predetermined adjustment time is expanded.
  • the value of .DELTA.Tn MAX since it is possible to obtain Knowing corrected displacement amount from the frequency distribution of .DELTA.Tn, may be performed simultaneously with the adjustment of the correction of the reference time and the audio delay time T P. In that case, based on both of the correction displacement amount and audio delay time after adjustment T P, it may be adjusted playback start timing of the voice by processing the audio data.
  • the transmission of data audio delay time T P of the network 3 It can be set to an optimum value according to the state.
  • FIG. 7 is a flowchart showing an example of a processing procedure of the image display apparatus 1 of the present invention.
  • the audio transmission processing unit 122 receives the audio related information from the image transmitting apparatus 2 by the data receiving unit 111 and the image audio dividing unit 121 at the start of the reception of the image data and the audio data, the audio transmission processing unit 122 The related information is stored in the voice correction data storage memory 132 (step A1).
  • the audio transmission processing unit 122 determines whether the audio data is the first audio data (step A3). In the case of the first voice data, the voice transmission processing unit 122 stores the arrival time of the voice packet including the voice data in the voice correction data storage memory 132 as the initial value of the reference time (step A4), and the process of step A10 And the audio data is written to the audio buffer 133. If the received voice data is not the first voice data, the voice transmission processing unit 122 reads the reference time and the voice transmission unit time T included in the voice related information from the voice correction data storage memory 132, and the ideal voice data. A deviation amount ⁇ Tn from the arrival time is calculated (step A5).
  • the voice transmission processing unit 122 generates deviation amount information including the ⁇ Tn, and calculates a corrected deviation amount based on the deviation amount information (step A6).
  • the audio transmission processing unit 122 determines whether or not a predetermined adjustment condition is satisfied (step A7). For example, the audio transmission processing unit 122 needs to process the audio data (adjustment of audio reproduction start timing) by comparing the correction deviation amount calculated in step A6 with a preset threshold value TTH. It is determined whether or not. If the correction deviation amount exceeds the threshold value T TH , the voice transmission processing unit 122 determines that the voice data needs to be processed, updates the reference time based on the correction deviation amount, and also uses the voice data processing unit. 123 is instructed to process the audio data.
  • the necessity of adjusting the sound reproduction start timing is not limited to the method of determining using the threshold value TTH .
  • the audio transmission processing unit 122 may determine whether or not a predetermined time has elapsed since the previously executed adjustment processing of the audio reproduction start timing.
  • the audio data processing unit 123 processes the audio data so as to expand the audio to be reproduced when the correction deviation amount is a positive value. If the correction deviation amount is a negative value, the audio data processing unit 123 processes the audio data so as to shorten the audio to be reproduced (step A8). The audio data processing unit 123 writes the processed audio data (audio correction data) in the audio buffer 133 (step A9).
  • step A7 correction amount of the reference time does not exceed the threshold value T TH, the voice transmission processing unit 122, processing of audio data is determined to be unnecessary, the audio data processing unit 123 the audio data received in the step A2 Are written in the audio buffer 133 without being processed (step A9).
  • the voice transmission processing unit 122 determines whether or not the voice data received in step A2 is the last voice data (step A10). If the voice data is not the last voice data, the process returns to step A2. The next sound data is received from the image sound dividing unit 121, and the processing from step A2 to step A10 is repeated. If the audio data received in step A2 is the last audio data, the process is terminated.
  • the flowchart shown in FIG. 7 shows a processing example in which the reference time is corrected and the audio reproduction start timing is adjusted based on the correction deviation amount.
  • a delay time T P of the speech may be determined the value after the adjustment of T P from the obtained shift amount information and the corrected shift amount in the process of step A6.
  • the amount of deviation ⁇ Tn from the ideal arrival time for each voice packet is calculated, and the reference time is corrected from the frequency distribution of ⁇ Tn, so that ⁇ Tn of each voice packet received thereafter is calculated.
  • the delay time T P of the speech in response to the data transmission state of the network 3 can be set appropriately.
  • the delay fluctuation for example, if audio data of about 1 second to several seconds is accumulated in the audio buffer 133 and audio reproduction is started, audio interruption can be suppressed.
  • the operability of the image transmission device 2 is degraded.
  • the audio data and image data are reproduced (displayed) correspondingly, the audio is reproduced with a significant delay from the image.
  • the playback start timing of the image data and the audio data corresponding to the image data are shifted, the viewer will feel uncomfortable with the playback audio.
  • the delay time T P of the speech based on the frequency distribution of ⁇ Tn of each voice packet as in the present invention, it is possible to suppress the delay time T P of the speech, can reduce such discomfort.
  • the arrival time of the voice packet is delayed and the distribution of the deviation amount is shifted in the positive direction.
  • the audio data held in the audio buffer 133 is reduced and the data transmission rate is further deteriorated, the audio data may be lost.
  • the reproduction time of the voice indicated by the arrived voice data (voice corresponding to the voice packet) is extended.
  • the arrival time of the voice packet is advanced, and the deviation distribution is shifted in the negative direction. In this case, the audio data stored in the audio buffer increases.
  • the playback time of the voice corresponding to the voice packet that has arrived is shortened.
  • the amount of audio data held in the audio buffer 133 becomes an appropriate value, and audio interruptions or audio data overflows can be suppressed.
  • the sound reproduction time is adjusted and the reference time is corrected, the distribution of the deviation amount after the adjustment processing is close to zero, and appropriate sound reproduction is performed. Further, even when the data transmission speed of the network 3 changes after the adjustment process, the same process can be performed, and appropriate sound reproduction can be continued.
  • [Appendix 3] The audio playback device according to appendix 1 or 2, The data processing device adjusts the reproduction start timing of the audio and corrects the reference time based on the deviation amount.
  • [Appendix 4] The audio reproduction device according to attachment 3, wherein When the average value of the deviation amount or the value of the deviation amount with the highest occurrence frequency of the deviation amount is a corrected deviation amount, The audio reproduction start timing is adjusted based on the correction deviation amount, The audio reproduction device, wherein the reference time is corrected based on the correction deviation amount.
  • the audio playback device according to appendix 4, wherein The data processing device includes: If the amount of correction deviation is positive, adjust to extend the playback time of the sound, correct to delay the reference time, An audio reproducing apparatus that adjusts the audio reproduction time to be shortened and corrects the reference time to be advanced when the correction deviation is negative.
  • the audio playback device according to any one of appendices 1 to 5, The reference time is set as an initial value of an arrival time at which the first audio data of the received audio data arrives.
  • the audio playback device according to any one of appendices 1 to 6,
  • the data processing device includes: An audio reproducing apparatus that gradually adjusts the audio reproduction start timing based on an audio reproduction time used for adjusting the audio reproduction start timing.
  • the audio playback device according to any one of appendices 1 to 7,
  • the ideal playback time is an audio playback device that is scheduled based on a predetermined transmission interval of the audio data and a reference time serving as a reference when playing back the audio.
  • the audio playback device according to appendix 8, wherein The data processing device includes: Based on the transmission interval of the audio data and the maximum value indicated by the deviation amount information, which is information relating to the deviation amount, the time from the arrival of the audio data to the start of reproduction of the audio indicated by the audio data An audio playback device that adjusts the audio delay time.
  • the audio playback device according to any one of appendices 5 to 9, The data processing device includes: An audio reproduction apparatus, wherein the audio expansion time or shortening time is set to 0.5% or less of an audio reproduction time used for adjusting the audio reproduction start timing.
  • a data processing device for adjusting so as to shorten the playback time of the voice indicated by the voice data received via the network;
  • An audio output device that reproduces the audio based on the reproduction time of the audio adjusted by the data processing device;
  • An audio reproducing apparatus having [Appendix 12] The sound reproducing device according to any one of appendices 1 to 11, An image output device for displaying an image indicated by image data received via the network; With An image display device that receives audio data transmitted corresponding to the image data.
  • An image transmission device that transmits video as image data composed of continuous still image data, and transmits audio data corresponding to the image data separately from the image data;
  • the image display device according to claim 12, wherein the image display device reproduces the image and sound indicated by the image data and the audio data received from the image transmission device connected to the image transmission device via a network so as to be capable of data transmission.
  • An image reproduction system An image reproduction system.
  • Appendix 14 An audio reproduction method by an image display device that reproduces an image and audio indicated by image data and audio data received via a network, Based on the amount of deviation from the ideal arrival time for each audio data received via the network, adjust the audio playback start timing indicated by the received audio data, An audio reproduction method for reproducing the audio based on the reproduction start timing of the audio adjusted by the data processing device.

Abstract

Un dispositif de reproduction audio pour reproduire l'audio représenté par des données audio reçues via un réseau comprend : un dispositif de traitement de données qui ajuste, sur la base d'un écart par rapport à un temps d'arrivée idéal de chaque élément des données audio susmentionnées reçu via le réseau, un instant de début de reproduction de l'audio représenté par les données audio reçues ; et un dispositif de sortie audio qui reproduit l'audio d'après l'instant de début de reproduction de l'audio ajusté par le dispositif de traitement de données.
PCT/JP2015/059430 2015-03-26 2015-03-26 Dispositif de reproduction audio, dispositif d'affichage d'image, et procédé de reproduction audio correspondant WO2016151852A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2015/059430 WO2016151852A1 (fr) 2015-03-26 2015-03-26 Dispositif de reproduction audio, dispositif d'affichage d'image, et procédé de reproduction audio correspondant

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2015/059430 WO2016151852A1 (fr) 2015-03-26 2015-03-26 Dispositif de reproduction audio, dispositif d'affichage d'image, et procédé de reproduction audio correspondant

Publications (1)

Publication Number Publication Date
WO2016151852A1 true WO2016151852A1 (fr) 2016-09-29

Family

ID=56978179

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/059430 WO2016151852A1 (fr) 2015-03-26 2015-03-26 Dispositif de reproduction audio, dispositif d'affichage d'image, et procédé de reproduction audio correspondant

Country Status (1)

Country Link
WO (1) WO2016151852A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0630047A (ja) * 1992-07-10 1994-02-04 Matsushita Electric Ind Co Ltd パケット遅延変動制御回路
JP2003258894A (ja) * 2002-03-05 2003-09-12 Matsushita Electric Ind Co Ltd データ受信再生方法およびデータ通信装置

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0630047A (ja) * 1992-07-10 1994-02-04 Matsushita Electric Ind Co Ltd パケット遅延変動制御回路
JP2003258894A (ja) * 2002-03-05 2003-09-12 Matsushita Electric Ind Co Ltd データ受信再生方法およびデータ通信装置

Similar Documents

Publication Publication Date Title
US20220263423A9 (en) Controlling a jitter buffer
US7424026B2 (en) Method and apparatus providing continuous adaptive control of voice packet buffer at receiver terminal
US8937963B1 (en) Integrated adaptive jitter buffer
US7457282B2 (en) Method and apparatus providing smooth adaptive management of packets containing time-ordered content at a receiving terminal
US6985501B2 (en) Device and method for reducing delay jitter in data transmission
JP4462996B2 (ja) パケット受信方法及びパケット受信装置
US9948578B2 (en) De-jitter buffer update
JP2006135974A (ja) 適応的バッファ遅延を有する音声受信機
US8594184B2 (en) Method and apparatus for controlling video-audio data playing
JP2007511939A5 (fr)
US7738772B2 (en) Apparatus and method for synchronizing video data and audio data having different predetermined frame lengths
JP4076981B2 (ja) 通信端末装置およびバッファ制御方法
WO2016151852A1 (fr) Dispositif de reproduction audio, dispositif d'affichage d'image, et procédé de reproduction audio correspondant
JP5186094B2 (ja) 通信端末、マルチメディア再生制御方法、およびプログラム
JP2013005423A (ja) 映像再生装置、映像再生方法およびプログラム
JP2017204700A (ja) 映像再生装置、映像再生方法および映像再生プログラム
US8572273B2 (en) Method and apparatus for reproducing multimedia data by controlling reproducing speed
JP2010136159A (ja) データ受信装置
JP2007318283A (ja) パケット通信システム、データ受信機器
JP2005064873A (ja) ジッタバッファ制御方法及びip電話機
JP2007274536A (ja) 受信装置及び送受信方法
JP2008199361A (ja) ストリームデータ受信再生装置
JP2005229168A (ja) メディア出力システムとその同期誤差制御方法およびプログラム
JP2005101818A (ja) 復号再生装置、復号再生用プログラムおよび復号再生方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15886404

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15886404

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP