EP2157580A1 - Video editing system - Google Patents
Video editing system Download PDFInfo
- Publication number
- EP2157580A1 EP2157580A1 EP09167674A EP09167674A EP2157580A1 EP 2157580 A1 EP2157580 A1 EP 2157580A1 EP 09167674 A EP09167674 A EP 09167674A EP 09167674 A EP09167674 A EP 09167674A EP 2157580 A1 EP2157580 A1 EP 2157580A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- audio
- parameter
- data
- point
- video editing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000008859 change Effects 0.000 claims abstract description 39
- 241000282414 Homo sapiens Species 0.000 claims description 4
- 238000000034 method Methods 0.000 description 17
- 230000005236 sound signal Effects 0.000 description 11
- 230000008569 process Effects 0.000 description 8
- 239000000284 extract Substances 0.000 description 4
- 210000005069 ears Anatomy 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
- G11B27/034—Electronic editing of digitised analogue information signals, e.g. audio or video signals on discs
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/00007—Time or data compression or expansion
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10527—Audio or video recording; Data buffering arrangements
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/238—Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
- H04N21/2389—Multiplex stream processing, e.g. multiplex stream encrypting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/433—Content storage operation, e.g. storage operation in response to a pause request, caching operations
- H04N21/4334—Recording operations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/434—Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/438—Interfacing the downstream path of the transmission network originating from a server, e.g. retrieving encoded video stream packets from an IP network
- H04N21/4385—Multiplex stream processing, e.g. multiplex stream decrypting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8455—Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
- H04N5/91—Television signal processing therefor
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/00007—Time or data compression or expansion
- G11B2020/00014—Time or data compression or expansion the compressed signal being an audio signal
- G11B2020/00028—Advanced audio coding [AAC]
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10527—Audio or video recording; Data buffering arrangements
- G11B2020/10537—Audio or video recording
- G11B2020/10546—Audio or video recording specifically adapted for audio data
- G11B2020/10555—Audio or video recording specifically adapted for audio data wherein the frequency, the amplitude, or other characteristics of the audio signal is taken into account
- G11B2020/10564—Audio or video recording specifically adapted for audio data wherein the frequency, the amplitude, or other characteristics of the audio signal is taken into account frequency
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10527—Audio or video recording; Data buffering arrangements
- G11B2020/10537—Audio or video recording
- G11B2020/10546—Audio or video recording specifically adapted for audio data
- G11B2020/10555—Audio or video recording specifically adapted for audio data wherein the frequency, the amplitude, or other characteristics of the audio signal is taken into account
- G11B2020/10574—Audio or video recording specifically adapted for audio data wherein the frequency, the amplitude, or other characteristics of the audio signal is taken into account volume or amplitude
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10527—Audio or video recording; Data buffering arrangements
- G11B2020/10537—Audio or video recording
- G11B2020/10592—Audio or video recording specifically adapted for recording or reproducing multichannel signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N9/00—Details of colour television systems
- H04N9/79—Processing of colour television signals in connection with recording
- H04N9/80—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
- H04N9/804—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components
- H04N9/806—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components with processing of the sound signal
- H04N9/8063—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components with processing of the sound signal using time division multiplex of the PCM audio and PCM video signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N9/00—Details of colour television systems
- H04N9/79—Processing of colour television signals in connection with recording
- H04N9/80—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
- H04N9/82—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only
- H04N9/8205—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal
Definitions
- the present invention relates to a video editing system and more particularly relates to a video editing system that records content editing points.
- CM commercial message
- the user may start playing back the content anywhere he or she likes by specifying an appropriate editing point on the content with a remote, for example.
- Another technique for sensing a change between a content portion and a non-content portion uses an audio signal. Specifically, according to such a technique, on finding the level of the audio signal lower than a predetermined one, the recorder determines that this is where the content and non-content portions change and puts an editing point there. And then the recorder stores data about the editing point along with the content itself. In this manner, editing points can be put on a content being recorded.
- the audio signal representing broadcast data is usually subjected to compression and has normally been transformed into frequency based data by discrete cosine transform (DCT), for example. That is why to detect the level of such an audio signal, the audio data should be subjected to an inverse discrete cosine transform (IDCT) or any other appropriate transformation for transforming the frequency based data into time based data. For that reason, if it is determined, by the level of an audio signal, where to put the editing points, it will take a lot of time to get the transformation done, and therefore, the editing points cannot be placed quickly.
- DCT discrete cosine transform
- a video editing system is designed to write editing point information about a point on a time series where a content portion and a non-content portion of AV data change from one into the other, along with the AV data itself, on a storage medium.
- the audio data of the AV data yet to be decoded includes a parameter representing how much the volume of the audio data will be when decoded.
- the system comprises a detecting section for locating, based on the parameter, a point where the content portion and the non-content portion change, thereby generating the editing point information representing such a change point, and a writing section for writing the editing point information, along with the AV data, on the storage medium.
- the detecting section stores at least one range, in which the parameter has a value that is equal to or smaller than a threshold value, as a candidate range in which the change point could be located, and sets the change point by choosing from the at least one candidate range.
- the detecting section changes the threshold values as the value of the parameter varies.
- the detecting section sets the change point based on the interval between the candidate ranges.
- the detecting section locates the change point by using the parameter in only one of those audio channels, without using the parameter in any other audio channel.
- the detecting section locates the change point by using the parameter of audio data falling within only a particular frequency range, which forms part of the audible range for human beings, without using the parameter of audio data in any other frequency range.
- the parameter is global_gain defined by MPEG (Moving Picture Experts Group)-2 AAC (Advanced Audio Coding).
- the parameter is scalefactor defined by MPEG (Moving Picture Experts Group)-AUDIO.
- a change point between a content portion and a non-content portion can be located quickly and an editing point can be put at that change point instantly.
- FIG. 1 illustrates a video editing system as a specific preferred embodiment of the present invention.
- FIGS. 2A and 2B illustrate arrangements of packets in a TS and a partial TS, respectively, in a preferred embodiment of the present invention.
- FIG. 3 illustrates an AAC encoded stream according to a preferred embodiment of the present invention.
- FIG. 4 illustrates how global_gain changes when an audio gap is found in a preferred embodiment of the present invention.
- FIG. 5 is a flowchart showing the procedure of audio gap finding processing according to a preferred embodiment of the present invention.
- FIG. 6 shows an exemplary data structure for a program map table according to a preferred embodiment of the present invention.
- FIG. 7 is a flowchart illustrating the procedure of calculating an audio gap finding threshold value according to a preferred embodiment of the present invention.
- FIG. 8 illustrates the distribution of audio gaps in a preferred embodiment of the present invention.
- FIG. 9 shows an exemplary audio data structure according to the MPEG-AUDIO Layer-1 standard in a preferred embodiment of the present invention.
- FIG. 10 illustrates how audio data is decoded in a preferred embodiment of the present invention.
- the TV broadcast data is supposed to be compressed and encoded compliant with the MPEG (Moving Picture Experts Group)-2 standard.
- audio is supposed to be encoded compliant with MPEG-2 AAC (Advanced Audio Coding).
- MPEG-2 AAC Advanced Audio Coding
- FIG. 1 illustrates a video editing system 100 as a specific preferred embodiment of the present invention.
- the video editing system 100 includes an antenna 101, a tuner 102, a demultiplexer 103, a CM detecting section 104, and a writing section 105.
- the CM detecting section 104 includes a memory 104a and a CPU 104b.
- the storage medium 106 on which data is stored may be either a hard disk or any other storage device built in the system 100 or a removable storage medium such as an optical disc or a semiconductor memory card.
- a channel is selected with the tuner 102, thereby outputting a partial TS (transport stream) including video PES (packetized elementary stream) packets and audio PES packets.
- TS transport stream
- video PES packetized elementary stream
- the demultiplexer 103 receives the partial TS from the tuner 102, extracts only the audio PES packets from it and then outputs them.
- the CM detecting section 104 locates, using the audio PES packets supplied from the demultiplexer 103, a point where a content portion and a non-content portion that are continuous with each other on the time series change from one into the other (i.e., a point where an editing point needs to be put), and outputs editing point information about a point where the editing point needs to be placed to the writing section 105.
- the change point is a point on the time series where the content portion and the non-content point change.
- the editing point information may include time information about the change point.
- the time information may be a PTS (presentation time stamp) or a DTS (decoding time stamp), for example. However, these are just examples. And any other sort of editing information may also be used as long as the change point can be located.
- the memory 104a of the CM detecting section 104 stores not only the audio PES data supplied from the demultiplexer 103 but also the results of computations done by the CPU 104b, and outputs editing point information to the writing section 105.
- the CPU 104b reads the data stored in the memory 104a and carries out various kinds of computations. It will be described in detail later exactly how the CM detecting section 104 determines the point where the editing point should be put.
- the writing section 105 writes not only the partial TS supplied from the tuner 102 but also the editing point information provided by the CM detecting section 104 on the storage medium 106.
- the storage medium 106 may be an HDD, a DVD or a BD and stores the partial TS or the editing point information that has been written by the writing section 105.
- FIG. 2A illustrates an arrangement of packets in a TS
- FIG. 2B illustrates an arrangement of packets in a partial TS.
- each box with PAT, PMT1, V1, or A1 sign corresponds to a single packet and Vn and An (where n is 1, 2, 3 or 4) indicate that the packet includes the video or audio data of a program #n.
- the tuner 102 extracts video and audio packets V1 and A1 associated with the selected program #1 from the TS shown in FIG. 2A and also extracts a PAT (program association table) and a PMT1 (program map table 1), which are tables containing program-related information, and rewrites their contents so that those tables are compatible with the partial TS.
- PAT' and PMT1' are arranged in the partial TS.
- SIT selection information table
- an audio PES packet such as the packet A1 includes data that has been encoded compliant with the MPEG-2 AAC standard and also includes global_gain as a piece of gain information.
- the change point between a content portion and a non-content portion is located by using that global_gain.
- the AAC encoded stream shown in FIG. 3 is supposed to be compliant with the ADTS (audio data transport stream) format that is used in digital broadcasting.
- An ADTS can be classified into a number of units called "AAU (audio access units)".
- An AAU can be obtained by extracting data portions from audio PES packets.
- adts_frame corresponding to one AAU includes adts_fixed_header, adts_variable_header, adts_error_check, and raw_data_block.
- the raw_data_block is comprised of multiple constituent elements, which are simply called “elements". Examples of those elements that form one raw_data_block include CPE (channel pair element) for L/R channels, FILL (fill element) to insert stuffing bytes, and END (term element) that indicates the end of one AAU.
- CPE channel pair element
- FILL fill element
- END term element
- the raw_data_block has such a structure in a situation where there are two (i.e., L and R) audio channels.
- the CPE includes common_window, which is a piece of information representing a common window function for use in both of L and R channels, and two individual_channel_streams as channel-by-channel information.
- Each individual_channel_stream includes window_sequence, which is a piece of information representing sequence processing on the window function, max_sfb, which is a piece of information about band limitation, global_gain, which is a piece of information representing the overall level of the frequency spectrum, scale_factor_data, which is a piece of information representing upscale and down-scale parameters, and spectral_data, which is a piece of information representing quantization data.
- a frequency conversion is carried out using global_gain, scale_factor_data and spectral_data, thereby obtaining the audio data.
- the global_gain is a piece of information representing the overall level of the frequency spectrum and therefore represents an approximate value of the volume of an audio signal decoded. That is why the global_gain can be used as a parameter representing the volume.
- FIG. 4 illustrates how the global_gain of the audio PES packet changes when an audio gap is found.
- the ordinate represents the global_gain value and the abscissa represents the time.
- An audio gap finding threshold value is a threshold value for finding the audio gap and determined based on the global_gain value. It will be described in further detail later exactly how to set the threshold value.
- the mute period is a period in which the global_gain value detected by the CM detecting section 104 is relatively small. And this mute period corresponds to the audio gap.
- the point in time when the global_gain value becomes smaller than the audio gap finding threshold value will be referred to herein as an "IN point” and the point in time when the global_gain value becomes greater than the audio gap finding threshold value will be referred to herein as an "OUT point”.
- FIG. 5 is a flowchart showing the procedure of audio gap finding processing.
- Step S20 the system gets ready for the input of any audio PES packet from the demultiplexer 103 to the memory 104a and determines whether or not any packet has come yet. If the answer is YES (i.e., if any audio PES packet has gotten stored in the memory 104a), the CPU 104b extracts global_gain in Step S21 from the audio PES packet that is now stored in the memory 104a.
- the global_gain value that has been detected earliest is extracted and not every channel is analyzed. For example, if the broadcast received is a stereo broadcast in which there are two audio channels of R and L, only the global_gain of either the R or L channel needs to be extracted and there is no need to extract the global_gain from the other channel. Likewise, even if there are 5.1 audio channels, the global_gain has only to be extracted from one of those 5.1 channels and there is no need to extract the global_gain from any other channel. By using the global_gain of only one of multiple audio channels without extracting the global_gain from any other channel in this manner, the complexity of the computation processing can be reduced and the audio gap can be found more quickly.
- FIG. 6 shows an exemplary data structure of the program map table PMT1' in the partial TS.
- This program map table includes stream_type, which is a piece of information representing the type of the given stream data. By reference to this stream_type, it can be determined whether the given stream data is a video stream or an audio stream.
- the global_gain of one of those audio channels may be used for finding the audio gap.
- the audio channel is highly likely to be a main audio channel, the audio gap can be found accurately.
- the audio channel that has been detected earlier than any other channel is used, the audio gap can be found quickly.
- the global_gain of only one of multiple audio channels is supposed to be used to find the audio gap. If necessary, however, the global_gain values of two or more audio channels may also be used to find the audio gap.
- the audio gap could be found more accurately by using the global_gain of a front audio channel rather than that of a rear audio channel. That is why the global_gain of a front audio channel is preferred to that of a rear audio channel.
- the CPU 104b calculates the audio gap finding threshold value based on the global_gain value extracted. It will be described in detail later exactly how to calculate the audio gap finding threshold value.
- the CPU 104b stores sensing status information, indicating whether an audio gap is being sensed or not, in the memory 104a. And if the sensing status information indicates otherwise (i.e., a non-gap portion is now being sensed) in Step S23, then the process advances to Step S24.
- Step S24 the CPU 104b determines whether or not the global_gain value is less than the audio gap finding threshold value. If the answer is NO (i.e., if the global_gain value is equal to or greater than the audio gap finding threshold value), then the process goes back to the processing step S20. On the other hand, if the global_gain value is found smaller than the audio gap finding threshold value (i.e., if the answer to the query of Step S24 is YES), then the CPU 104b defines the sensing status information to be "audio gap is now being sensed". At the same time, the CPU 104b generates audio gap information in Step S25 with the PTS of the audio PES packet at that timing associated with the IN point of the audio gap, and then the process goes back to the processing step S20.
- Step S26 the CPU 104b determines whether or not the global_gain value is equal to or greater than the audio gap finding threshold value. If the answer is NO (i.e., if the global_gain value is smaller than the audio gap finding threshold value), then the process goes back to the processing step S20. Meanwhile, if the answer is YES (i.e., if the global_gain value is equal to or greater than the audio gap finding threshold value), then the CPU 104b defines the sensing status information to be "non-gap is being sensed".
- the CPU 104b generates audio gap information in Step S27 with the PTS of the audio PES packet at that timing associated with the OUT point of the audio gap.
- the CPU 104b stores audio gap information about the IN and OUT points in the memory 104a in Step S28 and then the process goes back to the processing step S20.
- the audio gap information is added to a list of audio gaps in the memory 104a.
- the list of audio gaps includes a group of audio gaps that have been found as a result of the audio gap finding processing described above. And that list is used in determining whether a given point belongs to a content portion or a non-content portion as will be described later. Each of those audio gaps that have been added to the list of audio gaps represents a period where a change point between the content portion and the non-content portion is potentially located.
- the memory 104a stores the global_gain values of at least the previous 30 seconds.
- the CPU 104b calculates in Step S31 the average of the global_gain values during the previous 30 seconds that are stored in the memory 104a. Next, the CPU 104b multiplies the average global_gain thus calculated by 0.6, thereby calculating an audio gap finding threshold value in Step S32.
- Step S33 the CPU 104b determines whether or not the audio gap finding threshold value thus calculated is smaller than 128. If the answer is NO (i.e., if the audio gap finding threshold value thus calculated is equal to or greater than 128), the CPU 104b sets the audio gap finding threshold value to be 128 in Step S35. Meanwhile, if the answer to the query of Step S33 is YES, the CPU 104b determines in the next processing step S36 whether or not the audio gap finding threshold value calculated is greater than 116. If the answer is NO (i.e., if the audio gap finding threshold value thus calculated is equal to or smaller than 116), the CPU 104b sets the audio gap finding threshold value to be 116 in Step S37.
- the audio gap finding threshold value calculated is greater than 116 but smaller than 128 (i.e., if the answers to the queries of Steps S33 and S36 are both YES), then the threshold value calculated is used as it is as the audio gap finding threshold value.
- the average of the global_gain values usually changes according to the channel or program selected. That is why by setting the audio gap finding threshold value adaptively based on the average of the global_gain values (i.e., by changing the audio gap finding threshold values with a variation in global_gain value) as is done in this preferred embodiment, the audio gap finding threshold value can be set appropriately. As a result, the audio gap can be found more accurately based on the audio PES packet.
- each of those non-content portions will normally last either 15 seconds or a multiple of 15 seconds.
- the change point between the content and non-content portions is detected by paying special attention to that periodicity.
- FIG. 8 illustrates the distribution of audio gaps that have been found while a TV broadcast is being recorded.
- t represents the time.
- audio gaps A through E to be added to the list of audio gaps in the memory 104a are shown. Any of these audio gaps A through E potentially has a change point between the content and non-content portions.
- the intervals between the audio gaps A and B, between the audio gaps B and C, between the audio gaps C and D, and between the audio gaps D and E are 40, 15, 30 and 20 seconds, respectively.
- the CPU 104b determines that there should be non-content portions in the interval between the audio gaps B and C and in the interval between the audio gaps C and D, and also determines that there should be content portions in the interval between the audio gaps A and B and in the interval between the audio gaps D and E.
- the CPU 104b concludes that the audio gaps B and D have change points between the content and non-content portions but that the other audio gaps A , C and E have nothing to do with the content/non-content change points.
- the CPU 104b generates editing point information by defining the midpoint between the respective PTS of the IN and OUT points of the audio gap B and the one between those of the IN and OUT points of the audio gap D to be editing points, and then outputs that information to the writing section 105 by way of the memory 104a.
- the writing section 105 writes the editing point information on the storage medium 106.
- the video editing system 100 of the preferred embodiment described above sets the editing points with the length of the interval between a pair of audio gaps taken into account. As a result, the editing points can be set more accurately.
- the video editing system 100 of this preferred embodiment locates the change point between content and non-content portions by using the global_gain value of an audio PES packet yet to be decoded without decoding the audio PES packet into an audio signal. Since no decoding process is performed, the content/non-content change point can be located much more quickly.
- the audio gap finding threshold value is defined by calculating the average of the global_gain values during the previous 30 seconds.
- the audio gap finding threshold value does not always have to be defined by such a method.
- the audio gap finding threshold value could also be received along with a TV broadcast.
- the video editing system 100 could accumulate the audio gap finding threshold values. In the latter case, the audio gap finding threshold values may be stored on a channel-by-channel basis.
- the content and its editing point information are supposed to be stored on the same storage medium 106. However, they may also be stored on physically different media.
- the content may be stored on an HDD and the editing point information may be stored on a flash memory. In that case, the HDD and the flash memory are equivalent to the storage medium 106.
- the editing point could be set where there is not any change point. Nevertheless, when a content portion and a non-content portion change from one into the other, the mute period usually lasts less than one second. Considering this fact, if the period between the IN and OUT points lasts one second or more, then it may be determined that there is no change point within that period. Then, the editing point can be set more accurately.
- the given period is determined to belong to a non-content portion if its duration is a multiple of 15 seconds.
- this decision may naturally be made according to the duration of a current non-content actually on the air.
- the duration may also be a multiple of 20 seconds or 25 seconds.
- the broadcast video data is supposed to be encoded compliant with the MPEG-2 standard and the audio data is supposed to be encoded compliant with the MPEG-2 AAC standard.
- the broadcast video and audio data may also be encoded by any other coding method.
- the audio gap may be found by extracting a parameter, which can be used to calculate the volume of an audio signal or an approximate value thereof, from the audio data that has been encoded compliant with the MPEG-1 standard or the AC-3 (Audio Code number 3) standard.
- the effect of the present invention described above can also be achieved by using such a parameter.
- scalefactor may be used as a parameter for calculating an approximate value of the volume of an audio signal.
- FIG. 9 shows an exemplary audio data structure according to the MPEG-AUDIO Layer-1 standard
- FIG. 10 illustrates how audio data is decoded compliant with the MPEG-AUDIO Layer-1 standard.
- a data stream is divided into 32 sub-bands #0 through #31 on a predetermined frequency range basis. And each of those sub-bands includes quantized sample data "sample”, the number of bits allocated ("allocation") to that "sample”, and a decoding gain coefficient "scalefactor".
- the decoding processing may be performed as follows. First of all, data that has been dequantized based on "allocation” and "sample []” is multiplied by "scalefactor” on a sub-band basis, thereby generating intermediate data "sample' []". Next, synchronization processing is carried out on “sample' []” of the respective sub-bands, thereby synthesizing those sub-bands together and obtaining PCM data.
- Each "scalefactor” includes the amplitude information of its associated sub-band and can be used to calculate an approximate value of the volume of an audio signal just like the global_gain. That is why the audio gap can also be found just as described above by using the "scalefactor".
- the audio gap can be found without performing any dequantization or synchronization processing. As a result, the audio gap can be found more quickly with the computational complexity reduced significantly. Likewise, even when the global_gain described above is used, the audio gap can also be found without dequantization or synchronization, which would reduce the computational complexity and speed up the audio gap finding significantly.
- the audio gap may be found by using the scalefactor of audio data falling within a particular frequency range.
- the scalefactor is extracted from only sub-bands associated with the frequency range (e.g., from 100 Hz to 10 kHz) of audio that can easily reach a person's ears, not from a sub-band associated with any other frequency range, the audio gap can be found more quickly with the amount of data used and the computational complexity both cut down significantly.
- most of the audio data on the air is distributed within an easily audible frequency range for human beings. That is why even if the scalefactor extracted from only sub-bands associated with a particular frequency range is used as described above, the audio gap can still be found accurately.
- the frequency range mentioned above is just an example. Rather the frequency range may be defined anywhere else as long as it forms at least part of the audible range (20 Hz to 20 kHz) to human ears.
- the audio gap can be found much more quickly with the computational complexity cut down significantly.
- a video editing system can be used in digital TV sets, recorders, and any other device that can record a TV broadcast.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Television Signal Processing For Recording (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
- Management Or Editing Of Information On Record Carriers (AREA)
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2008213778 | 2008-08-22 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP2157580A1 true EP2157580A1 (en) | 2010-02-24 |
Family
ID=41055083
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP09167674A Withdrawn EP2157580A1 (en) | 2008-08-22 | 2009-08-12 | Video editing system |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US20100046908A1 (https=) |
| EP (1) | EP2157580A1 (https=) |
| JP (1) | JP2010074823A (https=) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9191639B2 (en) * | 2010-04-12 | 2015-11-17 | Adobe Systems Incorporated | Method and apparatus for generating video descriptions |
| US9336685B2 (en) * | 2013-08-12 | 2016-05-10 | Curious.Com, Inc. | Video lesson builder system and method |
| JP6506230B2 (ja) * | 2016-09-28 | 2019-04-24 | Necプラットフォームズ株式会社 | 音声無音検知装置、音声無音検知方法、音声無音検知プログラム、及び音声無音検知システム |
| JP7518681B2 (ja) * | 2020-07-14 | 2024-07-18 | シャープ株式会社 | 無音区間検出装置および無音区間検出方法 |
Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP0969399A2 (en) * | 1998-06-30 | 2000-01-05 | International Business Machines Corporation | Multimedia system and method for automatic clip selection |
| EP1076337A1 (en) * | 1998-04-27 | 2001-02-14 | Hitachi, Ltd. | Recorder/reproducer |
| US6600874B1 (en) * | 1997-03-19 | 2003-07-29 | Hitachi, Ltd. | Method and device for detecting starting and ending points of sound segment in video |
| EP1708101A1 (en) * | 2004-01-14 | 2006-10-04 | Mitsubishi Denki Kabushiki Kaisha | Summarizing reproduction device and summarizing reproduction method |
| WO2007013407A1 (ja) * | 2005-07-27 | 2007-02-01 | Matsushita Electric Industrial Co., Ltd. | ダイジェスト生成装置、ダイジェスト生成方法、ダイジェスト生成プログラムを格納した記録媒体、およびダイジェスト生成装置に用いる集積回路 |
| JP2007074040A (ja) | 2005-09-02 | 2007-03-22 | Victor Co Of Japan Ltd | 放送番組記録装置 |
| EP1770704A2 (en) * | 2005-09-30 | 2007-04-04 | Sony Corporation | Data recording and reproducing apparatus, method, and program therefor |
| WO2008066114A1 (en) * | 2006-11-30 | 2008-06-05 | Panasonic Corporation | Signal processor |
| EP1954041A1 (en) * | 2005-09-30 | 2008-08-06 | Pioneer Corporation | Digest generating device, and program therefor |
| JP2008213778A (ja) | 2007-03-07 | 2008-09-18 | Toyoda Gosei Co Ltd | グラブボックス |
| JP2009183742A (ja) | 2003-04-25 | 2009-08-20 | Cxr Ltd | X線画像形成システム |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP3840928B2 (ja) * | 2001-07-17 | 2006-11-01 | ソニー株式会社 | 信号処理装置および方法、記録媒体、並びにプログラム |
| JP4862136B2 (ja) * | 2006-12-08 | 2012-01-25 | 株式会社Jvcケンウッド | 音声信号処理装置 |
-
2009
- 2009-08-06 JP JP2009183742A patent/JP2010074823A/ja active Pending
- 2009-08-12 EP EP09167674A patent/EP2157580A1/en not_active Withdrawn
- 2009-08-14 US US12/541,297 patent/US20100046908A1/en not_active Abandoned
Patent Citations (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6600874B1 (en) * | 1997-03-19 | 2003-07-29 | Hitachi, Ltd. | Method and device for detecting starting and ending points of sound segment in video |
| EP1076337A1 (en) * | 1998-04-27 | 2001-02-14 | Hitachi, Ltd. | Recorder/reproducer |
| EP0969399A2 (en) * | 1998-06-30 | 2000-01-05 | International Business Machines Corporation | Multimedia system and method for automatic clip selection |
| JP2009183742A (ja) | 2003-04-25 | 2009-08-20 | Cxr Ltd | X線画像形成システム |
| EP1708101A1 (en) * | 2004-01-14 | 2006-10-04 | Mitsubishi Denki Kabushiki Kaisha | Summarizing reproduction device and summarizing reproduction method |
| WO2007013407A1 (ja) * | 2005-07-27 | 2007-02-01 | Matsushita Electric Industrial Co., Ltd. | ダイジェスト生成装置、ダイジェスト生成方法、ダイジェスト生成プログラムを格納した記録媒体、およびダイジェスト生成装置に用いる集積回路 |
| JP2007074040A (ja) | 2005-09-02 | 2007-03-22 | Victor Co Of Japan Ltd | 放送番組記録装置 |
| EP1770704A2 (en) * | 2005-09-30 | 2007-04-04 | Sony Corporation | Data recording and reproducing apparatus, method, and program therefor |
| EP1954041A1 (en) * | 2005-09-30 | 2008-08-06 | Pioneer Corporation | Digest generating device, and program therefor |
| WO2008066114A1 (en) * | 2006-11-30 | 2008-06-05 | Panasonic Corporation | Signal processor |
| US20090207775A1 (en) * | 2006-11-30 | 2009-08-20 | Shuji Miyasaka | Signal processing apparatus |
| JP2008213778A (ja) | 2007-03-07 | 2008-09-18 | Toyoda Gosei Co Ltd | グラブボックス |
Also Published As
| Publication number | Publication date |
|---|---|
| US20100046908A1 (en) | 2010-02-25 |
| JP2010074823A (ja) | 2010-04-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9653094B2 (en) | Methods and systems for performing signal analysis to identify content types | |
| US8855796B2 (en) | Method and device for detecting music segment, and method and device for recording data | |
| CN100380441C (zh) | 检测给定类型节目的方法和设备、无声检测器和接收器 | |
| US6842735B1 (en) | Time-scale modification of data-compressed audio information | |
| US6748360B2 (en) | System for selling a product utilizing audio content identification | |
| US8332059B2 (en) | Apparatus and method for synchronizing additional data and base data | |
| US20220189509A1 (en) | Methods and apparatus to perform speed-enhanced playback of recorded media | |
| US8682132B2 (en) | Method and device for detecting music segment, and method and device for recording data | |
| US9767846B2 (en) | Systems and methods for analyzing audio characteristics and generating a uniform soundtrack from multiple sources | |
| JP2017532603A (ja) | オーディオ信号のエンコードおよびデコード | |
| EP2157580A1 (en) | Video editing system | |
| JP2003029772A (ja) | 信号処理装置および方法、記録媒体、並びにプログラム | |
| JP2006126826A (ja) | オーディオ信号符号化/復号化方法及びその装置 | |
| US7792681B2 (en) | Time-scale modification of data-compressed audio information | |
| JP4743228B2 (ja) | デジタル音声信号解析方法、その装置、及び映像音声記録装置 | |
| JP2004334160A (ja) | 特徴量抽出装置 | |
| JP2009229921A (ja) | 音響信号分析装置 | |
| JP2008262000A (ja) | オーディオ信号特徴検出装置及び特徴検出方法 | |
| JP2008154132A (ja) | 音声映像ストリーム圧縮装置及び音声映像記録装置 | |
| JP4862136B2 (ja) | 音声信号処理装置 | |
| JP2009157278A (ja) | オーディオ信号特徴検出装置及び特徴検出方法 | |
| HK1122893B (en) | Musical composition section detecting method and its device, and data recording method and its device | |
| JP2011138123A (ja) | 音楽検出装置、及び音楽検出方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK SM TR |
|
| AX | Request for extension of the european patent |
Extension state: AL BA RS |
|
| 17P | Request for examination filed |
Effective date: 20100819 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN |
|
| 18W | Application withdrawn |
Effective date: 20131211 |