WO2010140195A1 - 映像編集装置 - Google Patents
映像編集装置 Download PDFInfo
- Publication number
- WO2010140195A1 WO2010140195A1 PCT/JP2009/002558 JP2009002558W WO2010140195A1 WO 2010140195 A1 WO2010140195 A1 WO 2010140195A1 JP 2009002558 W JP2009002558 W JP 2009002558W WO 2010140195 A1 WO2010140195 A1 WO 2010140195A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- unit
- key
- similarity
- block
- feature vector
- Prior art date
Links
- 239000013598 vector Substances 0.000 claims abstract description 60
- 238000000605 extraction Methods 0.000 claims description 36
- 238000004364 calculation method Methods 0.000 claims description 29
- 239000000284 extract Substances 0.000 claims description 5
- 230000005236 sound signal Effects 0.000 claims description 5
- 238000000034 method Methods 0.000 description 35
- 238000004458 analytical method Methods 0.000 description 20
- 238000012545 processing Methods 0.000 description 19
- 238000010586 diagram Methods 0.000 description 16
- 230000010354 integration Effects 0.000 description 7
- 238000001228 spectrum Methods 0.000 description 6
- 230000003595 spectral effect Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 239000000470 constituent Substances 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 108010076504 Protein Sorting Signals Proteins 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000012447 hatching Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
- G11B27/034—Electronic editing of digitised analogue information signals, e.g. audio or video signals on discs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7834—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
Definitions
- the present invention relates to video editing.
- Patent Document 1 proposes a video editing method using acoustic analysis technology. This method automatically detects, as an editing point, a point where the utterance is silenced or the type of sound is switched, and presents the video section included in the editing point to the user as an editing fragment.
- Japanese Patent Laid-Open No. 2004-23798 Japanese Patent Laid-Open No. 2004-23798
- Patent Document 1 when an unknown sound source that cannot be prepared in advance is mixed, or when a plurality of sound sources are superimposed at the same time, excessive division or misintegration of scenes (scenes) occurs. However, there is a problem that it is not possible to reduce the user's trouble in video editing.
- the present invention has been made to solve the above problems, and it is an object of the present invention to provide a video editing apparatus capable of efficiently extracting partial videos.
- the present invention includes a dividing unit that divides an audio signal included in video data into a plurality of blocks along a time axis, an extraction unit that analyzes the audio signal for each block and extracts a feature vector, and at least one A management unit that manages the feature vector as a search key; and the feature vector extracted from the extraction unit and the search key managed by the management unit are collated for each block, and the search key
- a first calculation unit that calculates a first similarity with the feature vector, a key candidate generation unit that acquires the feature vector having a small first similarity from the extraction unit, and generates a key candidate;
- the feature vector extracted by the extraction unit and the key candidate are collated for each block, and a second similarity between the key candidate and the feature vector is calculated.
- a co-occurrence score is calculated from the calculation unit, a storage unit that stores the first similarity and the second similarity for each block, the first similarity, and the second similarity Determining whether to register the key candidate as the search key based on the co-occurrence score, and adding the key candidate determined to be registered as the search key to be registered in the management unit;
- An integrated score for each block is obtained from the similarity of the search key for each block managed by the management unit, and a video corresponding to a block that exceeds the integrated threshold in the integrated score is defined as one section.
- a cutout unit for cutting out the video editing apparatus.
- partial images can be extracted efficiently.
- FIG. 1 is a block diagram of a video editing apparatus according to Embodiment 1.
- (A) is a flowchart showing the flow of the extraction unit
- (b) to (f) are diagrams showing an outline of the extraction unit.
- (A) is a flowchart showing the flow of the calculation unit
- (b) to (f) are diagrams showing an outline of the calculation unit.
- A) is a flowchart showing the flow of the key candidate generation unit
- (b) (c) is a diagram showing an overview of the processing result of the key candidate generation unit.
- FIG. 6 is a block diagram of a video editing apparatus according to a second embodiment.
- (A) is a flowchart showing the flow of the key candidate generation unit,
- (b) is a diagram showing an overview of the processing results of the key candidate generation unit.
- FIG. 9 is a block diagram of a video editing apparatus according to a third embodiment.
- FIG. 1 is a diagram illustrating a hardware configuration of the video editing apparatus 100.
- the video editing apparatus 100 includes a CPU 101, a main storage unit such as a ROM (Read Only Memory) 104 and a RAM (Random Access Memory) 105 that stores various data and various programs, and an HDD (Hard that stores various data and various programs.
- the external storage unit 107 such as a Disk (Drive) or a CD (Compact Disk) drive device and a bus 108 for connecting them are provided, and a hardware configuration using a normal computer is provided.
- the video editing apparatus 100 includes a display unit 103 that displays information, an operation unit 102 such as a keyboard and a mouse that accepts user instruction input, and a communication unit 106 that controls communication with an external device. Are connected to each other.
- the video editing apparatus 100 according to the first embodiment of the present invention will be described with reference to FIGS.
- the video editing apparatus 100 reduces the editing work by efficiently dividing a scene from video data including a plurality of scenes and efficiently extracting a target scene.
- FIG. 2 is a block diagram of the video editing apparatus 100.
- the video editing apparatus 100 includes an audio acquisition unit 11, a division unit 21, an extraction unit 31, a first calculation unit 41, a second calculation unit 42, a management unit 51, a storage unit 61, and a cutout unit. 71, a key candidate generation unit 81, and a registration unit 91.
- the sound acquisition unit 11 extracts an analysis target acoustic signal from the editing target video data, and outputs the acoustic signal to the division unit 21.
- the input method of the acoustic signal is not particularly limited.
- the audio acquisition unit 11 including a microphone, an amplifier, an AD converter, and the like may be acquired in real time. Further, a configuration may be adopted in which an acoustic signal stored in a storage device as a digital signal can be acquired by reading.
- digital video data can be acquired from an external digital video camera, a receiving tuner for digital broadcasting, or other digital recording devices, a separation / extraction process is performed to extract only an audio signal, and the dividing unit 21 To output.
- the dividing unit 21 divides the acoustic signal input from the voice acquisition unit 11 into sections having a time width along the time axis.
- the section of the acoustic signal divided by the dividing unit 21 is hereinafter referred to as a block.
- the dividing unit 21 outputs the acoustic signals included in these blocks to the extracting unit 31. If the block division unit is generated in the same time width as the basic unit in search key generation, similarity calculation, or scene division, which will be described later, subsequent processing is facilitated. In addition, you may set a block so that it may overlap with an adjacent block temporally. Further, the time width of the block may be variable. In that case, it can process efficiently by outputting to the extraction part 31 the acoustic signal except the overlapping time domain.
- the extracting unit 31 analyzes the block-unit acoustic signal input from the dividing unit 21 and converts it into a feature vector. This feature vector is used for comparison and collation with acoustic signals included in different blocks.
- the extraction unit 31 outputs the feature vector together with the block number k to the first calculation unit 41, the second calculation unit 42, the management unit 51, and the cutout unit 71.
- the first calculation unit 41 collates the feature vector input from the extraction unit 31 with the feature vector corresponding to the search key registered in the management unit 51, and determines the similarity between the feature vectors in advance. Measured according to the measured distance scale, and outputs the similarity to the storage unit 61.
- the second calculation unit 42 compares the feature vector input from the extraction unit 31 with the feature vector corresponding to the key candidate generated by the key candidate generation unit 81 to calculate the similarity. And output to the storage unit 61.
- the management unit 51 one or more search keys for performing collation by the first calculation unit 41 are registered.
- the search key to be managed is the feature vector of the corresponding block that is input from the extraction unit 31.
- another registration method may be used, such as retaining only the corresponding time information.
- the management unit 51 also performs additional registration and deletion of a search key such as adding a key candidate registered as a new search key that satisfies a condition, or deleting a key candidate that does not satisfy the condition.
- the storage unit 61 stores one or more search keys registered in the management unit 51 and the similarity in block units of the acoustic signal to be analyzed. Similarly, a key candidate generated by the key candidate generating unit 81 and a time-series similarity including a plurality of similarities in units of blocks are stored. For example, as shown in FIG. 7C, these time-series similarities can be managed by setting the search key as “row” and the corresponding block of time column as “column”.
- the storage unit 61 stores and stores the similarity as a matrix element for each combination in a matrix. Key candidates are stored in the same manner.
- the time-series similarity stored in the storage unit 61 is used for scene division in the cutout unit 71 and registration of a new search key in the registration unit 91.
- the cutout unit 71 refers to the time-series similarity stored in the storage unit 61, and uses as one section video segments that can be determined as the same scene from the similarity corresponding to the search key registered in the management unit 51. cut.
- the key candidate generation unit 81 estimates a candidate section for adding a plurality of audio signals as a search key from the same scene and prevents the similar section from being excessively divided and adds the candidate section to the management unit 51 as a key candidate. .
- the registration unit 91 determines whether the key candidate newly registered by the key candidate generation unit 81 is generated from the same scene as the search key already registered in the management unit 51. Judge whether the continuity of the scene is maintained. In order to determine the continuity of the scene, among the similarities stored in the storage unit 61, the registered search key and the similarity series of key candidates are compared.
- the video editing apparatus 100 updates the similarity information in the storage unit 61 while adding a search key for determining that the scenes are the same from the acoustic signal itself to be analyzed. Thus, a single scene is cut out from the similarity of a plurality of search keys.
- the operation of each component of the video editing apparatus 100 will be described using the case where the acoustic signal shown in FIG. 3 is input as a motif.
- the acoustic signal to be analyzed includes three sections (scene 1, scene 2, and scene 3), and different music co-occurs in each section. Also, in the scene 2, clapping is mixed from the middle of the scene, and a plurality of sound sources are simultaneously mixed in one scene.
- FIG. 4A shows a flowchart showing the operation of the extraction unit 31, and FIGS. 4B to 4F show schematic diagrams of algorithms for extracting features from a speech waveform.
- step S3101 the extraction unit 31 acquires an acoustic signal included in the analysis target section, as illustrated in FIG.
- step S3102 the extraction unit 31 divides the frame into frames suitable for feature extraction as shown in FIG.
- the reason for this division is that the acquired acoustic signal may include a signal sequence longer than a frame unit suitable for feature extraction.
- An arbitrary time length is set as a frame unit. Here, it is assumed that the frame length is 25 milliseconds and the moving width of the frame is 10 milliseconds.
- step S3103 the extraction unit 31 performs conversion into a frequency spectrum in units of frames as shown in FIG. That is, spectrum analysis is performed.
- spectrum analysis is performed.
- FFT Fast Fourier Transform
- FIG. 4D is a conceptual diagram showing a spectrum sequence in units of frames. In this figure, the magnitude of the spectral power value is replaced with black, white, and hatching types. Other figures are also shown in the same manner.
- step S3104 the extraction unit 31 sub-blocks the frequency spectrum sequence into a plurality of times and frequency bands as shown in FIG. For example, as shown in FIG. 4E, several adjacent time frames are classified as one sub-block, and the same division is performed in the frequency band.
- the reason for sub-blocking is as follows. This is because if the spectral sequence of each frame included in the block is used as a feature vector as it is, the degree of superimposition of the noise source differs depending on the frequency band of the acoustic signal, so that local fluctuations are directly reflected in the feature vector.
- step S3105 the extraction unit 31 generates a representative vector from a plurality of vectors included in the sub-block with the divided sub-block as a unit, and characterizes the time-series representative vector. Generate as a vector.
- a representative vector generation method for example, a method of adopting an average value of vectors, or a method of detecting a peak from a difference from an adjacent band and setting a cumulative value of peaks included in each band as a vector value There is.
- FIG. 5A shows a flowchart showing the operation of the first calculation unit 41
- FIGS. 5B to 5F show schematic diagrams of algorithms for calculating similarity.
- step S4101 the first calculation unit 41 takes out unprocessed search keys registered in the management unit 51 as shown in FIGS.
- An example of information registered in the management unit 51 is shown in FIG.
- the management unit 51 includes an ID that is a serial number of the search key, time information of the extracted acoustic signal, a flag that indicates whether the key is a candidate search or a registered search key, and a feature vector generated by the extraction unit 31. It is registered as related information of the search key.
- search key 1 the processing will be described on the assumption that the search key having the serial number ID 1 (hereinafter referred to as “search key 1”) is extracted.
- step S4102 the first calculation unit 41 acquires a feature vector included in an unprocessed block from the acoustic signal to be analyzed, as shown in FIGS.
- the process proceeds assuming that a feature vector at time t is extracted.
- step S4103 the first calculation unit 41 collates the search key 1 with the feature vector at time t as shown in FIG.
- a method for matching feature vectors is employed in which matching is performed separately for each sub-block, and the reciprocal of the Euclidean distance of the feature vector included in each sub-block is calculated as the similarity S in the sub-block.
- the spectral power value in the search block sub-block i-th (maximum number of time blocks is I)
- frequency band j-th (maximum number of bands is J)
- Key (i, j)
- feature vector sub-block i-th frequency
- the spectrum power value in the band j is Vec (t) (i, j)
- the normalization factor between feature vectors is ⁇
- the normalization factor of the similarity score is K
- step S4104 the first calculation unit 41 integrates the similarity S ij calculated in each sub-block, and calculates the similarity between the search key 1 and the block at time t based on the following equation (2). To do.
- a is the ID number of the search key.
- Equation (2) selects the maximum frequency band similarity in each block and averages it in a plurality of blocks.
- step S4105 the first calculation unit 41 performs the processing from step S4102 to S4104 until the terminal block arrives.
- a time-series similarity curve is obtained as shown in FIG.
- the vertical axis represents the similarity
- the horizontal axis represents the block number, that is, the time axis.
- step S4106 the first calculation unit 41 performs steps S4101 to S4105 until there is no search key registered in the management unit 51. That is, when all the search keys are processed, time series similarity to a plurality of search keys is calculated (in the case of Y), and the process ends. On the other hand, if there is an unprocessed search key, the process returns to step S4101 (in the case of N).
- the second calculation unit 42 is registered in the management unit 51 by processing the search key registered in the management unit 51 in the same manner as the first calculation unit 41 that obtains the time series similarity.
- a time-series similarity can be calculated for the candidate key.
- FIG. 6A shows a flowchart showing the operation of key candidate generation
- FIGS. 6B and 6C show an outline of the processing result of the key candidate generation unit 81.
- FIG. 6A shows a flowchart showing the operation of key candidate generation
- FIGS. 6B and 6C show an outline of the processing result of the key candidate generation unit 81.
- FIG. 6A shows a flowchart showing the operation of key candidate generation
- FIGS. 6B and 6C show an outline of the processing result of the key candidate generation unit 81.
- step S8101 the key candidate generation unit 81 acquires an analysis starting point for searching for a position where the key candidate is generated. This is based on the generation position of one search key already registered in the management unit 51 as the starting point of analysis.
- this search key is referred to as “origin search key”. Note that it is assumed that one search key is registered in the management unit 51 before the key candidate is generated, and the similarity is stored in the storage unit 61.
- step S8102 the key candidate generation unit 81 starts a search in the future direction (positive direction) of the time axis from the analysis starting point, and acquires the similarity of the unprocessed block related to the starting point search key from the storage unit 61.
- An unprocessed block is a block after the analysis starting point.
- the key candidate generating unit 81 calculates a boundary score R related to the starting point search key.
- the boundary score R is calculated by the following equation (3) that accumulates difference values when the similarity score T is lower than the similarity threshold value T when the similarity score T is determined.
- R k + 1 R k + (T ⁇ S k ) if T> S k (3)
- Sk is the similarity of the block number k regarding the origin search key.
- step S8104 the key candidate generating unit 81 determines whether or not the accumulated boundary score R k + 1 exceeds the boundary score threshold RT. If the accumulated boundary score R k + 1 exceeds the boundary score threshold RT, the process proceeds to step S8105 (in the case of Y), and if not, the process proceeds to step S8106 (in the case of N). That is, the similarity with the starting point search key is obtained in time series order, and a feature vector having a similarity lower than that of the starting point search key is set as a key candidate. A feature vector having a smaller degree of similarity than the starting point search key is referred to as a feature vector at a dissimilar position. The reason for using the accumulated boundary score is to eliminate a position that is temporarily dissimilar to the starting point search key, and to select only when the dissimilar state continues for a certain period of time.
- step S8105 the accumulated boundary score R k + 1 exceeds the boundary score threshold value RT as shown in FIG. A new key candidate is generated at.
- generating a key candidate means obtaining a feature vector corresponding to a block at a position first lower than the similarity threshold T from the extraction unit 31, and setting the obtained feature vector as a key candidate.
- step S 8106 the key candidate generation unit 81 repeats the processing of steps S 8102 to S 8104 if the accumulated boundary score Rk + 1 continues to exceed the boundary score threshold RT and has not reached the terminal block (N In the case of Y), if the terminal block has been reached, the process is terminated (in the case of Y).
- boundary score R has been described so as to continue to be accumulated. However, if the boundary score R does not fall below a certain interval threshold, various deformation methods such as resetting the boundary score R are possible.
- FIG. 7A shows a flowchart showing the operation of the registration unit 91
- FIGS. 7B to 7D show specific examples of processing targets of the registration unit 91.
- search key 1 and search key 2 are already registered in the management unit 51, and it is determined whether or not the third new key candidate is registered as a search key. The case of determination will be described.
- step S9102 the registration unit 91 acquires the similarity 3 of the search key 1 and the similarity 0 of the key candidate 3 in the block 1 from the similarity storage unit 61 as illustrated in FIG.
- the registration unit 91 calculates a co-occurrence score using these similarities.
- the “co-occurrence score” is obtained by scoring the similarity of the acoustic signals included at the same time (same block) with respect to the search key and the key candidate.
- a similarity threshold is set to 3 for determining whether or not there is similarity between the acoustic signal of the block and the search key. If both of the two keys to be compared exceed the similarity threshold, the co-occurrence score is set to 1, and if not, 0 is set.
- FIG. 7D shows an example of calculating the co-occurrence score expressed as described above. As shown in FIG. 7D, the search key 1 exceeds the similarity threshold, but the key candidate 3 does not exceed the similarity threshold, so the co-occurrence score is 0. Note that if the co-occurrence score is accumulated in adjacent blocks, the number of consecutive co-occurrence blocks can be expressed.
- step S9104 the registration unit 91 compares the calculated co-occurrence score with the co-occurrence threshold.
- the co-occurrence threshold is set to 2. Then, since the co-occurrence score of search key 1 and key candidate 3 in block 1 is 0, the process moves to step S9106 (in the case of N). When the co-occurrence score is 2 or more, the process moves to step S9105 (in the case of Y).
- step S9105 the registration unit 91 registers the key candidate whose co-occurrence score exceeds the co-occurrence threshold as a search key, and ends the process.
- step S9106 the registration unit 91 proceeds to step S9107 when processing up to the end block (in the case of Y), and repeats the processing of steps S9102 through S9105 when the processing is not completed (in the case of N). .
- search key 1 and key candidate 3 even if the same processing is repeated, the co-occurrence score does not exceed the threshold value, so the process moves to step S9107.
- step S9107 the registration unit 91 repeats the processing of steps S9101 to S9106 when the processing has not been completed for all the search keys (in the case of N). On the other hand, if it has ended (in the case of Y), the process proceeds to step S9108.
- step S9108 the registration unit 91 deletes the key candidate.
- step S9107 the registration unit 91 compares the search key 2 with the key candidate 3 for the next search key 2. Similarly, since the co-occurrence score does not exceed the co-occurrence threshold for the search key 2 and the key candidate 3 as well, the process moves to step S9108 and the key candidate 3 is deleted from the management unit 51.
- the search key 1 (generated from the music-only section of the scene 2) and the search key 2 (generated from the scene in which the music of the scene 2 and the clapping time are superimposed) are already registered.
- co-occurrence threshold value 2 is exceeded, it is determined that co-occurrence occurs.
- FIG. 8A is a flowchart illustrating the operation of the cutout unit 71
- FIG. 8B is a diagram illustrating the integrated score.
- cutout unit 71 The operation of the cutout unit 71 will be described with reference to an example in which the third key candidate is deleted and two search keys are registered in the management unit 51, as shown in FIG. 8B.
- step S7101 the cutout unit 71 sets the block 4 in which the search key 1 is generated as the analysis starting point.
- This search key 1 is a starting point search key.
- step S7102 the cutout unit 71 acquires the similarity between the search key 1 and the search key 2 in the block 4. From the example of FIG. 8B, “8” and “1” are acquired, respectively.
- step S7013 the cutout unit 71 calculates a time-series integrated score obtained by integrating similarities of a plurality of search keys as illustrated in FIG.
- the maximum score in the time series similarity group is set as the integrated score. In this case, “8” is set.
- step S7105 the cutout unit 71 cuts out a video (block group) corresponding to the blocks from the block 4 to the block 10 that exceeds the integration threshold as a scene that is a single segment. That is, this section is a scene that the user wants to cut out.
- the search key can be dynamically generated from the analysis target by the key candidate generation unit 81 without preparing a dictionary in advance for scene segmentation.
- key candidates are also generated from positions where different sound sources can be superimposed from the middle based on a criterion called a boundary score, and the registration unit 91 can determine whether or not a common acoustic signal is included.
- FIG. 9 is a block diagram of the video editing apparatus 100 according to the present embodiment.
- the video editing apparatus 100 includes an audio acquisition unit 11, a division unit 21, an extraction unit 32, a first calculation unit 41, a management unit 51, a storage unit 61, a cutout unit 71, and a key candidate generation unit 82.
- This embodiment is a configuration in which an estimation unit 901 is added to the configuration of the first embodiment, and is different from the first embodiment in that the generation position of key candidates is determined based on the result of sound source estimation.
- the extraction unit 32 analyzes the acoustic signal in units of blocks input from the dividing unit 21, converts the acoustic signal into a feature vector that can be compared and verified with an acoustic signal included in another time block, and outputs the first feature vector.
- the data is output to the calculation unit 41, the management unit 51, the cutout unit 71, and the estimation unit 101.
- the estimation unit 901 analyzes the feature vector input from the extraction unit 32, estimates the sound source included in the block, and outputs the result to the key candidate generation unit 82.
- sound source estimation methods For example, as a method, a statistical dictionary such as a mixed normal distribution is prepared for each category specified in advance, and a sound source having the highest score by matching with the dictionary is selected as a representative sound source (hereinafter referred to as a sound source) in the block. , Called “estimated sound source”).
- the key candidate generation unit 82 estimates a section to be added to the search key from a wide range of acoustic signals generated from the same scene in order to prevent a similar section from being excessively divided, and the key candidate generation unit 82 supplies the key candidate to the management unit 51. sign up.
- the result of the estimation unit 901 is used for estimation of key candidates.
- the acoustic signal to be analyzed includes three sections (scene 1, scene 2, and scene 3), and different music co-occurs in each section. Also, in the scene 2, clapping is mixed from the middle of the scene, and a plurality of sound sources are simultaneously mixed in one scene.
- FIG. 10A shows a flowchart showing the detailed operation of key candidate generation
- FIGS. 10B and 10C show an overview of the estimation result output from the estimation unit 901 and the processing result of the key candidate generation unit 81. .
- the estimation unit 901 is prepared in advance with a dictionary of four types of sound sources: speech, music, applause, and hustle.
- the representative sound source of each block is checked by comparing feature vectors with the dictionary. Is assigned.
- key candidates are generated using the sound source estimation results.
- step S8201 the key candidate generation unit 82 acquires an analysis starting point for searching for a position of key candidate generation. For example, as shown in FIG. 10C, the generation position of the search key already registered in the management unit 51 is set as the analysis starting point.
- step S8202 the key candidate generation unit 82 starts a search in the future direction (positive direction) of the time axis from the analysis starting point, and acquires an estimated sound source of an unprocessed block.
- step S8203 the key candidate generation unit 82 compares the estimated sound source of the block being processed with the estimated sound source of the adjacent block.
- step S8204 the key candidate generating unit 82 determines whether or not the estimated sound source changes, and if it changes, the process proceeds to step S8205 (in the case of Y). On the other hand, if the estimated sound source does not change, the process proceeds to step S8206 (in the case of N).
- step S8205 as shown in FIG. 10C, the key candidate generation unit 82 acquires the feature vector at the position where the estimated sound source is switched from music to applause from the extraction unit 32, and generates it as a new key candidate. To do.
- step S8206 the key candidate generation unit 82 ends when the terminal block is reached (in the case of Y), and if not reached (in the case of N), the key candidate generation unit 82 continues the processing of steps S8202 to S8204.
- a key candidate is generated from the position where the estimated sound source has changed, and a plurality of added search keys are used to determine a single scene (similar section).
- the result of the sound source estimation is the boundary of the similar section as it is, resulting in excessive division.
- a search key is generated from the scene where the sound source has changed as in this embodiment, and the similarity to the adjacent section From the co-occurrence scores, scenes intended by the user can be cut out by summarizing sections in which the same background sound flows, and the editing effort can be reduced.
- FIG. 11 is a schematic configuration diagram of the video editing apparatus 100 according to the third embodiment of the present invention.
- the video editing apparatus 100 includes an audio acquisition unit 11, a division unit 21, an extraction unit 32, a first calculation unit 41, a management unit 51, a storage unit 61, a cutout unit 71, and a key candidate generation unit 82.
- an initial key generation unit 911 and a specified point acquisition unit 921 are added to the configuration of the second embodiment.
- the present embodiment is different from the second embodiment in that a similar section including a designated point is searched starting from a time designated by the user.
- the designated point acquisition unit 921 acquires an arbitrary point included in the target section from the acoustic signal to be analyzed by a user operation.
- a user operation for example, an operation using a device such as a mouse or a remote controller can be considered.
- other methods may be used.
- sound may be played back via a device such as a speaker, and the user may be allowed to specify a designated point while confirming the sound data.
- a video thumbnail cut out from a video signal synchronized with audio data may be presented to the user, and a time corresponding to the selected video thumbnail may be input as a designated point.
- the designated point acquisition unit 921 outputs the detected designated point to the initial key generation unit 911 as information that can access the acoustic signal such as time.
- the initial key generating unit 911 Upon receiving the designated point from the designated point obtaining unit 921, the initial key generating unit 911 obtains a feature vector corresponding to the block including the designated point from the extracting unit 32, generates this feature vector as an initial key, and manages the managing unit 51. Output to.
- the management unit 51 registers this initial key as a search key.
- this embodiment cuts out only the scene including the time of interest of the designated point, so that only the thumbnail corresponding to only the designated point can be used to roughly grasp the whole or confirm the details. It is also possible to apply to applications such as playing
- the present invention is not limited to the above-described embodiments as they are, and can be embodied by modifying the components without departing from the scope of the invention in the implementation stage.
- Various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above embodiments. For example, some components may be deleted from all the components shown in the embodiments. Furthermore, constituent elements over different embodiments may be appropriately combined.
- the feature vector used to calculate the similarity is the same as that of the sound source estimation in order to reduce the calculation amount, but instead, another feature vector is used to improve the performance of the sound source estimation.
- a feature vector may be used.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Library & Information Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
Description
41、42・・・算出部、51・・・管理部、61・・・記憶部
71・・・切出し部、81・・・キー候補生成部、91・・・登録部
Rk+1=Rk+(T-Sk) if T>Sk ・・・(3)
但し、kはブロック番号であり、Skは、起点検索キーに関するブロック番号kの類似度である。
Claims (5)
- 映像データに含まれる音響信号を時間軸に沿って複数のブロックに分割する分割部と、
前記ブロック毎の前記音響信号を分析して特徴ベクトルを抽出する抽出部と、
少なくとも一つの前記特徴ベクトルを検索キーとして管理する管理部と、
前記抽出部から抽出された前記特徴ベクトルと前記管理部で管理されている前記検索キーとを前記ブロック毎にそれぞれ照合して、前記検索キーと当該特徴ベクトルとの第1の類似度を算出する第1の算出部と、
前記第1の類似度が小さい前記特徴ベクトルを前記抽出部から取得し、キー候補として生成するキー候補生成部と、
前記抽出部で抽出した前記特徴ベクトルと前記キー候補とを前記ブロック毎にそれぞれ照合して、前記キー候補と当該特徴ベクトルとの第2の類似度を算出する第2の算出部と、
前記第1の類似度及び前記第2の類似度をブロック毎に記憶する記憶部と、
前記第1の類似度と、前記第2の類似度とから共起スコアを算出し、当該共起スコアに基づいて前記キー候補を前記検索キーとして登録するか否かを判断し、登録すると判断した前記キー候補を前記検索キーとして前記管理部に追加して登録する登録部と、
前記管理部に管理された前記ブロック毎の前記検索キーの前記類似度から、前記ブロック毎の統合スコアを求め、当該統合スコアの中で統合閾値を超えたブロックに対応した映像を一つの区間として切り出す切出し部と、
を有することを特徴とする映像編集装置。 - 前記登録部は、前記共起スコアが共起閾値を越える場合に前記キー候補を前記検索キーとして登録する、
ことを特徴とする請求項1に記載の映像編集装置。 - 前記登録部は、前記第1の類似度と前記第2の類似度が、類似度閾値を共に越える前記ブロックの数を共起スコアとして算出する
ことを特徴とする請求項2に記載の映像編集装置。 - 前記各ブロックの前記音響信号と、予め規定した音源に対応する辞書との照合を行って、前記ブロック毎の前記音響信号に含まれる音源を推定する推定部をさらに有し、
前記キー候補生成部は、隣接する前記ブロックの前記音源を比較し、隣接する前記ブロックとは異なる前記音源を含む前記ブロックの前記特徴ベクトルを前記抽出部から取得し、この特徴ベクトルから前記キー候補を生成する、
ことを特徴とする請求項3に記載の映像編集装置。 - 前記音響信号の任意の時刻の位置を、ユーザからの操作によって指定点として取得する指定点取得部と、
前記指定点を含む前記ブロックに対応する前記特徴ベクトルを前記抽出部が抽出して、前記特徴ベクトルを初期キーとして生成する初期キー生成部と、
をさらに有し、
前記管理部は、前記初期キーを前記検索キーとして登録する、
ことを特徴とする請求項4に記載の映像編集装置。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2009/002558 WO2010140195A1 (ja) | 2009-06-05 | 2009-06-05 | 映像編集装置 |
US13/376,274 US8713030B2 (en) | 2009-06-05 | 2009-06-05 | Video editing apparatus |
JP2011518070A JP5337241B2 (ja) | 2009-06-05 | 2009-06-05 | 映像編集装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2009/002558 WO2010140195A1 (ja) | 2009-06-05 | 2009-06-05 | 映像編集装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2010140195A1 true WO2010140195A1 (ja) | 2010-12-09 |
Family
ID=43297343
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2009/002558 WO2010140195A1 (ja) | 2009-06-05 | 2009-06-05 | 映像編集装置 |
Country Status (3)
Country | Link |
---|---|
US (1) | US8713030B2 (ja) |
JP (1) | JP5337241B2 (ja) |
WO (1) | WO2010140195A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014520287A (ja) * | 2012-05-23 | 2014-08-21 | エンサーズ カンパニー リミテッド | オーディオ信号を用いたコンテンツ認識装置及び方法 |
JP2018109882A (ja) * | 2017-01-05 | 2018-07-12 | 株式会社東芝 | 動作解析装置、動作解析方法およびプログラム |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5939705B2 (ja) * | 2012-02-08 | 2016-06-22 | カシオ計算機株式会社 | 被写体判定装置、被写体判定方法及びプログラム |
US11372917B2 (en) * | 2017-12-27 | 2022-06-28 | Meta Platforms, Inc. | Labeling video files using acoustic vectors |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006040085A (ja) * | 2004-07-29 | 2006-02-09 | Sony Corp | 情報処理装置および方法、記録媒体、並びにプログラム |
JP2006164008A (ja) * | 2004-12-09 | 2006-06-22 | Matsushita Electric Ind Co Ltd | 画像検索装置および画像検索方法 |
JP2008022103A (ja) * | 2006-07-11 | 2008-01-31 | Matsushita Electric Ind Co Ltd | テレビ番組動画像ハイライト抽出装置及び方法 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5995153A (en) * | 1995-11-02 | 1999-11-30 | Prime Image, Inc. | Video processing system with real time program duration compression and expansion |
WO2000058863A1 (en) * | 1999-03-31 | 2000-10-05 | Verizon Laboratories Inc. | Techniques for performing a data query in a computer system |
US8238718B2 (en) | 2002-06-19 | 2012-08-07 | Microsoft Corporaton | System and method for automatically generating video cliplets from digital video |
JP4405418B2 (ja) | 2005-03-30 | 2010-01-27 | 株式会社東芝 | 情報処理装置及びその方法 |
US20090132074A1 (en) * | 2005-12-08 | 2009-05-21 | Nec Corporation | Automatic segment extraction system for extracting segment in music piece, automatic segment extraction method, and automatic segment extraction program |
WO2008078736A1 (ja) * | 2006-12-27 | 2008-07-03 | Nec Corporation | 同一性判定装置、同一性判定方法および同一性判定用プログラム |
WO2008146616A1 (ja) * | 2007-05-25 | 2008-12-04 | Nec Corporation | 画像音響区間群対応付け装置と方法およびプログラム |
JP5060224B2 (ja) | 2007-09-12 | 2012-10-31 | 株式会社東芝 | 信号処理装置及びその方法 |
US20110225196A1 (en) * | 2008-03-19 | 2011-09-15 | National University Corporation Hokkaido University | Moving image search device and moving image search program |
US7908300B2 (en) * | 2008-04-02 | 2011-03-15 | Honeywell International Inc. | Guided entry system for individuals for annotating process deviations |
-
2009
- 2009-06-05 JP JP2011518070A patent/JP5337241B2/ja not_active Expired - Fee Related
- 2009-06-05 US US13/376,274 patent/US8713030B2/en not_active Expired - Fee Related
- 2009-06-05 WO PCT/JP2009/002558 patent/WO2010140195A1/ja active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006040085A (ja) * | 2004-07-29 | 2006-02-09 | Sony Corp | 情報処理装置および方法、記録媒体、並びにプログラム |
JP2006164008A (ja) * | 2004-12-09 | 2006-06-22 | Matsushita Electric Ind Co Ltd | 画像検索装置および画像検索方法 |
JP2008022103A (ja) * | 2006-07-11 | 2008-01-31 | Matsushita Electric Ind Co Ltd | テレビ番組動画像ハイライト抽出装置及び方法 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014520287A (ja) * | 2012-05-23 | 2014-08-21 | エンサーズ カンパニー リミテッド | オーディオ信号を用いたコンテンツ認識装置及び方法 |
JP2018109882A (ja) * | 2017-01-05 | 2018-07-12 | 株式会社東芝 | 動作解析装置、動作解析方法およびプログラム |
US11030564B2 (en) | 2017-01-05 | 2021-06-08 | Kabushiki Kaisha Toshiba | Motion analysis apparatus, motion analysis method, and computer program product |
Also Published As
Publication number | Publication date |
---|---|
US8713030B2 (en) | 2014-04-29 |
US20120117087A1 (en) | 2012-05-10 |
JPWO2010140195A1 (ja) | 2012-11-15 |
JP5337241B2 (ja) | 2013-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7529659B2 (en) | Method and apparatus for identifying an unknown work | |
EP1357542B1 (en) | A video retrieval data generation apparatus and a video retrieval apparatus | |
JP2020527248A (ja) | 話者分離モデルの訓練方法、両話者の分離方法及び関連設備 | |
US20140161263A1 (en) | Facilitating recognition of real-time content | |
US20030023852A1 (en) | Method and apparatus for identifying an unkown work | |
WO2019076313A1 (zh) | 音频识别方法、装置和服务器 | |
US9058384B2 (en) | System and method for identification of highly-variable vocalizations | |
US8713008B2 (en) | Apparatus and method for information processing, program, and recording medium | |
JP5337241B2 (ja) | 映像編集装置 | |
CN113347489B (zh) | 视频片段检测方法、装置、设备及存储介质 | |
US9031384B2 (en) | Region of interest identification device, region of interest identification method, region of interest identification program, and region of interest identification integrated circuit | |
JP5257356B2 (ja) | コンテンツ分割位置判定装置、コンテンツ視聴制御装置及びプログラム | |
JP4447602B2 (ja) | 信号検出方法,信号検出システム,信号検出処理プログラム及びそのプログラムを記録した記録媒体 | |
JP2000285242A (ja) | 信号処理方法及び映像音声処理装置 | |
Duong et al. | Movie synchronization by audio landmark matching | |
CN111243618A (zh) | 用于确定音频中的特定人声片段的方法、装置和电子设备 | |
JP4394083B2 (ja) | 信号検出装置、信号検出方法、信号検出プログラム及び記録媒体 | |
JP2007060606A (ja) | ビデオの自動構造抽出・提供方式からなるコンピュータプログラム | |
JP3537727B2 (ja) | 信号検出方法、信号の検索方法及び認識方法並びに記録媒体 | |
Wang et al. | Audio fingerprint based on spectral flux for audio retrieval | |
JP2002171481A (ja) | 映像処理装置 | |
US20170194003A1 (en) | Method for Segmenting Videos and Audios into Clips Using Speaker Recognition | |
JP4579638B2 (ja) | データ検索装置及びデータ検索方法 | |
JP4884163B2 (ja) | 音声分類装置 | |
JP5230567B2 (ja) | 信号検出装置、信号検出方法、信号検出プログラム及び記録媒体 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09845480 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011518070 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13376274 Country of ref document: US |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 09845480 Country of ref document: EP Kind code of ref document: A1 |