WO2004079718A1 - 情報検出装置及び方法、並びにプログラム - Google Patents
情報検出装置及び方法、並びにプログラム Download PDFInfo
- Publication number
- WO2004079718A1 WO2004079718A1 PCT/JP2004/001397 JP2004001397W WO2004079718A1 WO 2004079718 A1 WO2004079718 A1 WO 2004079718A1 JP 2004001397 W JP2004001397 W JP 2004001397W WO 2004079718 A1 WO2004079718 A1 WO 2004079718A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- identification
- information
- type
- section
- frequency
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 41
- 238000000034 method Methods 0.000 title claims description 38
- 230000005236 sound signal Effects 0.000 claims abstract description 21
- 238000004364 calculation method Methods 0.000 claims abstract description 6
- 238000012935 Averaging Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 8
- 230000003595 spectral effect Effects 0.000 description 5
- 230000007423 decrease Effects 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 238000009527 percussion Methods 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/046—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for differentiation between music and non-music signals, based on the identification of musical parameters, e.g. based on tempo detection
Definitions
- TECHNICAL FIELD The present invention relates to the same type of audio and music by extracting features from audio, music, an audio signal including audio, or an information source including the audio signal.
- TECHNICAL FIELD The present invention relates to an information detection device and method for detecting a continuous section of a program, and a program.
- multimedia contents and broadcast contents include audio signals as well as video signals, which is very useful information in content classification and scene detection.
- efficient information retrieval and information management can be performed by identifying and detecting the audio part and the music part of the audio signal included in the information.
- ⁇ ⁇ 133 1–1 334 has 13 features including 4Hz modulation energy, low energy frame rate, spectral roll-off point, spectral centroid, spectral variation (Flux), zero crossing rate, etc.
- the volume is used to identify speech and music, and the performance of each is compared and evaluated.
- the spectrogram is a representation of the spectrum as image information with the vertical axis representing frequency and the horizontal axis representing time, with the spectra arranged in the time direction. Examples of inventions using this feature include, for example, the documents “Minami, Akutsu, Hamada and Tonomura,“ Video Indexing Using Sound Information and Its Applications ”, IEICE Transactions D-II, 1998, J. 81-D-II, No. 3, p.
- music often consists of many instruments, singing voices, sound effects, and rhythms of percussion instruments. Therefore, if the audio data is identified every short time, even during a continuous music section, it is not only the part that can be identified as music, but the part that should be judged as audio in the short term, Or it often contains parts that need to be classified into other types.
- an obvious music or voice part may be identified as an incorrect type due to an identification error. The same applies to types other than voice and music.
- a part that should be regarded as a continuous section in the long term may be cut off in the middle, or Has a problem that a temporary noise portion that cannot be regarded as a continuous section is regarded as a continuous section.
- the present invention has been proposed in view of such a conventional situation, and when detecting a continuous section of music, voice, etc. during an audio data, it is regarded as the same type in a long term. It is an object of the present invention to provide an information detecting apparatus and method for correctly detecting a continuous section to be performed, and a program for executing such information detecting processing all at once on a computer.
- an information detection apparatus and method analyze a feature amount of an audio signal included in an information source and classify and identify the type of the audio signal for each predetermined time unit. Then, the classified identification information is recorded in the identification information storage means. Then, the identification information is read from the identification information storage means, and an identification frequency for each predetermined time section longer than the time unit is calculated for each type of the audio signal. Detect a section.
- this information detection apparatus and its method for example, when the above-mentioned identification frequency of an arbitrary type is equal to or more than a first threshold and the state of being equal to or more than the first threshold continues for a first time or more, the type Is detected, and when the identification frequency is equal to or less than the second threshold and the state in which the identification frequency is equal to or less than the second threshold continues for a second time or more, the end of the type is detected.
- the identification frequency a value obtained by averaging the likelihood of identification of an arbitrary type for each time unit in the time section, or the number of identifications in the arbitrary time section of the type can be used.
- a program according to the present invention causes a computer to execute the above-described information detection processing.
- FIG. 1 is a diagram showing a schematic configuration of an information detection device according to the present embodiment.
- FIG. 2 is a diagram illustrating an example of a recording format of identification information.
- FIG. 3 is a diagram showing an example of a time section for calculating the identification frequency.
- FIG. 4 is a diagram showing an example of a recording format of the index information.
- FIG. 5 is a diagram for explaining how to detect the start of a music continuous section.
- FIG. 6 is a diagram for explaining how to detect the end of the music continuous section.
- FIGS. 7A to 7C are flowcharts showing a continuous section detection process in the information detection device.
- BEST MODE FOR CARRYING OUT THE INVENTION voice data is classified and classified into several types such as conversational voice and music for each predetermined time unit, and the start position and the end of a continuous section in which data of the same type are continuous. This is applied to an information detection device that records section information such as a position on a storage device or a recording medium.
- voice data is identified as voice or music and a continuous voice section or a continuous music section is detected.However, not only a voice section and a music section but also a cheerful section and a silent section are detected. It does not matter.
- the music genre may be identified and classified, and each continuous section may be detected.
- FIG. 1 shows a schematic configuration of an information detection device according to the present embodiment.
- an information detection device 1 includes an audio input unit 10 that reads an audio data of a predetermined format as block data D 10 for each predetermined time unit, and a predetermined time.
- a voice type identification unit 11 that identifies the type of the block data D 10 for each unit to generate identification information D 11, converts the identification information D 11 into a predetermined format, and converts the identification information D after conversion.
- Identification frequency calculation unit 15 that calculates identification frequency D 15 for each type (speech, music, etc.) using identification information D 14, and evaluation of identification frequency D 15 to start continuous sections of the same type
- the section start and end judgment unit 16 detects the position and end position, and sets it as section information D 16.
- a section information output section 17 converts the section information D 16 into a predetermined format, and records it on a storage device and a recording medium 18 as index information D 17.
- the storage device and the recording medium 13 and 18 include a storage device such as a memory and a magnetic disk, a storage medium such as a semiconductor memory (such as a memory card), or a CD-R.
- a storage medium such as a semiconductor memory (such as a memory card), or a CD-R.
- a recording medium such as an OM can be used.
- the voice input unit 10 reads voice data as block data D 10 for each predetermined time unit, and reads the block data D 10 as voice. This is supplied to the type identification unit 11.
- the speech type identification unit 11 identifies and classifies the type of the block data D 10 every predetermined time unit by analyzing the feature amount of the speech, and supplies the identification information D 11 to the identification information output unit 12. .
- the unit of time for identification is preferably about one second to several seconds.
- the identification information output unit 12 converts the identification information D 11 supplied from the audio type identification unit 11 into a predetermined format, and records the converted identification information D 12 on the storage device and the storage medium 13.
- FIG. 2 shows an example of a recording format of the identification information D12. In the example of the format shown in Fig. 2, the "time” indicating the position during the entire audio data, the "type code” indicating the type at the time position, and the likelihood of the identification are shown.
- the “probability” is a value indicating the certainty of the identification result, such as the likelihood obtained by an identification method such as the posterior probability maximization method or the vector quantization distortion obtained by the vector quantization method.
- the reciprocal of can be used.
- the identification information input unit 14 reads the identification information D 13 recorded in the storage device / recording medium 13, and supplies the read identification information D 14 to the identification frequency calculation unit 15. Note that the read timing may be read in real time when the identification information output unit 12 records the identification information D 12 on the storage device / recording medium 13.
- the identification frequency calculation unit 15 calculates an identification frequency for each type in a predetermined time section for each predetermined time unit, and obtains the identification frequency information. D 15 is supplied to the section start / end determination unit 16.
- Figure 3 shows an example of the time interval for calculating the classification frequency. FIG. 3 shows that every several seconds, the voice data is music (M) or voice (S), and the voice recognition frequency P s (t).
- the length of the time section Len is preferably, for example, about several seconds to several tens of seconds.
- the identification frequency can be obtained, for example, by averaging the likelihood at the time identified by the type in a predetermined time section.
- the speech recognition frequency P s (t) at time t is obtained as in the following equation (1).
- p (t ⁇ k) indicates the certainty of the discrimination at the time (t ⁇ k).
- the discrimination frequency P s (t) can be calculated using only the number of discriminations, as in equation (2) below. .
- the identification frequency can be calculated in exactly the same way.
- the section start / end determination unit 16 uses the identification frequency information D 15 supplied from the identification frequency calculation unit 15, the section start / end determination unit 16 detects the start position and end position of the continuous section of the same type, and the section information D 16 And supplies it to the section information output unit 17.
- the section information output section 17 converts the section information D 16 supplied from the section start / end determination section 16 into a predetermined format, and stores it as index information D 17 in a storage device. Record on recording medium 18.
- an example of the recording format of the index information D 17 is shown in FIG.
- “section number” indicating the number or identifier of a continuous section
- “type code” indicating the type of the continuous section
- start position and “start position” indicating the start time and end time of the continuous section End position
- FIG. 5 is a diagram illustrating a method of detecting the start of a music continuous section by comparing music identification frequency with a threshold.
- the identification type at each time is M (music) and S at the top of the figure.
- the vertical axis is the music identification frequency Pm (t) at time t.
- the threshold P 0 of the identification frequency P m (t) for the start determination is set to 3Z5
- the threshold H 0 of the number of identifications is set to 6.
- the identification frequency Pm (t) When the identification frequency Pm (t) is calculated for each predetermined time unit, the identification frequency in the time section Le 11 at point A in the figure? 111 (1 :) becomes 3/5 and becomes the threshold P 0 or more for the first time. After that, the discrimination frequency Pm) is continuously held at the threshold P 0 or more, and the start of music is detected for the first time at the point B in the figure where the state of the threshold P 0 or more is held for consecutive H0 times (sec) .
- the actual start position of the music is slightly before the point A at which the identification frequency Pm (t) first exceeds the threshold value P0.
- FIG. 6 is a diagram illustrating a state in which the end of the music continuation section is detected by comparing the music identification frequency with a threshold.
- M indicates that music was identified
- S indicates that speech was identified.
- the vertical axis is the music identification frequency Pm (t) at time t.
- the threshold P of the identification frequency Pm (t) for the end determination 1 is set to 2/5, and the threshold H1 of the number of times of identification is set to 6.
- the end detection threshold P1 may be the same as the start detection threshold P0.
- the discrimination frequency Pm (t) in the time section L en at the point C in the figure becomes 2Z5, and for the first time becomes the threshold P1 or less.
- the discrimination frequency Pm (t) is kept below the threshold P1 continuously, and the end of the music is stopped for the first time at point D in the figure where the state below the threshold P1 is kept for HI times (seconds). To detect.
- the actual end position of the music is slightly before the point C at which the identification frequency Pm (t) has become equal to or less than the threshold value P1 for the first time.
- K 2
- a position that is 3 away from point C is detected as the music end position.
- step S1 an initial process is performed in step S1. Specifically, it is assumed that the current time t is 0, and the in-section flag indicating that it is in a certain type of continuous section is FALSE, that is, it is not in a continuous section. Also, the count value that counts the number of times that the state where the identification frequency P (t) is equal to or larger than the threshold or equal to or smaller than the threshold is set to 0.
- step S2 the type at time t is identified. If the information has already been identified, the identification information at time t is read.
- step S3 it is determined whether or not the end of the data has been reached from the result of the identification or reading, and if the end of the data has been reached (Yes), the processing is terminated. On the other hand, if it is not the data end (No), the process proceeds to step S4.
- step S4 the identification frequency P (t) at time t of the type (for example, music) for which a continuous section is to be detected is calculated.
- step S5 it is determined whether or not the section flag is TRUE, that is, whether the section is in a continuous section. If the flag is TRUE (Yes), the process proceeds to step S13. If not (No), that is, FAL SE If so, proceed to step S6.
- step S6 it is determined whether or not the identification frequency P (t) is equal to or greater than a threshold value P0 for start detection.
- the identification frequency P (t) is less than the threshold value P0 (N0)
- the value of the count is reset to 0 in step S20, and the time t is increased by 1 in step S21 and the step Return to S2.
- the identification frequency P (t) is less than the threshold value P0 (Yes) the process proceeds to step S7.
- step S7 it is determined whether or not the count value is 0. If the count is 0 (Yes), X is stored as a start candidate time in step S8, and the process proceeds to step S9. Increase the counter value by 1.
- X is, for example, the position as described in FIG.
- the process proceeds to step S9, and the count value is incremented by one.
- step S10 it is determined whether or not the count value has reached the threshold value H0. If the count value has not reached the threshold value H0 (No), the process proceeds to step S21, and the time t is set. Increase by 1 and return to step S2. On the other hand, if the threshold value H0 has been reached (Yes), the process proceeds to step S11.
- step S11 the stored start candidate time X is determined as the start time.
- step S12 the count value is reset to 0, the flag during the section is changed to TRUE, and in step S21, The time t is incremented by 1, and the process returns to step S2.
- step S13 it is determined whether or not the identification frequency P (t) is equal to or less than the end detection threshold value P1.
- the identification frequency P (t) is larger than the threshold value P1 (No)
- the counter value is reset to 0 in step S20, and the time t is increased by 1 in step S21 and the step S2
- the process proceeds to step S14.
- step S14 it is determined whether or not the value of power input is 0.
- step S15 If it is 0 (Yes), Y is stored as an end candidate time in step S15, and step S15 is performed. Go to 16 and increase the county count by one. Where Y is explained in Figure 6, for example It is a position. On the other hand, if the count value is not 0 (No), the process proceeds to step SI6, and the count value is increased by one.
- step S17 it is determined whether or not the value of the counter has reached the threshold value H1. If the counter value has not reached the threshold value H1 (No), the process proceeds to step S21, where B temple time t And return to step S2. On the other hand, if the threshold value HI has been reached (Yes), the flow proceeds to step S18.
- step S18 the stored end candidate time Y is determined as the end time.
- step S19 the counter value is reset to 0, and the flag during the section is changed to FALSE.
- step S21 the time t And return to step S2.
- the audio signal in the information source is identified for each type (category) for each predetermined time unit, and the identification frequency of the type is evaluated.
- the identification frequency of a certain type becomes equal to or higher than a predetermined threshold value for the first time, and the state of being equal to or higher than the threshold value continues for a predetermined time
- the start of the continuous section of the type is started. If the detection frequency is below the predetermined threshold for the first time and the state where the frequency is below the threshold continues for a predetermined period of time, the end of the continuous section of that type is detected. It is possible to accurately detect the start position and the end position of the continuous section even if there is a typical sound mixing or some identification errors.
- the hardware configuration has been described.
- the CPU Central Processing Unit
- the computer program can be provided by being recorded on a storage medium or a recording medium, and can also be provided by being transmitted via the Internet or other transmission media.
- INDUSTRIAL APPLICABILITY According to the above-described present invention, the audio signal included in the information source is classified and classified into types (categories) such as music and voice for each predetermined time unit, and the classification frequency of the type is evaluated. When detecting continuous sections of the same type, even if there is temporary noise such as noise in the continuous sections, or if there is some identification error, the start and end positions of the continuous sections Can be accurately detected.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP04709697A EP1600943B1 (en) | 2003-03-06 | 2004-02-10 | Information detection device, method, and program |
US10/513,549 US8195451B2 (en) | 2003-03-06 | 2004-02-10 | Apparatus and method for detecting speech and music portions of an audio signal |
DE602004023180T DE602004023180D1 (de) | 2003-03-06 | 2004-02-10 | Informationsdetektionseinrichtung, -verfahren und -programm |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2003-060382 | 2003-03-06 | ||
JP2003060382A JP4348970B2 (ja) | 2003-03-06 | 2003-03-06 | 情報検出装置及び方法、並びにプログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2004079718A1 true WO2004079718A1 (ja) | 2004-09-16 |
Family
ID=32958879
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2004/001397 WO2004079718A1 (ja) | 2003-03-06 | 2004-02-10 | 情報検出装置及び方法、並びにプログラム |
Country Status (7)
Country | Link |
---|---|
US (1) | US8195451B2 (ja) |
EP (1) | EP1600943B1 (ja) |
JP (1) | JP4348970B2 (ja) |
KR (1) | KR101022342B1 (ja) |
CN (1) | CN100530354C (ja) |
DE (1) | DE602004023180D1 (ja) |
WO (1) | WO2004079718A1 (ja) |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007023660A1 (ja) | 2005-08-24 | 2007-03-01 | Matsushita Electric Industrial Co., Ltd. | 音識別装置 |
ES2354702T3 (es) * | 2005-09-07 | 2011-03-17 | Biloop Tecnologic, S.L. | Método para el reconocimiento de una señal de sonido implementado mediante microcontrolador. |
US8417518B2 (en) | 2007-02-27 | 2013-04-09 | Nec Corporation | Voice recognition system, method, and program |
JP4572218B2 (ja) * | 2007-06-27 | 2010-11-04 | 日本電信電話株式会社 | 音楽区間検出方法、音楽区間検出装置、音楽区間検出プログラム及び記録媒体 |
JP2009192725A (ja) * | 2008-02-13 | 2009-08-27 | Sanyo Electric Co Ltd | 楽曲記録装置 |
MY153562A (en) * | 2008-07-11 | 2015-02-27 | Fraunhofer Ges Forschung | Method and discriminator for classifying different segments of a signal |
US9037474B2 (en) * | 2008-09-06 | 2015-05-19 | Huawei Technologies Co., Ltd. | Method for classifying audio signal into fast signal or slow signal |
US8340964B2 (en) * | 2009-07-02 | 2012-12-25 | Alon Konchitsky | Speech and music discriminator for multi-media application |
US8606569B2 (en) * | 2009-07-02 | 2013-12-10 | Alon Konchitsky | Automatic determination of multimedia and voice signals |
US8712771B2 (en) * | 2009-07-02 | 2014-04-29 | Alon Konchitsky | Automated difference recognition between speaking sounds and music |
DE112009005215T8 (de) * | 2009-08-04 | 2013-01-03 | Nokia Corp. | Verfahren und Vorrichtung zur Audiosignalklassifizierung |
US20110040981A1 (en) * | 2009-08-14 | 2011-02-17 | Apple Inc. | Synchronization of Buffered Audio Data With Live Broadcast |
CN102044246B (zh) * | 2009-10-15 | 2012-05-23 | 华为技术有限公司 | 一种音频信号检测方法和装置 |
CN102044244B (zh) | 2009-10-15 | 2011-11-16 | 华为技术有限公司 | 信号分类方法和装置 |
JP4837123B1 (ja) * | 2010-07-28 | 2011-12-14 | 株式会社東芝 | 音質制御装置及び音質制御方法 |
WO2012020717A1 (ja) * | 2010-08-10 | 2012-02-16 | 日本電気株式会社 | 音声区間判定装置、音声区間判定方法および音声区間判定プログラム |
US9160837B2 (en) | 2011-06-29 | 2015-10-13 | Gracenote, Inc. | Interactive streaming content apparatus, systems and methods |
US20130090926A1 (en) * | 2011-09-16 | 2013-04-11 | Qualcomm Incorporated | Mobile device context information using speech detection |
CN103092854B (zh) * | 2011-10-31 | 2017-02-08 | 深圳光启高等理工研究院 | 一种音乐数据分类方法 |
US20130317821A1 (en) * | 2012-05-24 | 2013-11-28 | Qualcomm Incorporated | Sparse signal detection with mismatched models |
JP6171708B2 (ja) * | 2013-08-08 | 2017-08-02 | 富士通株式会社 | 仮想マシン管理方法、仮想マシン管理プログラム及び仮想マシン管理装置 |
US9817379B2 (en) * | 2014-07-03 | 2017-11-14 | David Krinkel | Musical energy use display |
KR102435933B1 (ko) * | 2020-10-16 | 2022-08-24 | 주식회사 엘지유플러스 | 영상 컨텐츠에서의 음악 구간 검출 방법 및 장치 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4541110A (en) | 1981-01-24 | 1985-09-10 | Blaupunkt-Werke Gmbh | Circuit for automatic selection between speech and music sound signals |
US5298674A (en) * | 1991-04-12 | 1994-03-29 | Samsung Electronics Co., Ltd. | Apparatus for discriminating an audio signal as an ordinary vocal sound or musical sound |
EP0637011A1 (en) | 1993-07-26 | 1995-02-01 | Koninklijke Philips Electronics N.V. | Speech signal discrimination arrangement and audio device including such an arrangement |
WO1998027543A2 (en) | 1996-12-18 | 1998-06-25 | Interval Research Corporation | Multi-feature speech/music discrimination system |
JPH10187182A (ja) * | 1996-12-20 | 1998-07-14 | Nippon Telegr & Teleph Corp <Ntt> | 映像分類方法および装置 |
JP2910417B2 (ja) * | 1992-06-17 | 1999-06-23 | 松下電器産業株式会社 | 音声音楽判別装置 |
US5966690A (en) * | 1995-06-09 | 1999-10-12 | Sony Corporation | Speech recognition and synthesis systems which distinguish speech phonemes from noise |
US6185527B1 (en) * | 1999-01-19 | 2001-02-06 | International Business Machines Corporation | System and method for automatic audio content analysis for word spotting, indexing, classification and retrieval |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2551050B2 (ja) * | 1987-11-13 | 1996-11-06 | ソニー株式会社 | 有音無音判定回路 |
EP0517233B1 (en) * | 1991-06-06 | 1996-10-30 | Matsushita Electric Industrial Co., Ltd. | Music/voice discriminating apparatus |
JPH06332492A (ja) * | 1993-05-19 | 1994-12-02 | Matsushita Electric Ind Co Ltd | 音声検出方法および検出装置 |
DE4422545A1 (de) * | 1994-06-28 | 1996-01-04 | Sel Alcatel Ag | Start-/Endpunkt-Detektion zur Worterkennung |
US5712953A (en) * | 1995-06-28 | 1998-01-27 | Electronic Data Systems Corporation | System and method for classification of audio or audio/video signals based on musical content |
US6711536B2 (en) * | 1998-10-20 | 2004-03-23 | Canon Kabushiki Kaisha | Speech processing apparatus and method |
US6490556B2 (en) * | 1999-05-28 | 2002-12-03 | Intel Corporation | Audio classifier for half duplex communication |
US6349278B1 (en) * | 1999-08-04 | 2002-02-19 | Ericsson Inc. | Soft decision signal estimation |
JP4438144B2 (ja) * | 1999-11-11 | 2010-03-24 | ソニー株式会社 | 信号分類方法及び装置、記述子生成方法及び装置、信号検索方法及び装置 |
US6901362B1 (en) * | 2000-04-19 | 2005-05-31 | Microsoft Corporation | Audio segmentation and classification |
US6640208B1 (en) * | 2000-09-12 | 2003-10-28 | Motorola, Inc. | Voiced/unvoiced speech classifier |
US6694293B2 (en) * | 2001-02-13 | 2004-02-17 | Mindspeed Technologies, Inc. | Speech coding system with a music classifier |
US6785645B2 (en) * | 2001-11-29 | 2004-08-31 | Microsoft Corporation | Real-time speech and music classifier |
JP3826032B2 (ja) * | 2001-12-28 | 2006-09-27 | 株式会社東芝 | 音声認識装置、音声認識方法及び音声認識プログラム |
FR2842014B1 (fr) * | 2002-07-08 | 2006-05-05 | Lyon Ecole Centrale | Procede et appareil pour affecter une classe sonore a un signal sonore |
-
2003
- 2003-03-06 JP JP2003060382A patent/JP4348970B2/ja not_active Expired - Fee Related
-
2004
- 2004-02-10 KR KR1020047017765A patent/KR101022342B1/ko not_active IP Right Cessation
- 2004-02-10 DE DE602004023180T patent/DE602004023180D1/de not_active Expired - Lifetime
- 2004-02-10 US US10/513,549 patent/US8195451B2/en not_active Expired - Fee Related
- 2004-02-10 CN CNB200480000194XA patent/CN100530354C/zh not_active Expired - Fee Related
- 2004-02-10 EP EP04709697A patent/EP1600943B1/en not_active Expired - Lifetime
- 2004-02-10 WO PCT/JP2004/001397 patent/WO2004079718A1/ja active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4541110A (en) | 1981-01-24 | 1985-09-10 | Blaupunkt-Werke Gmbh | Circuit for automatic selection between speech and music sound signals |
US5298674A (en) * | 1991-04-12 | 1994-03-29 | Samsung Electronics Co., Ltd. | Apparatus for discriminating an audio signal as an ordinary vocal sound or musical sound |
JP2910417B2 (ja) * | 1992-06-17 | 1999-06-23 | 松下電器産業株式会社 | 音声音楽判別装置 |
EP0637011A1 (en) | 1993-07-26 | 1995-02-01 | Koninklijke Philips Electronics N.V. | Speech signal discrimination arrangement and audio device including such an arrangement |
US5966690A (en) * | 1995-06-09 | 1999-10-12 | Sony Corporation | Speech recognition and synthesis systems which distinguish speech phonemes from noise |
WO1998027543A2 (en) | 1996-12-18 | 1998-06-25 | Interval Research Corporation | Multi-feature speech/music discrimination system |
JPH10187182A (ja) * | 1996-12-20 | 1998-07-14 | Nippon Telegr & Teleph Corp <Ntt> | 映像分類方法および装置 |
US6185527B1 (en) * | 1999-01-19 | 2001-02-06 | International Business Machines Corporation | System and method for automatic audio content analysis for word spotting, indexing, classification and retrieval |
Non-Patent Citations (2)
Title |
---|
DONGGE LI ET AL: "Classification of general audio data for content-based retrieval", PATTERN RECOGNITION LETTERS, vol. 22, no. 5, April 2001 (2001-04-01), pages 533 - 544, XP004233004 * |
See also references of EP1600943A4 |
Also Published As
Publication number | Publication date |
---|---|
US20050177362A1 (en) | 2005-08-11 |
EP1600943B1 (en) | 2009-09-16 |
EP1600943A1 (en) | 2005-11-30 |
KR101022342B1 (ko) | 2011-03-22 |
US8195451B2 (en) | 2012-06-05 |
CN1698095A (zh) | 2005-11-16 |
DE602004023180D1 (de) | 2009-10-29 |
JP4348970B2 (ja) | 2009-10-21 |
KR20050109403A (ko) | 2005-11-21 |
JP2004271736A (ja) | 2004-09-30 |
EP1600943A4 (en) | 2006-12-06 |
CN100530354C (zh) | 2009-08-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2004079718A1 (ja) | 情報検出装置及び方法、並びにプログラム | |
JP4442081B2 (ja) | 音声抄録選択方法 | |
US7263485B2 (en) | Robust detection and classification of objects in audio using limited training data | |
US8838452B2 (en) | Effective audio segmentation and classification | |
JP3913772B2 (ja) | 音識別装置 | |
US20050187765A1 (en) | Method and apparatus for detecting anchorperson shot | |
US20050027766A1 (en) | Content identification system | |
US20040143434A1 (en) | Audio-Assisted segmentation and browsing of news videos | |
WO2006132596A1 (en) | Method and apparatus for audio clip classification | |
JP2000066691A (ja) | オーディオ情報分類装置 | |
Wu et al. | Multiple change-point audio segmentation and classification using an MDL-based Gaussian model | |
Vavrek et al. | Broadcast news audio classification using SVM binary trees | |
JP4099576B2 (ja) | 情報識別装置及び方法、並びにプログラム及び記録媒体 | |
Jarina et al. | Rhythm detection for speech-music discrimination in mpeg compressed domain | |
JP4201204B2 (ja) | オーディオ情報分類装置 | |
JP3475317B2 (ja) | 映像分類方法および装置 | |
JP3607450B2 (ja) | オーディオ情報分類装置 | |
JP4392805B2 (ja) | オーディオ情報分類装置 | |
Huijbregts et al. | Filtering the unknown: Speech activity detection in heterogeneous video collections | |
Dogan et al. | Content-based classification and segmentation of mixed-type audio by using MPEG-7 features | |
WO2006009035A1 (ja) | 信号検出方法,信号検出システム,信号検出処理プログラム及びそのプログラムを記録した記録媒体 | |
CN113178199A (zh) | 基于相位偏移检测的数字音频篡改取证方法 | |
Pikrakis et al. | An overview of speech/music discrimination techniques in the context of audio recordings | |
JP2011085824A (ja) | 音響識別装置、その処理方法およびプログラム | |
AU2005252714B2 (en) | Effective audio segmentation and classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10513549 Country of ref document: US Ref document number: 1020047017765 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2004709697 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2004800194X Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWP | Wipo information: published in national office |
Ref document number: 1020047017765 Country of ref document: KR |
|
WWP | Wipo information: published in national office |
Ref document number: 2004709697 Country of ref document: EP |