IN2014MN01588A - - Google Patents
Info
- Publication number
- IN2014MN01588A IN2014MN01588A IN1588MUN2014A IN2014MN01588A IN 2014MN01588 A IN2014MN01588 A IN 2014MN01588A IN 1588MUN2014 A IN1588MUN2014 A IN 1588MUN2014A IN 2014MN01588 A IN2014MN01588 A IN 2014MN01588A
- Authority
- IN
- India
- Prior art keywords
- classification
- speech
- frame
- music
- classified
- Prior art date
Links
- 230000000694 effects Effects 0.000 abstract 1
- 230000007774 longterm Effects 0.000 abstract 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/20—Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0212—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/173—Transcoding, i.e. converting between two coded representations avoiding cascaded coding-decoding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/22—Mode decision, i.e. based on audio signal content versus external parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/24—Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/81—Detection of presence or absence of voice signals for discriminating voice from music
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Quality & Reliability (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Improved audio classification is provided for encoding applications. An initial classification is performed followed by a finer classification to produce speech classifications and music classifications with higher accuracy and less complexity than previously available. Audio is classified as speech or music on a frame by frame basis. If the frame is classified as music by the initial classification that frame undergoes a second finer classification to confirm that the frame is music and not speech (e.g. speech that is tonal and/or structured that may not have been classified as speech by the initial classification). Depending on the implementation one or more parameters may be used in the finer classification. Example parameters include voicing modified correlation signal activity and long term pitch gain.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261586374P | 2012-01-13 | 2012-01-13 | |
US13/722,669 US9111531B2 (en) | 2012-01-13 | 2012-12-20 | Multiple coding mode signal classification |
PCT/US2012/071217 WO2013106192A1 (en) | 2012-01-13 | 2012-12-21 | Multiple coding mode signal classification |
Publications (1)
Publication Number | Publication Date |
---|---|
IN2014MN01588A true IN2014MN01588A (en) | 2015-05-08 |
Family
ID=48780608
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
IN1588MUN2014 IN2014MN01588A (en) | 2012-01-13 | 2012-12-21 |
Country Status (12)
Country | Link |
---|---|
US (1) | US9111531B2 (en) |
EP (1) | EP2803068B1 (en) |
JP (1) | JP5964455B2 (en) |
KR (2) | KR20140116487A (en) |
CN (1) | CN104040626B (en) |
BR (1) | BR112014017001B1 (en) |
DK (1) | DK2803068T3 (en) |
ES (1) | ES2576232T3 (en) |
HU (1) | HUE027037T2 (en) |
IN (1) | IN2014MN01588A (en) |
SI (1) | SI2803068T1 (en) |
WO (1) | WO2013106192A1 (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9589570B2 (en) * | 2012-09-18 | 2017-03-07 | Huawei Technologies Co., Ltd. | Audio classification based on perceptual quality for low or medium bit rates |
RU2656681C1 (en) * | 2012-11-13 | 2018-06-06 | Самсунг Электроникс Ко., Лтд. | Method and device for determining the coding mode, the method and device for coding of audio signals and the method and device for decoding of audio signals |
CN106409313B (en) * | 2013-08-06 | 2021-04-20 | 华为技术有限公司 | Audio signal classification method and device |
CN104424956B9 (en) | 2013-08-30 | 2022-11-25 | 中兴通讯股份有限公司 | Activation tone detection method and device |
KR102457290B1 (en) * | 2014-02-24 | 2022-10-20 | 삼성전자주식회사 | Signal classifying method and device, and audio encoding method and device using same |
ES2874757T3 (en) * | 2014-05-08 | 2021-11-05 | Ericsson Telefon Ab L M | Audio signal classifier |
CN107424621B (en) * | 2014-06-24 | 2021-10-26 | 华为技术有限公司 | Audio encoding method and apparatus |
CN104143335B (en) | 2014-07-28 | 2017-02-01 | 华为技术有限公司 | audio coding method and related device |
US9886963B2 (en) * | 2015-04-05 | 2018-02-06 | Qualcomm Incorporated | Encoder selection |
CN104867492B (en) * | 2015-05-07 | 2019-09-03 | 科大讯飞股份有限公司 | Intelligent interactive system and method |
KR102398124B1 (en) * | 2015-08-11 | 2022-05-17 | 삼성전자주식회사 | Adaptive processing of audio data |
US10186276B2 (en) * | 2015-09-25 | 2019-01-22 | Qualcomm Incorporated | Adaptive noise suppression for super wideband music |
WO2017117234A1 (en) * | 2016-01-03 | 2017-07-06 | Gracenote, Inc. | Responding to remote media classification queries using classifier models and context parameters |
US10902043B2 (en) * | 2016-01-03 | 2021-01-26 | Gracenote, Inc. | Responding to remote media classification queries using classifier models and context parameters |
JP6996185B2 (en) * | 2017-09-15 | 2022-01-17 | 富士通株式会社 | Utterance section detection device, utterance section detection method, and computer program for utterance section detection |
CN116149499B (en) * | 2023-04-18 | 2023-08-11 | 深圳雷柏科技股份有限公司 | Multi-mode switching control circuit and switching control method for mouse |
Family Cites Families (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2568984C (en) * | 1991-06-11 | 2007-07-10 | Qualcomm Incorporated | Variable rate vocoder |
US5778335A (en) | 1996-02-26 | 1998-07-07 | The Regents Of The University Of California | Method and apparatus for efficient multiband celp wideband speech and music coding and decoding |
US6493665B1 (en) * | 1998-08-24 | 2002-12-10 | Conexant Systems, Inc. | Speech classification and parameter weighting used in codebook search |
US7072832B1 (en) * | 1998-08-24 | 2006-07-04 | Mindspeed Technologies, Inc. | System for speech encoding having an adaptive encoding arrangement |
US7272556B1 (en) * | 1998-09-23 | 2007-09-18 | Lucent Technologies Inc. | Scalable and embedded codec for speech and audio signals |
US6691084B2 (en) * | 1998-12-21 | 2004-02-10 | Qualcomm Incorporated | Multiple mode variable rate speech coding |
JP2000267699A (en) * | 1999-03-19 | 2000-09-29 | Nippon Telegr & Teleph Corp <Ntt> | Acoustic signal coding method and device therefor, program recording medium therefor, and acoustic signal decoding device |
CN1242379C (en) * | 1999-08-23 | 2006-02-15 | 松下电器产业株式会社 | Voice encoder and voice encoding method |
US6604070B1 (en) * | 1999-09-22 | 2003-08-05 | Conexant Systems, Inc. | System of encoding and decoding speech signals |
US6625226B1 (en) * | 1999-12-03 | 2003-09-23 | Allen Gersho | Variable bit rate coder, and associated method, for a communication station operable in a communication system |
US6697776B1 (en) * | 2000-07-31 | 2004-02-24 | Mindspeed Technologies, Inc. | Dynamic signal detector system and method |
US6694293B2 (en) | 2001-02-13 | 2004-02-17 | Mindspeed Technologies, Inc. | Speech coding system with a music classifier |
US6785645B2 (en) | 2001-11-29 | 2004-08-31 | Microsoft Corporation | Real-time speech and music classifier |
US6829579B2 (en) * | 2002-01-08 | 2004-12-07 | Dilithium Networks, Inc. | Transcoding method and system between CELP-based speech codes |
US7657427B2 (en) * | 2002-10-11 | 2010-02-02 | Nokia Corporation | Methods and devices for source controlled variable bit-rate wideband speech coding |
US7363218B2 (en) * | 2002-10-25 | 2008-04-22 | Dilithium Networks Pty. Ltd. | Method and apparatus for fast CELP parameter mapping |
FI118834B (en) * | 2004-02-23 | 2008-03-31 | Nokia Corp | Classification of audio signals |
JP2007538282A (en) * | 2004-05-17 | 2007-12-27 | ノキア コーポレイション | Audio encoding with various encoding frame lengths |
US8010350B2 (en) | 2006-08-03 | 2011-08-30 | Broadcom Corporation | Decimated bisectional pitch refinement |
CN1920947B (en) * | 2006-09-15 | 2011-05-11 | 清华大学 | Voice/music detector for audio frequency coding with low bit ratio |
CN101197130B (en) * | 2006-12-07 | 2011-05-18 | 华为技术有限公司 | Sound activity detecting method and detector thereof |
KR100964402B1 (en) * | 2006-12-14 | 2010-06-17 | 삼성전자주식회사 | Method and Apparatus for determining encoding mode of audio signal, and method and appartus for encoding/decoding audio signal using it |
KR100883656B1 (en) | 2006-12-28 | 2009-02-18 | 삼성전자주식회사 | Method and apparatus for discriminating audio signal, and method and apparatus for encoding/decoding audio signal using it |
CN101226744B (en) * | 2007-01-19 | 2011-04-13 | 华为技术有限公司 | Method and device for implementing voice decode in voice decoder |
KR100925256B1 (en) * | 2007-05-03 | 2009-11-05 | 인하대학교 산학협력단 | A method for discriminating speech and music on real-time |
CN101393741A (en) * | 2007-09-19 | 2009-03-25 | 中兴通讯股份有限公司 | Audio signal classification apparatus and method used in wideband audio encoder and decoder |
CN101399039B (en) * | 2007-09-30 | 2011-05-11 | 华为技术有限公司 | Method and device for determining non-noise audio signal classification |
CN101221766B (en) * | 2008-01-23 | 2011-01-05 | 清华大学 | Method for switching audio encoder |
CN101965612B (en) * | 2008-03-03 | 2012-08-29 | Lg电子株式会社 | Method and apparatus for processing a signal |
CN101236742B (en) * | 2008-03-03 | 2011-08-10 | 中兴通讯股份有限公司 | Music/ non-music real-time detection method and device |
US8768690B2 (en) * | 2008-06-20 | 2014-07-01 | Qualcomm Incorporated | Coding scheme selection for low-bit-rate applications |
EP2144230A1 (en) * | 2008-07-11 | 2010-01-13 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Low bitrate audio encoding/decoding scheme having cascaded switches |
PL2301011T3 (en) | 2008-07-11 | 2019-03-29 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Method and discriminator for classifying different segments of an audio signal comprising speech and music segments |
KR101261677B1 (en) * | 2008-07-14 | 2013-05-06 | 광운대학교 산학협력단 | Apparatus for encoding and decoding of integrated voice and music |
CN101751920A (en) * | 2008-12-19 | 2010-06-23 | 数维科技(北京)有限公司 | Audio classification and implementation method based on reclassification |
CN101814289A (en) * | 2009-02-23 | 2010-08-25 | 数维科技(北京)有限公司 | Digital audio multi-channel coding method and system of DRA (Digital Recorder Analyzer) with low bit rate |
JP5519230B2 (en) * | 2009-09-30 | 2014-06-11 | パナソニック株式会社 | Audio encoder and sound signal processing system |
CN102237085B (en) * | 2010-04-26 | 2013-08-14 | 华为技术有限公司 | Method and device for classifying audio signals |
CA2821577C (en) | 2011-02-15 | 2020-03-24 | Voiceage Corporation | Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a celp codec |
-
2012
- 2012-12-20 US US13/722,669 patent/US9111531B2/en active Active
- 2012-12-21 HU HUE12810018A patent/HUE027037T2/en unknown
- 2012-12-21 IN IN1588MUN2014 patent/IN2014MN01588A/en unknown
- 2012-12-21 SI SI201230593A patent/SI2803068T1/en unknown
- 2012-12-21 KR KR1020147022400A patent/KR20140116487A/en active Application Filing
- 2012-12-21 KR KR1020177000172A patent/KR20170005514A/en not_active Application Discontinuation
- 2012-12-21 BR BR112014017001-0A patent/BR112014017001B1/en active IP Right Grant
- 2012-12-21 EP EP12810018.7A patent/EP2803068B1/en active Active
- 2012-12-21 JP JP2014552206A patent/JP5964455B2/en active Active
- 2012-12-21 ES ES12810018.7T patent/ES2576232T3/en active Active
- 2012-12-21 WO PCT/US2012/071217 patent/WO2013106192A1/en active Application Filing
- 2012-12-21 CN CN201280066779.6A patent/CN104040626B/en active Active
- 2012-12-21 DK DK12810018.7T patent/DK2803068T3/en active
Also Published As
Publication number | Publication date |
---|---|
WO2013106192A1 (en) | 2013-07-18 |
EP2803068A1 (en) | 2014-11-19 |
BR112014017001A8 (en) | 2017-07-04 |
CN104040626B (en) | 2017-08-11 |
BR112014017001A2 (en) | 2017-06-13 |
US20130185063A1 (en) | 2013-07-18 |
DK2803068T3 (en) | 2016-05-23 |
JP2015507222A (en) | 2015-03-05 |
BR112014017001B1 (en) | 2020-12-22 |
HUE027037T2 (en) | 2016-08-29 |
KR20140116487A (en) | 2014-10-02 |
ES2576232T3 (en) | 2016-07-06 |
KR20170005514A (en) | 2017-01-13 |
EP2803068B1 (en) | 2016-04-13 |
JP5964455B2 (en) | 2016-08-03 |
US9111531B2 (en) | 2015-08-18 |
CN104040626A (en) | 2014-09-10 |
SI2803068T1 (en) | 2016-07-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
IN2014MN01588A (en) | ||
PH12015501587B1 (en) | Signaling audio rendering information in a bitstream | |
MX2023001960A (en) | Multi-channel audio decoder, multi-channel audio encoder, methods and computer program using a residual-signal-based adjustment of a contribution of a decorrelated signal. | |
MX351359B (en) | Encoder, decoder and methods for signal-dependent zoom-transform in spatial audio object coding. | |
WO2014168934A3 (en) | Systems and methods for generating a digital output signal in a digital microphone system | |
PH12015501575A1 (en) | Device and method for reducing quantization noise in a time-domain decoder | |
ATE527834T1 (en) | ECONOMICAL LOUDNESS MEASUREMENT OF CODED AUDIO | |
MX2009007412A (en) | Audio decoder. | |
HK1158804A1 (en) | Method and discriminator for classifying different segments of a signal | |
EP2846229A3 (en) | Systems and methods for generating haptic effects associated with audio signals | |
WO2010003109A3 (en) | Speech recognition with parallel recognition tasks | |
MX2016000908A (en) | Apparatus and method for low delay object metadata coding. | |
IN2015MN01766A (en) | ||
MY187728A (en) | Method and system for encoding audio data with adaptive low frequency compensation | |
PH12015501516A1 (en) | System and methods of performing filtering for gain determination | |
MY155997A (en) | Apparatus and method for processing an audio signal and for providing a higher temporal granularity for a combined unified speech and audio codec (usac) | |
MY165327A (en) | Apparatus,method and computer program for providing one or more adjusted parameters for provision of an upmix signal representation on the basis of a downmix signal representation and a parametric side information associated with the downmix signal representation,using an average value | |
MX2016002561A (en) | Unvoiced/voiced decision for speech processing. | |
MX2016004923A (en) | Concept for encoding an audio signal and decoding an audio signal using speech related spectral shaping information. | |
MX355258B (en) | Concept for encoding an audio signal and decoding an audio signal using deterministic and noise like information. | |
TH161848B (en) | Encoders, decoders, and methods for zoom-based codecs. To code the signal destination Spatial sound | |
MX2018003531A (en) | Encoder, decoder and methods for signal-adaptive switching of the overlap ratio in audio transform coding. | |
Tomaru et al. | B2-3. Production variation of English schwa and Japanese listeners' perceptual assimilation pattern of English schwa (Summaries of Talks at the 26th General Meeting) | |
MY197703A (en) | Audio signal classification method and apparatus | |
TH147495A (en) | Decoders and methods for the parameterization concept of spatial audio object coding, which makes them commonly used in the case of multi-channel mixing. |