CN104079247B - Balanced device controller and control method and audio reproducing system - Google Patents
Balanced device controller and control method and audio reproducing system Download PDFInfo
- Publication number
- CN104079247B CN104079247B CN201310100401.XA CN201310100401A CN104079247B CN 104079247 B CN104079247 B CN 104079247B CN 201310100401 A CN201310100401 A CN 201310100401A CN 104079247 B CN104079247 B CN 104079247B
- Authority
- CN
- China
- Prior art keywords
- audio
- term
- short
- balanced
- music
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03G—CONTROL OF AMPLIFICATION
- H03G5/00—Tone control or bandwidth control in amplifiers
- H03G5/16—Automatic control
- H03G5/165—Equalizers; Volume or gain control in limited frequency bands
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/02—Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos
- G10H1/06—Circuits for establishing the harmonic content of tones, or other arrangements for changing the tone colour
- G10H1/12—Circuits for establishing the harmonic content of tones, or other arrangements for changing the tone colour by filtering complex waveforms
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/46—Volume control
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03G—CONTROL OF AMPLIFICATION
- H03G5/00—Tone control or bandwidth control in amplifiers
- H03G5/005—Tone control or bandwidth control in amplifiers of digital signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/04—Circuits for transducers, loudspeakers or microphones for correcting frequency response
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/307—Frequency adjustment, e.g. tone control
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/036—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal of musical genre, i.e. analysing the style of musical pieces, usually for selection, filtering or classification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/046—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for differentiation between music and non-music signals, based on the identification of musical parameters, e.g. based on tempo detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/155—Musical effects
- G10H2210/265—Acoustic effect simulation, i.e. volume, spatial, resonance or reverberation effects added to a musical sound, usually by appropriate filtering or delays
- G10H2210/295—Spatial effects, musical uses of multiple audio channels, e.g. stereo
- G10H2210/301—Soundscape or sound field simulation, reproduction or control for musical purposes, e.g. surround or 3D sound; Granular synthesis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/075—Musical metadata derived from musical analysis or for use in electrophonic musical instruments
- G10H2240/081—Genre classification, i.e. descriptive metadata for classification or selection of musical pieces according to style
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/81—Detection of presence or absence of voice signals for discriminating voice from music
Abstract
Disclose balanced device controller and control method and audio reproducing system.In one embodiment, balanced device controller includes:Audio classifiers, for identifying the audio types of audio signal in real time;And adjustment unit, balanced device is adjusted in a continuous manner for the confidence value based on the audio types identified, wherein, audio classifiers are configured to by audio signal classification into multiple audio types with respective confidence value, and adjustment unit is configured to be weighted the confidence value of multiple audio types by the importance based on multiple audio types to consider at least some audio types in multiple audio types.
Description
Technical field
Present invention relates generally to Audio Signal Processing.Specifically, presently filed embodiment is related to for audio classification
With the apparatus and method of audio frequency process, more particularly to dialogue booster, surround sound virtual machine, volume leveller and balanced device
Control.
Background technology
In order to lift the total quality of audio and correspondingly lift Consumer's Experience, some audios improve device be used for when
Audio signal is changed in domain or in spectral domain.Have developed various audios for various purposes improves device.Audio improves
Some Usual examples of device include:
Talk with booster:In film and broadcast or TV programme, for understanding story, dialogue is most important
Composition.In order to improve its definition and its intelligibility, especially for the older of Hearing, enhancing dialogue have developed
Method.
Surround sound virtual machine:Surround sound virtual machine makes it possible in the boombox of PC (PC) or earphone
In render around (multichannel) voice signal.That is, pass through stereo (such as loudspeaker and earphone), surround sound
Virtual machine is that user generates virtual surrounding sound effect, there is provided the experience of film.
Volume leveller:Volume leveller is intended to that the volume of the audio content of playback is adjusted, and is based on target
Loudness value makes the volume almost be consistent on a timeline.
Balanced device:Balanced device provides the uniformity for the spectrum balance for being referred to as " tone " or " tone color ", and enables users to
Reach to amplify some sound or remove undesirable sound and the configuration frequency response (gain) on each single frequency band
One-piece pattern (curve or shape).In traditional balanced device, for example different music wind of different sound can be directed to
Lattice and different equalizer presets is provided.Once have selected preset, or there is provided balanced mode, then apply on signal identical
EQ Gain, untill the balanced mode is by manual modification.By contrast, dynamic equalizer passes through continuous monitoring audio
Spectrum balance, it is compared with desired tone and dynamically adjusts equalization filter so that the original pitch of audio to be changed into
Tone it is expected, to realize spectrum balance uniformity.
Generally, audio, which improves device, has the application scenario/context of its own.That is, audio improvement device can
Specific properties collection can be only applicable to and be not suitable for all possible audio signal because different contents may need with
Different mode is handled.For example, dialogue Enhancement Method is generally applied to movie contents.If Enhancement Method application will be talked with
In the music without dialogue, then some frequency subbands may mistakenly be strengthened and introduce substantial amounts of sound by talking with Enhancement Method
Color change and inconsistency perceptually.Similarly, if noise suppressing method is applied on music signal, can hear
Strong distortion.
But improve for generally including one group of audio for the audio frequency processing system of device, it is inputted inevitably
It is probably the audio signal of be possible to type.For example, the audio frequency processing system being integrated in PC will be received from each introduces a collection
Audio content, including film, music, VoIP and game.Therefore, in order to every using preferable algorithm or application to corresponding contents
The preferable parameter of individual algorithm, it is important that identify or distinguish these processed contents.
Improve algorithm to distinguish audio content and the correspondingly preferable parameter of application or preferable audio, it is traditional
System be generally pre-designed one group it is preset, and it is preset to select for the content to be played to require user.It is preset generally by one
Group audio improves algorithm and/or its optimal parameter to be applied is encoded, such as special for film or music playback
" film " of design is preset and " music " is preset.
But for a user, manually select and inconvenient.User will not generally enter at various predefined preset
Row frequently switches, but keeps preset using one to all the elements.In addition, in some automated solutions,
Parameter in preset or algorithm set be typically it is discrete (for example, to being opened for the special algorithm of certain content or
Person closes), it can not be in a manner of continuous based on content come adjusting parameter.
The content of the invention
The first aspect of the application is that the audio content based on playback automatically configures audio improvement in a continuous manner
Device.By " automatic " pattern, user can not have to tired different preset in selecting, and simply enjoy their content.Separately
On the one hand, in order to avoid the audible distortion at transfer point, continuous regulation are more important.
According to the embodiment of first aspect, a kind of audio processing equipment includes:Audio classifiers, for by audio signal
It is categorized into real time at least one audio types;Audio improves device, for improving listener experiences;And adjustment unit, use
Improve at least one ginseng of device in the confidence value based at least one audio types to adjust audio in a continuous manner
Number.
It can be any in dialogue booster, surround sound virtual machine, volume leveller and balanced device that audio, which improves device,
Device.
Correspondingly, a kind of audio-frequency processing method includes:Audio signal is categorized at least one audio types in real time;
And the confidence value based at least one audio types come adjust in a continuous manner it is at least one for audio improve
Parameter.
According to another embodiment of first aspect, a kind of volume leveller controller includes:Audio content grader,
For identifying the content type of audio signal in real time;And adjustment unit, for based on the content type identified come to connect
Continuous mode adjusts volume leveller.Adjustment unit is configurable to make the dynamic gain of volume leveller and the letter of audio signal
The content type positive correlation of breath property, and make the dynamic gain of volume leveller and the interference content type negative correlation of audio signal.
Also disclose a kind of audio processing equipment for including above-mentioned volume leveller controller.
Correspondingly, a kind of volume leveller control method includes:The content type of audio signal is identified in real time;By making
The dynamic gain of volume leveller and the informational content type positive correlation of audio signal, and increase the dynamic of volume leveller
The interference content type of benefit and audio signal is negatively correlated, and adjusts sound in a continuous manner based on the content type identified
Measure leveller.
According to the further embodiment of first aspect, a kind of balanced device controller includes:Audio classifiers, for real-time
Ground identifies the audio types of audio signal;And adjustment unit, for based on the audio types identified come in a continuous manner
Adjust balanced device.
Also disclose a kind of audio processing equipment for including above-mentioned balanced device controller.
Correspondingly, a kind of balanced device control method includes:The audio types of audio signal are identified in real time;And based on institute
The audio types of identification adjust balanced device in a continuous manner.
Present invention also provides the computer-readable medium for being recorded on computer program instructions, when by processor Lai
When performing the instruction, the instruction enable a processor to perform above-mentioned audio-frequency processing method or volume leveller control method,
Or balanced device control method.
, can be according to the type of audio signal and/or the confidence level of the type according to each embodiment of first aspect
Value improves device continuously to adjust audio, and it can be dialogue booster, surround sound virtual machine, volume school that the audio, which improves device,
One of flat device and balanced device.
The second aspect of the application is to develop content recognition component to identify multiple audio types, and can use detection
As a result manipulating/instruct various audios by finding preferable parameter in a continuous manner improves the working method of device.
According to the embodiment of second aspect, audio classifiers include:Short-term characteristic extractor, for from each including sound
Short-term characteristic is extracted in the short-term audio fragment of frequency frame sequence;Short-term grader, for using corresponding Short-term characteristic come by length
Short-term audio fragment sequence in phase audio fragment is categorized into short-term audio types;Statistics extractor, it is short for calculating
Phase grader is directed to the statistics of the result of the short-term audio fragment sequence in the long-term audio fragment, as long-term characteristic;
And long-term grader, for long-term audio fragment to be categorized into long-term audio types using long-term characteristic.
Also disclose a kind of audio processing equipment for including above-mentioned audio classifiers.
Correspondingly, a kind of audio frequency classification method includes:Extracted from each short-term audio fragment including audio frame sequence
Short-term characteristic;The short-term audio fragment sequence in long-term audio fragment is categorized into short-term audio using corresponding Short-term characteristic
In type;Statistics of the short-term grader for the result of the short-term audio fragment sequence in the long-term audio fragment is calculated,
As long-term characteristic;And long-term audio fragment is categorized into long-term audio types using long-term characteristic.
According to another embodiment of second aspect, a kind of audio classifiers include:Audio content grader, for knowing
The content type of the short-term fragment of other audio signal;And audio context grader, for being based at least partially on by audio
The content type that content classifier is identified identifies the context type of the short-term fragment.
Also disclosing includes the audio processing equipment of above-mentioned audio classifiers.
Correspondingly, a kind of audio frequency classification method includes:Identify the content type of the short-term fragment of audio signal;And at least
Identified content type is based in part on to identify the context type of the short-term fragment.
The present disclosure also provides the computer-readable medium that record thereon has computer program instructions, when by processor
During performing the instruction, the instruction enables a processor to perform above-mentioned audio frequency classification method.
According to one embodiment, there is provided a kind of balanced device controller, including:Audio classifiers, for continuously identifying sound
The audio types of frequency signal;And adjustment unit, for the confidence value based on the audio types identified come with continuous side
Formula adjusts balanced device, wherein, audio classifiers are configured to audio signal classification to multiple sounds with respective confidence value
In frequency type, and adjustment unit is configured to the confidence by the importance based on multiple audio types to multiple audio types
Angle value is weighted to consider at least some audio types in multiple audio types.
According to another embodiment, there is provided a kind of balanced device controller, including:Audio classifiers, for continuously identifying
The audio types of audio signal;And adjustment unit, for the confidence value based on the audio types identified come with continuous
Mode adjusts balanced device, wherein, audio classifiers are configured to audio signal classification to multiple with respective confidence value
In audio types, and adjustment unit be configured to by based on influence of the confidence value to multiple audio types be weighted come
Consider at least some audio types in multiple audio types.
According to another embodiment, there is provided a kind of audio reproducing system, it includes the balanced device control according to above-described embodiment
Device processed.
According to one embodiment, there is provided a kind of balanced device control method, including:The audio class of audio signal is identified in real time
Type;And the confidence value based on the audio types identified adjusts balanced device in a continuous manner, wherein, by audio signal
It is categorized into multiple audio types with respective confidence value, and the operation adjusted is configured to by based on multiple audios
The importance of type is weighted to the confidence value of multiple audio types to consider at least some sounds in multiple audio types
Frequency type.
According to another embodiment, there is provided a kind of balanced device control method, including:The audio of audio signal is identified in real time
Type;And confidence value based on the audio types identified adjusts balanced device in a continuous manner, wherein, audio is believed
Number it is categorized into multiple audio types with respective confidence value, and the operation adjusted is configured to by based on confidence level
Influence of the value to multiple audio types is weighted to consider at least some audio types in multiple audio types.
According to each embodiment of second aspect, audio signal can be classified into different long-term type or upper and lower
In literary type, either context type is different from short-term type or content type for the long-term type.The type of audio signal and/
Or the confidence value of type can be also used for adjusting audio improvement device, such as dialogue booster, surround sound virtual machine, volume school
Flat device or balanced device.
Brief description of the drawings
In the accompanying drawings, in an illustrative manner and unrestricted mode illustrates the application, in the accompanying drawings, identical accompanying drawing mark
Note represents similar element, in the accompanying drawings:
Fig. 1 schematic diagram illustrates the audio processing equipment according to presently filed embodiment;
Fig. 2 and Fig. 3 schematic diagram illustrates the modification of embodiment as shown in Figure 1;
Fig. 4 to Fig. 6 schematic diagram illustrate for identify multiple audio types and calculate confidence value grader can
Can framework;
Fig. 7 to Fig. 9 schematic diagram illustrates more embodiments of the audio processing equipment of the application;
Figure 10 schematic diagram illustrates the transfer lag between different audio types;
Figure 11 to Figure 14 is the flow chart according to the audio-frequency processing method of presently filed embodiment;
Figure 15 schematic diagram illustrates strengthens controller according to the dialogue of presently filed embodiment;
Figure 16 and Figure 17 is the flow that the audio-frequency processing method according to the application is used in the control to talking with booster
Figure;
Figure 18 schematic diagram illustrates the surround sound virtual machine controller according to presently filed embodiment;
Figure 19 is the flow chart that the audio-frequency processing method according to the application is used in the control to surround sound virtual machine;
Figure 20 schematic diagram illustrates the volume leveller controller according to presently filed embodiment;
Figure 21 schematic diagram illustrates the effect of the volume leveller controller according to the application;
Figure 22 schematic diagram illustrates the balanced device controller according to presently filed embodiment;
Figure 23 shows that desired spectrum balances preset some examples;
Figure 24 schematic diagram illustrates the audio classifiers according to presently filed embodiment;
Figure 25 and Figure 26 schematic diagram illustrates some features as used in the audio classifiers according to the application;
Figure 27 to Figure 29 schematic diagram illustrates more embodiments of the audio classifiers according to the application;
Figure 30 to Figure 33 is the flow chart according to the audio frequency classification method of presently filed embodiment;
Figure 34 schematic diagram illustrates the audio classifiers of another embodiment according to the application;
Figure 35 schematic diagram illustrates the audio classifiers of the further embodiment according to the application;
Figure 36 schematic diagram illustrates the heuristic rule used in the audio classifiers of the application;
Figure 37 and Figure 38 schematic diagram illustrates more embodiments of the audio classifiers according to the application;
Figure 39 and Figure 40 is the flow chart according to the audio frequency classification method of presently filed embodiment;And
Figure 41 is the block diagram for realizing the example system according to presently filed embodiment.
Embodiment
Presently filed embodiment is described referring to the drawings.It should be noted that for the sake of clarity, in accompanying drawing and description
In eliminate to as well known to those skilled in the art and for understanding that the application is not those required components and processing
Represent and describe.
It will be understood by those skilled in the art that the various aspects of the application may be implemented as system, device (example
Such as, cellular phone, portable electronic device, personal computer, server, TV set-top box or digital VTR, or
Person any other media player), method or computer program product.Therefore, the various aspects of the application can be taken firmly
The form of part embodiment, the form of Software Implementation (including firmware, resident software, microcode etc.) or by software and hardware
The form for the embodiment that aspect combines, " circuit ", " module ", " system " can be generally referred to as here.Moreover,
The various aspects of the application can take the one or more computer for including computer-readable program coding thereon can
Read the form of computer program product included in medium.
Any combinations of one or more computer-readable medium can be used.Computer-readable medium can be meter
Calculation machine readable signal medium or computer-readable recording medium.Computer-readable recording medium for example may be, but not limited to,
System, equipment or device of electronics, magnetic, optical, electromagnetism, infrared either semiconductor, or it is above-mentioned
Any appropriate combination.The more specifically example (enumerating for nonexhaustive) of computer-readable recording medium can include:Have
Electrical connection, portable computer diskette, hard disk, random access memory (RAM), the read-only storage of one or more wire
Device (ROM), Erasable Programmable Read Only Memory EPROM (EPROM or flash memory), optical fiber, compact disc read-only memory (CD-ROM), light
Storage arrangement, magnetic storage device or above-mentioned any appropriate combination.It is computer-readable in the linguistic context of this document
Storage medium can be can include either storing for used by instruction execution system, equipment or device or or with
Combined use program any tangible medium.
Computer-readable signal media can include the propagation data signal for wherein including computer-readable program coding,
Such as in a base band or the part as carrier wave.Such transmitting signal can take various forms, include but is not limited to,
Electromagnetic signal either optical signalling or its any suitable combination.
Computer-readable signal media can be any computer-readable medium in addition to computer-readable recording medium,
It can communicate, propagate either transmission by the use or or used in combination of instruction execution system, equipment or device
Program.
Included program coding can be transmitted using any appropriate medium in computer-readable medium, appropriate Jie
Matter includes but is not limited to:Radiolink, Wireline, optical cable, RF (radio frequency) etc., or above-mentioned any suitable combination.
The computer program code that operation is performed for the various aspects for the application can be with one or more
Any combinations of individual programming language are write, and programming language includes programming language such as Java, Smalltalk, C of object-oriented
++ etc., and conventional program programming language, such as " C " programming language or similar programming language.Program coding can conduct
Independent software package fully performs on the computer of user, or part performs on the computer of user, partly remote
Perform on journey computer, either performed completely on remote computer or server.In the scene of the latter, remote computer
Can be by the computer of any type of network connection to user, any type of network includes LAN (LAN) or wide
Domain net (WAN), or can be with externally connected computer (for example, being connected using Internet service operator by internet).
Hereinafter, the flow of the method according to presently filed embodiment, equipment (system) and computer program product is passed through
Figure diagram and/or block diagram describe the various aspects of the application.It is to be understood that each frame of flowchart illustration and/or block diagram,
And the combination of the frame of flowchart illustration and/or block diagram, it can be realized by computer program instructions.Can be by these computers
Programmed instruction is supplied to the processor of all-purpose computer, special-purpose computer or other programmable data processing devices, to be formed
Machine so that formed by the instruction of computer or the computing device of other programmable data processing devices for realizing stream
The device of specified function/action in journey figure and/or a block of block diagram or multiple pieces.
These computer program instructions can also be stored in computer-readable medium, its can instruct computer, other
Programmable data processing device or other devices work in a particular manner, to cause the institute in computer-readable medium
A kind of manufacture is produced in the instruction of storage, and the manufacture includes implementation process figure and/or a block of block diagram or multiple pieces
In specified function/action instruction.
Computer programming instruction can also be loaded into computer, other programmable data processing devices or other devices
On, to cause a series of arithmetic operations to be carried out on computer, other programmable data processing devices or other devices,
So as to produce computer-implemented processing, to cause the instruction performed on computer or other programmable data processing devices to carry
For the processing for realizing specified function/action in flow chart and/or a block of block diagram or multiple pieces.
Presently filed embodiment described in detail below, for the sake of clarity, carry out tissue description according to following framework:
Part 1:Audio processing equipment and method
The audio types of trifle 1.1
The confidence value of the audio types of trifle 1.2 and the framework of grader
Trifle 1.3 carries out smooth to the confidence value of audio types
The parameter adjustment of trifle 1.4
The parameter smoothing of trifle 1.5
The conversion of the audio types of trifle 1.6
The combination of the embodiment of trifle 1.7 and application scenarios
The audio-frequency processing method of trifle 1.8
Part 2:Talk with booster controller and control method
The rank of the dialogue enhancing of trifle 2.1
Trifle 2.2 is used for the threshold value for determining the frequency band to be strengthened
Adjustment of the trifle 2.3 to background sound level
The combination of the embodiment of trifle 2.4 and application scenarios
Trifle 2.5 talks with booster control method
Third portion:Surround sound virtual machine controller and control method
The surround sound enhancing amount of trifle 3.1
The initial frequency of trifle 3.2
The combination of the embodiment of trifle 3.3 and application scenarios
The surround sound virtual machine control method of trifle 3.4
4th part:Volume leveller controller and control method
The informational content type of trifle 4.1 and interference content type
Content type in 4.2 different contexts of trifle
The context type of trifle 4.3
The combination of the embodiment of trifle 4.4 and application scenarios
The volume leveller control method of trifle 4.5
5th part:Balance controller and control method
Control of the trifle 5.1 based on content type
There is leading source in the music of trifle 5.2
The balanced device of trifle 5.3 it is preset
Control of the trifle 5.4 based on context type
The combination of the embodiment of trifle 5.5 and application scenarios
The balanced device control method of trifle 5.6
6th part:Audio classifiers and sorting technique
Context classifier of the trifle 6.1 based on content type classification
The extraction of the long-term characteristic of trifle 6.2
The extraction of the Short-term characteristic of trifle 6.3
The combination of the embodiment of trifle 6.4 and application scenarios
The audio frequency classification method of trifle 6.5
7th part:VoIP graders and sorting technique
Context quantization of the trifle 7.1 based on short-term fragment
Trifle 7.2 uses the classification of VoIP voices and VoIP noises
Trifle 7.3 makes smoothing fluctuations
The combination of the embodiment of trifle 7.4 and application scenarios
Trifle 7.5VoIP sorting techniques
Part 1:Audio processing equipment and method
Fig. 1 shows the overall framework for the audio processing equipment 100 for being adapted to content, and this is adapted to the audio frequency process of content
Equipment 100 is supported to improve device 400 based on the audio content of playback automatically to configure at least one audio with improved parameter.
The overall framework includes three major parts:Audio classifiers 200, adjustment unit 300 and audio improve device 400.
Audio classifiers 200 are used to audio signal being categorized into real time at least one audio types.Audio classifiers
200 automatically identify the audio types of playing back content.Any audio classification techniques, for example, by signal transacting, machine learning and
The audio classification techniques that pattern-recognition is realized, it can apply to identify audio content.Confidence value can be generally estimated simultaneously, put
Certainty value represents probability of the audio content for one group of predefined target audio type.
Audio improves device 400 and is used to lift listener experiences by handling audio signal, later will be detailed
Describing audio improves device 400.
Adjustment unit 300 changes for the confidence value based at least one audio types to adjust audio in a continuous manner
At least one parameter of kind device.Adjustment unit 300, which is designed to manipulation of audio, improves the working method of device 400.Adjustment
Unit 300 estimates that respective audio improves the optimal parameter of device based on the result obtained from audio classifiers 200.
Various audios can be applied to improve device in this device.Fig. 2 is shown improves showing for device including four audios
Example sexual system, the system include talking with booster (Dialog Enhancer, DE) 402, surround sound virtual machine (Surround
Virtualizer, SV) 404, volume leveller (Volume Leveler, VL) 406 and balanced device (Equalizer, EQ) 408.
, can in a continuous manner automatically based on the result (audio types and/or confidence value) obtained in audio classifiers 200
Adjusting each audio improves device.
Certainly, audio processing equipment may not necessarily include the audio improvement device of all categories, and can only include wherein
One or more audio improve device.On the other hand, audio improves that device is not limited to provide in present disclosure
A little devices, and can include further types of audio improves device, it is also within the scope of application.In addition, present disclosure
Those audios of middle discussion improve the title of device, including dialogue booster (DE) 402, surround sound virtual machine (SV) 404, volume
Leveller (VL) 406 and balanced device (EQ) 408, should not be construed as limiting, it is identical that each of which should be understood that covering is realized
Or any other device of identity function.
1.1 audio types
In order to suitably control various types of audios to improve device, present invention also offers the new frame of audio types
Structure, but those audio types of the prior art can also be applied to this.
Specifically, the audio types of different meaning of one's words ranks are modeled, including represents basic group in audio signal
The low level audio element divided and the high-level sound for representing most common audio content in the entertainment applications of user in real life
Frequency type.The former can also be named as " content type ", and basic audio content type can include voice (speech), sound
Happy (music, including song), background sound (background sound, or audio) and noise (noise).
The implication of voice and music is self-evident.Noise in this application means physical noise, rather than finger speech meaning
Noise.In this application, physical noise can include the noise from such as air-conditioning, and from technical reason noise for example
Pink noise caused by signal transmission path.By contrast, " background sound " in the application is that those can be hair
The audio of the raw auditory events around the core objective of hearer's notice.For example, in audio signal in telephone relation, remove
The sound of caller, can also there is some other sound unintentionally, such as unrelated with the telephone relation some other
The sound of people, the sound of keyboard, the sound etc. of step.These unwanted sound are referred to as " background sound ", rather than noise.
In other words, " background sound " can be defined as and non-targeted (the either core objective of hearer's notice) or even not
It is desired, but still have the sound of some meaning of one's words implications;And " noise " can be defined as except target sound and background sound it
The unwanted sound of those outer.
Sometimes background sound be not really " unwanted " but be intentionally produced and carry some useful information, example
Such as the background sound in film, TV programme or radio broadcast program.So background sound can also be referred to as sometimes
" audio ".Later in present disclosure, " background sound " is only used for terseness, and can be also simply referred to as the " back of the body
Scape ".
Further, music can be classified into the music for not dominating source and the music for having leading source.If in musical film
Have that a source (voice or musical instrument) is stronger more than other sources, then the music is referred to as " music for having leading source ", otherwise in section
It is referred to as " music of dereliction stem ".For example, with singing in the polyphony of sound and various musical instruments, put down if it is harmony
Weighing apparatus, or the energy in some most important sources is suitable each other, then its music for being considered as not dominating source;Compared to it
Under, if source (for example, voice) loudness is much higher, much, it is considered as including leading source for other source peace and quiet.Make
For another example, single either prominent musical instrument tone is " music with leading source ".
Music is also based on different standards and is divided into different types.It can be classified based on the style of music,
Such as rock and roll, jazz, a Chinese musical telling and folk rhyme, but not limited to this.It is also based on musical instrument and is classified, such as vocal music and instrumental music.Device
Pleasure can include with the various music of different instrument playings, such as piano music and guitar music.Other exemplary standard bags
Rhythm, speed, tone color and/or any other musical features of music are included, to allow music based on the similar of these features
Property and be classified.For example, according to tone color, vocal music is divided into tenor, baritone, bass, soprano, mezzo-soprano and female
Bass.
The content type of audio signal can be for for example classifying including the short-term audio fragment of multiple frames.Generally, sound
The length of frequency frame is multiple milliseconds, such as 20ms, and can be had by the length for the short-term audio fragment that audio classifiers are classified
There are the length from hundreds of milliseconds to the several seconds, such as 1 second.
In order to control audio to improve device in a manner of being adapted to content, audio signal can be classified in real time.Pin
To content type set forth above, the content type of current short-term audio fragment represents the content class of current audio signal
Type.Because the length of short-term audio fragment is not very long, audio signal can be divided into non-overlapped short-term sound in succession
Frequency fragment.But short-term audio fragment can also along audio signal time shaft by continuously/semi-continuously sample.Also
It is to say, short-term audio fragment can move predetermined to time shaft of the step-length along audio signal of one or more frame
The window of length (desired short-term audio clip length) samples.
High-level audio types can also be named as " context type ", because it indicates the long-term class of audio signal
Type, and can be treated as being categorized into the environment or context of the distance speech event of the above type.According to
The application, context type can include most common voice applications, such as film class media (movie-like media), sound
Happy (music, including song), game (game) and VoIP (voice over internet protocol).
The implication of music, game and VoIP is self-evident.Film class media can include film, TV programme, radio
Broadcast program or with above-mentioned any other similar audio frequency media.Film class media be mainly characterized by being mixed with can
Voice, music and the various types of background sounds (audio) of energy.
It should be noted that content type and context type all include music (including song).In the application hereafter
In, distinguish area using vocabulary " short-term music (short-term music) " and " long-term music (long-term music) "
Both point.
For some embodiments of the application, it is also proposed that some other context type frameworks.
For example, audio signal can be classified as the audio (such as film class media and music CD) or low-quality of high quality
The audio (such as the online stream audio of VoIP, low bit rate and content of user's generation) of amount, it may be collectively termed as " audio matter
Measure type ".
As another example, audio signal can be classified as VoIP or non-VoIP, and it can be considered as above-mentioned 4
The deformation of context type framework (VoIP, film class media, (long-term) music and game).With above and below VoIP or non-VoIP
Relatively, audio signal is divided into the audio content type related to VoIP to text, for example, VoIP voices, non-VoIP voices,
VoIP noises and non-VoIP noises.The framework of VoIP audio content types especially has for distinguishing VoIP and non-VoIP contexts
With because VoIP contexts are typically a kind of most challenging application scenarios of volume leveller (audio improvement device).
Generally, the context type of audio signal can divide for the long-term audio fragment longer than short-term audio fragment
Class.The quantity for multiple frames that long-term audio fragment includes is more more than the frame in short-term audio fragment.Long-term audio fragment
Multiple short-term audio fragments can also be included.Generally, long-term audio fragment can have the length of number of seconds magnitude, such as the several seconds is extremely
Tens of seconds, such as 10 seconds.
Similarly, in order to control the audio to improve device in an adaptive way, audio signal can be classified in real time
Into context type.Similarly, the context type of current long-term audio fragment represents the upper and lower of current audio signal
Literary type.Because the length of long-term audio fragment is relatively grown, audio signal can be along the time shaft quilt of audio signal
Continuously/semi-continuously sample, to avoid the drastically change of its context type and therefore caused audio improves device
The drastically change of running parameter.That is, long-term audio fragment can use predetermined length, (desired long-term audio fragment is grown
Degree) window believe with the step-length of one or more frame, or with the step-length of one or more short-term fragment along audio
Number time shaft move and sample.
It is described above both content type and context type.In presently filed embodiment, adjustment unit
300 can be based at least one at least one content type in various content types and/or various context types
Hereafter type improves at least one parameter of device to adjust audio.Therefore, as shown in figure 3, in the embodiment shown in Fig. 1
In deformation, audio classifiers 200 can include either audio context grader 204 or two of audio content grader 202
Person.
The different audio types based on various criterion (such as context type) have been had been mentioned above, also refer to
Different audio types based on different levels rank (such as content type).But the standard and the stratum level
It is provided to convenience described herein and obvious and non-limiting.In other words, in this application, above-mentioned any two or more
Individual audio types can be identified simultaneously by audio classifiers 200, and be considered simultaneously by adjustment unit 300, as retouched hereinafter
State.In other words, all audio types in different levels rank can be arranged side by side, or in same rank.
The confidence value of 1.2 audio types and the framework of grader
Audio classifiers 200 can export hard decision result, or adjustment unit 300 can be by audio classifiers 200
As a result it is considered as hard decision result.Even with hard decision, multiple audio types can also be assigned to audio fragment.Example
Such as, audio fragment can be marked as both " voice " and " short-term music ", because it can be the mixed of voice and short-term music
Close signal.The label obtained, which can be used directly to manipulation of audio, improves device 400.Simple example is when there is voice
Enable dialogue booster 402 and dialogue booster 402 is closed when in the absence of voice.But if without careful smooth side
Case (will be discussed) later, and the hard-decision method may draw at the transfer point from an audio types to another audio types
Enter some unnatural sound.
In order to greater flexibility and can adjust in a continuous manner audio improve device parameter, can estimate
Calculate the confidence value (soft-decision) of each target audio type.Confidence value represents audio content to be identified and target audio type
Between matching it is horizontal, its value is from 0 to 1.
As it was previously stated, many sorting techniques can directly export confidence value.It can also be put according to various methods to calculate
Certainty value, these methods can be considered as a part for grader.If for example, pass through some probabilistic Modeling technologies such as Gauss
Mixed model (Gaussian Mixture Models, GMM) trains audio model, then posterior probability can be used to indicate
Confidence value, such as:
Wherein, x is an audio fragment, ciIt is target audio type, N is the quantity of target audio type, p (x | ci) be
Audio fragment x belongs to audio types ciPossibility, and p (ci| x) it is corresponding posterior probability.
On the other hand, if passing through some discriminating methods such as SVMs (Support Vector
Machine, SVM) and adaBoost train audio mode, then score (actual value) can only be obtained according to the control of model.
In the case of these, usually using sigmoid function (sigmoid function) by the score obtained (in theory from-∞ to ∞)
It is mapped to desired confidence level (from 0 to 1):
Wherein, y is the output score from SVM or adaBoost, and A and B are that needs are well-known by using some
Technology concentrate two parameters estimating from training data.
For some embodiments of the application, adjustment unit 300 can use more than two content type and/or more
In the context type of two.So, audio content grader 202 just needs to identify more than two content type, and/or
Person's audio context grader 204 needs to identify more than two context type.In this case, audio content grader
202 or audio context grader 204 can be classifiers with certain framework tissue.
For example, if adjustment unit 300 needs four kinds of all context types:Film class media, long-term music, game
And VoIP, then audio context grader 204 can have following different framework:
First, audio context grader 204 can include:The 6 man-to-man binary organized as shown in Figure 4
Grader (each grader is differentiated a target audio type and another target audio type);As shown in Figure 5
(each grader is by a target audio class for the binary classifier of 3 " a pair other " (one-to-others) organizing like that
Type is differentiated with other target audio types);And 4 " a pair other " graders organized as shown in Figure 6.Also
There is other frameworks such as decision-making directed acyclic graph (Decision Directed Acyclic Graph, DDAG) framework.Note
Meaning, in Fig. 4 to Fig. 6 and following corresponding description, " film (movie) " rather than " film class are used for simplicity
Media ".
Each binary classifier will provide confidence score H (x) and export (x represents audio fragment) as it.It is every obtaining
, it is necessary to map that to the final confidence value of identified context type after the output of individual binary classifier.
In general it is assumed that in be classified to M context type of audio signal (M is positive integer).Traditional is man-to-man
Framework constructs M (M-1)/2 grader each trained by the data from two classifications, then each one-against-one device
A ticket of its classification being inclined to is launched, and final result is who gets the most votes's class in the classification of M (M-1)/2 grader
Not.Compared with traditional one-to-one framework, the level framework in Fig. 4 is also required to construct M (M-1)/2 grader.But survey
Examination iteration can shorten to M-1 times, be/be not in respective classes because fragment x will be determined in each hierarchy levels, and
And overall number of levels is M-1.Can be according to binary classification confidence level Hk(x) calculate for the final of various context types
Confidence value, such as (k=1,2 ... 6, represent different context types):
CMOVIE=(1-H1(x))·(1-H3(x))·(1-H6(x))
CVOIP=H1(x)·H2(x)·H4(x)
CMUSIC=H1(x)·(1-H2(x))·(1-H5(x))+H3(x)·(1-H1(x))·(1-H5(x))
+H6(x)·(1-H1(x))·(1-H3(x))
CGAME=H1(x)·H2(x)·(1-H4(x))+H1(x)·H5(x)·(1-H2(x))+H3(x)·H5(x)
·(1-H1(x))
In framework as shown in Figure 5, from binary classification result Hk(x) can be with to the mapping function of final confidence value
It is defined as following example:
CMOVIE=H1(x)
CMUSIC=H2(x)·(1-H1(x))
CVOIP=H3(x)·(1-H2(x))·(1-H1(x))
CGAME=(1-H3(x))·(1-H2(x))·(1-H1(x))
In framework as shown in Figure 6, final confidence value can be equal to corresponding binary classification result Hk (x), or
If person requires for the confidence value of all categories and is 1, final confidence value can be based on the H estimatedk(x)
To be simply normalized:
CMOVIE=H1(x)/(H1(x)+H2(x)+H3(x)+H4(x))
CMUSIC=H2(x)/(H1(x)+H2(x)+H3(x)+H4(x))
CMOVIP=H3(x)/(H1(x)+H2(x)+H3(x)+H4(x))
CGAME=H4(x)/(H1(x)+H2(x)+H3(x)+H4(x))
One or more classification with maximum confidence value can be determined that the final class identified.
It should be noted that in framework as shown in Figures 4 to 6, the order of different binary classifiers is not necessarily as schemed
It is shown, but can be other orders, this sequentially can pass through manual assignment or automatic according to the different demands of various applications
Learn to select.
Above description is directed to audio context grader 204.For audio content grader 202, situation is similar.
Alternately, whether audio content grader 202 or audio context grader 204 may be implemented as
All the elements type/context type is identified while providing the single grader of respective confidence value simultaneously.There are many use
In the prior art for accomplishing this point.
Using confidence value, the output of audio classifiers 200 can represent that each dimension represents each target with vector
The confidence value of audio types.For example, if target audio type were voice, short-term music, noise, background successively, example
Output result can be (0.9,05,0.0,0.0), represent that its 90% ground determines that the audio content is voice, 50% ground determines should
Audio is music.Pay attention to, vectorial all dimensions of output and be not necessarily 1 (for example, Fig. 6 result is not necessarily normalizing
Change), it is probably the mixed signal of voice and short-term music to represent the audio signal.
In the 6th follow-up part and the 7th part, will be discussed in detail audio context classification and audio content classification it is new
The implementation of grain husk.
The confidence value of 1.3 audio types it is smooth
Alternatively, after each audio fragment has been classified into predefined audio types, additional step is
Make classification results smooth along time shaft, to avoid the drastically transition from a type to another type, and change audio
The estimation of parameter in kind device is smoother.For example, a long selections are in addition to only having a fragment to be classified as VoIP
It is classified as film class media, therefore lofty VoIP decision-makings can be revised as by film class media by smoothing processing.
Therefore, in the modification of embodiment as shown in Figure 7, type smooth unit 712 is also provided with, for for every
Individual audio types, the confidence value when secondary audio program signal is carried out smoothly.
Conventional smoothing method is based on weighted average, for example, calculate current actual degree of belief value with last time it is smoothed
Confidence value weighted sum, it is as follows:
SmoothConf (t)=β smoothConf (t-1)+(1- β) conf (t) (3)
Wherein, t is represented when time (current audio fragment), and t-1 represents last (a upper audio fragment), and β is weight,
Conf and smoothConf be respectively it is smooth before confidence value and it is smooth after confidence value.
From the perspective of confidence value, the result of the hard decision from grader can also be represented with confidence value,
Its value is 0 or 1.That is, if certain target audio type is chosen to distribute to certain audio fragment, corresponding confidence level is
1;Otherwise, confidence level is 0.Therefore, even if audio classifiers 200 do not provide confidence value and only provided on the hard of audio types
Judgement, can also be adjusted the continuous adjustment of unit 300 by the smooth operation of type smooth unit 712.
By using different smoothing weights for different situations, smoothing algorithm can be " asymmetric ".For example, it is used for
Calculate weighted sum weight can confidence value based on the audio types of audio signal adaptively change.Current clip
Confidence value is bigger, then its weight is bigger.
From another perspective, can be based on different from an audio types to another for calculating the weight of weighted sum
The conversion of one audio types to adaptively changing, especially when based on identified by audio classifiers 200 it is multiple in
When holding type, rather than improving device based on the existence or non-existence of single content type to adjust one or more audios.For example,
For from the audio types relatively frequently occurred in some context to less frequently occur in this context another
The conversion of audio types, the confidence value of the latter can be carried out smoothly, to cause it too fast not increase, because it may
Simply individual accidental interference.
Another factor is that change (increases or reduced) trend, including rate of change.Assuming that go out more concerned with audio types
The delay of (that is, when its confidence value increase) now, can design smoothing algorithm as follows:
Above formula enables the confidence value quick response current state smoothed when confidence value increase, and works as
Confidence value confidence value smoothed when reducing can slowly disappear.Smooth letter can be easily designed in a similar manner
Several modifications.For example, formula (4) could be modified so that proper conf (t)>Conf (t) power during=smoothConf (t-1)
Become much larger again.In fact, in formula (4), it is believed that β=0 and conf (t) weight become maximum, i.e., and 1.
Change from the point of view of a visual angle, the change trend for considering some audio types is to consider the different switching pair of audio types
Specific example.For example, the increase of type A confidence value can be seen as being transformed into A from non-A, and type A confidence level
The reduction of value can be seen as being transformed into non-A from A.
1.4 parameter adjustment
Adjustment unit 300 is designed to estimate or adjust audio based on the result obtained from audio classifiers 200
Improve the suitable parameter of device 400.By using content type either context type or use content type and context
Both types are used for joint decision, can improve device for different audios to design different adjustment algorithms.For example, use
Context type information such as film class media and long-term music, it can automatically select foregoing preset and be applied
It is added on corresponding contents.Using obtainable content-type information, each audio can be adjusted in a manner of more accurately improves dress
The parameter put, as further part will be introduced.Content-type information and upper can also be jointly used in adjustment unit 300
Context information, to balance long-term information and short term information.The specific adjusted algorithm for improving device for special audio can be worked as
Make independent adjustment unit, or different adjustment algorithms can be considered as united adjustment unit jointly.
That is, adjustment unit 300 is configured to the confidence value and/or extremely of at least one content type
The confidence value of a few context type improves at least one parameter of device to adjust audio.Improve for specific audio
Device, some audio types are informednesses and some audio types are interfering.Therefore, specific audio improves device
Parameter can be born with the confidence value positive correlation of the audio types of informedness or with the confidence value of interfering audio types
It is related.Here, " positive correlation " means parameter in a linear fashion or in a non-linear fashion with the confidence level of audio types
The increase or reduction of value and increase or reduce." negative correlation " mean parameter in a linear fashion or in a non-linear fashion with
Being decreased or increased for confidence value of audio types and increase respectively or reduce.
Here, by the reduction of confidence value and increased by positive correlation or negative correlation directly " passed " to ginseng to be adjusted
Number.Mathematically, either " transmission " can be embodied in linear scale or inverse proportion to this correlation, add computing or subtract
Computing (addition or subtraction), multiplication either division operation or nonlinear function.The association of all these forms can be claimed
For " transmission function "., can also be by current confidence value or its mathematical distortions in order to determine increasing or decreasing for confidence value
Compared with a upper confidence value or multiple history confidence values or its mathematical distortions.In the linguistic context of the application, term
" comparison " refer to by subtraction relatively or the comparison that passes through division arithmetic.By determining whether difference is more than 0 or ratio
Whether rate can be determined that increase or reduces more than 1.
In concrete implementation, can by appropriate algorithm (such as transmission function) by the value of parameter and confidence level or
Its ratio or difference are directly associated, therefore " external observer " be not necessarily required to clearly to know specific confidence value with/
Or specific parameter is the increase in and is also the reduction of.By in the ensuing part 2 on specific audio improvement device to the 5th
Some specific examples are provided in part.
As described in trifle above, for same audio fragment, grader 200 can be identified with respective confidence value
Multiple audio types, its confidence value can not necessarily add up to 1 because the audio fragment can include simultaneously it is multiple into
Point, such as music and voice and background sound.In such circumstances, it should which Balanced Audio improves between different audio types
The parameter of device.For example, adjustment unit 300 can be configured as by the importance based at least one audio types at least
The confidence value of one audio types is weighted to consider at least some audio types in multiple audio types.Special audio
Type is more important, then parameter is bigger by its effect.
Weight can also reflect that the informedness of audio types influences and interference influences.For example, for interference audio
Type, can be to the weight that it bears.It will be given in the ensuing part 2 for improving device on specific audio into the 5th part
Go out some specific examples.
Please note that in the linguistic context of the application " weight " has than the wider implication of coefficient in multinomial.Except multinomial
The form of coefficient in formula, it can also be the form of index or power.When being the coefficient in multinomial, weight coefficient can be with
It is normalized or can not be normalized.In short, weight only represents the object being weighted for parameter to be adjusted
Influenceed with how many.
In some other embodiments, for multiple audio types included in same audio fragment, its confidence
Angle value can by normalization be converted into weight, it is then possible to by calculate for each audio types it is predefined and
By the preset parameter value of the Weight based on confidence value and determine final parameter.That is, adjustment unit 300
It may be configured to by the way that the effect of multiple audio types is weighted to consider multiple audio types based on confidence value.
As the specific example of weighting, adjustment unit is configured to consider at least one leading sound based on confidence value
Frequency type.For with low confidence value (being less than threshold value) audio types, it can not be considered.It is equal to confidence
The weight that angle value is less than other audio types of threshold value is arranged to zero.The of device will be improved on specific audio ensuing
2 parts provide some specific examples into the 5th part.
Content type and context type can be considered together.In one embodiment, content type and context class
Type can be treated as can having corresponding weight in same rank and its confidence value.In another embodiment,
As shown by its name, " context type " is the context or environment residing for " content type ", therefore can be configured
Adjustment unit 200 with cause depending on audio signal context type and to different context types audio signal in
Hold type and distribute different weights.In general, any audio types may be constructed the context of another audio types, therefore
Adjustment unit 200 may be configured to change the weight of an audio types according to the confidence value of another audio types.
Some specific examples will be provided into the 5th part in the ensuing part 2 for improving device on specific audio.
In the linguistic context of the application, " parameter " has implication more wider than its literal meaning.Except with single value
Parameter, its can also refer to the set of foregoing preset including different parameter, the vector being made up of different parameters or
Pattern (profile).Specifically, following parameter will be discussed into the 5th part in ensuing part 2, but the application is not
It is limited to this:The rank of dialogue enhancing, the threshold value of the frequency band for determining to talk with enhancing, background sound level, surround sound enhancing amount, use
The dynamic gain of initial frequency, volume leveller in surround sound virtual machine or the scope of dynamic gain, represent audio signal
It is that the parameter of the degree of new discernable audio event, balanced rank, balanced mode and spectrum balance are preset.
1.5 parameter smoothing
In trifle 1.3, discuss and the confidence value of audio types has been carried out smoothly to avoid its acute variation, and
And therefore avoid the acute variation of the parameter of audio improvement device.Other modes are also possible.A kind of mode be by based on
The parameter of audio types adjustment carries out smoothly, this mode being discussed in this trifle;Another way is configuration audio classification
Device and/or adjustment unit will discuss this mode to postpone the change of the result of audio classifiers in trifle 1.6.
In one embodiment, parameter can also be further smoothed audible to avoid to introduce at transfer point
Distortion quick change, such as:
Wherein,It is smoothed parameter, L (t) is not smoothed parameter, and τ is the coefficient for representing time constant, and t is
When secondary, and t-1 is last.
That is, as shown in figure 8, audio processing equipment can include parameter smoothing unit 814, it is used for:For by adjusting
Audio that whole unit 300 is adjusted improves device (such as dialogue booster 402, surround sound virtual machine 404, volume leveller 406
With it is at least one in balanced device 408) parameter, by calculate when time determined by adjustment unit parameter value with it is last
The weighted sum of smoothed parameter value when the secondary parameter determined by adjustment unit 300 come to carrying out smoothly.
Timeconstantτ can be the specific requirement based on application and/or the implementation based on audio improvement device 400
Fixed value.It can also be based on audio types, be based especially on different from an audio types to another audio types
Translation type --- such as from music to voice and from voice to music --- and adaptively change.
Example (further details may be referred to the 5th part) is used as using balanced device.Balanced device is suitable for application to music
Content and be not suitable for being applied to voice content.Therefore, it is smooth in order to be carried out to the rank of equilibrium, when audio signal is changed from music
During to voice, time constant can be with relatively small, quickly to apply smaller balanced rank to voice content.The opposing party
Face, can phase for the time constant of the conversion from voice to music in order to avoid producing audible distortion at transfer point
To larger.
In order to estimate translation type (for example, from voice to music, or from music to voice), content can be directly used
Classification results.That is, audio content is categorized into or music or voice make it obtain translation type straight from the shoulder.In order to
The conversion is estimated in a more continuous fashion, can also depend on estimated not smoothed balanced rank, rather than directly
The hard decision of comparing audio type.Overall thought is:If not smoothed balanced rank is increased, its represent from voice to
The conversion of music (or more like music);Otherwise, itself it is more likely that from music to voice (or more like voice) conversion.Pass through
Different translation types is distinguished, time constant can be correspondingly set, an example is:
Wherein τ (t) is the time constant changed over time depending on content, and τ 1 and τ 2 are two preset time constants
Value, generally meets τ 1>τ2.Intuitively, above function representation carries out relatively slow conversion when balanced rank increase, and when equal
Weighing apparatus rank carries out relatively fast conversion when reducing, but the application not limited to this.In addition, parameter is not limited to balanced rank, and can
To be other specification.That is, can with configuration parameter smooth unit 814 so that the weight that must be used to calculating weighted sum be based on by
The increase trend of parameter determined by adjustment unit 300 reduces trend and adaptively changed.
The conversion of 1.6 audio types
Reference picture 9 and Figure 10, will be described as avoiding the drastically change of audio types, and therefore avoid audio from improving dress
Another scheme jumpy for the parameter put.
As shown in figure 9, audio processing equipment 100 can also include timer 916, connect for measuring audio classifiers 200
The duration of the continuous same new audio types of output, wherein, adjustment unit 300 may be configured to be continuing with current audio
Type, untill the length of the duration of new audio types reaches threshold value.
In other words, the observation period as shown in Figure 10 (or maintaining the phase) is introduced.Use the observation period (length with the duration
The threshold value of degree is corresponding), the change of audio types is further monitored in one continuous time, to confirm whether audio types are true
It is real to have changed, new audio types then could be actually used in adjustment unit 300.
As shown in Figure 10, arrow (1) shows that current state is that the results of type A and audio classifiers 200 does not become
The situation of change.
If the result that current state is type A and audio classifiers 200 becomes type B, timer 916 starts to count
When, or as shown in Figure 10, processing proceeds to the observation period (arrow (2)), and sets the initial value for staying and counting cnt, its table
Show observation period length (being equal to threshold value).
Then, if continuously output type B, cnt continuously reduce (arrow (3)) until cnt to audio classifiers 200
Equal to 0 (that is, the length of new type B duration reaches threshold value), then adjustment unit 300 can use new audio
Type B (arrow (4)), or in other words, up to now just it is considered that audio types really become and turn to type B.
Otherwise, if before cnt is changed into 0 (before the length of duration reaches threshold value), the output of audio classifiers
200 change back to original type A, then the observation period terminates, and adjustment unit 300 still uses original type A (arrows
(5))。
Can be similar with above-mentioned processing to type A change from type B.
In the above process, threshold value (or staying counting) can be set based on application demand.The threshold value can be pre-
It is defined as fixed value.The threshold value can also be set adaptively.In a modification, for different from an audio types
It is different to the conversion pair of another audio types, the threshold value.For example, when changing to type B from type A, the threshold value can be
First value;And when changing to type A from type B, the threshold value can be second value.
In another modification, staying counting (threshold value) can be negatively correlated with the confidence value of new audio types.It is overall
Thought is:If confidence level shows to obscure (for example, when confidence value is only 0.5 or so) between two types, observe
Phase needs to grow;Otherwise, the observation period can be with relatively short.According to this guilding principle, can be set by below equation exemplary
Stay counting:
HangCnt=C | 0.5-Conf |+D
Wherein, HangCnt is lingering period or threshold value, and C and D are two parameters that can be set based on application demand,
Usual C is negative value and D is positive value.
Incidentally, above timer 916 (and therefore above-mentioned conversion process) is described as setting as audio frequency process
A standby part is still in the outside of audio classifiers 200.In some other embodiments, as being retouched in trifle 7.3
State, timer 916 can be considered as a part for audio classifiers 200.
The combination of 1.7 embodiments and application scenarios
All the above embodiment and its modification can be arbitrarily combined to realize, and in different portions with it
Point/embodiment in be previously mentioned but with same or similar function any component can be used as same component or individually
Component realize.
Especially, when embodiment and its modification is described above, eliminate with embodiment above or
The component of the similar reference of the reference of the component being had been described above in person's modification, and only describe different components.
In fact, these different components can merge with other embodiment or the component of modification, can also form alone
Single solution.Can mutually it be harmonious for example, referring to any two described by Fig. 1 to Figure 10 or more solutions
And.As most complete solution, audio processing equipment can include audio content grader 202 and audio context is classified
Both devices 204, and type smooth unit 712, parameter smoothing unit 814 and timer 916.
As previously mentioned, audio improve device 400 can include dialogue booster 402, surround sound virtual machine 404,
Volume leveller 406 and balanced device 408.Audio processing equipment 100 can include any of which or more, and
Suitable for their adjustment unit 300.When being related to multiple audios improvement devices 400, adjustment unit 300 can be considered as including specially
Improve multiple subelement 300A to 300D (Figure 15, Figure 18, Figure 20 and Figure 22) of device 400 for corresponding audio, or still
It is considered as a united adjustment unit.When being exclusively used in audio and improving device, adjustment unit 300 together with audio classifiers 200,
And other possible components can be considered as the controller of special audio improvement device, it will be in ensuing part 2 extremely
It is discussed in detail in 5th part.
In addition, audio, which improves device 400, is not limited to already mentioned example, but can change including any other audio
Kind device.
In addition, any solution discussed or its any combinations can also be with the other parts of present disclosure
Described in or implied embodiment combination.Especially, the audio that will be discussed in the 6th part and the 7th part point
The embodiment of class device can be used in audio processing equipment.
1.8 audio-frequency processing method
During the audio processing equipment in describing embodiment of above, it is clear that also disclose that some processes and side
Method.Hereinafter, the summary of these methods is provided in the case of the details for not repeating to have discussed, it should be noted that
It is that, although disclosing method during audio processing equipment is described, these methods not necessarily use described group
Part is not necessarily performed by these components.For example, the embodiment of audio processing equipment can be partially or fully
Realized with hardware and/or firmware, and following audio-frequency processing methods can be realized fully by computer executable program,
Although these methods can also use the hardware and/or firmware of audio processing equipment.
These methods are described hereinafter with reference to Figure 11 to Figure 14.The stream attribute corresponding to audio signal is note that, works as reality
When repeatedly carry out various operations when realizing methods described, and different operations is not necessarily directed to same audio fragment.
In embodiment as shown in figure 11, there is provided audio-frequency processing method.First, it is pending audio signal is real-time
Ground is categorized at least one audio types (operation 1102)., can be continuous based on the confidence value of at least one audio types
At least one parameter (operation 1104) improved for audio of ground adjustment.Audio improve can be dialogue enhancing (operation 1106),
Surround sound virtual (operation 1108), volume smoothing (1110) and/or equilibrium (operation 1112).Accordingly, at least one parameter
It can include being used for talking with enhancing processing, surround sound is virtually handled, at least one place in volume smoothing process and equilibrium treatment
At least one parameter of reason.
Here, mean audio types (so as to the parameter) by according to the specific of audio signal " in real time " and " continuously "
Content and change in real time, and " continuously " still mean that adjustment is the continuous adjustment based on confidence value, rather than mutation
Or discrete adjustment.
Audio types can include content type and/or context type.Correspondingly, adjustment operation 1104 can be configured
Adjusted into the confidence value based at least one content type and the confidence value of at least one context type at least one
Parameter.Content type can also include at least one content type in short-term music, voice, background sound and noise.Up and down
Literary type can also include at least one context type in long-term music, film class media, game and VoIP.
It can also propose other context type schemes, for example, it is related to VoIP upper including VoIP and non-VoIP
Hereafter type, and the audio quality type including high quality audio or low quality audio.
Short-term music can also be broken into further various subtypes according to different standards.Depending on dominating depositing for source
, short-term music can include without leading source music and have the music in leading source.In addition, short-term music can be included at least
One type based on style, either at least one type or at least one rhythm based on music, speed based on musical instrument
Degree, tone color and/or any other musical features and the music type classified.
When not only identify content type but also identify context type when, can by the context type residing for content type come
Determine the importance of content type.That is, the context type depending on audio content, the sound to different context types
Content type in frequency signal distributes different weights.More generally, an audio types can influence another audio types,
Or an audio types can be the premise of another audio types.Therefore, adjustment operation 1104 can be configured as basis
The confidence value of another audio types changes the weight of an audio types.
When audio signal is classified into multiple audio types (namely for same audio fragment) simultaneously, in order to adjust
To improve the audio fragment, adjustment operation 1104 can contemplate some or all in identified audio types whole parameter.
For example, adjustment operation 1104 can be configured as based on the importance of at least one audio types come at least one audio types
Confidence value be weighted.Or adjustment operation 1104 may be configured to by the confidence value pair based on audio types
It is weighted to consider at least some audio types in audio types.In the case of special, adjustment operation 1104 can be with
It is configured as considering at least one leading audio types based on confidence value.
In order to avoid the drastically change of result, Smooth scheme can be introduced.
Adjusted parameter value can be carried out smooth (operation 1214 in Figure 12).For example, operated when secondary by adjustment
1104 parameter values determined may alternatively be when the secondary parameter value determined by adjustment operation and last smoothed parameter
The weighted sum of value.Therefore, by the smooth operation of iteration, smooth on a timeline parameter value.
For calculate weighted sum weight can the audio types based on audio signal, or based on different from a sound
Frequency type and adaptively changes to the conversion pair of another audio types.Or for calculate weighted sum weight be based on by
The increase trend for the parameter value that adjustment operation determines reduces trend adaptively to change.
Another Smooth scheme is shown in Figure 13.That is, this method can also include:For each audio class
Type, by calculating current actual degree of belief value and the weighted sum of last smoothed confidence value, to working as secondary audio program
The confidence value of signal is carried out smooth (operation 1303).With parameter smoothing operation 1214 similarly, for calculating the power of weighted sum
Weight can the confidence value based on the audio types of audio signal, or based on different from an audio types to another sound
The conversion pair of frequency type, and adaptively change.
Even if another Smooth scheme is the exporting change for audio classification operation 1102 but delay is from sound
Buffering of the frequency type to the conversion of another audio types.That is, adjustment operation 1104 does not use new sound immediately
Frequency type, but the stabilization of the output of stand by tone frequency division generic operation 1102.
Specifically, this method can include:The duration that same new audio types are continuously exported to sort operation enters
Row measurement (operation 1403 in Figure 14), wherein, adjustment operation 1104 is configured as being continuing with (the operation of current audio types
" N " and operation in 14035 are 11041) until the length of the duration of new audio types reaches threshold value (in operation 14035
" Y " and operation 11042).Specifically, adjusted when the audio types from audio classification operation 1102 export relative to audio frequency parameter
During present video type change used in whole operation 1104 (operation 14031 in " Y "), then timing starts (operation
14032).If audio classification operation 1102 continues to output the new audio types, that is to say, that if in operation 14031
Judgement continues as " Y ", then timing continues (operation 14032).Finally reach threshold value (behaviour when the duration of the new audio types
Make " Y " in 14035) when, adjustment operation 1104 is using the new audio types (operation 11042), and timing resets (operation
14034), for being prepared for the conversion of audio types next time.In (" N " in operation 14035), adjustment before reaching threshold value
Operation 1104 is continuing with current audio types (operation 11041).
Here, timing can be realized by the mechanism of timer and (be counted up or counted downwards).If in timing
After starting but before threshold value is reached, the output of audio classification operation 1104 changes back to institute in current adjustment operation 1104
The present video type used, then it should be considered as not relative to the change of the present video type used in adjustment operation 1104
Change (" N " in operation 14031).But current classification results (the present video piece to be sorted corresponded in audio signal
Section) relative to the previous output (the previous audio fragment to be sorted corresponded in audio signal) of audio classification operation 1102
(" Y " in operation 14033) is changed, therefore, timing resets (operation 14034), (operation 14031 when change next time
In " Y ") start timing.Certainly, if the classification results of audio classification operation 1102 did not both adjust relative to audio frequency parameter
Present video Change of types (" N " in operation 14031) used in operation 1104, also not relative to previous Classification Change
(" N " in operation 14033), then it represents that audio classification is in the state of stabilization and is continuing with current audio types.
Threshold value used herein above can also be directed to the different conversions from an audio types to another audio types
Pair and it is different because when state is not very stable, may generally be more desirable to audio improve device be in its default conditions without
It is to be in other states.On the other hand, if the confidence value of the new audio types is of a relatively high, it is transformed into new audio
Type is safer.Therefore, the threshold value can be negatively correlated with the confidence value of new audio types.Confidence level is higher, then threshold value is got over
It is low, it is meant that audio types can quickly be transformed into new audio types.
With the embodiment of audio processing equipment similarly, on the one hand, the embodiment of audio-frequency processing method and embodiment party
Any combinations of the modification of formula are all feasible;On the other hand, the modification of the embodiment and embodiment of audio-frequency processing method
Each aspect also can be single solution.Especially, in all audio-frequency processing methods, such as the can be used
The audio frequency classification method discussed in 6 parts and the 7th part.
Part 2:Talk with booster controller and control method
The example that audio improves device is dialogue booster (DE), and it is intended to the audio for continuously monitoring playback, inspection
Survey the presence of dialogue, and strengthen the dialogue with improve its definition and intelligibility (make the dialogue be easier to be heard and by
Understand), especially for the older of hearing loss.In addition to detecting whether to have dialogue, if talking with and existing also
Detection most important frequency for intelligibility, then correspondingly strengthens the frequency (using dynamic spectrum releveling).
The H.Muesch A2 of Publication No. WO 2008/106036 " Speech Enhancement in Entertainment
An example of dialogue Enhancement Method is given in Audio ", the full content of the document is incorporated by reference into herein.
Enable the common manual for talking with booster is configured generally for the content of film class media, and for music content
Then disable, because dialogue enhancing may erroneous trigger too much to music signal.
In the case where audio types information can be obtained, can be adjusted based on the confidence value of the audio types identified
Talk with the rank and other specification of enhancing.As the audio processing equipment and the specific example of method discussed before, dialogue enhancing
Device can use any combination of all embodiments and these embodiments discussed in part 1.Specifically, controlling
In the case of system dialogue booster, the audio classifiers 200 of audio processing equipment 100 as shown in Figures 1 to 10 and adjustment are single
Member 300 can form dialogue booster controller 1500 as shown in figure 15.In this embodiment, because adjustment unit is
It is exclusively used in talking with booster, so it can be referred to as 300A.Also, as front portion is discussed, audio classifiers 200
Can be including at least one in audio content grader 202 and audio context grader 204, and talk with booster control
Device 1500 can also include at least one in type smooth unit 712, parameter smoothing unit 814 and timer 916.
Therefore, in this section, these contents that will do not had been described above in front portion repeatedly, and only provide some
Specific example.
For talking with booster, adjustable parameter includes but is not limited to:Talk with the rank strengthened, background sound level and be used for
Determine the threshold value of frequency band to be reinforced.Referring to the H.Muesch A2 of Publication No. WO 2008/106036 " Speech
Enhancement in Entertainment Audio ", entire contents are incorporated by reference into herein.
The rank of 2.1 dialogue enhancings
When being related to the rank of dialogue enhancing, adjustment unit 300A can be configured as making the dialogue of dialogue booster to strengthen
Rank and voice confidence value positive correlation.Additionally or alternatively, the rank can be put with other guide type
Certainty value is negatively correlated.Therefore, talk with enhancing rank can be configured to it is proportional to the confidence level of voice (linear or non-
Linear), to cause in non-speech audio such as music and background sound (audio), dialogue enhancing is less effective.
For context type, adjustment unit 300A may be configured to make the rank of the dialogue enhancing of dialogue booster with
The confidence value positive correlation of film class media and/or VoIP, and/or make the rank and length of the dialogue enhancing of dialogue booster
Phase music and/or the confidence value of game are negatively correlated.For example, the rank of dialogue enhancing can be set to and film class media
Confidence value is proportional (linear or nonlinear).When the confidence value of film class media is 0 (for example, in music
In appearance), the rank for talking with enhancing is also 0, and it strengthens equivalent to dialogue is disabled.
As described by front portion, it can combine and consider content type and context type.
2.2 threshold value for determining frequency band to be reinforced
During the work of dialogue booster, exist for each frequency band and be used to determine frequency band threshold whether to be enhanced
Value (being typically energy threshold or loudness threshold), that is to say, that will be to corresponding energy/loudness threshold the above frequency band
Strengthened.In order to adjust threshold value, adjustment unit 300A can be configured as making the threshold value and short-term music and/or noise and/
Or the confidence value positive correlation of background sound, and/or make the confidence value of threshold value and voice negatively correlated.If for example, language
Sound confidence level height (meaning more reliable speech detection), then can reduce threshold value, to enable more frequency bands to be enhanced;
On the other hand, can be with rise threshold to cause less frequency band is enhanced (therefore to have less when the confidence value of music is high
Distortion).
The adjustment of 2.3 pairs of background sound levels
As shown in figure 15, another component talked with booster is minimum tracing unit 4022, and it is used to estimate sound
(background sound level is used for SNR (signal to noise ratio) estimation, and the frequency band threshold being previously mentioned in trifle 2.2 to background sound level in frequency signal
Estimation).It is also based on the confidence value of audio content type to adjust.For example, if the confidence level height of voice, minimum
Background sound level can more assuredly be set to current minimum by amount tracing unit.If the confidence level of music is high, background
Sound level is set to higher than current minimum, or in another way, background sound level is configured to current minimum
With the weighted average of the energy of present frame, wherein current minimum is subjected to big weight.If the confidence level of noise and background
Height, then background sound level can be set more much higher than the value of current minimum, or in another way, background sound level is set
Determine into current minimum and the weighted average of the energy of present frame, wherein current minimum is subjected to small weight.
Therefore, adjustment unit 300A may be configured to apply one to the background sound level estimated by minimum tracing unit
Adjustment amount, wherein, adjustment unit is additionally configured to make putting for the adjustment amount and short-term music and/or noise and/or background sound
Certainty value positive correlation, and/or make the confidence value of the adjustment amount and voice negatively correlated.In modification, adjustment unit 300A
It can be configured as making the adjustment amount and the confidence value ratio of noise and/or background sound and the more positive correlation of short-term music.
The combination of 2.4 embodiments and application scenarios
Similar with part 1, the modification of all the above embodiment and embodiment can be arbitrary group with its
Close to realize, and still any group with same or similar function being previously mentioned in different part/embodiments
Part can be used as same component or single component to realize.
For example, it can be combined with each other in trifle 2.1 to any two described in trifle 2.3 or more solution.
And these combinations can also be with times that be described in part 1 and hint and will being described in other parts later
What embodiment is combined.Especially, many formula, which actually can be applied to every kind of audio, improves device or method, but not
One is scheduled in each part of present disclosure and quotes or discuss these formula.In this case, the disclosure is each
Individual part can mutually be referred to, and the specific formulation discussed in a part is applied in another part, simply needs root
According to the specific requirement of concrete application, relevant parameter, coefficient, power (index) and weight are suitably adjusted.
2.5 dialogue booster control methods
It is similar with part 1, during the dialogue enhancing controller in describing embodiment above, it is clear that also disclose
Some processes and method.Hereinafter, the summary of these methods is provided in the case of the details for not repeating to have discussed.
First, the embodiment for the audio-frequency processing method discussed in part 1 can be used for talking with booster, dialogue
The parameter of booster is one of target to be adjusted by audio-frequency processing method.According to this point, audio-frequency processing method is also dialogue
Booster control method.
In this trifle, the those aspects specific to the control for talking with booster will be only discussed.One on control method
As aspect, may be referred to part 1.
According to an embodiment, audio-frequency processing method can also include dialogue enhancing processing, and adjust operation 1104
Including making the rank that dialogue strengthens and film class media and/or VoIP confidence value positive correlation, and/or strengthen dialogue
Rank and long-term music and/or game confidence value it is negatively correlated.That is, dialogue enhancing is mainly for context type
For the audio signal of film class media or VoIP.
More specifically, adjustment operation 1104 can include the rank of dialogue enhancing and the confidence of voice for making dialogue booster
Angle value positive correlation.
The application can also adjust frequency band to be reinforced in dialogue enhancing processing.As shown in figure 16, can according to the application
Threshold value (being typically energy or loudness), the threshold are adjusted with the confidence value based on the audio types (operation 1602) identified
It is worth for determining whether corresponding frequency band is to be enhanced.Then, in booster is talked with, based on the threshold value adjusted, selection (behaviour
Make 1604) and strengthen frequency band more than (operation 1606) respective threshold.
Especially, adjustment operation 1104 can include making putting for threshold value and short-term music and/or noise and/or background sound
Certainty value positive correlation, and/or make the confidence value of threshold value and voice negatively correlated.
Audio-frequency processing method (especially talking with enhancing processing) generally also includes the background sound level in estimation audio signal, leads to
The processing is often realized by minimum tracing unit 4022, minimum tracing unit 4022 is realized in booster 402 is talked with, and
And for SNR estimations or frequency band threshold estimation.The application can be also used for adjusting background sound level.In such circumstances, such as
Shown in Figure 17, after background sound level is estimated (operation 1702), the confidence values of audio types is primarily based on to adjust background sound
Level (operation 1704), background sound level is then used for SNR estimations and/or frequency band threshold estimation (operation 1706).Especially, adjust
Operation 1104 may be configured to apply an adjustment amount to estimated background sound level, wherein adjustment operation 1104 can also quilt
Be configured to make the adjustment amount and short-term music and/or noise and/or background sound positive correlation, and/or make the adjustment amount with
The confidence value of voice is negatively correlated.
More specifically, adjustment operation 1104 may be configured to make the adjustment amount and noise and/or the confidence value of background
Than with the more positive correlation of short-term music.
With the embodiment of audio processing equipment similarly, on the one hand, the embodiment of audio-frequency processing method and embodiment party
Any combinations of the modification of formula are all feasible;On the other hand, the change of the embodiment and embodiment of audio-frequency processing method
Each aspect of type can be single solution.In addition, any two described in this trifle or more solves
Scheme can be combined with each other, and these combination can also with part 1 and in the other parts that will be described later
Described in and any embodiment for being implied be combined.
Third portion:Surround sound virtual machine controller and control method
Surround sound virtual machine makes it possible to render in PC boombox or earphone (such as more around acoustical signal
Sound channel 5.1 and multichannel 7.1).That is, by for example built-in portable computer loudspeaker of stereo or earphone,
Surround sound virtual machine is that user generates virtual surrounding sound effect and provides the experience of film.In surround sound virtual machine generally
Come from and multi-channel sound using head related transfer function (Head Related Transfer Function, HRTF) to simulate
Ripple of the sound of the associated various loudspeaker positions of frequency signal at ear is extremely.
Although existing surround sound virtual machine works well on earphone, on boombox, surround sound is empty
Intend device for different contents it is different work.Generally, film class media content enables surround sound virtual machine for loudspeaker,
And music is not done that, because music may sound too thin.
Because it is simultaneously raw that the identical parameters of surround sound virtual machine can not be directed to both film class media content and music content
Into good acoustic image, so needing to be based on content more accurately adjustment parameter.Use obtainable audio types information, especially sound
Happy confidence value and voice confidence value, and some other content-type informations and contextual information, this Shen can be used
Please complete the work.
With part 2 similarly, as the audio processing equipment and the specific example of method discussed in part 1, ring
All embodiments for being discussed in part 1 can be used around sound virtual machine 404 and in these disclosed in part 1
Any combinations of embodiment.Especially, in the case where controlling surround sound virtual machine 404, audio as shown in Figures 1 to 10
The audio classifiers 200 and adjustment unit 300 of processing equipment 100 can form surround sound virtual machine controller as shown in figure 18
1800.In this embodiment, because adjustment unit is exclusively used in surround sound virtual machine 404, it can be referred to as
300B.Also, similar with part 2, audio classifiers 200 can include audio content grader 202 and audio context point
It is at least one in class device 204, and surround sound virtual machine controller 1800 can also include type smooth unit 712, parameter
It is at least one in smooth unit 814 and timer 916.
Therefore, in this section, by these contents that repeatedly part 1 does not have been described above, and some tools are only provided
Body example.
For surround sound virtual machine, adjustable parameter includes but is not limited to:Starting frequency for surround sound virtual machine 404
Rate and surround sound enhancing amount.
3.1 surround sound enhancing amounts
When being related to surround sound enhancing amount, adjustment unit 300B may be configured to make surrounding for surround sound virtual machine 404
The confidence value positive correlation of sound enhancing amount and noise and/or background and/or voice, and/or make surround sound enhancing amount with it is short
The confidence value of phase music is negatively correlated.
Specifically, in order to change surround sound virtual machine 404 so that music (content type) sounds acceptable, adjustment
Unit 300B example implementation can adjust surround sound enhancing amount based on short-term music confidence value, such as:
SB∝(1–Confmusic) (5)
Wherein, SB represents surround sound enhancing amount, ConfmusicIt is the confidence value of short-term music.
It helps to weaken surround sound enhancing for music, prevents it from sounding fuzzy.
Similarly, voice confidence value can also be utilized, such as:
SB∝(1–Confmusic)*Confspeech α (6)
Wherein, ConfspeechIt is the confidence value of voice, α is the weight coefficient of exponential form, and its scope can be 1 to 2.
The formula represents that surround sound enhancing amount is only high to pure voice (high voice confidence level and low music confidence level).
Or it can only consider the confidence value of voice:
SB∝Confspeech (7)
Various modifications can be designed in a similar way.Especially, for noise or background sound, can construct and public affairs
Formula (5) is to the similar formula of formula (7).Furthermore, it is possible to combine the effect for considering four content types with any combinations.At this
In the case of sample, noise and background sound are ambient sounds, it is possible to more safely with big enhancing amount;Assuming that speaker
It is usually located at before screen, so voice there can be medium enhancing amount;And music uses less enhancing amount.Therefore,
Adjustment unit 300B may be configured to make surround sound enhancing amount with noise and/or the confidence value of background than the content with voice
Type more positive correlation.
Assuming that predefining desired enhancing amount (that is, equivalent to weight) for each content type, can also apply another
One alternative formula:
Wherein,It is the enhancing amount of estimation, is expectation/predefined enhancing of content type with target α under content type
Measure (weight), with target Conf under content type be content type confidence value (bkg represents background sound,
That is background sound).Depend on the circumstances, amusicCan (but not necessarily) 0 is set to, representing will for absolute music (content type)
Disable surround sound virtual machine 404.
From another perspective, the expectation that target α under content type is content type of carrying in formula (8)/predetermined
Justice enhancing amount, and the confidence value of corresponding contents type by the confidence value of all content types identified and remove
Business can be considered as corresponding contents type predefined/desired enhancing amount normalized weight.That is, adjustment is single
First 300B may be configured to by being weighted based on confidence value to the predefined enhancing amount of multiple content types, to examine
Consider at least some content types in multiple content types.
For context type, adjustment unit 300B may be configured to the surround sound for making surround sound virtual machine 404
Enhancing amount and film class media and/or the confidence value positive correlation of game, and/or make surround sound enhancing amount and long-term music
And/or VoIP confidence value is negatively correlated.It is then possible to construct and formula (5) to the similar formula of formula (8).
As special example, surround sound virtual machine 404 can be enabled to pure film class media and/or game, but it is right
Music and/or VoIP disabling surround sounds virtual machine 404.Meanwhile it can be arranged differently than surrounding for film class media and game
The enhancing amount of sound virtual machine 404.Film class media use higher enhancing amount, and play and use less enhancing amount.Therefore, adjust
Whole unit 300B may be configured to confidence value ratio and the game more positive correlation for making surround sound enhancing amount and film class media.
Similar with content type, the enhancing amount of audio signal can also be set to adding for the confidence value of context type
Weight average value:
Wherein,It is the enhancing amount of estimation, is the expectation/predefined of context type with target α under context type
Enhancing amount (weight), it is the confidence value of context type with target Conf under context type.Depend on the circumstances, aMUSICWith
aVOIPCan (but not necessarily) 0 is set to, represent virtual for absolute music (content type) and/or pure VoIP disabling surround sounds
Device 404.
Equally, it is similar with content type, the phase that target α under context type is context type of carrying in formula (9)
Prestige/predefined enhancing amount, and the confidence value of respective contexts type is by the confidence of all context types identified
Business that is angle value and being removed can be considered as the normalized weight of predefined/desired enhancing amount of respective contexts type.
That is, adjustment unit 300B may be configured to by based on predefined increasing of the confidence value to multiple context types
Strong amount is weighted, to consider at least some context types in multiple context types.
3.2 initial frequency
Other specification, such as initial frequency can also be changed in surround sound virtual machine.Generally, the high frequency in audio signal
Component by space more suitable for being rendered.For example, in music, if carrying out space to render to bass with more rings
Around sound effective value, then it will sound queer.Therefore, for specific audio signal, surround sound virtual machine is it needs to be determined that frequency
Threshold value, space is carried out to the component more than threshold value and renders and keeps the component below the threshold value.The frequency threshold is exactly to originate
Frequency.
, can be to the initial frequency of music content increase surround sound virtual machine so that right according to presently filed embodiment
More basses can be kept in music signal.Therefore, adjustment unit 300B can be configured as making rising for surround sound virtual machine
The confidence value positive correlation of beginning frequency and short-term music.
The combination of 3.3 embodiments and application scenarios
Similar with part 1, the modification of all the above embodiment and embodiment can be combined with it
To realize, and it is previously mentioned in different part/embodiments but has any component of same or similar function can
To be realized as same component or single component.
For example, it can be combined with each other in trifle 3.1 to any two described in trifle 3.2 or more solution.
And these combinations can also with it is described in part 1, part 2 and being implied and in other parts later
Any embodiment that will be described is combined.
3.4 surround sound virtual machine control methods
It is similar with part 1, during the surround sound virtual machine controller in describing embodiment above, it is clear that
Disclose some processes and method.Hereinafter, the general of these methods is provided in the case of the details for not repeating to have discussed
Will.
First, the embodiment for the audio-frequency processing method discussed in part 1 can be used for surround sound virtual machine, ring
Parameter around sound virtual machine is one of target to be adjusted by audio-frequency processing method.According to this point, audio-frequency processing method is also
Surround sound virtual machine control method.
In this trifle, only discussion is exclusively used in controlling to the those aspects of surround sound virtual machine.On control method
General aspect, it may be referred to part 1.
According to an embodiment, audio-frequency processing method can also virtually be handled including surround sound, and adjust operation
The 1104 surround sound enhancing amounts that may be configured to make surround sound virtually handle and noise and/or background and/or the confidence of voice
Angle value positive correlation, and/or make the confidence value of surround sound enhancing amount and short-term music negatively correlated.
Specifically, adjustment operation 1104 may be configured to make surround sound enhancing amount and noise and/or the confidence level of background
Value is than the content type more positive correlation with voice.
Alternatively, or additionally, the confidence value of context is also based on to adjust surround sound enhancing amount.Specifically
Ground, adjustment operation 1104 may be configured to the surround sound enhancing amount for making surround sound virtually handle and film class media and/or trip
The confidence value positive correlation of play, and/or make surround sound enhancing amount and long-term music and/or VoIP confidence value negative
Close.
More specifically, adjustment operation 1104 may be configured to the confidence value for making surround sound enhancing amount and film class media
Than with play more positive correlation.
Another parameter to be adjusted is the initial frequency that surround sound is virtually handled.As shown in figure 19, it is primarily based on audio
The confidence value of type adjusts initial frequency (operation 1902), is then wrapped around more than sound virtual machine processing initial frequency those
Audio component (operation 1904).Specifically, adjustment operation 1104 may be configured to the initial frequency for making surround sound virtually handle
With the confidence value positive correlation of short-term music.
It is similar with the embodiment of audio processing equipment, on the one hand, the embodiment and embodiment of audio-frequency processing method
Any combinations of modification be all feasible;On the other hand, the modification of the embodiment and embodiment of audio-frequency processing method
Each aspect can also be single solution.In addition, any two or more solutions described in this trifle
Scheme can be combined with each other, and these combinations can also with it is described in the other parts in present disclosure and implied
Any embodiment be combined.
4th part:Volume leveller controller and control method
The volume of the volume of different audio-sources or the different fragments in same audio-source changes very big sometimes.Because user
Volume must not be infrequently adjusted, so pretty troublesome.Volume leveller (VL) is intended to the volume progress to the audio content of playback
Regulation, and volume is consistent on a timeline based on target loudness value.In A.J.Seefeldt et al. publication number
For US2009/0097676A1 " Calculating and Adjusting the Perceived Loudness and/or
The Perceived Spectral Balance of an Audio Signal ", B.G.Grockett et al. Publication No.
WO2007/127023A1 " Audio Gain Control Using Specific-Loudness-Based Auditory
Event Detection " and A.Seefeldt et al. Publication No. WO2009/011827A1 " Audio
Example is given in Processing Using Auditory Scene Analysis and Spectral Skewness "
Volume leveller.The full content of these three documents is incorporated by reference into herein.
Volume leveller continuously measures the loudness of audio signal in some way, then changes the letter with amount of gain
Number, the amount of gain is the zoom factor for changing the loudness of audio signal, and is typically measured loudness, desired mesh
Mark the function of loudness and some other factors.In the case where that should reach target loudness and keep the potential condition of dynamic range again, need
Multiple factors are considered to estimate suitable gain.Volume leveller generally includes some daughter elements, such as dynamic gain control
(AGC), auditory events detection, dynamic range control (DRC).
Commonly used control signal controls " gain " of audio signal in volume leveller.For example, control signal can
Be the amplitude of audio signal drawn by pure signal analysis change instruction.Control signal can also pass through psychologic acoustics
Analysis such as auditory scene analysis or the auditory events based on specific loudness are detected to indicate whether to occur new auditory events
Auditory events instruction.Gain control is carried out using such control signal in volume leveller, for example, by ensuring
Gain is nearly constant in auditory events, and by limiting most of change in gain near event boundaries, to reduce
The possible audible distortion as caused by the quick change of the gain in audio signal.
But show that the common method of control signal can not be to the auditory events and non-information (interference) of informedness
Auditory events make a distinction.Here, informative auditory event is represented comprising significant information and may more closed by user
The audio event of note, such as dialogue and music, rather than the signal of informedness do not include the information significant to user, such as VoIP
In noise.As a result, the signal of non-information may also be applied in big gain and be raised to close to target loudness.
This will be very unpleasant in some applications.For example, in voip phone, after being handled by volume leveller, go out
The noise in call interval is often raised to loud volume now.This is undesirable for user.
In order to solve the problem at least in part, the application proposes to control based on the embodiment discussed in part 1
Volume leveller processed.
It is similar with part 2 and third portion, as the specific of the audio processing equipment and method discussed in part 1
Example, volume leveller 406 can be using all embodiments discussed in part 1 and disclosed in part 1
These embodiments any combinations.Especially, in the case where controlling volume leveller 406, as shown in Figures 1 to 10
The audio classifiers 200 and adjustment unit 300 of audio processing equipment 100 can form volume leveller 406 as shown in figure 20
Controller 2000.In this embodiment, because adjustment unit is exclusively used in volume leveller 406, it can be with
It is referred to as 300C.
That is, the disclosure based on part 1, volume leveller controller 2000 can include:Audio classification
Device 200, for continuously identifying the audio types (such as content type and/or context type) of audio signal;And adjustment
Unit 300C, volume leveller is adjusted in a continuous manner for the confidence value based on the audio types identified.It is similar
Ground, audio classifiers 200 can include audio content grader 202 and audio context grader 204 in it is at least one, and
And volume leveller controller 2000 can also include in type smooth unit 712, parameter smoothing unit 814 and timer 916
It is at least one.
Therefore, in this section, these contents that will do not had been described above in part 1 repeatedly, and only provide some
Specific example.
The different parameters of volume leveller 406 can be adaptively adjusted based on classification results.For example, by reducing non-letter
The gain of breath property signal, can adjust the parameter directly relevant with the scope of dynamic gain or dynamic gain.It can also adjust
Indication signal is the parameter of the degree of new appreciable audio event, and being then indirectly controlled dynamic gain, (gain will be
It is slowly varying in audio event, but may rapidly change in the boundary of two audio events).In this application, provide
Some embodiments of parameter regulation or volume leveller controlling mechanism.
4.1 informational content types and interference content type
As the above mentioned, relevantly with the control of volume leveller, audio content type can be classified as information
Property content type and interference content type.And adjustment unit 300C may be configured to the dynamic gain for making volume leveller
With the informational content type positive correlation of audio signal, and make the dynamic gain of volume leveller and the interference of audio signal
Content type is negatively correlated.
As an example, think noise be interfering (non-information) and by noise bring up to loud volume be order
People is unhappy, directly controls the parameter of dynamic gain or indicates that the parameter of new audio event can be set to put with noise
Certainty value (Confnoise) decreasing function it is proportional, such as:
GainControl∝1–Confnoise (10)
Here, for simplicity, represent to be formed with the gain control in volume leveller using symbol GainControl
All parameters closed, because the different implementations of volume leveller can use the different parameters name with different latent meanings
Claim.It can make expression briefly without losing its generality using single term GainControl.Substantially, these parameter phases are adjusted
When in applying linear or nonlinear weight to original gain.As an example, GainControl can be used directly to
Scalar gain, make it that gain is small if GainControl is small.As another specific example, in B.G.Grockett etc.
The Publication No. WO2007/127023A1 of people " Audio Gain Control Using Specific-Loudness-
Described in Based Auditory Event Detection " by using GainControl to scale event control signals
Gain is indirectly controlled, the full content of the document is incorporated by reference into herein.In this case, GainControl is worked as
Hour, the control of the gain to volume leveller is changed to prevent gain significant changes over time.When GainControl height
When, modification is controlled to enable the gain of leveller more freely to change.
Using described in formula (10) gain control (directly original gain is zoomed in and out, otherwise scaling thing
Part control signal), the dynamic gain of audio signal is related to noise confidence value (linear or nonlinear).If signal
It is the noise with confidence value, then due to the factor (1-Confnoise) and final gain will be small.In this way, avoid
Noise signal brings up to unpleasant loud volume.
As the exemplary variation of formula (10), if (such as in VoIP) does not feel emerging to background sound yet in the application
Interest, background sound can be similarly processed and also apply small gain to it.Control function can both consider the confidence of noise
Angle value (Confnoise) it is further contemplated that background confidence value (Confbkg), such as:
GainControl∝(1–Confnoise)·(1–Confbkg) (11)
In the equation above, because noise and background sound are all undesirable, GainControl comparably by
The influence of the confidence value of noise and the confidence value of background, and may be considered that noise and background sound have same power
Weight.Depend on the circumstances, noise and background sound there can be different weights.For example, can be to the confidence value and the back of the body of noise
The confidence value (either they difference) from 1 of scape sound provides different coefficients or different index (α and γ).Namely
Say, formula (11) can be rewritten as:
GainControl∝(1–Confnoise)α·(1–Confbkg)γ (12)
Or
GainControl∝(1–Confnoise α)·(1–Confbkg γ) (13)
Or adjustment unit 300C is configured to confidence value to consider at least one leading content type.
Such as:
GainControl∝1–max(Confnoise,Confbkg) (14)
Formula (11) (and its modification) and formula (14) are both represented to noise signal and background noise signal with small
Gain, and only when the confidence level of the confidence level and background of noise is both small (such as in voice signal and music signal)
When just keep volume leveller original working method, to cause GainControl close to 1.
Above example is to consider leading interference content type.Depend on the circumstances, adjustment unit 300C can also be by
It is configured to consider leading informational content type based on confidence value.In order to have more generality, adjustment unit 300C can be with
Be configured to consider at least one leading content type based on confidence value, regardless of whether the audio types identified whether
Be/include informedness audio types and/or interference audio types.
Another exemplary variation as formula (10), it is assumed that voice signal be most the content of informedness and need
Less modification is made to the acquiescence working method of volume leveller, control function can contemplate noise confidence value
(Confnoise) and voice confidence value (ConfspeechBoth), such as:
GainControl∝1–Confnoise·(1–Confspeech) (15)
Using the function, only to those with strong noise confidence level and with sound confidence level of speaking in a low voice (for example, pure noise)
Signal obtains small GainControl, and if voice confidence level is high, then GainControl will close to 1 (so that because
This maintains the original working method of volume leveller).More generally, its can be considered as can be according at least in another
Hold confidence value (such as the Conf of typespeech) change a content type (such as Confnoise) weight.More than
In formula (15), its can be considered as the confidence level of voice change noise confidence level weight coefficient (with formula 12 and public affairs
It is another weight that weight in formula 13, which is compared).In other words, in formula (10), ConfnoiseCoefficient can be considered as 1;And
In formula (15), some other audio types (such as voice, but not limited to this) will influence the important of the confidence value of noise
Property, it can be said that ConfnoiseWeight have modified by the confidence value of voice.In the linguistic context of present disclosure, term " power
Weight " should be interpreted as including this point.That is, the importance of its value of indicating, but be not necessarily normalized.Can
To refer to trifle 1.4.
From another perspective, it is similar with formula (12) and formula (13), confidence value can be applied in superior function
Add the weight of exponential form, to represent the priority of different audio signals (or importance), for example, formula (15) can be changed
Into:
GainControl∝1-Confnoise α·(1-Confspeech)γ (16)
Wherein, α and γ is two weights, if it is desired to have faster response for the modification of leveller parameter, then this two
Individual weight can be set smaller.
Freely the various control letters of different application can be may adapt to be formed to formula (16) by combinatorial formula (10)
Number.It can also merge in a similar way easily by the confidence value of other audio content types, such as music confidence value
Into control function.
It is used to adjust the parameter that expression signal is the degree of new perceptible audio event in GainControl, then
Being indirectly controlled dynamic gain, (gain will be slowly varying in audio event, but the boundary in two audio events can
Can quickly change) in the case of, it is believed that there is another between the confidence value and final dynamic gain of content type
Transmission function.
Content type in 4.2 different contexts
Audio content type, such as noise, background are considered in the control function of above formula (10) to formula (16)
The confidence value of sound, short-term music and voice, but the audio context in sound institute source is not accounted for, such as film class matchmaker
Body and VoIP.It is likely to require in different audio contexts and difference is carried out to same audio content type such as background sound
Processing.Background sound includes various sound, such as car engine, blast and applause.In VoIP, background signal is probably nothing
Meaning, but in film class media, background signal is probably important.This expression needs to identify above and below audio interested
Text and need to design different control functions for different audio context.
Therefore, adjustment unit 300C is configured to the context type of audio signal by audio signal
Appearance type is considered as informedness or interfering.For example, by considering noise confidence value and background confidence value, and area
Divide VoIP contexts and non-VoIP contexts, the control function dependent on audio context can be:
If audio context is VoIP
GainControl∝1-max(Confnoise,Confbkg)
Otherwise (17)
GainControl∝1-Confnoise
That is, in VoIP contexts, noise and background sound are considered as interfering content type;And non-
In VoIP contexts, background sound is considered as the content type of informedness.
As another example, consider the confidence value of voice, noise and background and distinguish VoIP and non-above and below VoIP
Text the control function dependent on audio context can be:
If audio context is VoIP
GainControl∝1-max(Confnoise,Confbkg)
Otherwise (18)
GainControl∝1-Confnoise·(1-Confspeech)
Here, voice is emphasised as the content type of informedness.
Assuming that in non-VoIP contexts, music is also important informational message, can be by the part 2 of formula (18)
Expand to:
GainControl∝1-Confnoise·(1-max(Confspeech,Confmusic)) (19)
In fact, each control function or its modification in control function (10) to control function (16) can apply to
Different/corresponding audio contexts.Therefore, substantial amounts of combination can be produced to form the control letter dependent on audio context
Number.
, can in addition to the differentiation in formula (17) and formula (18) and the VoIP contexts utilized and non-VoIP contexts
To utilize other audio contexts, such as film class media, long-term music and game, or low quality audio in a similar manner
And high quality audio.
4.3 context type
Context type can also be directly used in control volume leveller to avoid those unpleasant sound from (such as making an uproar
Sound) it is elevatedly too many.For example, VoIP confidence values can be used for manipulating volume leveller, volume leveller is set to be put in VoIP
It is less sensitive when reliability is high.
Especially, using VoIP confidence values ConfVOIP, the rank of volume leveller can be configured to and (1-
ConfVOIP) proportional.That is, volume leveller is almost deactivated in VoIP contents (when VoIP confidence values are high),
This is consistent with traditional manual setting (preset) that volume leveller is disabled for VoIP contexts.
Or the dynamic gain ranging of different contexts setting of audio signal can be directed to.Generally, VL (smooth by volume
Device) amount also adjusts the amount for the gain for putting on audio signal, and can be seen as another (non-linear) weight in gain.
In one embodiment, setting can be:
Table 1
Film class media | Long-term music | VOIP | Game | |
VL amounts | It is high | It is medium | Close (or minimum) | It is low |
Moreover, it is assumed that predefine desired VL amounts for each context type.For example, for film class media VL amounts
1 is set to, is 0 for VoIP, is 0.6 for music, and is 0.3 for game, but the application not limited to this.According to this
Example, if the scope that the scope of the dynamic gain of film class media is 100%, VoIP dynamic gain is 60%, with this
Analogize.If the classification of audio classifiers 200 is based on hard decision, the scope of dynamic gain can be directly disposed as above-mentioned
Example.If the classification of audio classifiers 200 is can be based on context type based on soft-decision, the scope of dynamic gain
Confidence value adjust.
Similarly, audio classifiers 200 can identify multiple context types, and adjustment unit from audio signal
300C may be configured to the confidence value of the plurality of context type by the importance based on the plurality of context type
It is weighted, to adjust the scope of dynamic gain.
Typically for context type, can also use here and formula (10) to the similar function of formula (16) ---
Content type in formula is replaced with into context type --- adaptively to set suitable VL amounts.In fact, table 1 reflects
The importance of different context types.
From another perspective, confidence value can be used for obtaining the normalized weight discussed in trifle 1.4.It is false
It is located in table 1 and predefines specific amount for each context type, then can applies the formula similar with formula (9).It is suitable
Just refer to, similar solution can also be applied to multiple content types and any other audio types.
The combination of 4.4 embodiments and application scenarios
Similar with part 1, the modification of all the above embodiment and embodiment can be arbitrary group with its
Close to realize, and still any group with same or similar function being previously mentioned in different part/embodiments
Part can be used as same component or single component to realize.For example, in trifle 4.1 to any two described in trifle 4.3
Individual or more solutions can be combined with each other.And these combination can also with part 1 to third portion and
Any embodiment described in other parts and being implied will be described later to be combined.
Figure 21 by by original short-term fragment (Figure 21 (A)), by do not change parameter conventional volume leveller handle
The short-term fragment (Figure 21 (C)) that short-term fragment (Figure 21 (B)) and the volume leveller proposed by the application are handled is compared,
Show the effect of volume leveller controller set forth herein.As can be seen that in the conventional sound as shown in Figure 21 (B)
Measure in leveller, the volume of noise (latter half of audio signal) is also enhanced, and this is unpleasant.By contrast,
In the new volume leveller as shown in Figure 21 (C), the volume of the live part of audio signal is enhanced without substantially carrying
The volume of strong noise, good experience is brought to audience.
4.5 volume leveller control methods
It is similar with part 1, during the volume leveller controller in describing embodiment above, it is clear that also public
Some processing and method are opened.Hereinafter, the summary of these methods is provided in the case of the details for not repeating to have discussed.
First, the embodiment for the audio-frequency processing method discussed in part 1 can be used for volume leveller, volume
The parameter of leveller is one of target to be adjusted by audio-frequency processing method.According to this point, audio-frequency processing method is also volume
Leveller control method.
In this trifle, only discussion is exclusively used in controlling to the those aspects of volume leveller.One on control method
As aspect, may be referred to part 1.
According to the application, there is provided volume leveller control method, including:The content class of audio signal is identified in real time
Type;And by making the dynamic gain of volume leveller and the informational content type positive correlation of audio signal, and make volume
The dynamic gain of leveller and the interference content type of audio signal are negatively correlated, come based on the content type identified with continuous
Mode adjust volume leveller.
Content type can include voice, short-term music, noise and background sound.Generally, noise is considered as interfering
Content type.
When adjusting the dynamic gain of volume leveller, can confidence value based on content type directly adjust, or
Person can be adjusted by the transmission function of the confidence value of content type.
As has been described, audio signal may be categorized into multiple audio types simultaneously.When being related to multiple content classes
During type, adjustment operation 1104 may be configured to the plurality of content type by the importance based on the plurality of content type
Confidence value is weighted, or by being weighted the influence of the plurality of content type based on confidence value, to consider this
At least some audio content types in multiple audio content types.Especially, adjustment operation 1104 is configured to
Confidence value considers at least one leading content type.When audio signal had not only included interference content type but also including information
Property content type when, adjustment operation be configured to confidence value to consider at least one leading interference content class
Type, and/or at least one leading informational content type is considered based on confidence value.
Different audio types may affect one another.Therefore, adjustment operation 1104 may be configured to using at least one
The confidence value of other guide type changes the weight of a content type.
As described in part 1, the confidence value of the audio types of audio signal can be carried out smooth.On
The detail with reference part 1 of smooth operation.
This method can also include the context type of identification audio signal, wherein, adjustment operation 1104 can be configured
Into adjusting the scope of dynamic gain based on the confidence value of context type.
The role of content type is limited to the context type residing for it.Therefore, when (being directed to for audio signal simultaneously
Same audio fragment) not only obtained content-type information but also obtain context type information when, the context class based on audio signal
The content type of audio signal can be defined as informedness or interfering by type.It is in addition, upper depending on audio signal
Hereafter type, the content type that can be given in the audio signal of different context types distribute different weights.From another angle
Degree seen, different weight (larger or less, on the occasion of or negative value) can be used to reflect the informedness of content type
Matter or jamming pattern.
The context type of audio signal can include VoIP, film class media, long-term music and game.And in VoIP
In the audio signal of context type, background sound is considered as interference content type;And in the sound of non-VoIP context types
In frequency signal, background and/or voice and/or music are considered as informational content type.Other context types can include height
Quality audio or low quality audio.
It is similar with multiple content types, when audio signal (is directed to same audio fragment) simultaneously and be classified into accordingly putting
During multiple context types of certainty value, adjustment operation 1104 may be configured to by the weight based on the plurality of context type
The confidence value of the plurality of context type is weighted by the property wanted, or by based on confidence value by the plurality of context class
The influence of type is weighted, to consider at least some context types in the plurality of context type.Especially, adjustment operation
Confidence value is configured to consider at least one leading context type.
Finally, the embodiment of the method as described in this trifle can be used as will be discussed in the 6th part and the 7th part
The audio frequency classification method stated, its detailed description is omitted here.
With the embodiment of audio processing equipment similarly, on the one hand, the embodiment of audio-frequency processing method and embodiment party
Any combinations of the modification of formula are all feasible;On the other hand, the modification of the embodiment and embodiment of audio-frequency processing method
Each aspect can be single solution.In addition, any two or more solutions described in this trifle
Scheme can be combined with each other, and these combinations can also with it is described in the other parts in present disclosure and implied
Any embodiment be combined.
5th part:Balance controller and control method
Balanced device is commonly used to music signal to adjust or change the spectrum of music signal balance, and spectrum balance is referred to as
" tone " or " tone color ".Traditional balanced device permitted a user to emphasize some sound or remove undesirable sound and
It is each that individually configuration frequency responds the one-piece pattern (curve or shape) of (gain) on frequency band.Popular music player example
The gain at each frequency band is adjusted such as the commonly provided graphic equalizer of Windows (form) media player, and is provided
One group is directed to the preset of different music styles such as rock and roll, a Chinese musical telling, jazz and folk rhyme, to listen attentively to the music process of different-style
In obtain optimum experience.Once have selected preset or set pattern, then apply same EQ Gain on signal until hand
Untill changing the pattern dynamicly.
By contrast, in order to keep the global consistency of related to desired tone or tone color spectrum balance, dynamic equalization
Device provides the mode for automatically adjusting the EQ Gain at each frequency band.Balanced by the spectrum for continuously monitoring audio, by it
Compared with desired preset spectrum balance, and applied EQ Gain is dynamically adjusted so that the original spectrum of audio balance to be turned
It is changed to expectation and composes balance, so as to realizes the uniformity.It is expected that spectrum balance is manually to select before treatment or preset.
Two kinds of balanced device all has the disadvantage that:Must manually select optimal equalization pattern, it is expected compose balance or
The related parameter of person, and can not the audio content based on playback come dynamically change optimal equalization pattern, it is expected to compose balance or
The related parameter of person.In order to provide overall high quality for different classes of audio signal, distinguishing audio content type is
It is very important.For example, different musical works needs different balanced modes, such as the balanced mode of different-style music.
In the equalizer system for being possible to input various audio signals (being not only music), it is necessary to based on content type come
Adjust parametric equalizer.For example, balanced device is generally enabled music signal, but balanced device is disabled to voice signal, because
Weighing apparatus may change the tone color of voice too much and correspondingly sound unnatural signal.
In order to solve this problem at least in part, present applicant proposes based on the embodiment discussed in part 1
To control balanced device.
It is similar to the 4th part with part 2, as the specific of the audio processing equipment and method discussed in part 1
Example, balanced device 408 can use all embodiments for being discussed in part 1 and in these disclosed in part 1
Any combinations of a little embodiments.Especially, in the case where controlling balanced device 408, audio frequency process as shown in Figures 1 to 10
The audio classifiers 200 and adjustment unit 300 of equipment 100 can form the controller 2200 of balanced device 408 as shown in figure 22.
In this embodiment, because adjustment unit is exclusively used in balanced device 408, it can be referred to as 300D.
That is, the disclosure based on part 1, balanced device controller 2200 can include:Audio classifiers
200, for continuously identifying the audio types of audio signal;And adjustment unit 300D, for based on the audio class identified
The confidence value of type adjusts balanced device in a continuous manner.Similarly, audio classifiers 200 can include audio content classification
It is at least one in device 202 and audio context grader 204, and volume equalizer controller 2200 can also include type
It is at least one in smooth unit 712, parameter smoothing unit 814 and timer 916.
Therefore, in this section, these contents that will do not had been described above in part 1 repeatedly, and only provide theirs
Some specific examples.
5.1 controls based on content type
In general, it is directed in general audio content type such as music, voice, background sound and noise, it should to not
Same content type is arranged differently than balanced device.It is similar with traditional setting, balanced device can be automatically enabled music signal, but
It is that balanced device is automatically disabled to voice;Or in a manner of more continuous, high balanced rank is set still to music signal
Low balanced rank is set to voice signal.In this way, it is possible to the equilibrium of balanced device is automatically set for audio content
Rank.
Especially for music, it was observed that balanced device is not fine to the snatch of music effect with leading source, because such as
Fruit is applied with unsuitable equilibrium, may significantly change the tone color in leading source and sound unnatural.In consideration of it,
Low balanced rank is preferably set on the snatch of music with leading source, and the snatch of music in not leading source is kept
High balanced rank.Using this information, balanced device can be automatically set balanced rank for different music contents.
It is also based on different attributes --- such as style, musical instrument and including music rhythm, speed and tone color
Feature --- come to music assorting.Different music styles can be directed in an identical manner using different balanced preset, this
A little music group/types can also have the optimum equalization pattern of its own either equalizer curve (in traditional balanced device) or
Optimal expectation spectrum balance is (in dynamic equalizer).
As previously discussed, balanced device is generally enabled music content, but balanced device is disabled to voice, because due to tone color
Change, for dialogue, balanced device may be allowed to sound bad.It is automatic realize this point a method be make balanced rank with
Content is related, and content refers specifically to the music confidence value and/or voice confidence value obtained from audio content sort module.This
In, balanced rank can be interpreted the weight of applied EQ Gain.Rank is higher, then the equilibrium applied is stronger.Pin
To the example, if balanced rank is 1, apply complete balanced mode;If balanced rank is 0, correspondingly all increasings
Benefit is 0dB and therefore application is not balanced.It can represent equal with different parameters in the different implementations of equalizer algorithm
Weigh rank.One illustrative embodiments of the parameter are such as in A.Seefeldt et al. Publication No. US 2009/
0097676A1 " Calculating and Adjusting the Perceived Loudness and/or the
The weight for the balanced device realized in Perceived Spectral Balance of an Audio Signal ", the document
Full content is incorporated by reference into herein.
Various control programs can be designed to adjust balanced rank.For example, using audio content type information, can use
Voice confidence value or music confidence value set balanced rank, such as:
Leq∝Confmusic (20)
Or
Leq∝1-Confspeech (21)
Wherein, LeqIt is balanced rank, and ConfmusicAnd ConfspeechRepresent the confidence value of music and voice.
That is, adjustment unit 300D may be configured to the confidence value positive for making balanced rank and short-term music
Close, or make the confidence value of balanced rank and voice negatively correlated.
Balanced rank is set using voice confidence value and music confidence value with can be combined with.Overall thought is:
Only when music confidence value is high and voice confidence level is low, equalization stages special talent should be high, and otherwise balanced rank is low.Such as:
Leq=Confmusic(1-Confspeech α) (22)
Wherein, in order to handle the non-zero voice confidence level that may often occur in music signal, in voice confidence level
Add index α.Using above formula, the absolute music signal integrity of no any phonetic element is applied balanced (rank is equal to 1).
As stated in part 1, α can be considered as the weight coefficient of the importance based on content type, and generally can be with
It is set to 1 to 2.
If setting bigger weight to the confidence value of voice, adjustment unit 300D, which may be configured to work as, is directed to language
The confidence value of sound content type disables balanced device 408 when being more than threshold value.
In the above description, using music content type and voice content type as example.Alternatively, or additionally,
It is also contemplated that the confidence value of background sound and/or noise.Specifically, adjustment unit 300D may be configured to make equalization stages
Not with the confidence value positive correlation of background sound, and/or make the confidence value of balanced rank and noise negatively correlated.
As another embodiment, confidence value can be used for drawing the normalized weight as described in trifle 1.4.
Assuming that for each content type predefine desired balanced rank (for example, be 1 to music, to voice be 0, to noise and
Background sound is 0.5), then can to apply the formula similar with formula (8) completely.
Balanced rank can also be carried out smoothly, to avoid that the quick change of audible distortion may be being introduced at transfer point
Change.The parameter smoothing unit 814 as described in trifle 1.5 can be used.
The possibility in the leading source in 5.2 music
In order to avoid the music with leading source is applied in high balanced rank, balanced rank and instruction music can also be made
Whether fragment includes the confidence value Conf in leading sourcedomCorrelation, such as:
Leq=1-Confdom (23)
In this way, balanced rank is relatively low to the snatch of music with leading source, and to the snatch of music in not leading source
It is higher.
Here, although describing the confidence value of the music with leading source, can also use without leading source
Music confidence value.That is, adjustment unit 300D may be configured to make balanced rank dominate the short of source with no
The confidence value positive correlation of phase music, and/or bear the confidence value of balanced rank and the short-term music with leading source
It is related.
As described in trifle 1.1, although as voice on the one hand and music and as the leading source of having on the other hand
Music or music without leading source are the content types in different levels rank, but can abreast consider them.It is logical
Confidence value and voice confidence value and music confidence value that joint as described above considers leading source are crossed, can be by by public affairs
At least one in formula (20) to formula (21) is combined to set balanced rank with formula (23).One example is to merge institute
There are three formula:
Leq=Confmusic(1-Confspeech)(1-Confdom) (24)
In order to more general, the different of the importance based on content type can also be applied to different confidence values
Weight, such as in a manner of in formula (22).
As another example, it is assumed that just calculate Conf only when audio signal is musicdom, segmentation letter can be set
Number, such as:
If categorizing system quite determines that the audio is music (music confidence value is more than threshold value), the function is based on master
The confidence value of stem sets balanced rank;Otherwise, equalization stages are set based on music confidence value and voice confidence value
Not.That is, adjustment unit 300D may be configured to when the confidence value of short-term music is more than threshold value consider have/no
Short-term music with leading source.Of course, it is possible to change the first half of formula (25) in a manner of formula (20) to (24)
Or latter half.
The same Smooth scheme as described in trifle 1.5 can also be applied, and is also based on translation type to set
Constant α, translation type such as position from the music with leading source to the conversion of the music without leading source, or from without
Conversion of the music in leading source to the music with leading source.For this purpose, the formula similar with formula (4 ') can also be applied.
5.3 balanced devices it is preset
In addition to balanced rank is adaptively adjusted based on the confidence value of audio content type, it might also depend on
Its style, musical instrument or other characteristics, automatically selected for different audio contents appropriate balanced mode it is preset or
It is expected that spectrum balance is preset.The music of music including same musical instrument with same style or the sound with same musical specific property
Pleasure can share same balanced mode it is preset or it is expected compose balance it is preset.
In order to generality, represent to have same style, same musical instrument or similar using term " music type "
The music group of music attribute, and they can be considered as another level of the audio content type as described in trifle 1.1
Rank.Appropriate balanced mode, balanced rank, and/or it is expected composes that balance is preset to be associated with each music type.Equilibrium model
Formula is consequently exerted at the gain curve on music signal, and it can be for different music types (such as classics, rock and roll, Jue Shihe
Folk rhyme) and any one equalizer preset in the equalizer preset that uses.It is expected that the preset expression of spectrum balance is directed to each type
Desired tone color.Figure 23 shows that the expectation spectrum realized in Doby family theater (Dolby Home Theater) technology is flat
Weigh preset several examples.Each example describes the expectation spectral shape in entire audible frequency range.Continuously should
Shape and relatively calculates EQ Gain compared with the spectral shape of the audio inputted according to this, so as to by the sound of input
The spectral shape of frequency transforms to preset spectral shape.
For a new snatch of music, it may be determined that immediate type (hard decision), or can calculate with each
The related confidence value (soft-decision) of music type.Based on this information, it is suitable to be determined for given musical works
Balanced mode or it is expected compose balance it is preset.Simplest mode is the corresponding modes of the type most matched to its distribution, such as:
Wherein, PeqIt is the balanced mode of estimation or it is expected that spectrum balance is preset, and c*It is the rope of the music type most matched
Draw (leading audio types), can be by selecting the type with highest confidence value to obtain the index.
Furthermore, it is possible to there is the more than one music type with the confidence value more than 0, it is meant that the musical works has
Have and the more or less similar attribute of those types.For example, musical works can have multiple musical instruments, or its can have it is more
The attribute of kind style.This has been inspired by considering all types rather than only being estimated suitably using an immediate type
The another way of balanced mode.It is, for example, possible to use weighted sum:
Wherein, N is the quantity of predefined type, wcIt is related to each predefined music type (index is c)
Design pattern PcWeight, wc1 should be normalized to based on its respective confidence value.In this way, the pattern of estimation will be each
The mixing of the pattern of kind music type.For example, for the not only musical works with jazz's attribute but also with rock and roll attribute, estimate
The pattern of meter will be the pattern that falls between.
In some applications, it may be undesirable to be related to all types as shown in formula (27).Only type
A subset --- with current maximally related type of musical works --- is needed to be considered, and formula (27) can be slightly improved into:
Wherein, N ' is the quantity for the type to be considered, c ' is the type rope of the confidence value based on type and descending arrangement
Draw.By using subset, immediate type can be focused more on and exclude those less related types.In other words, adjust
Unit 300D is configured to confidence value to consider at least one leading audio types.
In the above description, using music type as example.In fact, the program is for appointing as shown in trifle 1.1
Audio types on what stratum level are all applicable.Therefore, usually, adjustment unit 300D may be configured to each
It is preset that audio types distribute balanced rank and/or balanced mode and/or spectrum balance.
5.4 controls based on context type
In trifle before, discussion focuses on various content types.In more embodiment party that this trifle is discussed
In formula, context type will be alternatively, or additionally considered.
Generally, balanced device is enabled but for film class media content disabling balanced device for music, because due to obvious
Tone color change, balanced device may make the dialogue in film class media sound bad.This represent balanced rank can with it is long-term
The confidence level of music and/or the confidence level association of film class media:
Leq∝ConfMUSIC (29)
Or
Leq∝1-ConfMOVIE (30)
Wherein, LeqIt is balanced rank, ConfMUSICAnd ConfMOVIERepresent long-term music and the confidence level of film class media
Value.
That is, adjustment unit 300D may be configured to the confidence value positive for making balanced rank and long-term music
Close, or make balanced rank and the confidence value of film class media negatively correlated.
That is, film class media signal is directed to, film class media confidence value height (or music confidence level is low), because
This balanced rank is low;On the other hand, for music signal, film class media confidence value is low (or music confidence level is high), because
This balanced rank is high.
It can be changed in a manner of with formula (22) to formula (25) identical shown in formula (29) and formula (30)
Solution, and/or formula (29) and formula (30) can be with appointing in the scheme shown in formula (22) to formula (25)
One scheme of meaning is combined.
Additionally or alternatively, adjustment unit 300D may be configured to make balanced rank bear with the confidence value played
It is related.
As another embodiment, confidence value can be used for drawing the normalized weight as described in trifle 1.4.It is false
If predefining desired balanced rank/pattern (showing balanced mode in following table 2) for each context type,
The formula similar with formula (9) can also be applied.
Table 2:
Film class media | Long-term music | VoIP | Game | |
Balanced mode | Pattern 1 | Pattern 2 | Pattern 3 | Pattern 4 |
Here, it is equal as being disabled for some context type (such as film class media and game) in some patterns
A kind of mode of weighing apparatus, all gains can be set to zero.
The combination of 5.5 embodiments and application scenarios
Similar with part 1, the modification of all the above embodiment and embodiment can be arbitrary group with its
Close to realize, and still any group with same or similar function being previously mentioned in different part/embodiments
Part can be used as same component or single component to realize.
For example, can group each other in trifle 5.1 to any two described in trifle 5.4 or more solutions
Close.And these combination can also with part 1 to the 4th part and described in the other parts that will be described later
And any embodiment for being implied be combined.
5.6 balanced device control methods
It is similar with part 1, during the balanced device controller in describing embodiment above, it is clear that also disclose that
Some processing and method.Hereinafter, the summary of these methods is provided in the case of the details for not repeating to have discussed.
First, the embodiment for the audio-frequency processing method discussed in part 1 can be used for balanced device, balanced device
Parameter is one of target to be adjusted by audio-frequency processing method.According to this point, audio-frequency processing method is also balanced device controlling party
Method.
In this trifle, only discussion is exclusively used in controlling to the those aspects of balanced device.General side on control method
Face, it may be referred to part 1.
According to some embodiments, balanced device control method can include:The audio types of audio signal are identified in real time;
And balanced device is adjusted in a continuous manner based on the audio types identified.
It is similar with the other parts of the application, when being related to multiple audio types with respective confidence value, adjustment behaviour
Make 1104 to may be configured to enter the confidence value of the plurality of audio types by the importance based on the plurality of audio types
Row weighting, or by being weighted the influence of the plurality of audio types based on confidence value, to consider the plurality of audio class
At least some audio types in type.Especially, adjustment operation 1104 is configured to confidence value to consider at least
One leading audio types.
As described in part 1, adjusted parameter value can be carried out smooth.Trifle 1.5 and trifle 1.8 are may be referred to,
And omit be described in detail herein.
Audio types can be content type or be context type, or both.When being related to content type
When, adjustment operation 1104 may be configured to the confidence value positive correlation for making balanced rank and short-term music, and/or make
The confidence value of the rank that weighs and voice is negatively correlated.Additionally or alternatively, adjustment operation can be configured to make equilibrium
The confidence value positive correlation of rank and background sound, and/or make the confidence value of balanced rank and noise negatively correlated.
When being related to context type, adjustment operation 1104 may be configured to the confidence for making balanced rank and long-term music
Angle value positive correlation, and/or make balanced rank and film class media and/or the confidence value of game negatively correlated.
For the content type of short-term music, adjustment operation 1104 may be configured to make balanced rank leading with not having
The confidence value positive correlation of the short-term music in source, and/or make the confidence of balanced rank and the short-term music with leading source
Angle value is negatively correlated.It can just think just to complete this point when the confidence value of short-term music is more than threshold value.
Except adjusting balanced rank, can based on the confidence value of the audio types of audio signal come adjust balanced device its
In terms of him.For example, adjustment operation 1104 may be configured to distribute balanced rank and/or balanced mode to each audio types
And/or spectrum balance is preset.
On the instantiation of audio types, part 1 may be referred to.
With the embodiment of audio processing equipment similarly, on the one hand, the embodiment of audio-frequency processing method and embodiment party
Any combinations of the modification of formula are all feasible;On the other hand, the modification of the embodiment and embodiment of audio-frequency processing method
Each aspect can also be single solution.In addition, any two or more solutions described in this trifle
Certainly scheme can be combined with each other, and these combinations can also with it is described in the other parts in the disclosure and implied
Any embodiment is combined.
6th part:Audio classifiers and sorting technique
As described in trifle 1.1 and trifle 1.2, the content type including various stratum levels discussed in this application and upper
Hereafter the audio types of type can include being classified or being identified based on the method for machine learning by existing classification schemes.
In this part and ensuing part, as being previously mentioned in part before, present applicant proposes for context type
The grader and some novel aspects of method classified.
6.1 Context quantizations based on content type classification
As being previously mentioned in part before, audio classifiers 200 are used for content type and/or the knowledge for identifying audio signal
The context type of other audio signal.Therefore, audio classifiers 200 can include in audio content grader 202 and/or audio
Hereafter grader 204.When using prior art to realize audio content grader 202 and audio context grader 204, this
Two graders can be independent of one another, though they may share some features and therefore may share for extract feature one
A little schemes.
In this part and ensuing 7th part, according to the novel aspect proposed in this application, above and below audio
Literary grader 204 can use the result of audio content grader 202, that is to say, that audio classifiers 200 can include:Sound
Frequency content classifier, for identifying the content type of audio signal;And audio context grader 204, for based on audio
The result of content classifier 202 identifies the context type of audio signal.So, the classification knot of audio content grader 202
Fruit both can be used by audio context grader 204, but by as described in part above adjustment unit 300 (or
Adjustment unit 300A to adjustment unit 300D) use.But although being not shown in the accompanying drawings, audio classifiers 200 may be used also
With including the two audio content graders 202 used respectively by adjustment unit 300 and audio context grader 204.
In addition, as described in trifle 1.2, especially when classify multiple audio types when, audio content grader 202 or sound
Frequency context classifier 204 can include one group of interoperable grader, although they can also be realized as single classification
Device.
As described in trifle 1.1, content type is for generally having number frame to the length (such as 1s) of tens of frame amount levels
A kind of audio types of short-term audio fragment, and context type is for the length generally with the several seconds to tens of second-times
A kind of audio types of the long-term audio fragment of (such as 10s).Therefore, with " content type " and " context type " correspondingly,
Use term " short-term " and " long-term " respectively if necessary.But as ensuing 7th part will described in, although up and down
Literary type is indicated for the property of the audio signal in relatively long time scale, but can also be based on from short-term audio
The feature of snippet extraction identifies context type.
Now, reference picture 24 illustrates the structure of audio content grader 202 and audio context grader 204.
As shown in figure 24, audio content grader 202 can include Short-term characteristic extractor 2022, for from each including
The short-term audio fragment extraction Short-term characteristic of audio frame sequence;And short-term grader 2024, for using corresponding short-term special
Short-term fragment sequence in long-term audio fragment is categorized into short-term audio types by sign.Short-term characteristic extractor 2022 and short
Both phase grader 2024 can realize with prior art, but in ensuing trifle 6.3 also for Short-term characteristic
Extractor 2022 proposes some improvement.
Short-term grader 2024 may be configured to each short-term segment classification in the short-term fragment sequence to
At least one short-term audio types in trifle 1.1 in explained following short-term audio types (content type):It is voice, short
Phase music, background sound and noise.Each content type can also be further classified the secondary other content type of lower level
In, such as discussed in trifle 1.1, but not limited to this.
As being known in the art, the confidence level of categorized audio types can also be obtained by short-term grader 2024
Value.In this application, when mentioning the operation of any grader, it should be understood that regardless of whether carried out it is explicitly stated,
Then it also obtain confidence value simultaneously if necessary.Can be in L.Lu, H.-J.Zhang and S.Li " Content-based
Audio Classification and Segmentation by Using Support Vector Machines",ACM
The example of audio types classification is found in Multimedia Systems Journal 8 (6), pp.482-492 (2003.3), should
The full content of document is incorporated by reference into herein.
On the other hand, as shown in figure 24, audio context grader 204 can include statistics extractor 2042, use
In the statistics for the result for calculating the short-term fragment sequence that short-term grader is directed in the long-term audio fragment, as long-term spy
Sign;And long-term grader 2044, for long-term audio fragment to be categorized into long-term audio types using long-term characteristic.It is similar
Both ground, statistics extractor 2042 and long-term grader 2044 can realize with prior art, but following
Trifle 6.2 in some improvement for proposing also for statistics extractor 2042.
Long-term grader 2044 may be configured to long-term audio fragment being categorized into and following explained in trifle 1.1
At least one long-term audio types in the long-term audio types (context type) crossed:Film class media, long-term music, game
And VoIP.Alternatively, or additionally, long-term grader 2044 may be configured to long-term audio fragment being categorized into
In trifle 1.1 in explained VoIP or non-VoIP.Alternatively, or additionally, long-term grader 2044 can by with
It is set to and long-term audio fragment is categorized into high quality audio explained in trifle 1.1 or low quality audio.
In practice, various target audio types can be selected and trained based on the demand of application/system.
On the implication and selection of short-term fragment and long-term fragment (and the frame to be discussed in trifle 6.3), Ke Yican
Examine trifle 1.1.
The extraction of 6.2 long-term characteristics
As shown in figure 24, in one embodiment, only statistics extractor 2042 is used for from short-term grader
2024 result extraction long-term characteristic.As long-term characteristic, can be calculated by statistics extractor 2042 in data below
It is at least one:The average value of the confidence value of the short-term audio types of short-term fragment in long-term fragment to be sorted and side
Difference, the above-mentioned average value weighted by the significance level of short-term fragment and variance, each short-term sound in long-term fragment to be sorted
The frequency of occurrences of frequency type and the frequency changed between different short-term audio types.
The voice confidence value in each short-term fragment (length 1s) and short-term music confidence level are shown in Figure 25
The average value of value.In order to contrast, the audio context different from three extracts the fragment:It is film class media (Figure 25 (A)), long-term
Music (Figure 25 (B)) and VoIP (Figure 25 (C)).It can be observed how for film class media context, whether for voice
Type or music type can obtain high confidence value, and it continually replaces in the two audio types.Compare
Under, the fragment of long-term music provides stable and high short-term music confidence value and relatively stable and low voice confidence level
Value.And VoIP fragment provides steady and low short-term music confidence value, but give and stop due to VoIP sessions
Pause and the voice confidence value of fluctuation.
Variance for the confidence value of each audio types is also the feature for different audio contexts of classifying.Figure 26
Give in film class media ((A) in Figure 26), long-term music ((B) in Figure 26) and VoIP ((C) in Figure 26) audio
Voice hereinafter, short-term music, (abscissa is putting in data set to the block diagram of variance of confidence value of background and noise
The variance of certainty value, ordinate are the generation quantity in each section (bin) of the variance yields in data set, and it can be normalized
To indicate the probability of happening in each section of variance yields).For film class media, all voices, short-term music and background are put
The variance of reliability is all of a relatively high and wider distribution, illustrates that the confidence value of these audio types consumingly changes;For length
Phase music, the variance of the confidence level of all voices, short-term music and background is all relatively low and narrow distribution, indicates these sounds
The confidence value of frequency type keeps stable:Voice confidence value keeps constant low and music confidence value keeps constant height;
For VoIP, the variance of the confidence value of short-term music is relatively low and narrow distribution, and the variance of the confidence value of voice is relative
Ground wider distribution, this is due to the frequent pause during VoIP converses.
On the weight for calculating weighted average and variance, it is determined based on the importance of each short-term fragment
's.The importance of short-term fragment can be measured by the energy and loudness of short-term fragment, can be estimated with many prior arts
Count the energy and loudness of short-term fragment.
The frequency of occurrences of each short-term audio types in long-term fragment to be sorted is:To the short-term piece in long-term fragment
The counting for each audio types that section is classified into, is normalized by the length of long-term fragment.
The frequency changed between different short-term audio types in long-term fragment to be sorted is:To be sorted long-term
Two adjacent short-term fragments in fragment change the counting of audio types, are normalized by the length of long-term fragment.
When reference picture 25 discusses the average value and variance of confidence value, each short-term audio types are actually also relates to
The frequency of occurrences and the conversion frequency between those different short-term audio types.These features are also classified high with audio context
Degree is related.For example, long-term music mainly includes short-term music VF type, so there is long-term music high short-term music to occur
Frequency, and VoIP mainly includes voice and pause, so VoIP has high voice or the noise frequency of occurrences.Show as another
Example, film class media are more frequently changed than long-term music or VoIP between different short-term audio types, so film class
Media generally have the higher conversion frequency between short-term music, voice and background;VoIP is generally more frequent than other types
Ground is changed between voice and noise, therefore VoIP generally has the higher conversion frequency between voice and noise.
In general it is assumed that long-term fragment has same length in same application/system.It is if it is the case, then each
The generation of short-term audio types counts and the transition count in long-term fragment between different short-term audio types can be direct
Using without normalizing.If the length of long-term fragment is variable, the frequency of occurrences as mentioned above should be used
And conversion frequency.Claims hereof should be interpreted to cover both of these case.
Additionally or alternatively, audio classifiers 200 (or audio context grader 204) can also include length
Phase feature extractor 2046 (Figure 27), for the Short-term characteristic based on the short-term fragment sequence in long-term audio fragment further from
Long-term characteristic is extracted in long-term audio fragment.In other words, the dividing without using short-term grader 2024 of long-term characteristic extractor 2046
Class result, but directly using the Short-term characteristic that is extracted by Short-term characteristic extractor 2022, come obtain will be by long-term grader
2044 some long-term characteristics used.Long-term characteristic extractor 2046 and statistics extractor 2042 can be used independently or
Person uses in combination.In other words, audio classifiers 200 can include long-term characteristic grader 2046 or including statistical number
According to extractor 2042, or both can be included.
Any feature can be extracted by long-term characteristic extractor 2046.In this application, it is proposed that calculate from short-term special
Levy at least one of following statistics of Short-term characteristic of extractor 2022 and be used as long-term characteristic:Average value, variance, weighting are flat
, weighted variance, high average (high average), harmonic(-)mean (low average) and height are average between harmonic(-)mean
Ratio (contrast).
From the average value and variance of the Short-term characteristic of the short-term snippet extraction in long-term fragment to be sorted;
From the weighted average and weighted variance of the Short-term characteristic of the short-term snippet extraction in long-term fragment to be sorted.Base
Come in the importance for each short-term fragment that energy or the loudness of the short-term fragment of use just mentioned measure to Short-term characteristic
It is weighted;
Height is average:From the average value of the selected Short-term characteristic of the short-term snippet extraction in long-term fragment to be sorted.
It is selected when Short-term characteristic meets at least one condition of following condition:More than threshold value;Or short not less than every other
In the Short-term characteristic of the predetermined ratio of phase feature, for example, the Short-term characteristic of highest 10%;
Harmonic(-)mean:From the average value of the selected Short-term characteristic of the short-term snippet extraction in long-term fragment to be sorted.
It is selected when Short-term characteristic meets at least one condition of following condition:Less than threshold value;Or short not higher than every other
In the Short-term characteristic of the predetermined ratio of phase feature, for example, 10% minimum Short-term characteristic;And
Contrast:The average ratio between harmonic(-)mean of height, represent the dynamic of the Short-term characteristic in long-term fragment.
Short-term characteristic extractor 2022 can be realized with prior art, and can thus extract any feature.Although
In this way, propose some improvement for Short-term characteristic extractor 2022 in ensuing trifle 6.3.
The extraction of 6.3 Short-term characteristics
As shown in Figure 24 and Figure 27, Short-term characteristic extractor 2022 may be configured to directly from each short-term audio fragment
Extract at least one feature in following characteristics:Rhythmic appearance, interruption/quietness and short-term audio quality feature.
Rhythmic appearance can include rhythm intensity, rhythm regularity, rhythm clarity (referring to L.Lu, D.Liu, and H.-
J.Zhang.“Automatic mood detection and tracking of music audio signals”.IEEE
Transactions on Audio,Speech,and Language Processing,14(1):5-18,2006, in its whole
Appearance is incorporated by reference into herein) and 2D subbands modulation (M.F McKinney and J.Breebaart. " Features for
Audio and music classification ", Proc.ISMIR, 2003, entire contents are incorporated by reference into herein
In).
Interruption/quietness can include voice interruption, decline, Jing Yin length, unnatural quiet, unnatural peace and quiet suddenly
Average value, unnatural quietly gross energy etc..
Short-term audio quality is characterized in the audio quality feature on short-term fragment, itself and the audio matter extracted from audio frame
Measure feature is similar, will be in discussion below.
Alternatively, or additionally, as shown in figure 28, audio classifiers 200 can include frame level feature extractor
2012, for extracting frame level feature in each frame of included audio frame sequence from short-term fragment, and Short-term characteristic carries
The frame level feature that device 2022 is configured to extract from audio frame sequence is taken to calculate Short-term characteristic.
As pretreatment, input audio signal can be monophonic audio signal by lower mixing (down-mix).If sound
Frequency signal has been that monophonic signal does not need the pretreatment then.Then with predefined length (usual 10 milliseconds to 25 milliseconds)
It is divided into frame.Correspondingly, frame level feature is extracted from each frame.
Frame level feature extractor 2012 may be configured to extract at least one feature in following characteristics:With various short-term
The feature of the attribute characterization of audio types, cut-off frequency, static signal to noise ratio (static SNR) characteristic, segmental signal-to-noise ratio
(segmental SNR) characteristic, basic voice Expressive Features (basic speech descrptor) and sound channel (vocal
Tract) characteristic.
With the spy of the attribute characterization of various short-term audio types (especially voice, short-term music, background sound and noise)
Sign can include at least one feature in following characteristics:Frame energy, subband Spectral structure, spectrum flux (spectral flux), plum
That cepstrum coefficient (Mel-frequency Cepstral Coefficient, MFCC), bass (bass), residual (residual)
Information, purity (Chroma) feature and zero-crossing rate (zero-crossing rate).
On MFCC details, L.Lu, H.-J.Zhang and S.Li are may be referred to, " Content-based Audio
Classification and Segmentation by Using Support Vector Machines",ACM
Multimedia Systems Journal 8 (6), pp.482-492 (2003.3), entire contents are incorporated by reference into this
Wen Zhong.On the details of tone chrominance information, G.H.Wakefield is may be referred to, " Mathematical
Representation of joint time Chroma distributions " in SPIE, 1999, entire contents pass through
Reference is merged into herein.
Cut-off frequency represents the highest frequency of audio signal, higher than the highest frequency content energy close to zero.Cut
Only frequency is designed to detect the limited content (band limited content) of frequency band, and in this application, frequency band is limited
Content for audio context classification be useful.Cut-off frequency is typically as caused by coding, because most of encoder
Radio-frequency component is abandoned in low bit rate or medium bit rate.For example, MP3 codecs have 16kHz's in 128kbps
Cut-off frequency;For another example many popular VoIP codecs have 8kHz or 16kHz cut-off frequency.
Except cut-off frequency, as another characteristic, it is used to distinguish the signal degradation during also audio coding is handled
Various audio contexts, such as VoIP contexts and non-VoIP contexts, high quality audio context with above and below low quality audio
Text.The feature for representing audio quality further can also be extracted in multiple ranks, such as objective speech quality assessment
Feature is (referring to Ludovic Malfait, Jens Berger, and Martin Kastner, " P.563-The ITU-T
Standard for Single-Ended Speech Quality Assessment”,IEEE Transaction on
Audio, Speech, and Language Processing, VOL.14, NO.6 (2006.11), entire contents pass through reference
It is merged into herein), to obtain more rich feature.The example of audio quality feature includes:
A) static SNR characteristics, including the ambient noise rank of estimation, spectrum clarity etc..
B) SNR characteristics, including spectrum level deviation, spectrum level range, relative lowest noise (relative noise are segmented
Floor) etc..
C) basic voice Expressive Features, including pitch average value (pitch average), voice segment change in sound level
(speech section level variation), vocal level etc..
D) tract characteristics, including class machine voice (robotization), pitch cross-power (pitch cross
Power) etc..
In order to draw short-term characteristic from frame level feature, Short-term characteristic extractor 2022 may be configured to calculate frame level feature
Statistics, as Short-term characteristic.
The example of the statistics of frame level feature includes average value and standard deviation, and it is each to distinguish that it captures rhythm characteristic
Kind audio types, such as short-term music, voice, background sound and noise.For example, voice generally with syllabic rate in voiced sound and clear
Between sound alternately but music then not so, represent voice frame level feature change generally than music frame level feature change
It is bigger.
Another example of statistics is the weighted average of frame level feature.For example, cut-off frequency is directed to, using each
The energy or loudness of frame are as weight, the weighted average between the cut-off frequency of each audio frame extraction in short-term fragment
It will be the cut-off frequency for the short-term fragment.
Alternatively, or additionally, as shown in figure 29, audio classifiers 200 can include:Frame level feature extractor
2012, for extracting frame level feature from audio frame;And frame level grader 2014, for using corresponding frame level feature come by this
Each frame classification in audio frame sequence into frame level audio types, wherein, Short-term characteristic extractor 2022 may be configured to
Short-term characteristic is calculated on the result of the audio frame sequence based on frame level grader 2014.
In other words, in addition to audio content grader 202 and audio context grader 204, audio classifiers 200 are also
Frame classifier 201 can be included.In such framework, frame level classification of the audio content grader 202 based on frame classifier 201
As a result short-term fragment is classified, and short-term point based on audio content grader 202 of audio context grader 204
Class result is classified to long-term fragment.
Frame level grader 2014 may be configured to by each audio frame in the audio frame sequence be categorized into it is any can
To be referred to as in the class of " frame level audio types ".In one embodiment, frame level audio types can have with hereinbefore
The similar framework of the framework of the content type discussed, and also there is the implication similar with content type, only difference is that
Frame level audio types and content type are the different stages in audio signal --- i.e., frame level not and short-term fragment rank --- point
Class.For example, frame level grader 2014 may be configured to each frame classification of the audio frame sequence to following frame level audio
In at least one type in type:Voice, music, background sound and noise.On the other hand, frame level audio types can also have
Have from the framework of content type partly different or entirely different framework, be more suitable for frame level classification, and be more suitable for being used as
For the Short-term characteristic classified in short term.For example, frame level grader 2014 may be configured to each frame of the audio frame sequence
It is categorized at least one frame level audio types in following frame level audio types:Voiced sound, voiceless sound and pause.
On how from frame level classification result extract Short-term characteristic, can be adopted by reference to the description in trifle 6.2
With similar scheme.
Alternatively, short-term grader 2024 can both use the short-term spy of the result based on frame level grader 2014
Sign, the Short-term characteristic for being directly based upon the frame level feature obtained from frame level feature extractor 2012 can also be used.Therefore, it is short-term special
Sign extractor 2022 is configured to frame level feature and frame level grader from audio frame sequence extraction on this
Both results of audio frame sequence calculate Short-term characteristic.
In other words, frame level feature extractor 2012 may be configured to calculate the statistics with discussing in trifle 6.2
Both similar statistics and Short-term characteristic of combination Figure 28 descriptions, it includes at least one feature in following characteristics:
Characterize the features of the attribute of various short-term audio types, cut-off frequency, static signal-to-noise characteristic, segmental signal-to-noise ratio characteristic, basic
Voice Expressive Features and tract characteristics.
In order to work in real time, in all embodiments, Short-term characteristic extractor 2022 may be configured to make
Worked on the short-term audio fragment that the sliding window slided to pre- fixed step size on the time dimension of long-term audio fragment is formed.Close
It may be referred to small in the sliding window for short-term audio fragment, and audio frame and the sliding window for long-term audio fragment, its details
Section 1.1.
The combination of 6.4 embodiments and application scenarios
Similar with part 1, the modification of all the above embodiment and embodiment can be arbitrary group with its
Close to realize, and the still any component with same or similar function mentioned in different part/embodiments
Same component or single component can be used as to realize.
For example, can group each other in trifle 6.1 to any two described in trifle 6.3 or more solutions
Close.And these combination can also with part 1 to the 5th part and described in the other parts that will be described later
And any embodiment for being implied be combined.Especially, the type smooth unit 712 described in part 1 can
For this part using the component as audio classifiers 200, for making frame classifier 2014 or audio content grader 202
Or the result of audio context grader 204 is smooth.In addition, timer 916 is also used as the group of audio classifiers 200
Part, to avoid the mutation of the output of audio classifiers 200.
6.5 audio frequency classification method
It is similar with part 1, during the audio classifiers in describing embodiment above, it is clear that also disclose that one
A little processes and method.Hereinafter, the summary of these methods is provided in the case of the details for not repeating to have discussed.
As shown in figure 30, in one embodiment, there is provided audio frequency classification method.In order to identify including short-term audio piece
The long-term audio types (that is, context type) of Duan Xulie (overlapping each other or not overlapping) long-term audio fragment, first will
Short-term audio fragment is categorized into short-term audio types (operation 3004), i.e. content type, and be directed to the long-term sound by calculating
The statistics (operation 3006) of the result of the sort operation of short-term fragment sequence in frequency fragment obtains long-term characteristic.So
Afterwards, can be using long-term characteristic come classified for a long time (operation 3008).Short-term audio fragment can include audio frame sequence.When
So, in order to identify the short-term audio types of short-term fragment, it is necessary to from short-term snippet extraction Short-term characteristic (operation 3002).
Short-term audio types (content type) can include but is not limited to voice, short-term music, background sound and noise.
Long-term characteristic can include but is not limited to:The average value and variance of the confidence value of short-term audio types, by short-term
The above-mentioned average value that the importance of fragment is weighted and variance, the frequency of occurrences of each short-term audio types with Bu Tong in short term
The frequency changed between audio types.
As shown in figure 31, in modification, the short-term spy for the short-term fragment sequence that can be directly based upon in long-term audio fragment
Sign obtains other long-term characteristic.This other long-term characteristic can include but is not limited to following Short-term characteristic statistics:
Average value, variance, weighted average, weighted variance, high average, harmonic(-)mean and high average and harmonic(-)mean ratio (contrast
Degree).
There is different modes to extract Short-term characteristic.One mode be directly extracted from short-term audio fragment to be sorted it is short
Phase feature.This feature includes but is not limited to:Rhythmic appearance, interruption/quietness and short-term audio quality feature.
The second way is to extract the frame level feature (operation in Figure 32 from the audio frame included by each short-term fragment
3201) frame level feature, is then based on to calculate Short-term characteristic, such as calculates the statistics of frame level feature as Short-term characteristic.Frame level
Feature can include but is not limited to:Characterize the various feature of the attribute of audio types, cut-off frequency, static noise bits in short term
Property, segmental signal-to-noise ratio characteristic, basic voice Expressive Features and tract characteristics.Characterize the feature of the attribute of various short-term audio types
It can also include:Frame energy, subband Spectral structure, spectrum flux, mel cepstrum coefficients, bass, residual, information, tone chromaticity and
Zero-crossing rate.
The third mode is to extract Short-term characteristic with extraction long-term characteristic similar mode:From to be sorted short-term
After audio frame extraction frame level feature (operation 3201) in fragment, each audio frame is categorized into using corresponding frame level feature
In frame level audio types (operation 32011 in Figure 33);And can by based on frame level audio types (alternatively including confidence
Angle value) Short-term characteristic is calculated to extract Short-term characteristic (operation 3002).Frame level audio types can have and short-term audio types
(content type) similar attribute and framework, and voice, music, background sound and noise can also be included.
The second way and the third mode can be combined, as shown in the dotted arrow in Figure 33.
As discussed in part 1, short-term audio fragment, which starves long-term audio fragment, to be sampled with sliding window.Namely
Say, extracting the operation (operation 3002) of Short-term characteristic can use with pre- fixed step size on the time dimension of long-term audio fragment
Carried out on the short-term audio fragment that the sliding window slided is formed, and extract the operation (operation 3107) of long-term characteristic and calculate short
The operation (operation 3006) of the statistics of phase audio types can also use the time dimension with pre- fixed step size in audio signal
Carried out on the long-term audio fragment that the sliding window slided on degree is formed.
With the embodiment of audio processing equipment similarly, on the one hand, the embodiment of audio-frequency processing method and embodiment party
Any combinations of the modification of formula are all feasible;On the other hand, the modification of the embodiment and embodiment of audio-frequency processing method
Each aspect can also be single solution.In addition, any two or more solutions described in this trifle
Certainly scheme can be combined with each other, and these combinations can also be dark with described in the other parts in present disclosure and institute
Any embodiment shown is combined.Especially, as discussed in trifle 6.4, the Smooth scheme of audio types
Can be a part for audio frequency classification method discussed herein with conversion plan.
7th part:VoIP graders and sorting technique
A kind of novel audio classifiers are proposed in the 6th part, for being based at least partially on content type classification
The result of device is by audio signal classification into audio context type.In the embodiment that the 6th part is discussed, it is from length
Several seconds extracted long-term characteristic into the long-term fragment of tens of seconds, and therefore, audio context classification may cause long time delay.
It is expected can also in real time or near-real-time, such as in short-term fragment rank, come audio context of classifying.
7.1 Context quantizations based on short-term fragment
Therefore, as shown in figure 34, there is provided audio classifiers 200A, including:Audio content classification 202A, for identifying sound
The content type of the short-term fragment of frequency signal;And audio context grader 204A, for being based at least partially on by audio
The content type of content classifier identification identifies the context type of short-term fragment.
Here, audio content grader 202A can take already mentioned technology in the 6th part, but can also use not
Same technology, as will be detailed below discussing in trifle 7.2.Moreover, audio context grader 204A can use the 6th part
Already mentioned technology, difference are that context classifier 204A can directly use audio content grader 202A knot
Fruit, rather than the statistics of the result using audio content grader 202A, because audio context grader 204A and sound
The grader 202A of frequency content classifies to same short-term fragment.In addition, it is similar with the 6th part, except in audio
Outside the result for holding grader 202A, audio context grader 204A can use other spies directly extracted from short-term fragment
Sign.That is, audio context grader 204A is configured to by using short-term segment contents type confidence
The machine learning models of other features of the angle value as feature and from short-term snippet extraction is classified to short-term fragment.Close
In the feature from short-term snippet extraction, the 6th part may be referred to.
Short-term fragment label can be simultaneously except VoIP speech/noises and/or non-by audio content grader 200A
(VoIP speech/noises and non-VoIP speech/noises will below the 7.2 for more audio types outside VoIP speech/noises
Section discusses), and each audio types in multiple audio types can be provided with the confidence value of its own, such as the 1.2nd
Discussed in section.This can realize the preferably classification degree of accuracy, because more rich information can be caught.For example, voice and
The united information of the confidence value of short-term music shows audio content is probably voice and background music in which kind of degree
Mixing, enables it to make a distinction with pure VoIP contents.
7.2 use the classification of VoIP voices and VoIP noises
This aspect of the application is particularly useful in the non-VoIP categorizing systems of VoIP/, with short decision delay to work as
Preceding short-term fragment will need the categorizing system when being classified.
For this purpose, as shown in figure 34, audio classifiers 200A is designed exclusively for the non-VoIP classification of VoIP/
's.In order to classify to the non-VoIP of VoIP/, VoIP speech classifiers 2026 and/or VoIP noise classification devices are developed, with life
Into the intermediate result of the non-VoIP classification of VoIP/ of the final robust for audio context grader 204A.
The short-term fragments of VoIP are either including VoIP voices or including VoIP noises.It was observed that the short-term fragment by voice
High accuracy can be reached by being categorized as VoIP voices or non-VoIP voices, but be VoIP by the short-term segment classification of noise
Noise or non-VoIP noises are really not so.It was therefore concluded that:By being VoIP (bags directly by short-term segment classification
VoIP voices and VoIP noises are included, but does not identify VoIP voices and VoIP noises specifically) and non-VoIP, without considering voice
Difference between noise, so as to which the feature of both content types (voice and noise) be mixed, it will reduce mirror
Other power.
For grader, realize the degree of accuracy of VoIP voices/non-VoIP Classification of Speech than VoIP noise/non-VoIP noises
The higher degree of accuracy of classification is rational, because voice includes more information, and the feature such as cut-off frequency than noise
Classification to voice is more effective.According to the weighting levels obtained from AdaBoost training process, for the non-VoIP voices of VoIP/
Former positions of the weighting Short-term characteristic of classification are:The standard deviation of logarithmic energy, cut-off frequency, rhythm intensity standard deviation and
Compose the standard deviation of flux.The standard deviation of the logarithmic energy of VoIP voices, the standard deviation of rhythm intensity and the mark for composing flux
Quasi- deviation is usually above non-VoIP voices.One is the reason for possible, in non-VoIP contexts such as film class media or game
Many short-term speech fragments other sound such as background music or audio relatively low generally with the value of features described above mix.Together
When, the cut-off frequency of the cut-off frequencies of VoIP voices generally than non-VoIP voices is low, and this explanation compiles solution by many popular VoIP
Code device introduces relatively low cut-off frequency.
Therefore, in one embodiment, audio content grader 202A can include:VoIP speech classifiers 2026,
For by short-term segment classification into VoIP voice contents type or non-VoIP voice contents type;And audio context grader
204A, the confidence value of VoIP voices and non-VoIP voices is configured to by short-term segment classification to VoIP contexts
In type or non-VoIP context types.
In another embodiment, audio content grader 202A can also include:VoIP noise classifications device 2028, use
In by short-term segment classification to VoIP noise contents type or non-VoIP noise contents type;And audio context grader
204A, the confidence value for being configured to VoIP voices, non-VoIP voices, VoIP noises and non-VoIP noises will be short-term
Segment classification is into VoIP context types or non-VoIP context types.
As discussed in the 6th part, trifle 1.2 and trifle 7.1, VoIP voices, non-VoIP voices, VoIP noises and non-
The content type of VoIP noises can be identified with prior art.
Or audio content grader 202A can have hierarchical structure as shown in figure 35.That is, utilize language
The result of sound/noise classification device 2025 is with first by short-term segment classification to voice or noise/background.
On the basis of the embodiment using only VoIP speech classifiers 2026, if a short-term fragment by voice/
Noise classification device 2025 (it is speech classifier in this case) is defined as voice, then VoIP speech classifiers 2026 after
It is that VoIP voices are also non-VoIP voices for continuous differentiation, and calculates binary classification result;Otherwise, it is believed that VoIP voices
Confidence value it is low or think that the judgement to VoIP voices is not known.
On the basis of the embodiment using only VoIP noise classifications device 2028, if a short-term fragment by voice/
Noise classification device 2025 (it is noise (background) grader in this case) is defined as noise, then VoIP noise classifications device
2028 to continue to distinguish it be that VoIP noises are also non-VoIP noises, and calculates binary classification result.Otherwise, it is believed that
The confidence value of VoIP noises is low or thinks that the judgement to VoIP noises is not known.
Here, because usual voice is the content type of informedness and noise/background is interference content type, even if short
Phase fragment is not noise, can not definitely determine that the short-term fragment is not VoIP contexts in the embodiment in earlier paragraphs yet
Type.And if a short-term fragment is not voice, in the embodiment using only VolP speech classifiers 2026, it may
It is not VolP context types.Therefore, generally can independently be realized using only the embodiment of VoIP speech classifiers 2026,
And may be used as supplementing embodiment using only the other embodiment of VoIP noise classifications device 2028, and for example using VoIP
The embodiment cooperation of speech classifier 2026.
That is, both VoIP speech classifiers 2026 and VoIP noise classifications device 2028 can be used.It is if short-term
Fragment is defined as voice by speech/noise grader 2025, then VoIP speech classifiers 2026 continue differentiation be VoIP voices also
It is non-VoIP voices, and calculates binary classification result.If short-term fragment is defined as making an uproar by speech/noise grader 2025
Sound, then it is that VoIP noises are also non-VoIP noises that VoIP noise classifications device 2028, which continues differentiation, and calculates binary classification knot
Fruit.Otherwise, it is believed that short-term fragment can be classified as non-VoIP.
The realization of speech/noise grader 2025, VoIP speech classifiers 2026 and VoIP noise classifications device 2028 can be with
The audio content grader 202 discussed using any prior art or part 1 into the 6th part.
If the short-term fragments of the audio content grader 202A realized as described above most at last one are not categorized into
In voice, noise and background, or it is not categorized into VoIP voices, non-VoIP voices, VoIP noises and non-VoIP noises, anticipates
Taste that all associated confidence values are all low, then audio content grader 202A (and audio context grader 204A) is short by this
Phase segment classification is non-VoIP.
In order to which the result based on VoIP speech classifiers 2026 and VoIP noise classifications device 2028 arrives short-term segment classification
In VoIP or non-VoIP context type, audio context grader 204A can use as discuss in trifle 7.1 based on
The technology of machine learning, and as an improvement, more features can be used, including directly extract from short-term fragment short-term
The knot of feature and/or other audio content graders for the other guide type in addition to VoIP related content types
Fruit, as discussed in trifle 7.1.
In addition to the above-mentioned technology based on machine learning, the alternative approach of the non-VoIP classification of VoIP/ can be utilized
Domain knowledge and the heuristic rule for utilizing the classification results related to VoIP voices and VoIP noises.This heuristic rule
An example it is as follows.
If time t current short-term fragment is confirmed as VoIP voices or non-VoIP voices, the classification results directly quilt
As the non-VoIP classification results of VoIP/, because the non-VoIP Classification of Speech of VoIP/ is robust, as previously discussed.Namely
Say, if short-term fragment is confirmed as VoIP voices, then it is VoIP context types;If short-term fragment is confirmed as non-
VoIP voices, then it is non-VoIP context types.
When VoIP speech classifiers 2026 are made for the voice determined as mentioned above by speech/noise grader 2025
When going out the binary decision on VoIP voices/non-VoIP voices, VoIP voices and non-VoIP voice confidence value are probably mutual
Mend, i.e., its summation is 1 (if 0 represents it 100% is not that 1 represents that 100% is), and is used to distinguishing VoIP voices and non-
The threshold value of the confidence value of VoIP voices can actually represent same point.If VoIP speech classifiers 2026 are not binary point
Class device, then the confidence value of VoIP voices and non-VoIP voices may not be complementary, and for distinguishing VoIP voices and non-
The threshold value of the confidence value of VoIP voices can not necessarily represent same point.
But situation about being fluctuated in VoIP voices or non-VoIP voices confidence level close to threshold value and above and below Near Threshold
Under, the non-VoIP classification results of VoIP/ excessively may continually switch.In order to avoid this fluctuation, buffering scheme can be provided:
Both the threshold value of VoIP voices and the threshold value of non-VoIP voices can be set to bigger so that less easily from Current Content
Type is switched to another content type.For the ease of description, the confidence value of non-VoIP voices can be converted into VoIP voices
Confidence value.It is, if confidence value is high, then it is assumed that short-term fragment is close to VoIP voices, and if confidence value
It is low, then it is assumed that short-term fragment is close to non-VoIP voices.Although for above-mentioned non-binary classifier, the height of non-VoIP voices is put
Certainty value is not necessarily mean that the low confidence value of VoIP voices, but this simplification can reflect the solution well
Essence, and the accompanying claims described with the language of binary classifier should be interpreted to cover being equal for non-binary classifier
Solution.
Buffering scheme is as shown in figure 36.In two threshold value Th1 and Th2 (Th1>=Th2) between have buffer area.Work as VoIP
When the value of the confidence v (t) of voice is fallen in this region, Context quantization will not change, as shown in the left and right sides arrow in Figure 36.
Only when the value of the confidence v (t) is more than larger threshold value Th1, short-term fragment will be classified as VoIP (such as the lower arrow institute in Figure 36
Show);And and if only if when confidence value is not more than less threshold value Th2, short-term fragment will be classified as non-VoIP (in such as Figure 36
Shown in portion's arrow).
If alternatively using VoIP noise classifications device 2028, situation is similar.In order that solution has more robustness,
VoIP speech classifiers 2026 and VoIP noise classifications device 2028 can be used in combination.Then, audio context grader 204A
It may be configured to:If the confidence value of VoIP voices is more than first threshold or if the confidence value of VoIP noises is more than
3rd threshold value, then it is VoIP context types by short-term segment classification;If the confidence value of VoIP voices is not more than the second threshold
Value, wherein Second Threshold is not more than first threshold, or if the confidence value of VoIP noises is not more than the 4th threshold value, wherein the
Four threshold values are not more than the 3rd threshold value, then are non-VoIP context types by short-term segment classification;Otherwise, it is by short-term segment classification
The context type of upper one short-term fragment.
Here, first threshold can be equal to Second Threshold, and the 3rd threshold value can be equal to the 4th threshold value, in particular for
But it is not limited to binary VoIP speech classifiers and binary VoIP noise classification devices.But because VoIP noise classification results are usual
Robustness is bad, if so the third and fourth threshold value is unequal each other will be more preferable, and the two should away from 0.5 (0 represent be
The high confidence level of non-VoIP noises, 1 expression are the high confidence levels of VoIP noises).
7.3 make smoothing fluctuations
In order to avoid rapid fluctuations, another solution is that confidence value determined by audio content grader is carried out
Smoothly.Therefore, as shown in figure 37, type smooth unit 203A can be comprised in audio classifiers 200A.For above being begged for
The confidence value of any one content type in content type related 4 VoIP of opinion, 1.3 sections can be used to discuss smooth
Scheme.
Or it is similar with trifle 7.2, VoIP voices and non-VoIP voices can be considered as pair with complementary confidence value;
VoIP noises and non-VoIP noises can also be considered as pair with complementary confidence value.In this case, there was only one per centering
It is individual to need to carry out smoothly, the Smooth scheme discussed in trifle 1.3 be used.
By taking the confidence value of VoIP voices as an example, formula (3) is rewritable to be:
V (t)=β v (t-1)+(1- β) voipSpeech Conf (t) (3 ")
Wherein, v (t) is moment t (when secondary) smoothed VoIP voice confidence values, and v (t-1) was a upper moment
The smoothed VoIP voice confidence values of (last time), and VoipSpeechConf is current time t before smooth
VoIP voice confidence levels, α are weight coefficients.
In a modification, if above-mentioned speech/noise grader 2025, if the voice confidence value of short-movie section
Low, then the short-term fragment can not be robustly categorized into VoIP voices, and can directly set VoipSpeechConf (t)=
V (t-1) is without making the real work of VoIP speech classifiers 2026.
Or in said circumstances, can set VoipSpeechConf (t)=0.5 (or other be not more than 0.5 values,
Such as 0.4-0.5), represent that (confidence level=1 represents its VoIP high confidence level to uncertain situation here, and confidence level=0 represents
Non- VoIP high confidence level).
Therefore, can also be included according to modification as shown in figure 37, audio content grader 200A:Speech/noise is classified
Device 2025, for identifying the voice content type of short-term fragment;And type smooth unit 203A, may be configured to, by
, will current short-term piece in the case that the confidence value of the voice content type of speech/noise grader classification is less than the 5th threshold value
Section it is smooth before VoIP voice confidence values be set as predetermined confidence value (such as 0.5 or other values, such as 0.4-0.5) or
The smoothed confidence value of upper one short-term fragment.In this case, VoIP speech classifiers 2026 can work and also may be used
Not work.Or confidence value can be set by VoIP speech classifiers 2026, this equates smoothly single by type
First 203A sets the solution of confidence value, and claim should be interpreted to cover both of these case.In addition,
Used here as sentence " confidence value for the voice content type classified by speech/noise grader is less than the 5th threshold value ", but protect
The scope not limited to this of shield, and it is equal to the feelings that short-term fragment is classified into other guide type than speech
Condition.
For the confidence value of VoIP noises, situation is similar and omits detailed description here.
In order to avoid rapid fluctuations, another solution is to confidence level determined by audio context grader 204A
Value is carried out smoothly, can use the Smooth scheme discussed in trifle 1.3.
In order to avoid rapid fluctuations, another solution is to postpone context type between VoIP and non-VoIP to turn
Change, can use and the scheme identical scheme described in trifle 1.6.As described in trifle 1.6, timer 916 can be in audio point
The outside of class device or in the inside of audio classifiers as one part.Therefore, as shown in figure 38, audio classifiers 200A may be used also
With including timer 916.And audio classifiers are configured to continuously export current context type the context until new
The length of the duration of type reaches the 6th threshold value (context type is an example of audio types).By reference to trifle
1.6, omit be described in detail herein.
Alternatively or additionally, another scheme as the conversion between delay VoIP and non-VoIP, as previously described
First and/or Second Threshold for VoIP/ non-VoIP classification can depend on the context type of upper one short-term fragment
It is and different.That is, when the context type of new short-term fragment is different from the context type of upper one short-term fragment, the
One threshold value and/or Second Threshold become much larger;When the context of the context type and upper one short-term fragment of new short-term fragment
When type is identical, first threshold and/or Second Threshold become smaller.In this way, context type tends to be maintained at and worked as
Front upper and lower literary type, so as to suppress the unexpected fluctuation of context type to a certain extent.
The combination of 7.4 embodiments and application scenarios
Similar with part 1, the modification of all the above embodiment and embodiment can be arbitrary group with its
Close to realize, and still any group with same or similar function being previously mentioned in different part/embodiments
Part can be used as same component or single component to realize.
For example, can group each other in trifle 7.1 to any two described in trifle 7.3 or more solutions
Close.And these combinations can also be with carrying out group in any embodiment described by part 1 to the 6th part and being implied
Close.Especially, the embodiment discussed in this part and its any combination can with audio processing equipment/method or with
Volume leveller controller/control method that 4th part is discussed is combined.
7.5 VoIP sorting techniques
It is similar with part 1, during the audio classifiers in describing embodiment above, it is clear that also disclose that one
A little processes and method.Hereinafter, the summary of these methods is provided in the case of the details for not repeating to have discussed.
In an embodiment shown in Figure 39, audio frequency classification method includes:Identify the short-term fragment of audio signal
Content type (operation 4004), is then based at least partially on identified content type to identify the context class of short-term fragment
Type (operation 4008).
In order to identify dynamically and rapidly the context type of audio signal, the audio frequency classification method pair in this part
It is particularly useful in identification VoIP context types and non-VoIP context types.It that case, first can be by short term
Segment classification identifies the operation quilt of context type into VoIP voice contents type or non-VoIP voice contents type
The confidence value based on VoIP voices and non-VoIP voices is configured to, by short-term segment classification to VoIP context types or non-
In VoIP context types.
Or can be first by short-term segment classification to VoIP noise contents type or non-VoIP noise contents type
In, and identify that the operation of context type is configured to the confidence value based on VoIP noises and non-VoIP noises, will be short-term
Segment classification is into VoIP context types or non-VoIP context types.
It can combine and consider voice and noise.In this case, identify that the operation of context type may be configured to
Based on the confidence value of VoIP voices, non-VoIP voices, VoIP noises and non-VoIP noises, by short-term segment classification to VoIP
Hereafter in type or non-VoIP context types.
In order to identify the context type of short-term fragment, machine learning model can be used, by the content class of short-term fragment
The confidence value of type and other features come out from short-term snippet extraction are all used as feature.
The operation of identification context type can also be realized based on heuristic rule.When pertaining only to VoIP voices and non-
During VoIP voices, heuristic rule is such:If the confidence value of VoIP voices is more than first threshold, by short-term fragment
It is categorized into VoIP context types;If the confidence value of VoIP voices is not more than Second Threshold, by short-term segment classification into
Non- VoIP context types, wherein Second Threshold are not more than first threshold;Otherwise, by short-term segment classification Cheng Shangyi short-term pieces
The context type of section.
It is similar for pertaining only to VoIP noises with the heuristic rule of the situation of non-VoIP noises.
When being both related to both voice and noise, heuristic rule is such:If the confidence value of VoIP voices is big
In first threshold or if the confidence value of VoIP noises is more than the 3rd threshold value, then by short-term segment classification into VoIP contexts
Type;If the confidence value of VoIP voices is not more than Second Threshold, wherein Second Threshold is not more than first threshold, or if
The confidence value of VoIP noises is not more than the 4th threshold value, wherein the 4th threshold value is not more than the 3rd threshold value, then by short-term segment classification
Into non-VoIP context types;Otherwise, by the context type of short-term segment classification Cheng Shangyi short-term fragments.
Here the Smooth scheme discussed in trifle 1.3 and trifle 1.8 can be used and omit detailed description.As
The modification of Smooth scheme described in trifle 1.3, before smooth operation 4106, this method can also include according to short-term piece
Section identification voice content type (operation 40040 in Figure 40), wherein, it is less than the 5th threshold in the confidence value of voice content type
In the case of value (operation 40041 in " N "), current short-term fragment it is smooth before VoIP voice confidence values be configured to
Predetermined confidence value or the smoothed confidence value (operation 40044 in Figure 40) of a upper short-term fragment.
Otherwise, if the operation of identification voice content type robustly judges that the short-term fragment is voice (operation 40041
In " Y "), then before smooth operation 4106, the short-term fragment is further split into VoIP voices or non-VoIP voices (behaviour
Make 40042).
In fact, even if without using Smooth scheme, this method can also identify voice content type and/or noise first
Content type, when short-term fragment is classified into voice or noise, further classification is realized so that short-term segment classification to be arrived
One of VoIP voices and non-VoIP voices, or it is categorized into one of VoIP noises and non-VoIP noises.Then it is identified up and down
The operation of literary type.
As being previously mentioned in trifle 1.6 and trifle 1.8, wherein the conversion plan discussed can be used as it is described here
Audio frequency classification method a part, and omit its details here.In short, this method can also be included to identifying context
The duration that the operation of type continuously exports same context type is measured, and wherein audio frequency classification method is configured to
Continue to output current context type and reach the 6th threshold value until the duration of new context type.
Similarly, different conversions from a context type to another context type can be directed to setting
The 6th different threshold values.Furthermore, it is possible to make the confidence value of the 6th threshold value and new context type negatively correlated.
, can be by pin as the improvement of the conversion plan in the audio frequency classification method particular for the non-VoIP classification of VoIP/
The first threshold of current short-term fragment to the one or more of the 4th threshold value is arranged to depend on a upper short-term fragment
Context type and it is different.
It is similar with the embodiment of audio processing equipment, on the one hand, the embodiment and embodiment of audio-frequency processing method
Any combinations of modification be all feasible;On the other hand, the modification of the embodiment and embodiment of audio-frequency processing method
Each aspect can also be single solution.In addition, any two or more solutions described in this trifle
Scheme can be combined with each other, and these combinations can also with it is described in the other parts in present disclosure and implied
Any embodiment be combined.Especially, described sound before audio frequency classification method as described herein can be used for
Frequency processing method, especially volume leveller control method.
As described by the beginning in " embodiment " of the application, presently filed embodiment may be embodied as firmly
Part or software, or both.Figure 41 is the block diagram for the example system 4200 for showing each side for realizing the application.
In Figure 41, CPU (CPU) 4201 according to the program being stored in read-only storage (ROM) 4202 or
Person is loaded into the program of random access memory (RAM) 4203 to carry out various processing from storage part 4208.In RAM 4203
In, the data required when CPU 4201 carries out various processing etc. are stored also according to needs.
CPU 4201, ROM 4202 and RAM 4203 are connected to each other by bus 4204.Input/output interface 4205 also connects
It is connected to bus 4204.
Input/output interface 4205 is connected to lower component:Importation 4206 including keyboard, mouse etc.;Including example
Such as output par, c 4207 of cathode-ray tube (CRT), the display of liquid crystal display (LCD) and loudspeaker etc.;Including hard disk
Deng storage part 4208;And the communications portion 4209 including the NIC such as LAN card, modem.It is logical
Letter part 4209 carries out communication process by network such as internet.
As needed, driver 4210 is also connected to input/output interface 4205.Such as disk of removable media 4211,
CD, magneto-optic disk, semiconductor memory etc. are installed to driver 4210 as needed, to cause the computer read from it
Program is installed in storage part 4208 as needed.
It is from network such as internet or for example removable from storage medium in the case where realizing said modules by software
The program of the composition software is installed except medium 4211.
Purpose of the terminology used here just for the sake of description embodiment is note that, and is not intended to limit
The application.As used herein, unless context is clearly specified in addition, " one " of singulative and "the" mean
Including plural form.It will further be understood that when using term " comprising " in this application, illustrate in the presence of signified feature, whole
Body, operation, step, element and/or component, but do not exclude the presence of or increase other one or more features, entirety,
Operation, step, element, component and/or their combination.
Corresponding construction, material, operation in following claims and all " device or operation plus function " elements etc.
With replacing, it is intended to including it is any be used for other elements for specifically noting in the claims it is combined perform the knot of the function
Structure, material or operation.Merely for diagram and description purpose and provide to the description of the present application, and not carry out thoroughly or limit
In disclosed application.For person of an ordinary skill in the technical field, without departing from scope and spirit of the present invention
In the case of, it is clear that can be so that many modifications may be made and modification.Selection and explanation to embodiment, it is to best explain the present invention
Principle and practical application, and make person of an ordinary skill in the technical field it will be appreciated that the application, can have be adapted to it is set
The various embodiments with various changes for the special-purpose thought.
According to the above, it can be seen that describe the embodiment (each being represented with " EE ") of following exemplary.
Equipment embodiment:
EE.1. a kind of balanced device controller, including:
Audio classifiers, for continuously identifying the audio types of audio signal;And
Adjustment unit, balanced device is adjusted in a continuous manner for the confidence value based on the audio types identified.
EE.2. the balanced device controller according to EE 1, wherein, the audio classifiers are configured to the audio
Modulation recognition is into multiple audio types with respective confidence value, and the adjustment unit is configured to by based on institute
The importance for stating multiple audio types is weighted to the confidence value of the multiple audio types to consider the multiple audio
At least some audio types in type.
EE.3. the balanced device controller according to EE 1, wherein, the audio classifiers are configured to the audio
Modulation recognition is into multiple audio types with respective confidence value, and the adjustment unit is configured to by based on institute
State influence of the confidence value to the multiple audio types be weighted it is at least some in the multiple audio types to consider
Audio types.
EE.4. the balanced device controller according to EE 3, wherein, the adjustment unit is configured to be based on the confidence
Angle value considers at least one leading audio types.
EE.5. the balanced device controller according to EE 1, in addition to parameter smoothing unit, for for by the adjustment
The parameter of the balanced device of unit adjustment, based on past parameter value come to working as time parameter value of determination by the adjustment unit
Carry out smooth.
EE.6. the balanced device controller according to EE 5, wherein, the parameter smoothing unit is configured to pass through calculating
Determined by the adjustment unit when the weighted sum of the secondary parameter value determined and last smoothed parameter value when secondary
Smoothed parameter value.
EE.7. the balanced device controller according to EE 6, wherein, the weight for calculating the weighted sum is based on described
The audio types of audio signal and adaptively change.
EE.8. the balanced device controller according to EE 6, wherein, the weight for calculating the weighted sum is based on difference
Adaptively change from an audio types to the conversion pair of another audio types.
EE.9. the balanced device controller according to EE 6, wherein, the weight for calculating the weighted sum is based on by institute
State the increase tendency of the value of the parameter of adjustment unit determination or reduce trend and adaptively change.
EE.10. the balanced device controller according to any one of EE 1 to EE 9, wherein,
The audio classifiers include audio content grader, for identifying the content type of the audio signal;And
The adjustment unit is configured to the confidence value positive correlation for making balanced rank and short-term music, and/or makes
The confidence value of the balanced rank and voice is negatively correlated.
EE.11. the balanced device controller according to any one of EE 1 to EE 9, wherein,
The audio classifiers include audio context grader, for identifying the context type of the audio signal;
And
The adjustment unit is configured to the confidence value positive correlation for making balanced rank and long-term music, and/or makes
The balanced rank and film class media and/or the confidence value of game are negatively correlated.
EE.12. the balanced device controller according to any one of EE 1 to EE 9, wherein,
The audio classifiers include audio content grader, for identifying the content type of the audio signal;And
The adjustment unit is configured to the confidence value positive for making balanced rank and the short-term music without leading source
Close, and/or make the confidence level of the balanced rank and the short-term music with leading source negatively correlated.
EE.13. the balanced device controller according to EE 10 or EE 11, wherein, the adjustment unit is configured to make
The confidence value positive correlation of the balanced rank and the short-term music without leading source, and/or make the balanced rank
It is negatively correlated with the confidence level of the short-term music with leading source.
EE.14. the balanced device controller according to EE 13, wherein, the adjustment unit is configured to when described short-term
The confidence value of music is considered when being more than threshold value without the/short-term music with leading source.
EE.15. the balanced device controller according to any one of EE 1 to EE 9, wherein,
The audio classifiers include audio content grader, for identifying the content type of the audio signal;And
The adjustment unit is configured to the confidence value positive correlation for making balanced rank and background sound, and/or makes
The confidence value of the balanced rank and noise is negatively correlated.
EE.16. the balanced device controller according to any one of EE 1 to EE 9, wherein, the adjustment unit by with
It is set to preset to the balanced rank of each audio types distribution and/or balanced mode and/or spectrum balance.
EE.17. the balanced device controller according to EE 16, wherein, the audio classifiers are classified including audio content
Device, for by the audio signal classification into short-term content type, the short-term content type include short-term music, voice,
It is at least one in background sound and noise.
EE.18. the balanced device controller according to EE 17, wherein, the short-term music includes at least one music class
Type.
EE.19. the balanced device controller according to EE 18, wherein, at least one music type includes being based on wind
The type of lattice, and/or the type based on musical instrument, and/or the rhythm based on music, speed, tone color and/or any other music category
The music type of property.
EE.20. the balanced device controller according to EE 16, wherein, the audio classifiers include audio context point
Class device, for, into long-term context type, the long-term context type to include film class matchmaker by the audio signal classification
It is at least one in body, long-term music, VoIP and game.
EE.21. a kind of audio reproducing system, including the balanced device controller according to any one of EE 1 to EE 20.
Method embodiment:
EE.1. a kind of balanced device control method, including:
The audio types of audio signal are identified in real time;And
Confidence value based on the audio types identified adjusts balanced device in a continuous manner.
EE.2. the balanced device control method according to EE 1, wherein, by the audio signal classification to accordingly putting
In multiple audio types of certainty value, and the operation of the adjustment is configured to by the weight based on the multiple audio types
The property wanted is weighted to the confidence value of the multiple audio types to consider at least some sounds in the multiple audio types
Frequency type.
EE.3. the balanced device control method according to EE 1, wherein, by the audio signal classification to accordingly putting
In multiple audio types of certainty value, and the operation of the adjustment be configured to by based on the confidence value to described more
The influence of individual audio types is weighted to consider at least some audio types in the multiple audio types.
EE.4. the balanced device control method according to EE 3, wherein, the operation of the adjustment is configured to based on described
Confidence value considers at least one leading audio types.
EE.5. the balanced device control method according to EE 1, in addition to, for the institute of the operation adjustment by the adjustment
The parameter of balanced device is stated, based on past parameter value come smooth to the parameter value progress for working as time determination by the operation of the adjustment.
EE.6. the balanced device control method according to EE 5, wherein, the smooth operation is configured to pass through calculating
By the operation of the adjustment when time weighted sum of the parameter value of determination and last smoothed parameter value, to determine to work as
Secondary smoothed parameter value.
EE.7. the balanced device control method according to EE 6, wherein, the weight for calculating the weighted sum is based on institute
State the audio types of audio signal and adaptively change.
EE.8. the balanced device control method according to EE 6, wherein, the weight for calculating the weighted sum is based on not
Same adaptively changes from an audio types to the conversion pair of another audio types.
EE.9. the balanced device control method according to EE 6, wherein, for calculate the weighted sum weight be based on by
The increase tendency for the parameter value that the operation of the adjustment determines reduces trend and adaptively changed.
EE.10. the balanced device control method according to any one of EE 1 to EE 9, wherein,
Identify that the operation of the audio types includes identifying the content type of the audio signal;And
The operation of the adjustment is configured to the confidence value positive correlation for making balanced rank and short-term music, and/or
Make the confidence value of the balanced rank and voice negatively correlated.
EE.11. the balanced device control method according to any one of EE 1 to EE 9, wherein,
Identify that the operation of the audio types includes identifying the context type of the audio signal;And
The operation of the adjustment is configured to the confidence value positive correlation for making balanced rank and long-term music, and/or
Make the balanced rank and film class media and/or the confidence value of game negatively correlated.
EE.12. the balanced device control method according to any one of EE 1 to EE 9, wherein,
Identify that the operation of the audio types includes identifying the content type of the audio signal;And
The operation of the adjustment is configured to make balanced rank with not having the confidence value of the short-term music in leading source just
Correlation, and/or make the confidence level of the balanced rank and the short-term music with leading source negatively correlated.
EE.13. the balanced device control method according to EE 10 or EE 11, wherein, the operation of the adjustment is configured
Into making the confidence value positive correlation of the balanced rank and the short-term music without leading source, and/or make the equilibrium
The confidence level of rank and the short-term music with leading source is negatively correlated.
EE.14. the balanced device control method according to EE 13, wherein, the operation of the adjustment is configured to when described
The confidence value of short-term music is considered when being more than threshold value without the/short-term music with leading source.
EE.15. the balanced device control method according to any one of EE 1 to EE 9, wherein,
Identify that the operation of the audio types includes identifying the content type of the audio signal;And
The operation of the adjustment is configured to the confidence value positive correlation for making balanced rank and background sound, and/or
Make the confidence value of the balanced rank and noise negatively correlated.
EE.16. the balanced device control method according to any one of EE 1 to EE 9, wherein, the operation of the adjustment
It is configured to preset to the balanced rank of each audio types distribution and/or balanced mode and/or spectrum balance.
EE.17. the balanced device control method according to EE 16, wherein, identify that the operation of the audio types includes inciting somebody to action
The audio signal classification into short-term content type, the short-term content type include short-term music, voice, background sound and
It is at least one in noise.
EE.18. the balanced device control method according to EE 17, wherein, the short-term music includes at least one music
Type.
EE.19. the balanced device control method according to EE 18, wherein, at least one music type includes being based on
The type of style, and/or the type based on musical instrument, and/or the rhythm based on music, speed, tone color and/or any other music
The music type of attribute.
EE.20. the balanced device control method according to EE 16, wherein, identify the operation of the audio types by described in
Audio content is categorized into long-term context type, and the long-term context type includes film class media, long-term music, VoIP
With it is at least one in game.
EE.21. a kind of record thereon has the computer-readable medium of computer program instructions, when being executed by a processor,
The instruction makes the processor be able to carry out balanced device control method, and the balanced device control method includes:
The audio types of audio signal are identified in real time;And
Confidence value based on the audio types identified adjusts balanced device in a continuous manner.
Claims (69)
1. a kind of balanced device controller, including:
Audio classifiers, for continuously identifying the audio types of audio signal;And
Adjustment unit, balanced device is adjusted in a continuous manner for the confidence value based on the audio types identified, wherein,
The audio classifiers are configured to by the audio signal classification into multiple audio types with respective confidence value, and
And the adjustment unit is configured to put the multiple audio types by the importance based on the multiple audio types
Certainty value is weighted to consider at least some audio types in the multiple audio types.
2. balanced device controller according to claim 1, in addition to parameter smoothing unit, for for single by the adjustment
The parameter of the balanced device of member adjustment, entered based on past parameter value to working as time parameter value of determination by the adjustment unit
Row is smooth.
3. balanced device controller according to claim 2, wherein, the parameter smoothing unit be configured to by calculate by
The weighted sum that the adjustment unit works as time parameter value of determination and the smoothed parameter value of last time is secondary smoothed to determine to work as
Parameter value.
4. balanced device controller according to claim 3, wherein, the weight for calculating the weighted sum is based on the sound
The audio types of frequency signal and adaptively change.
5. balanced device controller according to claim 3, wherein, for calculating the weight of the weighted sum based on different
Adaptively change from an audio types to the conversion pair of another audio types.
6. balanced device controller according to claim 3, wherein, the weight for calculating the weighted sum is based on by described
The increase tendency for the parameter value that adjustment unit determines reduces trend and adaptively changed.
7. the balanced device controller according to any one of claim 1 to claim 6, wherein,
The audio classifiers include audio content grader, for identifying the content type of the audio signal;And
The adjustment unit is configured to the confidence value positive correlation for making balanced rank and short-term music, and/or makes described
The confidence value of balanced rank and voice is negatively correlated.
8. the balanced device controller according to any one of claim 1 to claim 6, wherein,
The audio classifiers include audio context grader, for identifying the context type of the audio signal;And
The adjustment unit is configured to the confidence value positive correlation for making balanced rank and long-term music, and/or makes described
Balanced rank and film class media and/or the confidence value of game are negatively correlated.
9. the balanced device controller according to any one of claim 1 to claim 6, wherein,
The audio classifiers include audio content grader, for identifying the content type of the audio signal;And
The adjustment unit is configured to the confidence value positive correlation for making balanced rank and the short-term music without leading source, and
And/or person makes the confidence level of the balanced rank and the short-term music with leading source negatively correlated.
10. balanced device controller according to claim 7, wherein, the adjustment unit is configured to make the equalization stages
Not with the confidence value positive correlation of the short-term music without leading source, and/or make the balanced rank with leading
The confidence level of the short-term music in source is negatively correlated.
11. balanced device controller according to claim 8, wherein, the adjustment unit is configured to make the equalization stages
Not with the confidence value positive correlation of the short-term music without leading source, and/or make the balanced rank with leading
The confidence level of the short-term music in source is negatively correlated.
12. balanced device controller according to claim 10, wherein, the adjustment unit is configured to when short-term music
Confidence value is considered when being more than threshold value without the/short-term music with leading source.
13. balanced device controller according to claim 11, wherein, the adjustment unit is configured to when short-term music
Confidence value is considered when being more than threshold value without the/short-term music with leading source.
14. the balanced device controller according to any one of claim 1 to claim 6, wherein,
The audio classifiers include audio content grader, for identifying the content type of the audio signal;And
The adjustment unit is configured to the confidence value positive correlation for making balanced rank and background sound, and/or makes described
The confidence value of balanced rank and noise is negatively correlated.
15. the balanced device controller according to any one of claim 1 to claim 6, wherein, the adjustment unit quilt
It is configured to the balanced rank of each audio types distribution and/or balanced mode and/or spectrum balance is preset.
16. balanced device controller according to claim 15, wherein, the audio classifiers are classified including audio content
Device, for by the audio signal classification into short-term content type, the short-term content type include short-term music, voice,
It is at least one in background sound and noise.
17. balanced device controller according to claim 16, wherein, the short-term music includes at least one music class
Type.
18. balanced device controller according to claim 17, wherein, at least one music type includes being based on style
Type, and/or type based on musical instrument, and/or
The music type of rhythm, speed, tone color and/or any other music attribute based on music.
19. balanced device controller according to claim 15, wherein, the audio classifiers are classified including audio context
Device, for by the audio signal classification into long-term context type, the long-term context type include film class media,
It is at least one in long-term music, VoIP and game.
20. a kind of balanced device controller, including:
Audio classifiers, for continuously identifying the audio types of audio signal;And
Adjustment unit, balanced device is adjusted in a continuous manner for the confidence value based on the audio types identified, wherein,
The audio classifiers are configured to by the audio signal classification into multiple audio types with respective confidence value, and
And the adjustment unit be configured to by based on influence of the confidence value to the multiple audio types be weighted come
Consider at least some audio types in the multiple audio types.
21. balanced device controller according to claim 20, wherein, the adjustment unit is configured to be based on the confidence
Angle value considers at least one leading audio types.
22. the balanced device controller according to any one of claim 20 to claim 21, wherein,
The audio classifiers include audio content grader, for identifying the content type of the audio signal;And
The adjustment unit is configured to the confidence value positive correlation for making balanced rank and short-term music, and/or makes described
The confidence value of balanced rank and voice is negatively correlated.
23. the balanced device controller according to any one of claim 20 to claim 21, wherein,
The audio classifiers include audio context grader, for identifying the context type of the audio signal;And
The adjustment unit is configured to the confidence value positive correlation for making balanced rank and long-term music, and/or makes described
Balanced rank and film class media and/or the confidence value of game are negatively correlated.
24. the balanced device controller according to any one of claim 20 to claim 21, wherein,
The audio classifiers include audio content grader, for identifying the content type of the audio signal;And
The adjustment unit is configured to the confidence value positive correlation for making balanced rank and the short-term music without leading source, and
And/or person makes the confidence level of the balanced rank and the short-term music with leading source negatively correlated.
25. balanced device controller according to claim 22, wherein, the adjustment unit is configured to make the equalization stages
Not with the confidence value positive correlation of the short-term music without leading source, and/or make the balanced rank with leading
The confidence level of the short-term music in source is negatively correlated.
26. balanced device controller according to claim 23, wherein, the adjustment unit is configured to make the equalization stages
Not with the confidence value positive correlation of the short-term music without leading source, and/or make the balanced rank with leading
The confidence level of the short-term music in source is negatively correlated.
27. balanced device controller according to claim 25, wherein, the adjustment unit is configured to when short-term music
Confidence value is considered when being more than threshold value without the/short-term music with leading source.
28. balanced device controller according to claim 26, wherein, the adjustment unit is configured to when short-term music
Confidence value is considered when being more than threshold value without the/short-term music with leading source.
29. the balanced device controller according to any one of claim 20 to claim 21, wherein,
The audio classifiers include audio content grader, for identifying the content type of the audio signal;And
The adjustment unit is configured to the confidence value positive correlation for making balanced rank and background sound, and/or makes described
The confidence value of balanced rank and noise is negatively correlated.
30. the balanced device controller according to any one of claim 20 to claim 21, wherein, the adjustment unit
It is configured to preset to the balanced rank of each audio types distribution and/or balanced mode and/or spectrum balance.
31. balanced device controller according to claim 30, wherein, the audio classifiers are classified including audio content
Device, for by the audio signal classification into short-term content type, the short-term content type include short-term music, voice,
It is at least one in background sound and noise.
32. balanced device controller according to claim 31, wherein, the short-term music includes at least one music class
Type.
33. balanced device controller according to claim 32, wherein, at least one music type includes being based on style
Type, and/or type based on musical instrument, and/or
The music type of rhythm, speed, tone color and/or any other music attribute based on music.
34. balanced device controller according to claim 30, wherein, the audio classifiers are classified including audio context
Device, for by the audio signal classification into long-term context type, the long-term context type include film class media,
It is at least one in long-term music, VoIP and game.
35. a kind of audio reproducing system, including the balanced device control according to any one of claim 1 to claim 34
Device.
36. a kind of balanced device control method, including:
The audio types of audio signal are identified in real time;And
Confidence value based on the audio types identified adjusts balanced device in a continuous manner, wherein, the audio believed
Number it is categorized into multiple audio types with respective confidence value, and the operation of the adjustment is configured to by based on institute
The importance for stating multiple audio types is weighted to the confidence value of the multiple audio types to consider the multiple audio
At least some audio types in type.
37. balanced device control method according to claim 36, in addition to, for the institute of the operation adjustment by the adjustment
The parameter of balanced device is stated, based on past parameter value come smooth to the parameter value progress for working as time determination by the operation of the adjustment.
38. the balanced device control method according to claim 37, wherein, the smooth operation is configured to pass through calculating
By the operation of the adjustment when the weighted sum of the secondary parameter value determined and last smoothed parameter value, to determine when secondary warp
Smooth parameter value.
39. the balanced device control method according to claim 38, wherein, the weight for calculating the weighted sum is based on institute
State the audio types of audio signal and adaptively change.
40. the balanced device control method according to claim 38, wherein, the weight for calculating the weighted sum is based on not
Same adaptively changes from an audio types to the conversion pair of another audio types.
41. the balanced device control method according to claim 38, wherein, the weight for calculating the weighted sum be based on by
The increase tendency for the parameter value that the operation of the adjustment determines reduces trend and adaptively changed.
42. the balanced device control method according to any one of claim 36 to claim 41, wherein,
The operation of identification audio types includes identifying the content type of the audio signal;And
The operation of the adjustment is configured to the confidence value positive correlation for making balanced rank and short-term music, and/or makes institute
The confidence value for stating balanced rank and voice is negatively correlated.
43. the balanced device control method according to any one of claim 36 to claim 41, wherein,
The operation of identification audio types includes identifying the context type of the audio signal;And
The operation of the adjustment is configured to the confidence value positive correlation for making balanced rank and long-term music, and/or makes institute
State balanced rank and film class media and/or the confidence value of game are negatively correlated.
44. the balanced device control method according to any one of claim 36 to claim 41, wherein,
The operation of identification audio types includes identifying the content type of the audio signal;And
The operation of the adjustment is configured to the confidence value positive correlation for making balanced rank and the short-term music without leading source,
And/or make the confidence level of the balanced rank and the short-term music with leading source negatively correlated.
45. balanced device control method according to claim 42, wherein, the operation of the adjustment be configured to make it is described
The confidence value positive correlation of the rank that weighs and the short-term music without leading source, and/or make the balanced rank with having
The confidence level of the short-term music in leading source is negatively correlated.
46. balanced device control method according to claim 43, wherein, the operation of the adjustment be configured to make it is described
The confidence value positive correlation of the rank that weighs and the short-term music without leading source, and/or make the balanced rank with having
The confidence level of the short-term music in leading source is negatively correlated.
47. balanced device control method according to claim 45, wherein, the operation of the adjustment is configured to work as short-term sound
Happy confidence value is considered when being more than threshold value without the/short-term music with leading source.
48. balanced device control method according to claim 46, wherein, the operation of the adjustment is configured to work as short-term sound
Happy confidence value is considered when being more than threshold value without the/short-term music with leading source.
49. the balanced device control method according to any one of claim 36 to claim 41, wherein,
The operation of identification audio types includes identifying the content type of the audio signal;And
The operation of the adjustment is configured to the confidence value positive correlation for making balanced rank and background sound, and/or makes institute
The confidence value for stating balanced rank and noise is negatively correlated.
50. the balanced device control method according to any one of claim 36 to claim 41, wherein, the adjustment
Operation is configured to preset to the balanced rank of each audio types distribution and/or balanced mode and/or spectrum balance.
51. balanced device control method according to claim 50, wherein, identifying the operation of audio types is included the sound
Into short-term content type, the short-term content type is included in short-term music, voice, background sound and noise frequency Modulation recognition
It is at least one.
52. balanced device control method according to claim 51, wherein, the short-term music includes at least one music class
Type.
53. balanced device control method according to claim 52, wherein, at least one music type includes being based on wind
The type of lattice, and/or type based on musical instrument, and/or
The music type of rhythm, speed, tone color and/or any other music attribute based on music.
54. balanced device control method according to claim 50, wherein, identify that the audio is believed in the operation of audio types
Number it is categorized into long-term context type, the long-term context type includes film class media, long-term music, VoIP and game
In it is at least one.
55. a kind of balanced device control method, including:
The audio types of audio signal are identified in real time;And
Confidence value based on the audio types identified adjusts balanced device in a continuous manner, wherein, the audio believed
Number it is categorized into multiple audio types with respective confidence value, and the operation of the adjustment is configured to by based on institute
State influence of the confidence value to the multiple audio types be weighted it is at least some in the multiple audio types to consider
Audio types.
56. balanced device control method according to claim 55, wherein, the operation of the adjustment is configured to based on described
Confidence value considers at least one leading audio types.
57. the balanced device control method according to any one of claim 55 to claim 56, wherein,
The operation of identification audio types includes identifying the content type of the audio signal;And
The operation of the adjustment is configured to the confidence value positive correlation for making balanced rank and short-term music, and/or makes institute
The confidence value for stating balanced rank and voice is negatively correlated.
58. the balanced device control method according to any one of claim 55 to claim 56, wherein,
The operation of identification audio types includes identifying the context type of the audio signal;And
The operation of the adjustment is configured to the confidence value positive correlation for making balanced rank and long-term music, and/or makes institute
State balanced rank and film class media and/or the confidence value of game are negatively correlated.
59. the balanced device control method according to any one of claim 55 to claim 56, wherein,
The operation of identification audio types includes identifying the content type of the audio signal;And
The operation of the adjustment is configured to the confidence value positive correlation for making balanced rank and the short-term music without leading source,
And/or make the confidence level of the balanced rank and the short-term music with leading source negatively correlated.
60. balanced device control method according to claim 57, wherein, the operation of the adjustment be configured to make it is described
The confidence value positive correlation of the rank that weighs and the short-term music without leading source, and/or make the balanced rank with having
The confidence level of the short-term music in leading source is negatively correlated.
61. balanced device control method according to claim 58, wherein, the operation of the adjustment be configured to make it is described
The confidence value positive correlation of the rank that weighs and the short-term music without leading source, and/or make the balanced rank with having
The confidence level of the short-term music in leading source is negatively correlated.
62. balanced device control method according to claim 60, wherein, the operation of the adjustment is configured to work as short-term sound
Happy confidence value is considered when being more than threshold value without the/short-term music with leading source.
63. balanced device control method according to claim 61, wherein, the operation of the adjustment is configured to work as short-term sound
Happy confidence value is considered when being more than threshold value without the/short-term music with leading source.
64. the balanced device control method according to any one of claim 55 to claim 56, wherein,
The operation of identification audio types includes identifying the content type of the audio signal;And
The operation of the adjustment is configured to the confidence value positive correlation for making balanced rank and background sound, and/or makes institute
The confidence value for stating balanced rank and noise is negatively correlated.
65. the balanced device control method according to any one of claim 55 to claim 56, wherein, the adjustment
Operation is configured to preset to the balanced rank of each audio types distribution and/or balanced mode and/or spectrum balance.
66. balanced device control method according to claim 65, wherein, identifying the operation of audio types is included the sound
Into short-term content type, the short-term content type is included in short-term music, voice, background sound and noise frequency Modulation recognition
It is at least one.
67. balanced device control method according to claim 66, wherein, the short-term music includes at least one music class
Type.
68. balanced device control method according to claim 67, wherein, at least one music type includes being based on wind
The type of lattice, and/or type based on musical instrument, and/or
The music type of rhythm, speed, tone color and/or any other music attribute based on music.
69. balanced device control method according to claim 65, wherein, identify that the audio is believed in the operation of audio types
Number it is categorized into long-term context type, the long-term context type includes film class media, long-term music, VoIP and game
In it is at least one.
Priority Applications (10)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310100401.XA CN104079247B (en) | 2013-03-26 | 2013-03-26 | Balanced device controller and control method and audio reproducing system |
ES14724216.8T ES2630398T3 (en) | 2013-03-26 | 2014-03-17 | Control device and equalizer control method |
JP2016505490A JP6053984B2 (en) | 2013-03-26 | 2014-03-17 | Equalizer controller and control method |
EP14724216.8A EP2979359B1 (en) | 2013-03-26 | 2014-03-17 | Equalizer controller and controlling method |
PCT/US2014/030663 WO2014160548A1 (en) | 2013-03-26 | 2014-03-17 | Equalizer controller and controlling method |
US14/780,485 US9621124B2 (en) | 2013-03-26 | 2014-03-17 | Equalizer controller and controlling method |
EP17164545.0A EP3232567B1 (en) | 2013-03-26 | 2014-03-17 | Equalizer controller and controlling method |
JP2016230947A JP6325640B2 (en) | 2013-03-26 | 2016-11-29 | Equalizer controller and control method |
US15/433,486 US10044337B2 (en) | 2013-03-26 | 2017-02-15 | Equalizer controller and controlling method |
HK18103383.9A HK1244110A1 (en) | 2013-03-26 | 2018-03-12 | Equalizer controller and controlling method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310100401.XA CN104079247B (en) | 2013-03-26 | 2013-03-26 | Balanced device controller and control method and audio reproducing system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104079247A CN104079247A (en) | 2014-10-01 |
CN104079247B true CN104079247B (en) | 2018-02-09 |
Family
ID=51600326
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310100401.XA Active CN104079247B (en) | 2013-03-26 | 2013-03-26 | Balanced device controller and control method and audio reproducing system |
Country Status (7)
Country | Link |
---|---|
US (2) | US9621124B2 (en) |
EP (2) | EP2979359B1 (en) |
JP (2) | JP6053984B2 (en) |
CN (1) | CN104079247B (en) |
ES (1) | ES2630398T3 (en) |
HK (1) | HK1244110A1 (en) |
WO (1) | WO2014160548A1 (en) |
Families Citing this family (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9380383B2 (en) | 2013-09-06 | 2016-06-28 | Gracenote, Inc. | Modifying playback of content using pre-processed profile information |
US9716958B2 (en) * | 2013-10-09 | 2017-07-25 | Voyetra Turtle Beach, Inc. | Method and system for surround sound processing in a headset |
US9792952B1 (en) * | 2014-10-31 | 2017-10-17 | Kill the Cann, LLC | Automated television program editing |
US9729118B2 (en) | 2015-07-24 | 2017-08-08 | Sonos, Inc. | Loudness matching |
US9949057B2 (en) | 2015-09-08 | 2018-04-17 | Apple Inc. | Stereo and filter control for multi-speaker device |
CN108141684B (en) * | 2015-10-09 | 2021-09-24 | 索尼公司 | Sound output apparatus, sound generation method, and recording medium |
CN105263086A (en) * | 2015-10-27 | 2016-01-20 | 小米科技有限责任公司 | Adjustment method of equalizer, device and intelligent speaker |
WO2017079334A1 (en) | 2015-11-03 | 2017-05-11 | Dolby Laboratories Licensing Corporation | Content-adaptive surround sound virtualization |
EP3465681A1 (en) * | 2016-05-26 | 2019-04-10 | Telefonaktiebolaget LM Ericsson (PUBL) | Method and apparatus for voice or sound activity detection for spatial audio |
CN106601268B (en) * | 2016-12-26 | 2020-11-27 | 腾讯音乐娱乐(深圳)有限公司 | Multimedia data processing method and device |
US9860644B1 (en) | 2017-04-05 | 2018-01-02 | Sonos, Inc. | Limiter for bass enhancement |
US20210294845A1 (en) | 2017-04-28 | 2021-09-23 | Hewlett-Packard Development Company, L.P. | Audio classifcation with machine learning model using audio duration |
US11386913B2 (en) | 2017-08-01 | 2022-07-12 | Dolby Laboratories Licensing Corporation | Audio object classification based on location metadata |
CN107526568A (en) | 2017-08-18 | 2017-12-29 | 广东欧珀移动通信有限公司 | volume adjusting method, device, terminal device and storage medium |
JP6812381B2 (en) * | 2018-02-08 | 2021-01-13 | 日本電信電話株式会社 | Voice recognition accuracy deterioration factor estimation device, voice recognition accuracy deterioration factor estimation method, program |
EP3785453B1 (en) * | 2018-04-27 | 2022-11-16 | Dolby Laboratories Licensing Corporation | Blind detection of binauralized stereo content |
US11929091B2 (en) | 2018-04-27 | 2024-03-12 | Dolby Laboratories Licensing Corporation | Blind detection of binauralized stereo content |
CN110610702B (en) * | 2018-06-15 | 2022-06-24 | 惠州迪芬尼声学科技股份有限公司 | Method for sound control equalizer by natural language and computer readable storage medium |
US10991379B2 (en) * | 2018-06-22 | 2021-04-27 | Babblelabs Llc | Data driven audio enhancement |
US11430463B2 (en) * | 2018-07-12 | 2022-08-30 | Dolby Laboratories Licensing Corporation | Dynamic EQ |
CN109273010B (en) * | 2018-08-21 | 2020-08-11 | 深圳市声扬科技有限公司 | Voice data processing method and device, computer equipment and storage medium |
US10855241B2 (en) | 2018-11-29 | 2020-12-01 | Sony Corporation | Adjusting an equalizer based on audio characteristics |
WO2020247892A1 (en) * | 2019-06-07 | 2020-12-10 | Dts, Inc. | System and method for adaptive sound equalization in personal hearing devices |
JP7258228B2 (en) * | 2019-08-27 | 2023-04-14 | ドルビー ラボラトリーズ ライセンシング コーポレイション | Enhancing Dialogs with Adaptive Smoothing |
US10798484B1 (en) | 2019-11-26 | 2020-10-06 | Gracenote, Inc. | Methods and apparatus for audio equalization based on variant selection |
US11481628B2 (en) | 2019-11-26 | 2022-10-25 | Gracenote, Inc. | Methods and apparatus for audio equalization based on variant selection |
KR20210086086A (en) * | 2019-12-31 | 2021-07-08 | 삼성전자주식회사 | Equalizer for equalization of music signals and methods for the same |
EP3889958A1 (en) * | 2020-03-31 | 2021-10-06 | Moodagent A/S | Dynamic audio playback equalization using semantic features |
CN113763972A (en) * | 2020-06-02 | 2021-12-07 | 中国移动通信集团终端有限公司 | Method, device and equipment for adjusting audio parameters and computer storage medium |
JP2023539121A (en) * | 2020-08-18 | 2023-09-13 | ドルビー ラボラトリーズ ライセンシング コーポレイション | Audio content identification |
CN112967732B (en) * | 2021-02-25 | 2023-10-03 | 北京百度网讯科技有限公司 | Method, apparatus, device and computer readable storage medium for adjusting equalizer |
CN115691543A (en) * | 2021-07-28 | 2023-02-03 | 哈曼国际工业有限公司 | Adaptive equalization method and system for acoustic system |
TWI781714B (en) * | 2021-08-05 | 2022-10-21 | 晶豪科技股份有限公司 | Method for equalizing input signal to generate equalizer output signal and parametric equalizer |
CN115334349B (en) * | 2022-07-15 | 2024-01-02 | 北京达佳互联信息技术有限公司 | Audio processing method, device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101569092A (en) * | 2006-12-21 | 2009-10-28 | 皇家飞利浦电子股份有限公司 | System for processing audio data |
CN102195581A (en) * | 2010-03-18 | 2011-09-21 | 承景科技股份有限公司 | Method for adjusting volume of digital audio signal |
GB2491002A (en) * | 2011-05-17 | 2012-11-21 | Fender Musical Instr Corp | Consumer audio system and method using adaptive intelligence to distinguish information content of audio signals and to control signal processing function |
CN102982804A (en) * | 2011-09-02 | 2013-03-20 | 杜比实验室特许公司 | Method and system of voice frequency classification |
Family Cites Families (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07106883A (en) | 1993-10-01 | 1995-04-21 | Matsushita Electric Ind Co Ltd | Digital sound volume adjustment device and digital mixing device |
US5666430A (en) * | 1995-01-09 | 1997-09-09 | Matsushita Electric Corporation Of America | Method and apparatus for leveling audio output |
JPH08250944A (en) | 1995-03-13 | 1996-09-27 | Nippon Telegr & Teleph Corp <Ntt> | Automatic sound volume control method and device executing this method |
JPH1117472A (en) | 1997-06-20 | 1999-01-22 | Fujitsu General Ltd | Sound device |
US20050091066A1 (en) * | 2003-10-28 | 2005-04-28 | Manoj Singhal | Classification of speech and music using zero crossing |
JP4013906B2 (en) | 2004-02-16 | 2007-11-28 | ヤマハ株式会社 | Volume control device |
GB2413745A (en) * | 2004-04-30 | 2005-11-02 | Axeon Ltd | Classifying audio content by musical style/genre and generating an identification signal accordingly to adjust parameters of an audio system |
US20050251273A1 (en) * | 2004-05-05 | 2005-11-10 | Motorola, Inc. | Dynamic audio control circuit and method |
US8199933B2 (en) | 2004-10-26 | 2012-06-12 | Dolby Laboratories Licensing Corporation | Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal |
WO2006056910A1 (en) | 2004-11-23 | 2006-06-01 | Koninklijke Philips Electronics N.V. | A device and a method to process audio data, a computer program element and computer-readable medium |
JP2006171458A (en) | 2004-12-16 | 2006-06-29 | Sharp Corp | Tone quality controller, content display device, program, and recording medium |
CN101099196A (en) * | 2005-01-04 | 2008-01-02 | 皇家飞利浦电子股份有限公司 | An apparatus for and a method of processing reproducible data |
US7464029B2 (en) * | 2005-07-22 | 2008-12-09 | Qualcomm Incorporated | Robust separation of speech signals in a noisy environment |
JP2007208407A (en) | 2006-01-31 | 2007-08-16 | Toshiba Corp | Information processing apparatus and sound control method thereof |
CU23572A1 (en) | 2006-03-31 | 2010-09-30 | Ct Ingenieria Genetica Biotech | PHARMACEUTICAL COMPOSITION INCLUDING PROTEIN NMB0938 |
CN102684628B (en) | 2006-04-27 | 2014-11-26 | 杜比实验室特许公司 | Method for modifying parameters of audio dynamic processor and device executing the method |
KR100832360B1 (en) * | 2006-09-25 | 2008-05-26 | 삼성전자주식회사 | Method for controlling equalizer in digital media player and system thereof |
BRPI0807703B1 (en) | 2007-02-26 | 2020-09-24 | Dolby Laboratories Licensing Corporation | METHOD FOR IMPROVING SPEECH IN ENTERTAINMENT AUDIO AND COMPUTER-READABLE NON-TRANSITIONAL MEDIA |
ES2377719T3 (en) | 2007-07-13 | 2012-03-30 | Dolby Laboratories Licensing Corporation | Audio processing using an analysis of auditory scenes and spectral obliqueness. |
JP2010016483A (en) * | 2008-07-01 | 2010-01-21 | Victor Co Of Japan Ltd | Sound signal correction apparatus |
JP5321263B2 (en) * | 2009-06-12 | 2013-10-23 | ソニー株式会社 | Signal processing apparatus and signal processing method |
US20100319015A1 (en) * | 2009-06-15 | 2010-12-16 | Richard Anthony Remington | Method and system for removing advertising content from television or radio content |
JP5695896B2 (en) * | 2010-12-22 | 2015-04-08 | 株式会社東芝 | SOUND QUALITY CONTROL DEVICE, SOUND QUALITY CONTROL METHOD, AND SOUND QUALITY CONTROL PROGRAM |
JP5426608B2 (en) | 2011-05-31 | 2014-02-26 | 東京瓦斯株式会社 | Abnormality detection apparatus and abnormality detection method |
US9401153B2 (en) * | 2012-10-15 | 2016-07-26 | Digimarc Corporation | Multi-mode audio recognition and auxiliary data encoding and decoding |
US9305559B2 (en) * | 2012-10-15 | 2016-04-05 | Digimarc Corporation | Audio watermark encoding with reversing polarity and pairwise embedding |
EP2936485B1 (en) * | 2012-12-21 | 2017-01-04 | Dolby Laboratories Licensing Corporation | Object clustering for rendering object-based audio content based on perceptual criteria |
US9344815B2 (en) * | 2013-02-11 | 2016-05-17 | Symphonic Audio Technologies Corp. | Method for augmenting hearing |
-
2013
- 2013-03-26 CN CN201310100401.XA patent/CN104079247B/en active Active
-
2014
- 2014-03-17 US US14/780,485 patent/US9621124B2/en active Active
- 2014-03-17 EP EP14724216.8A patent/EP2979359B1/en active Active
- 2014-03-17 WO PCT/US2014/030663 patent/WO2014160548A1/en active Application Filing
- 2014-03-17 ES ES14724216.8T patent/ES2630398T3/en active Active
- 2014-03-17 EP EP17164545.0A patent/EP3232567B1/en active Active
- 2014-03-17 JP JP2016505490A patent/JP6053984B2/en active Active
-
2016
- 2016-11-29 JP JP2016230947A patent/JP6325640B2/en active Active
-
2017
- 2017-02-15 US US15/433,486 patent/US10044337B2/en active Active
-
2018
- 2018-03-12 HK HK18103383.9A patent/HK1244110A1/en unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101569092A (en) * | 2006-12-21 | 2009-10-28 | 皇家飞利浦电子股份有限公司 | System for processing audio data |
CN102195581A (en) * | 2010-03-18 | 2011-09-21 | 承景科技股份有限公司 | Method for adjusting volume of digital audio signal |
GB2491002A (en) * | 2011-05-17 | 2012-11-21 | Fender Musical Instr Corp | Consumer audio system and method using adaptive intelligence to distinguish information content of audio signals and to control signal processing function |
CN102982804A (en) * | 2011-09-02 | 2013-03-20 | 杜比实验室特许公司 | Method and system of voice frequency classification |
Also Published As
Publication number | Publication date |
---|---|
US10044337B2 (en) | 2018-08-07 |
EP3232567A1 (en) | 2017-10-18 |
JP6053984B2 (en) | 2016-12-27 |
JP2016519493A (en) | 2016-06-30 |
JP6325640B2 (en) | 2018-05-16 |
CN104079247A (en) | 2014-10-01 |
EP2979359B1 (en) | 2017-05-03 |
US9621124B2 (en) | 2017-04-11 |
HK1244110A1 (en) | 2018-07-27 |
EP3232567B1 (en) | 2019-10-23 |
US20160056787A1 (en) | 2016-02-25 |
WO2014160548A1 (en) | 2014-10-02 |
EP2979359A1 (en) | 2016-02-03 |
ES2630398T3 (en) | 2017-08-21 |
JP2017073811A (en) | 2017-04-13 |
US20170230024A1 (en) | 2017-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104079247B (en) | Balanced device controller and control method and audio reproducing system | |
CN104080024B (en) | Volume leveller controller and control method and audio classifiers | |
US10803879B2 (en) | Apparatuses and methods for audio classifying and processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |