CN107993671A - Sound processing method, device and electronic equipment - Google Patents
Sound processing method, device and electronic equipment Download PDFInfo
- Publication number
- CN107993671A CN107993671A CN201711258117.XA CN201711258117A CN107993671A CN 107993671 A CN107993671 A CN 107993671A CN 201711258117 A CN201711258117 A CN 201711258117A CN 107993671 A CN107993671 A CN 107993671A
- Authority
- CN
- China
- Prior art keywords
- sound
- signal
- source
- initial
- adaptive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 68
- 238000001914 filtration Methods 0.000 claims abstract description 133
- 230000002708 enhancing effect Effects 0.000 claims abstract description 88
- 238000000034 method Methods 0.000 claims abstract description 38
- 230000008569 process Effects 0.000 claims abstract description 24
- 238000012545 processing Methods 0.000 claims description 58
- 230000001629 suppression Effects 0.000 claims description 32
- 230000004044 response Effects 0.000 claims description 31
- 238000004590 computer program Methods 0.000 claims description 16
- 230000003044 adaptive effect Effects 0.000 claims description 12
- 238000001514 detection method Methods 0.000 claims description 12
- 230000015572 biosynthetic process Effects 0.000 claims description 4
- 230000005236 sound signal Effects 0.000 claims description 3
- 230000004807 localization Effects 0.000 description 13
- 238000003860 storage Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 230000008901 benefit Effects 0.000 description 6
- 239000004568 cement Substances 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 101000893549 Homo sapiens Growth/differentiation factor 15 Proteins 0.000 description 1
- 101000692878 Homo sapiens Regulator of MON1-CCZ1 complex Proteins 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 102100026436 Regulator of MON1-CCZ1 complex Human genes 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000000712 assembly Effects 0.000 description 1
- 238000000429 assembly Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004040 coloring Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/61—Control of cameras or camera modules based on recognised objects
- H04N23/611—Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
Disclose a kind of sound processing method, device, electronic equipment and computer-readable recording medium.The described method includes:Multiple voice signal and camera the acquired image signals gathered according to microphone array determine that sound pre-processes direction;Based on sound pretreatment set direction pretreatment filter factor;Pretreatment filtering is carried out to the multiple voice signal using the pretreatment filter factor, to obtain initial signal source signal and initial noisc source signal;Determine adaptive-filtering coefficient;And adaptive-filtering is carried out to the initial signal source signal and the initial noisc source signal using the adaptive-filtering coefficient, to obtain enhancing signal source signal.Signal source signal can be strengthened, so as to improve the tonequality of sound.
Description
Technical field
This application involves acoustic processing field, and more specifically, it is related to a kind of sound processing method, acoustic processing dress
Put, electronic equipment and computer-readable recording medium.
Background technology
With the popularization of various electronic equipments, in order to improve the convenience of control electronics, more and more electronics are set
It is standby to provide the function being controlled by voice.For example, the electronic equipment of such as smart mobile phone or mobile unit is provided with
Voice control function, user can by voice come control electronics to perform corresponding function.Therefore, electronic equipment needs
The voice of user is identified, so that the true intention of user is known, to control corresponding functional unit to perform the function needed for user.
But either in the home environment using smart mobile phone, or under the vehicle environment using mobile unit, speech recognition is all
It is easier to be disturbed be subject to external environment, particularly outside noise has a great influence for speech recognition.
Therefore, there are the problem of tonequality is poor, discrimination is relatively low for existing sound processing method.
The content of the invention
In order to solve the above-mentioned technical problem, it is proposed that the application.Embodiments herein provides a kind of acoustic processing side
Method, sound processing apparatus, electronic equipment and computer-readable recording medium, it can improve the tonequality of sound so as to improve sound
Discrimination.
According to the one side of the application, there is provided a kind of sound processing method, including:Gathered according to microphone array
Multiple voice signal and camera acquired image signals determine sound pre-process direction;Based on the sound pretreatment side
Filter factor is pre-processed to selection;Pretreatment filtering is carried out to the multiple voice signal using the pretreatment filter factor,
To obtain initial signal source signal and initial noisc source signal;Determine adaptive-filtering coefficient;And use the adaptive filter
Wave system number to carry out adaptive-filtering to the initial signal source signal and the initial noisc source signal, to obtain enhancing signal
Source signal.
According to the another aspect of the application, there is provided a kind of sound processing apparatus, including:Sound pretreatment direction determines list
Member, multiple voice signal and camera acquired image signals for being gathered according to microphone array determine that sound is located in advance
Manage direction;Filter factor selecting unit is pre-processed, for based on sound pretreatment set direction pretreatment filter factor;In advance
Filter unit is handled, for carrying out pretreatment filtering to the multiple voice signal using the pretreatment filter factor, with
To initial signal source signal and initial noisc source signal;Adaptive-filtering factor determination unit, for determining adaptive-filtering system
Number;And adaptive-filtering unit, for using the adaptive-filtering coefficient come to the initial signal source signal and described
Initial noisc source signal carries out adaptive-filtering, to obtain enhancing signal source signal.
According to the another further aspect of the application, there is provided a kind of electronic equipment, including:Processor;And memory, in institute
State and computer program instructions are stored with memory, the computer program instructions cause described when being run by the processor
Processor performs sound processing method as described above.
According to the another aspect of the application, there is provided a kind of computer-readable recording medium, is stored thereon with computer journey
Sequence instructs, and the computer program instructions cause the processor to perform acoustic processing as described above when being run by processor
Method.
Compared with prior art, set using the sound processing method according to the embodiment of the present application, sound processing apparatus, electronics
Standby and computer-readable recording medium, what the multiple voice signals and camera that can be gathered according to microphone array were gathered
Picture signal determines that sound pre-processes direction;Based on sound pretreatment set direction pretreatment filter factor;Using described
Pretreatment filter factor carries out pretreatment filtering to the multiple voice signal, to obtain initial signal source signal and initial noisc
Source signal;Determine adaptive-filtering coefficient;And using the adaptive-filtering coefficient come to the initial signal source signal and
The initial noisc source signal carries out adaptive-filtering, to obtain enhancing signal source signal.Therefore, can be by pre- based on sound
Direction is handled to strengthen signal source signal, so as to improve the tonequality of sound, and then improves the precision of voice recognition.
Brief description of the drawings
The embodiment of the present application is described in more detail in conjunction with the accompanying drawings, the above-mentioned and other purposes of the application,
Feature and advantage will be apparent.Attached drawing is used for providing further understanding the embodiment of the present application, and forms explanation
A part for book, is used to explain the application together with the embodiment of the present application, does not form the limitation to the application.In the accompanying drawings,
Identical reference number typically represents same parts or step.
Fig. 1 illustrates the schematic diagram of the application scenarios of the sound processing method according to the embodiment of the present application;
Fig. 2 illustrates the flow chart of the sound processing method according to the embodiment of the present application;
Fig. 3 illustrates the flow chart that sound pretreatment direction is determined in the sound processing method according to the embodiment of the present application;
Fig. 4 illustrates definite sound enhancing direction and sound suppression in the sound processing method according to the embodiment of the present application
At least one flow chart in direction;
Fig. 5 illustrates the flow of the adaptive-filtering coefficient update in the sound processing method according to the embodiment of the present application
Figure;
The mouth that Fig. 6 illustrates the user in the sound processing method according to the embodiment of the present application moves the flow chart of detection;
Fig. 7 illustrates the block diagram of the sound processing apparatus according to the embodiment of the present application;
Fig. 8 illustrates the block diagram of the electronic equipment according to the embodiment of the present application.
Embodiment
In the following, example embodiment according to the application will be described in detail by referring to the drawings.Obviously, described embodiment is only
Only it is the part of the embodiment of the application, rather than the whole embodiments of the application, it should be appreciated that the application is from described herein
The limitation of example embodiment.
Application general introduction
As described above, no matter under home environment or vehicle environment, speech recognition is all easier to be subject to external environment
Interference, particularly outside noise have a great influence for speech recognition.For example, extraneous noise may be from directional
People, the sound producing body such as television set.
Existing technical solution is to lift speech quality by the means of speech enhan-cement, is known so as to further lift voice
Not rate.Wherein, speech enhan-cement can be divided into the speech enhan-cement of single pass speech enhan-cement and multichannel again.
Single pass voice enhancement algorithm is difficult to handle the interference with directive unstable state, for example television set is dry
Disturb.Also, single channel noise reduction is when noise is reduced, or has certain loss to tonequality, the loss of tonequality also then can band
Carry out the decline of phonetic recognization rate.
Multicenter voice enhancing technology generally uses Wave beam forming and blind source separate technology.Wave beam forming is required for shifting to an earlier date greatly
Know azimuth information, rely on Speech processing means merely, when interfering energy is larger, such as signal and the energy ratio of interference
Situation less than 0dB, the accuracy of auditory localization are very low.Blind source separate technology can also face the problem of channel selecting, 0dB's
Still it is difficult to selection under scene accurately, in the case of otherwise interfering with sound source number more than channel number, blind source separating also is difficult to more
Solve well.
In view of the above-mentioned problems, the basic conception of the application is to propose a kind of sound processing method, sound processing apparatus, electronics
Equipment and computer-readable recording medium, what its multiple voice signals and camera for being gathered by microphone array were gathered
Picture signal determines that sound pre-processes direction, and pre-processes direction based on sound to be filtered multiple voice signals to obtain
Strengthen signal source signal.Therefore, the signal source of concern can be directed to, strengthens the signal of the signal source, so as to improve the sound of sound
Matter, and and then improve voice recognition precision.
It will be understood by those skilled in the art that it can be applied to include such as according to the sound processing method of the embodiment of the present application
The acoustic processing of various sound producing bodies under the various environment of the upper home environment and vehicle environment, the embodiment of the present application is simultaneously
It is not intended to this progress any restrictions.
After the basic principle of the application is described, carry out the specific various non-limits for introducing the application below with reference to the accompanying drawings
Property embodiment processed.
Exemplary system
Fig. 1 illustrates the schematic diagram of the application scenarios of the sound processing method according to the embodiment of the present application.
As shown in Figure 1, the application scenarios for the sound processing method include processing equipment 110 and one or more sound
Source, for example, the first sound source 120 and the second sound source 130.
The processing equipment 110 can be for receiving sound and carrying out any kind of electronic equipment of voice recognition, its
Including sound collection device 111, such as microphone array, and including image acquisition device 112, such as camera.
For example, under home environment, which can be smart mobile phone, for receiving the phonetic entry of user
And perform corresponding function.Alternatively, the processing equipment 110 can be mobile unit.In addition, the processing equipment 110 is except can be with
Receive outside the voice signal (for example, user speech) from the signal source for wishing to pay close attention to, other types of sound can also be received
Sound signal, for example, the signal from the noise source for being not intended to concern.
The above sound sampler 111 can be used for the audio signal that collection includes the sound sources such as signal source or noise source, its
It can be microphone array.For example, the microphone array can be made of the microphone of certain amount, for the sky to sound field
Between the characteristic system that is sampled and handled, it can include the not exactly the same multiple microphone MIC1 in respective pickup area and arrive
MICn, wherein n are greater than the natural number equal to 2.For example, the relative position relation depending on each microphone, microphone array
It can be divided into:Linear array, its array element center are located on the same line;Planar array, its array element central distribution are flat at one
On face;And space array, its array element central distribution is in solid space.
Above-mentioned image acquisition device 112 can be used for the picture signal for catching monitoring scene, it can include one or more
A camera.For example, the view data that the camera is collected can be consecutive image frame sequence (that is, video flowing) or discrete
Picture frame sequence (that is, the image data set arrived in predetermined sampling time point sampling) etc..For example, the camera can be such as monocular
Camera, binocular camera, more mesh cameras etc., in addition, it can be used for catching gray-scale map, can also catch the coloured silk with colouring information
Chromatic graph.Certainly, the camera of any other type as known in the art and be likely to occur in the future can be applied to this Shen
Please, the application catches it the mode of image and is not particularly limited, as long as the gray scale or colour information of input picture can be obtained
., in one embodiment, can be before being analyzed and being handled, by coloured silk in order to reduce the calculation amount in subsequent operation
Chromatic graph carries out gray processing processing.Certainly, in another embodiment, can also be directly to colour in order to retain the information content of bigger
Figure is analyzed and handled.
First sound source 120 and the second sound source 130 can be any kind of sound sources, it can include sending wishing what is paid close attention to
The noise source for the noise component(s) that the signal source of signal component and hope eliminate.For example, the sound source can have life or without life
Sound source.For example, lived sound source can be including humans and animals etc.;And abiotic sound source can include robot, TV
Machine, sound equipment etc..
Understand spirit herein and principle it should be noted that above application scene is for only for ease of and show, this
The embodiment not limited to this of application.On the contrary, embodiments herein can be applied to any scene that may be applicable in.For example, should
Sound source can be more and lesser number.
Illustrative methods
Fig. 2 illustrates the flow chart of the sound processing method according to the embodiment of the present application.
As shown in Fig. 2, included according to the sound processing method of the embodiment of the present application:S210, is adopted according to microphone array
Multiple voice signal and camera acquired image signals of collection determine that sound pre-processes direction;S220, based on the sound
Pre-process set direction pretreatment filter factor;S230, using it is described pretreatment filter factor to the multiple voice signal into
Row pretreatment filtering, to obtain initial signal source signal and initial noisc source signal;S240, determines adaptive-filtering coefficient;With
And S250, the initial signal source signal and the initial noisc source signal are carried out using the adaptive-filtering coefficient
Adaptive-filtering, to obtain enhancing signal source signal.
In one example, in the sound processing method according to the embodiment of the present application, gathered according to microphone array
Multiple voice signal and camera acquired image signals determine sound pretreatment direction S210 can include:According to described
Multiple voice signals determine sound Sounnd source direction;Image Sounnd source direction is determined according to described image signal;And based on the sound
Sound Sounnd source direction and described image Sounnd source direction are at least one in sound enhancing direction and sound suppression direction to determine, as
The sound pre-processes direction.
In step S210, first is come from for example, can be gathered by sound collection device 111 (for example, microphone array)
The voice signal of 120 and second sound source 130 of sound source, and picture signal is gathered by image acquisition device 112 (for example, camera).
Here, sound collection device 111 is used for the sound for gathering current environment, for example, it includes the sound letter for for example wishing concern
Number (for example, user speech) and corresponding interference signal (for example, from television set, radio etc.).The sound collection device
111 including but not limited to simulation microphone array and corresponding analog-digital converter etc..It is it is then possible to more based on what is gathered
A voice signal and acquired image signal determine that sound pre-processes direction.
In the following, how will be believed with reference to figure 3 to describe in detail based on the multiple voice signals gathered and acquired image
Number definite sound pretreatment direction.
Fig. 3 illustrates the flow chart that sound pretreatment direction is determined in the sound processing method according to the embodiment of the present application.
As shown in figure 3, in the sound processing method according to the embodiment of the present application, gathered according to microphone array more
A voice signal and camera acquired image signal determines that sound pretreatment direction S210 can include:S310, gathers institute
State multiple voice signals;S320, carries out the multiple voice signal auditory localization to determine sound Sounnd source direction;S330, is adopted
Collect described image signal;S340, identifies described image signal;S350, carries out the orientation based on image and adjudicates to determine image sound
Source direction;And S360, determine that sound strengthens direction harmony based on the sound Sounnd source direction and described image Sounnd source direction
Sound suppresses at least one in direction, and direction is pre-processed as the sound.
In step S310, multiple voice signals are gathered by sound collection device 111, for example, it includes for example wish to pay close attention to
Voice signal (for example, user speech) and corresponding interference signal (for example, from television set, radio etc.).
In step S320, the multiple voice signals gathered to sound collection device 111 carry out auditory localization, which determines
Position is not limited only to the judgement of single sound source, the judgement of multi-acoustical can also be carried out, so as to obtain the orientation of multi-acoustical.
For example, more auditory localizations can use MUSIC (Multiple Signal Classifications:Multiple Signal
Classification) algorithm.The algorithm calculates covariance matrix to the multi channel signals of reception first, and to covariance matrix
Eigenvalues Decomposition is carried out, is then ranked up according to the size of characteristic value, finds the corresponding characteristic vector of corresponding noise, finally
The different directions steering vector Special composition predicted in advance by noise characteristic vector sum is composed, and the corresponding direction of polarographic maximum is to correspond to
Sounnd source direction.For example, by auditory localization, the M Sounnd source direction based on voice signal is finally exported, is denoted as d1 (i), 0<i
≤M。
In step S330, image is gathered by image acquisition device 112.
In step S340, the image collected is identified, so that potential sound source is exported, such as user, television set, radio
Deng.Assuming that potential sound source number is N, it is s (j) to correspond to different sound sources respectively, wherein 0<j≤N.
In step S350, the orientation judgement based on image is carried out, the potential sound source identified for decision diagram picture is corresponding
Orientation, such as, user (face) is at 90 degree, and television set is at 45 degree.Finally export N number of potential sound source angle based on picture signal
Degree, is denoted as d2 (j), and 0<j≤N.
Obviously, although above first describe step S310 and S320, after describe step S330-S350, in reality
In, step S330-S350 can also be first carried out and then perform step S310 and S320, alternatively, two groups of steps can also
It is parallel to perform.
In step S360, according to the orientation discriminative information of image and the court verdict of auditory localization, to determine that sound strengthens
Direction and sound suppress at least one in direction, and direction is pre-processed as the sound.That is, real according to the application
In the sound processing method for applying example, it can be strengthened only for the signal of signal source, can also be only for the signal of noise source
Suppressed, the signal of signal source can also be strengthened at the same time and the signal of noise source is suppressed.
In the following, will to how based on the sound Sounnd source direction and described image Sounnd source direction come determine sound strengthen direction
Suppress at least one to be specifically described as sound pretreatment direction S360 in direction with sound.
In one example, in the sound processing method according to the embodiment of the present application, based on the sound Sounnd source direction
Determine that sound enhancing direction and sound suppress at least one S360 in direction and can include with described image Sounnd source direction:Really
Determine whether described image Sounnd source direction includes at least one image signal source direction associated with signal source;And in response to
Determine that described image Sounnd source direction includes at least one image signal source direction associated with signal source, will be described at least one
Image signal source direction is determined as the sound enhancing direction.
In addition, step S360 may further include:By sound described in the sound Sounnd source direction strengthen direction with
Outer direction is determined as the sound and suppresses direction.
For example, it is assumed that application scenarios include three sound sources, determined in step s 320 according to the sound source based on voice signal
Position show that the first sound source is at 0 degree, and the second sound source is at 50 degree, and the 3rd sound source is at 100 degree, then sound Sounnd source direction is 0 degree, 50 degree
With 100 degree.In addition, it is assumed that drawn in step S350 according to the auditory localization based on picture signal, user is at 0 degree, television set
At 45 degree, radio is at 120 degree, then image Sounnd source direction is 0 degree, 45 degree and 120 degree.At this time, determining image Sounnd source direction is
It is no to include at least one image signal source direction associated with signal source (for example, user).Due to above-mentioned image Sounnd source direction
Comprising the image signal source direction associated with user, i.e., 0 degree, therefore, sound enhancing direction directly can be determined as by 0 degree.
It is then possible to by the direction beyond sound enhancing direction in sound Sounnd source direction, i.e., 50 degree and 100 degree are determined as sound suppression side
To.
In addition, in one is replaced example, in the sound processing method according to the embodiment of the present application, based on the sound
Sounnd source direction and described image Sounnd source direction can come at least one S360 for determining sound enhancing direction and sound suppresses in direction
With including:Determine whether described image Sounnd source direction includes at least one image signal source direction associated with signal source;With
And in response to determining that described image Sounnd source direction includes at least one image signal source direction associated with signal source, it is based on
The sound Sounnd source direction and at least one image signal source direction joint determine the sound enhancing direction and the sound
Sound suppresses at least one in direction.
In the following, this replacement example will be specifically described with reference to figure 4.
Fig. 4 illustrates definite sound enhancing direction and sound suppression in the sound processing method according to the embodiment of the present application
At least one flow chart in direction.
As shown in figure 4, in the replacement example, determined based on the sound Sounnd source direction and described image Sounnd source direction
At least one S360 that sound strengthens in direction and sound suppression direction can include:S361, determines described image Sounnd source direction
Whether the image signal source direction associated with signal source is included, if it is not, then into S362, if it is, into S363;
S362, determines that voiceless sound strengthens direction;Whether S363, determine described image Sounnd source direction comprising associated with signal source multiple
Image signal source direction, if it is, into S364, if it is not, then into S365;S364, by one image signal source
Direction is determined as sound enhancing direction;And S365, based on the sound Sounnd source direction and at least one image signal source
Direction joint determines that the sound enhancing direction and the sound suppress at least one in direction.
For example, in the case where signal source is user, it can be determined that be currently based on the potential sound source that picture signal detects
Whether face is included, if not including face, then it is assumed that strengthen direction currently without sound source;If comprising face, continue to judge that this is latent
Whether multiple faces are included in sound source, if only including a face, the corresponding angle of output current face is sound source enhancing side
To;If comprising multiple faces, according to the sound source angle based on the positioning of multiple voice signals and the sound based on picture signal positioning
Source angle exports final sound source enhancing direction.
In one example, in the sound processing method according to the embodiment of the present application, based on the sound Sounnd source direction
Determine that the sound enhancing direction and the sound suppress in direction extremely with least one image signal source direction joint
Few one can include:Determine the first otherness of the sound Sounnd source direction and at least one image signal source direction;
It is minimized in response to first otherness, determines candidate sound sound source corresponding with first otherness being minimized
Direction and candidate image Sounnd source direction;It is and true based on the candidate sound Sounnd source direction and the candidate image Sounnd source direction
The fixed sound enhancing direction.
For example, the sound enhancing side is determined based on the candidate sound Sounnd source direction and the candidate image Sounnd source direction
To can include:By the candidate sound Sounnd source direction, the candidate image Sounnd source direction or the candidate sound sound source side
Strengthen direction as the sound to the intermediate value with the candidate image Sounnd source direction.
Specifically, as described above, it is assumed that the auditory localization based on voice signal in step s 320, determines that there are M
A Sounnd source direction, is denoted as d1 (i), and 0<i≤M.And, it is assumed that the auditory localization based on picture signal in step S350, determines
Go out there are N number of Sounnd source direction, be denoted as d2 (j), 0<J≤N, including Nf signal source (for example, face), signal source direction note
For df (j), 0<j≤Nf≤N.Calculate the first otherness c of df and d11(i, j), such as c can be expressed as1(i, j)=| sin
(d1(i))-sin(df(j))|.As the first otherness c1When (i, j) is minimized, corresponding d1 (i) and df (j) most connect
Closely, then enhancing direction can select the two one or calculate the angle among the two angle or the angle based on certain weight coefficient
Degree is as enhancing direction.Certainly, it will be understood by those skilled in the art that the calculating of otherness is not limited only to the above method, also may be used
With according to both angle interval calculations, i.e. interval is nearer, represents that difference is smaller, both are more similar.
For example, it is assumed that application scenarios include three sound sources, determined in step s 320 according to the sound source based on voice signal
Position show that the first sound source is at 0 degree, and the second sound source is at 50 degree, and the 3rd sound source is at 100 degree, i.e., sound Sounnd source direction is 0 degree, 50 degree
With 100 degree, be denoted as d1 (1), d1 (2), d1 (3) respectively.In addition, it is assumed that according to the sound source based on picture signal in step S350
Positioning show that user 1 is at 0 degree, and television set is at 45 degree, and user 2 is at 90 degree, and radio is at 120 degree, i.e., image Sounnd source direction is 0
Spend, 45 degree, 90 degree and 120 degree, wherein user 1 and user 2 (signal source 1 and signal source 2) is two signal sources, and direction is remembered respectively
It is df (1) and df (2), it is necessary to carry out the judgement of the first otherness.By the judgement of the first otherness, it is recognised that c1(i=
1, j=1)=0, it is minimized, then directly can be determined as sound enhancing direction by 0 degree.
In addition, in one example, in the sound processing method according to the embodiment of the present application, based on the sound sound source
Direction and at least one image signal source direction joint determine that the sound enhancing direction and the sound suppress in direction
At least one may further include:Determine in the sound Sounnd source direction except the sound strengthen direction in addition to direction with
Second otherness in the direction at least one image signal source direction in addition to the sound strengthens direction;Determine described
Whether the second otherness is less than a predetermined similarity threshold;And it is less than in response to second otherness described predetermined similar
Threshold value is spent, determines the side in addition to the sound strengthens direction in the sound Sounnd source direction corresponding with second otherness
Suppress direction to for the sound.
Specifically, as described above, it is assumed that the auditory localization based on voice signal in step s 320, determines that there are M
A Sounnd source direction, is denoted as d1 (i), and 0<I≤M, wherein removing beyond sound enhancing direction, also there are NR1 Sounnd source direction, is denoted as
Dr1 (i), wherein 0<i≤NR1≤M.And, it is assumed that the auditory localization based on picture signal in step S350, determines exist
N number of Sounnd source direction, is denoted as d2 (j), and 0<J≤N, wherein removing beyond sound enhancing direction, also there are NR2 Sounnd source direction, note
For dr2 (j), wherein 0<j≤NR2≤N.Direction is suppressed according to dr1 (i) and dr2 (j) cascading judgements, it can be one to suppress direction
A direction can also be multiple directions.Calculate the second otherness c of dr1 (i) and dr2 (j)2(i, j), such as c can be expressed as2
(i, j)=| sin (dr1 (i))-sin (dr2 (j)) |.Work as c2When (i, j) is less than certain threshold value, corresponding dr1 (i) is suppression
Direction.Certainly, can also be according to both it will be understood by those skilled in the art that the calculating of otherness is not limited only to the above method
Angle interval calculation, i.e. interval it is nearer, represent difference it is smaller, both are more similar.
Equally by taking above example as an example, sound Sounnd source direction is 0 degree, 50 degree and 100 degree, and sound enhancing direction is 0 degree,
0 degree is removed, is left for 50 degree and 100 degree, to be denoted as dr1 (1), dr1 (2) respectively.In addition, image Sounnd source direction for 0 degree, 45 degree,
90 degree and 120 degree, 0 degree is removed, is left, for 45 degree, 90 degree and 120 degree, to be denoted as dr2 (1), dr2 (2), dr2 (3) respectively.Assuming that
Threshold value is 10 degree, then passes through the judgement of the second otherness, c2(i=1, j=1)=5, less than 10 degree of threshold value, then can for example incite somebody to action
50 degree are determined as sound and suppress direction.Certainly, the application not limited to this, for example, it is also possible to be determined as sound suppression side by 45 degree
It is determined as sound to or by 47.5 degree and suppresses direction etc..
Referring back to Fig. 2, in step S220, filter factor is pre-processed based on sound pretreatment set direction.
In one example, in the sound processing method according to the embodiment of the present application, based on the sound pretreatment side
It can include to selection pretreatment filter factor S220:It is pre-designed and is filtered corresponding to the enhancing filter factor of different angle and suppression
Wave system number;And select enhancing filter factor corresponding with sound enhancing direction respectively and suppress direction with the sound
Corresponding suppression filter factor.
Specifically, can previously according to the formation of microphone array system design different angle enhancing filter factor and
Suppress filter factor, filter factor design can be designed using least square method.Strengthen filter factor and suppress filter factor
After precalculated, it can be stored among corresponding storage medium, system initialization is read out, or is stored in advance in
Among program.It is then possible to direction and sound suppression direction are strengthened according to corresponding sound to select corresponding enhancing filtering system
Number and suppression filter factor.
Therefore, in one example, in the sound processing method according to the embodiment of the present application, it is pre-designed corresponding to not
Enhancing filter factor and suppression filter factor with angle include:Formation based on the microphone array, which is pre-designed, to be corresponded to
The enhancing filter factor and suppression filter factor of different angle.
In step S230, pretreatment filtering is carried out to the multiple voice signal using the pretreatment filter factor, with
Obtain initial signal source signal and initial noisc source signal.
In one example, in the sound processing method according to the embodiment of the present application, the pretreatment filtering system is used
It is several that pretreatment filtering is carried out to the multiple voice signal, can to obtain initial signal source signal and initial noisc source signal S230
With including:The multiple voice signal is strengthened using the enhancing filter factor and the suppression filter factor respectively
Filtering and suppression filtering, to obtain the initial signal source signal and the initial noisc source signal.
Specifically, enhancing filtering is carried out to multiple voice signals by strengthening filter factor, can be mainly included
Signal source signal and a small amount of initial signal source signal (for example, it is desirable to voice signal) for including noise source signal.Also, pass through suppression
Filter factor processed carries out suppression filtering to multiple voice signals, can obtain mainly including noise source signal and include signal on a small quantity
The initial noisc source signal (for example, suppressing noise signal) of source signal.
Next, in step S240, adaptive-filtering coefficient is determined.
In the sound processing method according to the embodiment of the present application, adaptive-filtering coefficient has initial value, can basis
Initial adaptive-filtering coefficient directly performs subsequent operation.
Alternatively,, can also be right first in the real time process of sound in order to ensure the accuracy of adaptive-filtering
The initial adaptive-filtering coefficient is updated.
That is, in one example, in the sound processing method according to the embodiment of the present application, determine adaptive filter
Wave system number includes:Obtain initial adaptive-filtering coefficient;With believed according to the initial signal source signal and the initial noisc source
Number the initial adaptive-filtering coefficient is updated.
Specifically, for example, can be updated according to equation 1 below to the initial adaptive-filtering coefficient:
W (n+1)=W (n)+μ e (n) X (n) formula 1
Wherein, W (n) is initial adaptive-filtering coefficient, and W (n+1) is the adaptive-filtering coefficient after renewal, and μ is constant,
E (n) is residual signals, and X (n) is the initial noisc source signal.
In addition, residual signals e (n) can be represented by equation 2 below:
E (n)=d (n)-XT(n) W (n) formula 2
Wherein, d (n) is the initial signal source signal.
And, it is preferable that, can be in no signal source signal or signal in order to preferably determine the characteristic of noise source signal
Source signal is weaker or noise source signal it is stronger in the case of update adaptive-filtering coefficient, so as to preferably match noise source signal
Characteristic.
Thus, for example, in the case where signal source is user, can be believed according to initial signal source signal, initial noisc source
Number and whether speaking of user (for example, it moves detection to realize by mouth) initial adaptive-filtering coefficient is updated.
In the following, it will illustrate the process for updating initial adaptive-filtering coefficient with reference to Fig. 5 so that signal source is user as an example.
Fig. 5 illustrates the flow of the adaptive-filtering coefficient update in the sound processing method according to the embodiment of the present application
Figure.
As shown in figure 5, in the sound processing method according to the embodiment of the present application, renewal adaptive-filtering coefficient includes:
S510, sound enhancing direction is determined whether there is based on the sound Sounnd source direction and described image Sounnd source direction, if it is,
S520 is entered step, otherwise enters step S550;S520, to strengthen direction, the mouth for carrying out user moves detection in response to there are sound;
S530, it is determined whether detect that user's mouth moves, if it is, entering step S540, otherwise enter step S550;S540, response
In detecting that user's mouth moves, determine whether the ratio of initial signal source signal and initial noisc source signal is less than predetermined signal-to-noise ratio threshold
Value, if it is, S550 is entered step, if it is not, then not performing renewal;S550, updates adaptive-filtering coefficient.
In step S510, sound enhancing is determined whether there is based on the sound Sounnd source direction and described image Sounnd source direction
Direction.The presence or absence in sound enhancing direction can for example be obtained by above step S360.
Therefore, in one example, in the sound processing method according to the application, according to the initial signal source signal
The initial adaptive-filtering coefficient is updated with the initial noisc source signal including:In response to based on the sound sound
Source direction and described image Sounnd source direction determine that voiceless sound strengthens direction, and the initial adaptive-filtering coefficient is updated.
In step S520, the mouth for carrying out user moves detection.
For example, it can detect whether the mouth of user moves according to camera acquired image signal.
The mouth that Fig. 6 illustrates the user in the sound processing method according to the embodiment of the present application moves the flow chart of detection.
Include as shown in fig. 6, the mouth of the user in the sound processing method of the embodiment of the present application moves detection:S610,
In response to determining that image sound source is user, multiple image information corresponding with the face orientation of the user is gathered;S620, is based on
The multiple image infomation detection is moved with the presence or absence of mouth.
In step S610, the mouth due to identifying user by single-frame images moves relatively difficult, it is possible to according to sound
Strengthen set direction record a period of time in video information or multiple image information, i.e., by real time or quasi real time in a manner of gather
Multiple image information corresponding with the face orientation of the user.
Then, in step S620, moved based on the multiple image infomation detection with the presence or absence of mouth.For example, by multiframe figure
Each two consecutive frame image as in is matched, if mouth position does not have notable difference, illustrate may there is no mouth to move,
Otherwise there may be mouth to move.If the mouth of user has movement, illustrate that user may speak.
Fig. 5 is returned to, in step S530, it is determined whether detect that user's mouth moves.Also, do not detecting what user's mouth moved
In the case of, the initial adaptive-filtering coefficient is updated.
Therefore, in one example, in the sound processing method according to the embodiment of the present application, according to the initial signal
Source signal and the initial noisc source signal the initial adaptive-filtering coefficient is updated including:It is in response to signal source
User, gathers multiple image information corresponding with the face orientation of the user;Based on the multiple image infomation detection whether
There are mouth to move;And to be moved in response to there is no mouth, the initial adaptive-filtering coefficient is updated.
It is true based on initial signal source signal and initial noisc source signal in response to detecting that user's mouth moves in step S540
It is fixed whether to update adaptive-filtering coefficient.As described above, in the feelings that initial signal source signal is small or initial noisc source signal is big
Adaptive-filtering coefficient is updated under condition.
That is, in one example, in the sound processing method according to the embodiment of the present application, based on described more
Frame image information detects whether to further comprise after moving there are mouth:To be moved in response to there are mouth, determine the initial signal source letter
Number whether it is less than a predetermined snr threshold with the ratio of the initial noisc source signal;It is and described initial in response to determining
The ratio of signal source signal and the initial noisc source signal is less than the predetermined snr threshold, to the initial adaptive filter
Wave system number is updated.
Finally, in step S550, adaptive-filtering coefficient is updated.The renewal process of the adaptive-filtering coefficient with above
It is identical with reference to described in formula 1 and 2, just repeat no more herein.
Finally, in step S250, using the adaptive-filtering coefficient come to the initial signal source signal and it is described just
Beginning noise source signal carries out adaptive-filtering, to obtain enhancing signal source signal.
That is, by step S250, initial noisc source signal can be based on to initial signal source signal into advancing one
The adaptive-filtering of step, so that a small amount of noise source signal included in initial signal source signal is removed, so as to obtain enhancing signal
Source signal.
In one example, in the sound processing method according to the embodiment of the present application, the adaptive-filtering system is used
Number includes to carry out adaptive-filtering S250 to the initial signal source signal and the initial noisc source signal:Will be described initial
Noise source signal carries out the initial signal source signal using the adaptive-filtering coefficient adaptive as signal is referred to
Filtering, to obtain the enhancing signal source signal.
In addition, it should be noted that in the sound processing method according to the embodiment of the present application, to adaptive-filtering system
In the case that number is updated, the renewal process of adaptive-filtering coefficient can using the adaptive-filtering coefficient come to institute
State initial signal source signal and the initial noisc source signal carry out adaptive-filtering with obtain before enhancing signal source signal into
OK, it can also after which carry out, or simultaneously carry out.
That is, in one example, it is described certainly in use in the sound processing method according to the embodiment of the present application
Adaptive filtering coefficient comes after carrying out adaptive-filtering to the initial signal source signal and the initial noisc source signal, into one
Step includes:The initial adaptive-filtering coefficient is carried out according to the initial signal source signal and the initial noisc source signal
Renewal.
Alternatively, in one example, in the sound processing method according to the embodiment of the present application, using described adaptive
While filter factor to carry out adaptive-filtering to the initial signal source signal and the initial noisc source signal, further
Including:The initial adaptive-filtering coefficient is carried out more according to the initial signal source signal and the initial noisc source signal
Newly.
It can be seen from the above that using the sound processing method according to the embodiment of the present application, can be gathered according to microphone array
Multiple voice signal and camera acquired image signals determine sound pre-process direction;Based on the sound pretreatment side
Filter factor is pre-processed to selection;Pretreatment filtering is carried out to the multiple voice signal using the pretreatment filter factor,
To obtain initial signal source signal and initial noisc source signal;Determine adaptive-filtering coefficient;And use the adaptive filter
Wave system number to carry out adaptive-filtering to the initial signal source signal and the initial noisc source signal, to obtain enhancing signal
Source signal.Therefore, signal source signal can be strengthened by pre-processing direction based on sound, it is remote, anti-so as to fulfill detecting distance
The advantages of making an uproar property is good, speech recognition accuracy lifting.
Specifically, in embodiments herein, voice signal can be obtained by microphone array, obtains multiple sound
Azimuth information, by camera obtain picture signal, carry out potential sound producing body (potential sound producing body, for example, television set, people,
Radio, sound equipment etc.) detection, and record image orientation information where face;According to image orientation information and sound azimuth information
Obtain and it is expected speech enhan-cement direction and noise suppressed direction, it is designed according to enhancing direction and suppression set direction one
Strengthen filter factor and suppress filter factor, voice signal is filtered with filter factor is suppressed according to enhancing filter factor,
Obtain expectation voice signal and suppress noise signal, to do adaptive-filtering to suppressing noise signal and expectation voice signal, and
And adaptive-filtering coefficient update is carried out according to picture signal and voice signal.
In this way, by the identification of sound and image, it is capable of the orientation of preferably positioning signal source (for example, people), while root
It can preferably suppress the interference of directionality according to the wave filter of designed desired orientation, can by moving detection with reference to mouth
Preferably to do filter update;Due to combining image and voice, the energy ratio even in signal and interference is less than below 0dB
In the case of, auditory localization is still effective.
Exemplary means
Fig. 7 illustrates the block diagram of the sound processing apparatus according to the embodiment of the present application.
As shown in fig. 7, included according to the sound processing apparatus 700 of the embodiment of the present application:Sound pretreatment direction determines list
Member 710, multiple voice signal and camera acquired image signals for being gathered according to microphone array determine sound
Pre-process direction;Filter factor selecting unit 720 is pre-processed, for based on sound pretreatment direction-determining unit 710 institute
Definite sound pretreatment set direction pretreatment filter factor;Filter unit 730 is pre-processed, for being filtered using the pretreatment
Pretreatment filter factor selected by ripple coefficient limiting unit 720 carries out pretreatment filtering to the multiple voice signal, to obtain
Initial signal source signal and initial noisc source signal;Adaptive-filtering factor determination unit 740, for determining adaptive-filtering system
Number;And adaptive-filtering unit 750, for obtained adaptive using the adaptive-filtering factor determination unit 740
Filter factor come to it is described pretreatment the obtained initial signal source signal of filter unit 730 and the initial noisc source signal into
Row adaptive-filtering, to obtain enhancing signal source signal.
In one example, in the above sound processing unit 700, the sound pretreatment direction-determining unit 710 is used
In:Sound Sounnd source direction is determined according to the multiple voice signal;Image Sounnd source direction is determined according to described image signal;With
And based on the sound Sounnd source direction and described image Sounnd source direction come determine sound enhancing direction and sound suppress direction in
It is at least one, pre-process direction as the sound.
In one example, in the above sound processing unit 700, the sound pre-processes 710 base of direction-determining unit
Determine that sound enhancing direction and sound suppress in direction at least in the sound Sounnd source direction and described image Sounnd source direction
One includes:Determine whether described image Sounnd source direction includes at least one image signal source direction associated with signal source;
And in response to determining that described image Sounnd source direction includes at least one image signal source direction associated with signal source, will
At least one image signal source direction is determined as the sound enhancing direction.
In one example, in the above sound processing unit 700, the sound pre-processes 710 base of direction-determining unit
Determine that sound enhancing direction and sound suppress in direction at least in the sound Sounnd source direction and described image Sounnd source direction
One further comprises:The direction that sound described in the sound Sounnd source direction strengthens beyond direction is determined as the sound suppression
Direction processed.
In one example, in the above sound processing unit 700, the sound pre-processes 710 base of direction-determining unit
Determine that sound enhancing direction and sound suppress in direction at least in the sound Sounnd source direction and described image Sounnd source direction
One includes:Determine whether described image Sounnd source direction includes at least one image signal source direction associated with signal source;
And in response to determining that described image Sounnd source direction includes at least one image signal source direction associated with signal source, base
The sound enhancing direction and described is determined in the sound Sounnd source direction and at least one image signal source direction joint
Sound suppresses at least one in direction.
In one example, in the above sound processing unit 700, the sound pre-processes 710 base of direction-determining unit
The sound enhancing direction and described is determined in the sound Sounnd source direction and at least one image signal source direction joint
Sound suppress direction in it is at least one including:Determine the sound Sounnd source direction and at least one image signal source direction
The first otherness;It is minimized, determines corresponding with first otherness being minimized in response to first otherness
Candidate sound Sounnd source direction and candidate image Sounnd source direction;And based on the candidate sound Sounnd source direction and the candidate
Image Sounnd source direction determines the sound enhancing direction.
In one example, in the above sound processing unit 700, the sound pre-processes 710 base of direction-determining unit
Determine that the sound enhancing direction includes in the candidate sound Sounnd source direction and the candidate image Sounnd source direction:By the time
Select sound Sounnd source direction, the candidate image Sounnd source direction or the candidate sound Sounnd source direction and the candidate image sound
The intermediate value in source direction strengthens direction as the sound.
In one example, in the above sound processing unit 700, the sound pre-processes 710 base of direction-determining unit
The sound enhancing direction and described is determined in the sound Sounnd source direction and at least one image signal source direction joint
Sound suppresses at least one to further comprise in direction:Determine in the sound Sounnd source direction except the sound enhancing direction with
Second difference in outer direction and the direction at least one image signal source direction in addition to the sound strengthens direction
Property;Determine whether second otherness is less than a predetermined similarity threshold;And it is less than institute in response to second otherness
Predetermined similarity threshold is stated, determines to remove the sound enhancing side in the sound Sounnd source direction corresponding with second otherness
Suppress direction to direction in addition for the sound.
In one example, in the above sound processing unit 700, the pretreatment filter factor selecting unit 720 is used
In:It is pre-designed the enhancing filter factor and suppression filter factor corresponding to different angle;And selection respectively with the sound
Strengthen the corresponding enhancing filter factor in direction and suppression filter factor corresponding with sound suppression direction.
In one example, in the above sound processing unit 700, the pretreatment filter factor selecting unit 720 is pre-
First design includes corresponding to the enhancing filter factor and suppression filter factor of different angle:Formation based on the microphone array
It is pre-designed the enhancing filter factor and suppression filter factor corresponding to different angle.
In one example, in the above sound processing unit 700, the pretreatment filter unit 730 is used for:Make respectively
Enhancing filtering is carried out to the multiple voice signal and is suppressed to filter with the enhancing filter factor and the suppression filter factor
Ripple, to obtain the initial signal source signal and the initial noisc source signal.
In one example, in the above sound processing unit 700, adaptive-filtering factor determination unit 740 is used for:Obtain
Take initial adaptive-filtering coefficient;With according to the initial signal source signal and the initial noisc source signal to described initial
Adaptive-filtering coefficient is updated.
In one example, in the above sound processing unit 700, adaptive-filtering factor determination unit 740 is according to institute
State initial signal source signal and the initial noisc source signal the initial adaptive-filtering coefficient is updated including:Response
In determining that voiceless sound strengthens direction based on the sound Sounnd source direction and described image Sounnd source direction, to the initial adaptive filter
Wave system number is updated.
In one example, in the above sound processing unit 700, adaptive-filtering factor determination unit 740 is according to institute
State initial signal source signal and the initial noisc source signal the initial adaptive-filtering coefficient is updated including:Response
It is user in signal source, gathers multiple image information corresponding with the face orientation of the user;Believed based on the multiple image
Breath detects whether that there are mouth to move;And to be moved in response to there is no mouth, the initial adaptive-filtering coefficient is updated.
In one example, in the above sound processing unit 700, adaptive-filtering factor determination unit 740 is according to institute
State initial signal source signal and the initial noisc source signal the initial adaptive-filtering coefficient is updated including:Response
To move in there are mouth, determine whether the ratio of the initial signal source signal and the initial noisc source signal is less than a predetermined noise
Compare threshold value;And in response to determining that it is described pre- that the ratio of the initial signal source signal and the initial noisc source signal is less than
Determine snr threshold, the initial adaptive-filtering coefficient is updated.
In one example, in the above sound processing unit 700, the adaptive-filtering unit 750 is used for:By described in
Initial noisc source signal carries out certainly the initial signal source signal using the adaptive-filtering coefficient as signal is referred to
Adaptive filtering, to obtain the enhancing signal source signal.
In one example, in the above sound processing unit 700, described in 750 use of adaptive-filtering unit
Adaptive-filtering coefficient comes after carrying out adaptive-filtering to the initial signal source signal and the initial noisc source signal, institute
Adaptive-filtering factor determination unit 740 is stated according to the initial signal source signal and the initial noisc source signal to described first
Adaptive filtering coefficient is started to be updated.
In addition, in one example, in the above sound processing unit 700, used in the adaptive-filtering unit 750
The adaptive-filtering coefficient to carry out adaptive-filtering to the initial signal source signal and the initial noisc source signal
Meanwhile the adaptive-filtering factor determination unit 740 is according to the initial signal source signal and the initial noisc source signal
The initial adaptive-filtering coefficient is updated.
Here, it will be understood by those skilled in the art that other details according to the sound processing apparatus of the embodiment of the present application
The relevant details of the sound processing method according to the embodiment of the present application with illustrating before are identical, in order to avoid redundancy just not
Repeat again.
As described above, can be integrated in processing equipment 110 according to the sound processing apparatus 700 of the embodiment of the present application,
Can be the stand-alone device independent with processing equipment 110.
In one example, according to the sound processing apparatus 700 of the embodiment of the present application can be used as software module and/
Or hardware module and be integrated into the processing equipment 110.For example, the sound processing apparatus 700 can be the processing equipment 110
A software module in operating system, or can be directed to the application program that the processing equipment 110 is developed;When
So, which equally can be one of numerous hardware modules of the processing equipment 110.
Alternatively, in another example, the sound processing apparatus 700 and the processing equipment 110 can also be discrete set
It is standby, and the sound processing apparatus 700 can be connected to the processing equipment 110 by wired and or wireless network, and according to
The data format of agreement transmits interactive information.
Example electronic device
In the following, it is described with reference to Figure 8 the electronic equipment according to the embodiment of the present application.The electronic equipment can be such as Fig. 1 institutes
The processing equipment 110 or the stand-alone device independent with it shown, the stand-alone device can communicate with the processing equipment 110, with
Receive from it collected input signal.
Fig. 8 illustrates the block diagram of the electronic equipment according to the embodiment of the present application.
As shown in figure 8, electronic equipment 10 includes one or more processors 11 and memory 12.
Processor 11 can be central processing unit (CPU) or have data-handling capacity and/or instruction execution capability
Other forms processing unit, and can be with the other assemblies in control electronics 10 to perform desired function.
Memory 12 can include one or more computer program products, and the computer program product can include each
The computer-readable recording medium of kind form, such as volatile memory and/or nonvolatile memory.The volatile storage
Device is such as can include random access memory (RAM) and/or cache memory (cache).It is described non-volatile to deposit
Reservoir is such as can include read-only storage (ROM), hard disk, flash memory.It can be deposited on the computer-readable recording medium
The one or more computer program instructions of storage, processor 11 can run described program instruction, to realize this Shen described above
The sound localization method of each embodiment please and/or other desired functions.In the computer-readable recording medium
In can also store the various contents such as voice signal, picture content, filter factor.
In one example, electronic equipment 10 can also include:Input unit 13 and output device 14, these components pass through
Bindiny mechanism's (not shown) interconnection of bus system and/or other forms.
For example, when the electronic equipment is the processing equipment 110, which can be above-mentioned microphone array
Row, for catching the voice signal of sound source, or video camera, for catching picture signal.It is stand-alone device in the electronic equipment
When, which can be communication network connector, for receiving gathered input signal from the processing equipment 110.
In addition, the input equipment 13 can also include such as keyboard, mouse etc..
The output device 14 can export various information to outside, including determine range information, directional information etc..Should
The long-range output that output equipment 14 can include such as display, loudspeaker, printer and communication network and its be connected is set
It is standby etc..
Certainly, to put it more simply, illustrate only some in component related with the application in the electronic equipment 10 in Fig. 8,
Eliminate the component of such as bus, input/output interface etc..In addition, according to concrete application situation, electronic equipment 10 is also
It can include any other appropriate component.
Illustrative computer program product and computer-readable recording medium
In addition to the above method and equipment, embodiments herein can also be computer program product, it includes meter
Calculation machine programmed instruction, the computer program instructions when being run by processor so that the processor to perform this specification above-mentioned
The step in the sound processing method according to the various embodiments of the application described in " illustrative methods " part.
The computer program product can be used to hold with any combination of one or more programming languages to write
The program code of row the embodiment of the present application operation, described program design language include object oriented program language, such as
Java, C++ etc., further include conventional procedural programming language, such as " C " language or similar programming language.Journey
Sequence code can perform fully on the user computing device, partly perform on a user device, independent as one soft
Part bag performs, part performs or completely in remote computing device on a remote computing on the user computing device for part
Or performed on server.
In addition, embodiments herein can also be computer-readable recording medium, it is stored thereon with computer program and refers to
Order, the computer program instructions by processor when being run so that the processor performs above-mentioned " the exemplary side of this specification
The step in the sound processing method according to the various embodiments of the application described in method " part.
The computer-readable recording medium can use any combination of one or more computer-readable recording mediums.Computer-readable recording medium can
To be readable signal medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can include but is not limited to electricity, magnetic, light, electricity
Magnetic, the system of infrared ray or semiconductor, device or device, or any combination above.Readable storage medium storing program for executing is more specifically
Example (non exhaustive list) includes:Electrical connection, portable disc with one or more conducting wires, hard disk, random access memory
Device (RAM), read-only storage (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc
Read-only storage (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
The basic principle of the application is described above in association with specific embodiment, however, it is desirable to, it is noted that in this application
The advantages of referring to, advantage, effect etc. are only exemplary rather than limiting, it is impossible to which it is the application to think these advantages, advantage, effect etc.
Each embodiment is prerequisite.In addition, detail disclosed above is merely to exemplary effect and the work readily appreciated
With, and it is unrestricted, above-mentioned details is not intended to limit the application as that must be realized using above-mentioned concrete details.
The block diagram of device, device, equipment, system involved in the application only illustratively the example of property and is not intended to
It is required that or hint must be attached in the way of square frame illustrates, arrange, configure.As it would be recognized by those skilled in the art that
, it can connect, arrange by any-mode, configuring these devices, device, equipment, system.Such as " comprising ", "comprising", " tool
Have " etc. word be open vocabulary, refer to " including but not limited to ", and can be used interchangeably with it.Vocabulary used herein above
"or" and " and " refer to vocabulary "and/or", and can be used interchangeably with it, unless it is not such that context, which is explicitly indicated,.Here made
Vocabulary " such as " refers to phrase " such as, but not limited to ", and can be used interchangeably with it.
It may also be noted that in device, apparatus and method in the application, each component or each step are to decompose
And/or reconfigure.These decompose and/or reconfigure the equivalents that should be regarded as the application.
The above description of disclosed aspect is provided so that any person skilled in the art can make or use this
Application.Various modifications in terms of these are readily apparent to those skilled in the art, and are defined herein
General Principle can be applied to other aspect without departing from scope of the present application.Therefore, the application is not intended to be limited to
Aspect shown in this, but according to the widest range consistent with principle disclosed herein and novel feature.
In order to which purpose of illustration and description has been presented for above description.In addition, this description is not intended to the reality of the application
Apply example and be restricted to form disclosed herein.Although already discussed above multiple exemplary aspects and embodiment, this area skill
Art personnel will be recognized that its some modifications, modification, change, addition and sub-portfolio.
Claims (20)
1. a kind of sound processing method, including:
Multiple voice signal and camera the acquired image signals gathered according to microphone array determine that sound pre-processes
Direction;
Based on sound pretreatment set direction pretreatment filter factor;
Pretreatment filtering is carried out to the multiple voice signal using the pretreatment filter factor, to obtain initial signal source letter
Number and initial noisc source signal;
Determine adaptive-filtering coefficient;And
The initial signal source signal and the initial noisc source signal are carried out using the adaptive-filtering coefficient adaptive
It should filter, to obtain enhancing signal source signal.
2. sound processing method as claimed in claim 1, wherein, multiple voice signals for being gathered according to microphone array and
Camera acquired image signal determines that sound pretreatment direction includes:
Sound Sounnd source direction is determined according to the multiple voice signal;
Image Sounnd source direction is determined according to described image signal;And
Determine that sound enhancing direction and sound suppress in direction based on the sound Sounnd source direction and described image Sounnd source direction
It is at least one, as the sound pre-process direction.
3. sound processing method as claimed in claim 2, wherein, based on the sound Sounnd source direction and described image sound source side
Always determine sound enhancing direction and sound suppress direction in it is at least one including:
Determine whether described image Sounnd source direction includes at least one image signal source direction associated with signal source;And
In response to determining that described image Sounnd source direction includes at least one image signal source direction associated with signal source, by institute
State at least one image signal source direction and be determined as the sound enhancing direction.
4. sound processing method as claimed in claim 3, based on the sound Sounnd source direction and described image Sounnd source direction come
Determine that sound enhancing direction and sound suppress at least one to further comprise in direction:
The direction that sound described in the sound Sounnd source direction strengthens beyond direction is determined as the sound and suppresses direction.
5. sound processing method as claimed in claim 2, wherein, based on the sound Sounnd source direction and described image sound source side
Always determine sound enhancing direction and sound suppress direction in it is at least one including:
Determine whether described image Sounnd source direction includes at least one image signal source direction associated with signal source;And
In response to determining that described image Sounnd source direction includes at least one image signal source direction associated with signal source, it is based on
The sound Sounnd source direction and at least one image signal source direction joint determine the sound enhancing direction and the sound
Sound suppresses at least one in direction.
6. sound processing method as claimed in claim 5, wherein, based on the sound Sounnd source direction and at least one figure
As signal source direction joint determine sound enhancing direction and the sound suppress in direction it is at least one including:
Determine the first otherness of the sound Sounnd source direction and at least one image signal source direction;
It is minimized in response to first otherness, determines candidate sound corresponding with first otherness being minimized
Sounnd source direction and candidate image Sounnd source direction;And
The sound enhancing direction is determined based on the candidate sound Sounnd source direction and the candidate image Sounnd source direction.
7. sound processing method as claimed in claim 6, wherein, schemed based on the candidate sound Sounnd source direction and the candidate
As Sounnd source direction determines that the sound enhancing direction includes:
By the candidate sound Sounnd source direction, the candidate image Sounnd source direction or the candidate sound Sounnd source direction and institute
State the intermediate value of candidate image Sounnd source direction strengthens direction as the sound.
8. sound processing method as claimed in claim 6, wherein, based on the sound Sounnd source direction and at least one figure
As signal source direction joint determines sound enhancing direction and the sound suppresses in direction at least one further comprises:
Determine the direction in addition to the sound strengthens direction and at least one picture signal in the sound Sounnd source direction
Second otherness in the direction in the direction of source in addition to the sound strengthens direction;
Determine whether second otherness is less than a predetermined similarity threshold;And
It is less than the predetermined similarity threshold in response to second otherness, determines corresponding with second otherness described
Direction in sound Sounnd source direction in addition to the sound strengthens direction suppresses direction for the sound.
9. sound processing method as claimed in claim 2, wherein, based on sound pretreatment set direction pretreatment filtering
Coefficient includes:
It is pre-designed the enhancing filter factor and suppression filter factor corresponding to different angle;And
Selection enhancing filter factor corresponding with sound enhancing direction and suppression corresponding with sound suppression direction respectively
Filter factor processed.
10. sound processing method as claimed in claim 9, wherein, the enhancing being pre-designed corresponding to different angle filters system
Number and suppression filter factor include:
Formation based on the microphone array is pre-designed enhancing filter factor and suppression filtering system corresponding to different angle
Number.
11. sound processing method as claimed in claim 9, wherein, using the pretreatment filter factor to the multiple sound
Sound signal carries out pretreatment filtering, is included with obtaining initial signal source signal and initial noisc source signal:
Enhancing filter is carried out to the multiple voice signal using the enhancing filter factor and the suppression filter factor respectively
Ripple and suppression filter, to obtain the initial signal source signal and the initial noisc source signal.
12. sound processing method as claimed in claim 2, wherein it is determined that adaptive-filtering coefficient includes:
Obtain initial adaptive-filtering coefficient;With
The initial adaptive-filtering coefficient is carried out more according to the initial signal source signal and the initial noisc source signal
Newly.
13. sound processing method as claimed in claim 12, wherein, described make an uproar according to the initial signal source signal and initially
Sound-source signal the initial adaptive-filtering coefficient is updated including:
In response to determining that voiceless sound strengthens direction based on the sound Sounnd source direction and described image Sounnd source direction, to described initial
Adaptive-filtering coefficient is updated.
14. sound processing method as claimed in claim 12, wherein, described make an uproar according to the initial signal source signal and initially
Sound-source signal the initial adaptive-filtering coefficient is updated including:
It is user in response to signal source, gathers multiple image information corresponding with the face orientation of the user;
Moved based on the multiple image infomation detection with the presence or absence of mouth;And
To be moved in response to there is no mouth, the initial adaptive-filtering coefficient is updated.
15. sound processing method as claimed in claim 14, further comprises:
To be moved in response to there are mouth, determine whether the ratio of the initial signal source signal and the initial noisc source signal is less than one
Predetermined snr threshold;And
Ratio in response to determining the initial signal source signal and the initial noisc source signal is less than the predetermined signal-to-noise ratio
Threshold value, is updated the initial adaptive-filtering coefficient.
16. sound processing method as claimed in claim 1, wherein, using the adaptive-filtering coefficient come to described initial
Signal source signal and the initial noisc source signal, which carry out adaptive-filtering, to be included:
Using the initial noisc source signal as signal is referred to, using the adaptive-filtering coefficient come to the initial signal source
Signal carries out adaptive-filtering, to obtain the enhancing signal source signal.
17. sound processing method as claimed in claim 1, wherein, using the adaptive-filtering coefficient come to it is described just
After beginning signal source signal and the initial noisc source signal carry out adaptive-filtering, further comprise:
The initial adaptive-filtering coefficient is carried out more according to the initial signal source signal and the initial noisc source signal
Newly.
18. a kind of sound processing apparatus, including:
Sound pre-processes direction-determining unit, and multiple voice signals and camera for being gathered according to microphone array are adopted
The picture signal of collection determines that sound pre-processes direction;
Filter factor selecting unit is pre-processed, for based on sound pretreatment set direction pretreatment filter factor;
Filter unit is pre-processed, for carrying out pretreatment filter to the multiple voice signal using the pretreatment filter factor
Ripple, to obtain initial signal source signal and initial noisc source signal;
Adaptive-filtering factor determination unit, for determining adaptive-filtering coefficient;And
Adaptive-filtering unit, for using the adaptive-filtering coefficient come to the initial signal source signal and described initial
Noise source signal carries out adaptive-filtering, to obtain enhancing signal source signal.
19. a kind of electronic equipment, including:
Processor;And
Memory, is stored with computer program instructions, the computer program instructions are by the processing in the memory
Device causes the processor to perform the sound processing method as any one of claim 1-17 when running.
20. a kind of computer-readable recording medium, is stored thereon with computer program instructions, the computer program instructions are in quilt
Processor causes the processor to perform the sound processing method as any one of claim 1-17 when running.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711258117.XA CN107993671A (en) | 2017-12-04 | 2017-12-04 | Sound processing method, device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711258117.XA CN107993671A (en) | 2017-12-04 | 2017-12-04 | Sound processing method, device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107993671A true CN107993671A (en) | 2018-05-04 |
Family
ID=62035358
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711258117.XA Pending CN107993671A (en) | 2017-12-04 | 2017-12-04 | Sound processing method, device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107993671A (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108766457A (en) * | 2018-05-30 | 2018-11-06 | 北京小米移动软件有限公司 | Acoustic signal processing method, device, electronic equipment and storage medium |
CN108806711A (en) * | 2018-08-07 | 2018-11-13 | 吴思 | A kind of extracting method and device |
CN108920640A (en) * | 2018-07-02 | 2018-11-30 | 北京百度网讯科技有限公司 | Context acquisition methods and equipment based on interactive voice |
CN109147813A (en) * | 2018-09-21 | 2019-01-04 | 神思电子技术股份有限公司 | A kind of service robot noise-reduction method based on audio-visual location technology |
CN110503970A (en) * | 2018-11-23 | 2019-11-26 | 腾讯科技(深圳)有限公司 | A kind of audio data processing method, device and storage medium |
CN110858943A (en) * | 2018-08-24 | 2020-03-03 | 纬创资通股份有限公司 | Sound reception processing device and sound reception processing method thereof |
WO2020043007A1 (en) * | 2018-08-27 | 2020-03-05 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method, system, and computer-readable medium for purifying voice using depth information |
CN111402912A (en) * | 2020-02-18 | 2020-07-10 | 云知声智能科技股份有限公司 | Voice signal noise reduction method and device |
CN111435598A (en) * | 2019-01-15 | 2020-07-21 | 北京地平线机器人技术研发有限公司 | Voice signal processing method and device, computer readable medium and electronic equipment |
WO2020173156A1 (en) * | 2019-02-27 | 2020-09-03 | 北京地平线机器人技术研发有限公司 | Method, device and electronic device for controlling audio playback of multiple loudspeakers |
CN111863005A (en) * | 2019-04-28 | 2020-10-30 | 北京地平线机器人技术研发有限公司 | Sound signal acquisition method and device, storage medium and electronic equipment |
CN112216295A (en) * | 2019-06-25 | 2021-01-12 | 大众问问(北京)信息科技有限公司 | Sound source positioning method, device and equipment |
CN112509571A (en) * | 2019-08-27 | 2021-03-16 | 富士通个人电脑株式会社 | Information processing apparatus and recording medium |
WO2021078116A1 (en) * | 2019-10-21 | 2021-04-29 | 维沃移动通信有限公司 | Video processing method and electronic device |
CN113056925A (en) * | 2018-08-06 | 2021-06-29 | 阿里巴巴集团控股有限公司 | Method and device for detecting sound source position |
CN113544775A (en) * | 2019-03-06 | 2021-10-22 | 缤特力股份有限公司 | Audio signal enhancement for head-mounted audio devices |
CN113574597A (en) * | 2018-12-21 | 2021-10-29 | 弗劳恩霍夫应用研究促进协会 | Apparatus and method for source separation using estimation and control of sound quality |
CN114420144A (en) * | 2020-10-09 | 2022-04-29 | 雅马哈株式会社 | Audio signal processing method and audio signal processing device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106228993A (en) * | 2016-09-29 | 2016-12-14 | 北京奇艺世纪科技有限公司 | A kind of method and apparatus eliminating noise and electronic equipment |
CN106328156A (en) * | 2016-08-22 | 2017-01-11 | 华南理工大学 | Microphone array voice reinforcing system and microphone array voice reinforcing method with combination of audio information and video information |
CN106653041A (en) * | 2017-01-17 | 2017-05-10 | 北京地平线信息技术有限公司 | Audio signal processing equipment and method as well as electronic equipment |
CN106782584A (en) * | 2016-12-28 | 2017-05-31 | 北京地平线信息技术有限公司 | Audio signal processing apparatus, method and electronic equipment |
-
2017
- 2017-12-04 CN CN201711258117.XA patent/CN107993671A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106328156A (en) * | 2016-08-22 | 2017-01-11 | 华南理工大学 | Microphone array voice reinforcing system and microphone array voice reinforcing method with combination of audio information and video information |
CN106228993A (en) * | 2016-09-29 | 2016-12-14 | 北京奇艺世纪科技有限公司 | A kind of method and apparatus eliminating noise and electronic equipment |
CN106782584A (en) * | 2016-12-28 | 2017-05-31 | 北京地平线信息技术有限公司 | Audio signal processing apparatus, method and electronic equipment |
CN106653041A (en) * | 2017-01-17 | 2017-05-10 | 北京地平线信息技术有限公司 | Audio signal processing equipment and method as well as electronic equipment |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10798483B2 (en) | 2018-05-30 | 2020-10-06 | Beijing Xiaomi Mobile Software Co., Ltd. | Audio signal processing method and device, electronic equipment and storage medium |
CN108766457A (en) * | 2018-05-30 | 2018-11-06 | 北京小米移动软件有限公司 | Acoustic signal processing method, device, electronic equipment and storage medium |
CN108920640B (en) * | 2018-07-02 | 2020-12-22 | 北京百度网讯科技有限公司 | Context obtaining method and device based on voice interaction |
CN108920640A (en) * | 2018-07-02 | 2018-11-30 | 北京百度网讯科技有限公司 | Context acquisition methods and equipment based on interactive voice |
CN113056925A (en) * | 2018-08-06 | 2021-06-29 | 阿里巴巴集团控股有限公司 | Method and device for detecting sound source position |
CN108806711A (en) * | 2018-08-07 | 2018-11-13 | 吴思 | A kind of extracting method and device |
CN110858943A (en) * | 2018-08-24 | 2020-03-03 | 纬创资通股份有限公司 | Sound reception processing device and sound reception processing method thereof |
US11842745B2 (en) | 2018-08-27 | 2023-12-12 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method, system, and computer-readable medium for purifying voice using depth information |
WO2020043007A1 (en) * | 2018-08-27 | 2020-03-05 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method, system, and computer-readable medium for purifying voice using depth information |
CN109147813A (en) * | 2018-09-21 | 2019-01-04 | 神思电子技术股份有限公司 | A kind of service robot noise-reduction method based on audio-visual location technology |
CN110503970A (en) * | 2018-11-23 | 2019-11-26 | 腾讯科技(深圳)有限公司 | A kind of audio data processing method, device and storage medium |
CN113574597B (en) * | 2018-12-21 | 2024-04-12 | 弗劳恩霍夫应用研究促进协会 | Apparatus and method for source separation using estimation and control of sound quality |
CN113574597A (en) * | 2018-12-21 | 2021-10-29 | 弗劳恩霍夫应用研究促进协会 | Apparatus and method for source separation using estimation and control of sound quality |
CN111435598A (en) * | 2019-01-15 | 2020-07-21 | 北京地平线机器人技术研发有限公司 | Voice signal processing method and device, computer readable medium and electronic equipment |
US11817112B2 (en) | 2019-01-15 | 2023-11-14 | Beijing Horizon Robotics Technology Research And Development Co., Ltd. | Method, device, computer readable storage medium and electronic apparatus for speech signal processing |
CN111435598B (en) * | 2019-01-15 | 2023-08-18 | 北京地平线机器人技术研发有限公司 | Voice signal processing method, device, computer readable medium and electronic equipment |
CN111629301B (en) * | 2019-02-27 | 2021-12-31 | 北京地平线机器人技术研发有限公司 | Method and device for controlling multiple loudspeakers to play audio and electronic equipment |
WO2020173156A1 (en) * | 2019-02-27 | 2020-09-03 | 北京地平线机器人技术研发有限公司 | Method, device and electronic device for controlling audio playback of multiple loudspeakers |
US11856379B2 (en) | 2019-02-27 | 2023-12-26 | Beijing Horizon Robotics Technology Research And Development Co., Ltd. | Method, device and electronic device for controlling audio playback of multiple loudspeakers |
CN111629301A (en) * | 2019-02-27 | 2020-09-04 | 北京地平线机器人技术研发有限公司 | Method and device for controlling multiple loudspeakers to play audio and electronic equipment |
US11664042B2 (en) | 2019-03-06 | 2023-05-30 | Plantronics, Inc. | Voice signal enhancement for head-worn audio devices |
CN113544775A (en) * | 2019-03-06 | 2021-10-22 | 缤特力股份有限公司 | Audio signal enhancement for head-mounted audio devices |
CN111863005B (en) * | 2019-04-28 | 2024-09-27 | 北京地平线机器人技术研发有限公司 | Sound signal acquisition method and device, storage medium and electronic equipment |
CN111863005A (en) * | 2019-04-28 | 2020-10-30 | 北京地平线机器人技术研发有限公司 | Sound signal acquisition method and device, storage medium and electronic equipment |
CN112216295A (en) * | 2019-06-25 | 2021-01-12 | 大众问问(北京)信息科技有限公司 | Sound source positioning method, device and equipment |
CN112216295B (en) * | 2019-06-25 | 2024-04-26 | 大众问问(北京)信息科技有限公司 | Sound source positioning method, device and equipment |
CN112509571A (en) * | 2019-08-27 | 2021-03-16 | 富士通个人电脑株式会社 | Information processing apparatus and recording medium |
WO2021078116A1 (en) * | 2019-10-21 | 2021-04-29 | 维沃移动通信有限公司 | Video processing method and electronic device |
CN111402912A (en) * | 2020-02-18 | 2020-07-10 | 云知声智能科技股份有限公司 | Voice signal noise reduction method and device |
CN114420144A (en) * | 2020-10-09 | 2022-04-29 | 雅马哈株式会社 | Audio signal processing method and audio signal processing device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107993671A (en) | Sound processing method, device and electronic equipment | |
CN110600017B (en) | Training method of voice processing model, voice recognition method, system and device | |
CN107240395B (en) | Acoustic model training method and device, computer equipment and storage medium | |
DE112017003563B4 (en) | METHOD AND SYSTEM OF AUTOMATIC LANGUAGE RECOGNITION USING POSTERIORI TRUST POINT NUMBERS | |
EP3480820B1 (en) | Electronic device and method for processing audio signals | |
CN107799126B (en) | Voice endpoint detection method and device based on supervised machine learning | |
CN108172213B (en) | Surge audio identification method, surge audio identification device, surge audio identification equipment and computer readable medium | |
CN112183166B (en) | Method and device for determining training samples and electronic equipment | |
EP3444809B1 (en) | Personalized speech recognition method and system | |
US12046237B2 (en) | Speech interaction method and apparatus, computer readable storage medium and electronic device | |
CN110473568B (en) | Scene recognition method and device, storage medium and electronic equipment | |
CN112183107B (en) | Audio processing method and device | |
CN112885328B (en) | Text data processing method and device | |
CN110837758B (en) | Keyword input method and device and electronic equipment | |
CN109947971B (en) | Image retrieval method, image retrieval device, electronic equipment and storage medium | |
CN113516990A (en) | Voice enhancement method, method for training neural network and related equipment | |
CN111833554A (en) | Ticket selling machine, ticket selling machine system, ticket selling method and ticket selling device | |
CN112216307A (en) | Speech emotion recognition method and device | |
CN111581470A (en) | Multi-modal fusion learning analysis method and system for dialog system context matching | |
US20190348062A1 (en) | System and method for encoding data using time shift in an audio/image recognition integrated circuit solution | |
CN112002346A (en) | Gender and age identification method, device, equipment and storage medium based on voice | |
EP2503545A1 (en) | Arrangement and method relating to audio recognition | |
CN111400463B (en) | Dialogue response method, device, equipment and medium | |
CN110992971A (en) | Method for determining voice enhancement direction, electronic equipment and storage medium | |
Paleček | Experimenting with lipreading for large vocabulary continuous speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180504 |