CN107993671A - Sound processing method, device and electronic equipment - Google Patents

Sound processing method, device and electronic equipment Download PDF

Info

Publication number
CN107993671A
CN107993671A CN201711258117.XA CN201711258117A CN107993671A CN 107993671 A CN107993671 A CN 107993671A CN 201711258117 A CN201711258117 A CN 201711258117A CN 107993671 A CN107993671 A CN 107993671A
Authority
CN
China
Prior art keywords
sound
signal
source
initial
adaptive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711258117.XA
Other languages
Chinese (zh)
Inventor
朱长宝
陈本东
李育国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Horizon Robotics Technology Co Ltd
Original Assignee
Nanjing Horizon Robotics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Horizon Robotics Technology Co Ltd filed Critical Nanjing Horizon Robotics Technology Co Ltd
Priority to CN201711258117.XA priority Critical patent/CN107993671A/en
Publication of CN107993671A publication Critical patent/CN107993671A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • H04N23/611Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

Disclose a kind of sound processing method, device, electronic equipment and computer-readable recording medium.The described method includes:Multiple voice signal and camera the acquired image signals gathered according to microphone array determine that sound pre-processes direction;Based on sound pretreatment set direction pretreatment filter factor;Pretreatment filtering is carried out to the multiple voice signal using the pretreatment filter factor, to obtain initial signal source signal and initial noisc source signal;Determine adaptive-filtering coefficient;And adaptive-filtering is carried out to the initial signal source signal and the initial noisc source signal using the adaptive-filtering coefficient, to obtain enhancing signal source signal.Signal source signal can be strengthened, so as to improve the tonequality of sound.

Description

Sound processing method, device and electronic equipment
Technical field
This application involves acoustic processing field, and more specifically, it is related to a kind of sound processing method, acoustic processing dress Put, electronic equipment and computer-readable recording medium.
Background technology
With the popularization of various electronic equipments, in order to improve the convenience of control electronics, more and more electronics are set It is standby to provide the function being controlled by voice.For example, the electronic equipment of such as smart mobile phone or mobile unit is provided with Voice control function, user can by voice come control electronics to perform corresponding function.Therefore, electronic equipment needs The voice of user is identified, so that the true intention of user is known, to control corresponding functional unit to perform the function needed for user. But either in the home environment using smart mobile phone, or under the vehicle environment using mobile unit, speech recognition is all It is easier to be disturbed be subject to external environment, particularly outside noise has a great influence for speech recognition.
Therefore, there are the problem of tonequality is poor, discrimination is relatively low for existing sound processing method.
The content of the invention
In order to solve the above-mentioned technical problem, it is proposed that the application.Embodiments herein provides a kind of acoustic processing side Method, sound processing apparatus, electronic equipment and computer-readable recording medium, it can improve the tonequality of sound so as to improve sound Discrimination.
According to the one side of the application, there is provided a kind of sound processing method, including:Gathered according to microphone array Multiple voice signal and camera acquired image signals determine sound pre-process direction;Based on the sound pretreatment side Filter factor is pre-processed to selection;Pretreatment filtering is carried out to the multiple voice signal using the pretreatment filter factor, To obtain initial signal source signal and initial noisc source signal;Determine adaptive-filtering coefficient;And use the adaptive filter Wave system number to carry out adaptive-filtering to the initial signal source signal and the initial noisc source signal, to obtain enhancing signal Source signal.
According to the another aspect of the application, there is provided a kind of sound processing apparatus, including:Sound pretreatment direction determines list Member, multiple voice signal and camera acquired image signals for being gathered according to microphone array determine that sound is located in advance Manage direction;Filter factor selecting unit is pre-processed, for based on sound pretreatment set direction pretreatment filter factor;In advance Filter unit is handled, for carrying out pretreatment filtering to the multiple voice signal using the pretreatment filter factor, with To initial signal source signal and initial noisc source signal;Adaptive-filtering factor determination unit, for determining adaptive-filtering system Number;And adaptive-filtering unit, for using the adaptive-filtering coefficient come to the initial signal source signal and described Initial noisc source signal carries out adaptive-filtering, to obtain enhancing signal source signal.
According to the another further aspect of the application, there is provided a kind of electronic equipment, including:Processor;And memory, in institute State and computer program instructions are stored with memory, the computer program instructions cause described when being run by the processor Processor performs sound processing method as described above.
According to the another aspect of the application, there is provided a kind of computer-readable recording medium, is stored thereon with computer journey Sequence instructs, and the computer program instructions cause the processor to perform acoustic processing as described above when being run by processor Method.
Compared with prior art, set using the sound processing method according to the embodiment of the present application, sound processing apparatus, electronics Standby and computer-readable recording medium, what the multiple voice signals and camera that can be gathered according to microphone array were gathered Picture signal determines that sound pre-processes direction;Based on sound pretreatment set direction pretreatment filter factor;Using described Pretreatment filter factor carries out pretreatment filtering to the multiple voice signal, to obtain initial signal source signal and initial noisc Source signal;Determine adaptive-filtering coefficient;And using the adaptive-filtering coefficient come to the initial signal source signal and The initial noisc source signal carries out adaptive-filtering, to obtain enhancing signal source signal.Therefore, can be by pre- based on sound Direction is handled to strengthen signal source signal, so as to improve the tonequality of sound, and then improves the precision of voice recognition.
Brief description of the drawings
The embodiment of the present application is described in more detail in conjunction with the accompanying drawings, the above-mentioned and other purposes of the application, Feature and advantage will be apparent.Attached drawing is used for providing further understanding the embodiment of the present application, and forms explanation A part for book, is used to explain the application together with the embodiment of the present application, does not form the limitation to the application.In the accompanying drawings, Identical reference number typically represents same parts or step.
Fig. 1 illustrates the schematic diagram of the application scenarios of the sound processing method according to the embodiment of the present application;
Fig. 2 illustrates the flow chart of the sound processing method according to the embodiment of the present application;
Fig. 3 illustrates the flow chart that sound pretreatment direction is determined in the sound processing method according to the embodiment of the present application;
Fig. 4 illustrates definite sound enhancing direction and sound suppression in the sound processing method according to the embodiment of the present application At least one flow chart in direction;
Fig. 5 illustrates the flow of the adaptive-filtering coefficient update in the sound processing method according to the embodiment of the present application Figure;
The mouth that Fig. 6 illustrates the user in the sound processing method according to the embodiment of the present application moves the flow chart of detection;
Fig. 7 illustrates the block diagram of the sound processing apparatus according to the embodiment of the present application;
Fig. 8 illustrates the block diagram of the electronic equipment according to the embodiment of the present application.
Embodiment
In the following, example embodiment according to the application will be described in detail by referring to the drawings.Obviously, described embodiment is only Only it is the part of the embodiment of the application, rather than the whole embodiments of the application, it should be appreciated that the application is from described herein The limitation of example embodiment.
Application general introduction
As described above, no matter under home environment or vehicle environment, speech recognition is all easier to be subject to external environment Interference, particularly outside noise have a great influence for speech recognition.For example, extraneous noise may be from directional People, the sound producing body such as television set.
Existing technical solution is to lift speech quality by the means of speech enhan-cement, is known so as to further lift voice Not rate.Wherein, speech enhan-cement can be divided into the speech enhan-cement of single pass speech enhan-cement and multichannel again.
Single pass voice enhancement algorithm is difficult to handle the interference with directive unstable state, for example television set is dry Disturb.Also, single channel noise reduction is when noise is reduced, or has certain loss to tonequality, the loss of tonequality also then can band Carry out the decline of phonetic recognization rate.
Multicenter voice enhancing technology generally uses Wave beam forming and blind source separate technology.Wave beam forming is required for shifting to an earlier date greatly Know azimuth information, rely on Speech processing means merely, when interfering energy is larger, such as signal and the energy ratio of interference Situation less than 0dB, the accuracy of auditory localization are very low.Blind source separate technology can also face the problem of channel selecting, 0dB's Still it is difficult to selection under scene accurately, in the case of otherwise interfering with sound source number more than channel number, blind source separating also is difficult to more Solve well.
In view of the above-mentioned problems, the basic conception of the application is to propose a kind of sound processing method, sound processing apparatus, electronics Equipment and computer-readable recording medium, what its multiple voice signals and camera for being gathered by microphone array were gathered Picture signal determines that sound pre-processes direction, and pre-processes direction based on sound to be filtered multiple voice signals to obtain Strengthen signal source signal.Therefore, the signal source of concern can be directed to, strengthens the signal of the signal source, so as to improve the sound of sound Matter, and and then improve voice recognition precision.
It will be understood by those skilled in the art that it can be applied to include such as according to the sound processing method of the embodiment of the present application The acoustic processing of various sound producing bodies under the various environment of the upper home environment and vehicle environment, the embodiment of the present application is simultaneously It is not intended to this progress any restrictions.
After the basic principle of the application is described, carry out the specific various non-limits for introducing the application below with reference to the accompanying drawings Property embodiment processed.
Exemplary system
Fig. 1 illustrates the schematic diagram of the application scenarios of the sound processing method according to the embodiment of the present application.
As shown in Figure 1, the application scenarios for the sound processing method include processing equipment 110 and one or more sound Source, for example, the first sound source 120 and the second sound source 130.
The processing equipment 110 can be for receiving sound and carrying out any kind of electronic equipment of voice recognition, its Including sound collection device 111, such as microphone array, and including image acquisition device 112, such as camera.
For example, under home environment, which can be smart mobile phone, for receiving the phonetic entry of user And perform corresponding function.Alternatively, the processing equipment 110 can be mobile unit.In addition, the processing equipment 110 is except can be with Receive outside the voice signal (for example, user speech) from the signal source for wishing to pay close attention to, other types of sound can also be received Sound signal, for example, the signal from the noise source for being not intended to concern.
The above sound sampler 111 can be used for the audio signal that collection includes the sound sources such as signal source or noise source, its It can be microphone array.For example, the microphone array can be made of the microphone of certain amount, for the sky to sound field Between the characteristic system that is sampled and handled, it can include the not exactly the same multiple microphone MIC1 in respective pickup area and arrive MICn, wherein n are greater than the natural number equal to 2.For example, the relative position relation depending on each microphone, microphone array It can be divided into:Linear array, its array element center are located on the same line;Planar array, its array element central distribution are flat at one On face;And space array, its array element central distribution is in solid space.
Above-mentioned image acquisition device 112 can be used for the picture signal for catching monitoring scene, it can include one or more A camera.For example, the view data that the camera is collected can be consecutive image frame sequence (that is, video flowing) or discrete Picture frame sequence (that is, the image data set arrived in predetermined sampling time point sampling) etc..For example, the camera can be such as monocular Camera, binocular camera, more mesh cameras etc., in addition, it can be used for catching gray-scale map, can also catch the coloured silk with colouring information Chromatic graph.Certainly, the camera of any other type as known in the art and be likely to occur in the future can be applied to this Shen Please, the application catches it the mode of image and is not particularly limited, as long as the gray scale or colour information of input picture can be obtained ., in one embodiment, can be before being analyzed and being handled, by coloured silk in order to reduce the calculation amount in subsequent operation Chromatic graph carries out gray processing processing.Certainly, in another embodiment, can also be directly to colour in order to retain the information content of bigger Figure is analyzed and handled.
First sound source 120 and the second sound source 130 can be any kind of sound sources, it can include sending wishing what is paid close attention to The noise source for the noise component(s) that the signal source of signal component and hope eliminate.For example, the sound source can have life or without life Sound source.For example, lived sound source can be including humans and animals etc.;And abiotic sound source can include robot, TV Machine, sound equipment etc..
Understand spirit herein and principle it should be noted that above application scene is for only for ease of and show, this The embodiment not limited to this of application.On the contrary, embodiments herein can be applied to any scene that may be applicable in.For example, should Sound source can be more and lesser number.
Illustrative methods
Fig. 2 illustrates the flow chart of the sound processing method according to the embodiment of the present application.
As shown in Fig. 2, included according to the sound processing method of the embodiment of the present application:S210, is adopted according to microphone array Multiple voice signal and camera acquired image signals of collection determine that sound pre-processes direction;S220, based on the sound Pre-process set direction pretreatment filter factor;S230, using it is described pretreatment filter factor to the multiple voice signal into Row pretreatment filtering, to obtain initial signal source signal and initial noisc source signal;S240, determines adaptive-filtering coefficient;With And S250, the initial signal source signal and the initial noisc source signal are carried out using the adaptive-filtering coefficient Adaptive-filtering, to obtain enhancing signal source signal.
In one example, in the sound processing method according to the embodiment of the present application, gathered according to microphone array Multiple voice signal and camera acquired image signals determine sound pretreatment direction S210 can include:According to described Multiple voice signals determine sound Sounnd source direction;Image Sounnd source direction is determined according to described image signal;And based on the sound Sound Sounnd source direction and described image Sounnd source direction are at least one in sound enhancing direction and sound suppression direction to determine, as The sound pre-processes direction.
In step S210, first is come from for example, can be gathered by sound collection device 111 (for example, microphone array) The voice signal of 120 and second sound source 130 of sound source, and picture signal is gathered by image acquisition device 112 (for example, camera). Here, sound collection device 111 is used for the sound for gathering current environment, for example, it includes the sound letter for for example wishing concern Number (for example, user speech) and corresponding interference signal (for example, from television set, radio etc.).The sound collection device 111 including but not limited to simulation microphone array and corresponding analog-digital converter etc..It is it is then possible to more based on what is gathered A voice signal and acquired image signal determine that sound pre-processes direction.
In the following, how will be believed with reference to figure 3 to describe in detail based on the multiple voice signals gathered and acquired image Number definite sound pretreatment direction.
Fig. 3 illustrates the flow chart that sound pretreatment direction is determined in the sound processing method according to the embodiment of the present application.
As shown in figure 3, in the sound processing method according to the embodiment of the present application, gathered according to microphone array more A voice signal and camera acquired image signal determines that sound pretreatment direction S210 can include:S310, gathers institute State multiple voice signals;S320, carries out the multiple voice signal auditory localization to determine sound Sounnd source direction;S330, is adopted Collect described image signal;S340, identifies described image signal;S350, carries out the orientation based on image and adjudicates to determine image sound Source direction;And S360, determine that sound strengthens direction harmony based on the sound Sounnd source direction and described image Sounnd source direction Sound suppresses at least one in direction, and direction is pre-processed as the sound.
In step S310, multiple voice signals are gathered by sound collection device 111, for example, it includes for example wish to pay close attention to Voice signal (for example, user speech) and corresponding interference signal (for example, from television set, radio etc.).
In step S320, the multiple voice signals gathered to sound collection device 111 carry out auditory localization, which determines Position is not limited only to the judgement of single sound source, the judgement of multi-acoustical can also be carried out, so as to obtain the orientation of multi-acoustical.
For example, more auditory localizations can use MUSIC (Multiple Signal Classifications:Multiple Signal Classification) algorithm.The algorithm calculates covariance matrix to the multi channel signals of reception first, and to covariance matrix Eigenvalues Decomposition is carried out, is then ranked up according to the size of characteristic value, finds the corresponding characteristic vector of corresponding noise, finally The different directions steering vector Special composition predicted in advance by noise characteristic vector sum is composed, and the corresponding direction of polarographic maximum is to correspond to Sounnd source direction.For example, by auditory localization, the M Sounnd source direction based on voice signal is finally exported, is denoted as d1 (i), 0<i ≤M。
In step S330, image is gathered by image acquisition device 112.
In step S340, the image collected is identified, so that potential sound source is exported, such as user, television set, radio Deng.Assuming that potential sound source number is N, it is s (j) to correspond to different sound sources respectively, wherein 0<j≤N.
In step S350, the orientation judgement based on image is carried out, the potential sound source identified for decision diagram picture is corresponding Orientation, such as, user (face) is at 90 degree, and television set is at 45 degree.Finally export N number of potential sound source angle based on picture signal Degree, is denoted as d2 (j), and 0<j≤N.
Obviously, although above first describe step S310 and S320, after describe step S330-S350, in reality In, step S330-S350 can also be first carried out and then perform step S310 and S320, alternatively, two groups of steps can also It is parallel to perform.
In step S360, according to the orientation discriminative information of image and the court verdict of auditory localization, to determine that sound strengthens Direction and sound suppress at least one in direction, and direction is pre-processed as the sound.That is, real according to the application In the sound processing method for applying example, it can be strengthened only for the signal of signal source, can also be only for the signal of noise source Suppressed, the signal of signal source can also be strengthened at the same time and the signal of noise source is suppressed.
In the following, will to how based on the sound Sounnd source direction and described image Sounnd source direction come determine sound strengthen direction Suppress at least one to be specifically described as sound pretreatment direction S360 in direction with sound.
In one example, in the sound processing method according to the embodiment of the present application, based on the sound Sounnd source direction Determine that sound enhancing direction and sound suppress at least one S360 in direction and can include with described image Sounnd source direction:Really Determine whether described image Sounnd source direction includes at least one image signal source direction associated with signal source;And in response to Determine that described image Sounnd source direction includes at least one image signal source direction associated with signal source, will be described at least one Image signal source direction is determined as the sound enhancing direction.
In addition, step S360 may further include:By sound described in the sound Sounnd source direction strengthen direction with Outer direction is determined as the sound and suppresses direction.
For example, it is assumed that application scenarios include three sound sources, determined in step s 320 according to the sound source based on voice signal Position show that the first sound source is at 0 degree, and the second sound source is at 50 degree, and the 3rd sound source is at 100 degree, then sound Sounnd source direction is 0 degree, 50 degree With 100 degree.In addition, it is assumed that drawn in step S350 according to the auditory localization based on picture signal, user is at 0 degree, television set At 45 degree, radio is at 120 degree, then image Sounnd source direction is 0 degree, 45 degree and 120 degree.At this time, determining image Sounnd source direction is It is no to include at least one image signal source direction associated with signal source (for example, user).Due to above-mentioned image Sounnd source direction Comprising the image signal source direction associated with user, i.e., 0 degree, therefore, sound enhancing direction directly can be determined as by 0 degree. It is then possible to by the direction beyond sound enhancing direction in sound Sounnd source direction, i.e., 50 degree and 100 degree are determined as sound suppression side To.
In addition, in one is replaced example, in the sound processing method according to the embodiment of the present application, based on the sound Sounnd source direction and described image Sounnd source direction can come at least one S360 for determining sound enhancing direction and sound suppresses in direction With including:Determine whether described image Sounnd source direction includes at least one image signal source direction associated with signal source;With And in response to determining that described image Sounnd source direction includes at least one image signal source direction associated with signal source, it is based on The sound Sounnd source direction and at least one image signal source direction joint determine the sound enhancing direction and the sound Sound suppresses at least one in direction.
In the following, this replacement example will be specifically described with reference to figure 4.
Fig. 4 illustrates definite sound enhancing direction and sound suppression in the sound processing method according to the embodiment of the present application At least one flow chart in direction.
As shown in figure 4, in the replacement example, determined based on the sound Sounnd source direction and described image Sounnd source direction At least one S360 that sound strengthens in direction and sound suppression direction can include:S361, determines described image Sounnd source direction Whether the image signal source direction associated with signal source is included, if it is not, then into S362, if it is, into S363; S362, determines that voiceless sound strengthens direction;Whether S363, determine described image Sounnd source direction comprising associated with signal source multiple Image signal source direction, if it is, into S364, if it is not, then into S365;S364, by one image signal source Direction is determined as sound enhancing direction;And S365, based on the sound Sounnd source direction and at least one image signal source Direction joint determines that the sound enhancing direction and the sound suppress at least one in direction.
For example, in the case where signal source is user, it can be determined that be currently based on the potential sound source that picture signal detects Whether face is included, if not including face, then it is assumed that strengthen direction currently without sound source;If comprising face, continue to judge that this is latent Whether multiple faces are included in sound source, if only including a face, the corresponding angle of output current face is sound source enhancing side To;If comprising multiple faces, according to the sound source angle based on the positioning of multiple voice signals and the sound based on picture signal positioning Source angle exports final sound source enhancing direction.
In one example, in the sound processing method according to the embodiment of the present application, based on the sound Sounnd source direction Determine that the sound enhancing direction and the sound suppress in direction extremely with least one image signal source direction joint Few one can include:Determine the first otherness of the sound Sounnd source direction and at least one image signal source direction; It is minimized in response to first otherness, determines candidate sound sound source corresponding with first otherness being minimized Direction and candidate image Sounnd source direction;It is and true based on the candidate sound Sounnd source direction and the candidate image Sounnd source direction The fixed sound enhancing direction.
For example, the sound enhancing side is determined based on the candidate sound Sounnd source direction and the candidate image Sounnd source direction To can include:By the candidate sound Sounnd source direction, the candidate image Sounnd source direction or the candidate sound sound source side Strengthen direction as the sound to the intermediate value with the candidate image Sounnd source direction.
Specifically, as described above, it is assumed that the auditory localization based on voice signal in step s 320, determines that there are M A Sounnd source direction, is denoted as d1 (i), and 0<i≤M.And, it is assumed that the auditory localization based on picture signal in step S350, determines Go out there are N number of Sounnd source direction, be denoted as d2 (j), 0<J≤N, including Nf signal source (for example, face), signal source direction note For df (j), 0<j≤Nf≤N.Calculate the first otherness c of df and d11(i, j), such as c can be expressed as1(i, j)=| sin (d1(i))-sin(df(j))|.As the first otherness c1When (i, j) is minimized, corresponding d1 (i) and df (j) most connect Closely, then enhancing direction can select the two one or calculate the angle among the two angle or the angle based on certain weight coefficient Degree is as enhancing direction.Certainly, it will be understood by those skilled in the art that the calculating of otherness is not limited only to the above method, also may be used With according to both angle interval calculations, i.e. interval is nearer, represents that difference is smaller, both are more similar.
For example, it is assumed that application scenarios include three sound sources, determined in step s 320 according to the sound source based on voice signal Position show that the first sound source is at 0 degree, and the second sound source is at 50 degree, and the 3rd sound source is at 100 degree, i.e., sound Sounnd source direction is 0 degree, 50 degree With 100 degree, be denoted as d1 (1), d1 (2), d1 (3) respectively.In addition, it is assumed that according to the sound source based on picture signal in step S350 Positioning show that user 1 is at 0 degree, and television set is at 45 degree, and user 2 is at 90 degree, and radio is at 120 degree, i.e., image Sounnd source direction is 0 Spend, 45 degree, 90 degree and 120 degree, wherein user 1 and user 2 (signal source 1 and signal source 2) is two signal sources, and direction is remembered respectively It is df (1) and df (2), it is necessary to carry out the judgement of the first otherness.By the judgement of the first otherness, it is recognised that c1(i= 1, j=1)=0, it is minimized, then directly can be determined as sound enhancing direction by 0 degree.
In addition, in one example, in the sound processing method according to the embodiment of the present application, based on the sound sound source Direction and at least one image signal source direction joint determine that the sound enhancing direction and the sound suppress in direction At least one may further include:Determine in the sound Sounnd source direction except the sound strengthen direction in addition to direction with Second otherness in the direction at least one image signal source direction in addition to the sound strengthens direction;Determine described Whether the second otherness is less than a predetermined similarity threshold;And it is less than in response to second otherness described predetermined similar Threshold value is spent, determines the side in addition to the sound strengthens direction in the sound Sounnd source direction corresponding with second otherness Suppress direction to for the sound.
Specifically, as described above, it is assumed that the auditory localization based on voice signal in step s 320, determines that there are M A Sounnd source direction, is denoted as d1 (i), and 0<I≤M, wherein removing beyond sound enhancing direction, also there are NR1 Sounnd source direction, is denoted as Dr1 (i), wherein 0<i≤NR1≤M.And, it is assumed that the auditory localization based on picture signal in step S350, determines exist N number of Sounnd source direction, is denoted as d2 (j), and 0<J≤N, wherein removing beyond sound enhancing direction, also there are NR2 Sounnd source direction, note For dr2 (j), wherein 0<j≤NR2≤N.Direction is suppressed according to dr1 (i) and dr2 (j) cascading judgements, it can be one to suppress direction A direction can also be multiple directions.Calculate the second otherness c of dr1 (i) and dr2 (j)2(i, j), such as c can be expressed as2 (i, j)=| sin (dr1 (i))-sin (dr2 (j)) |.Work as c2When (i, j) is less than certain threshold value, corresponding dr1 (i) is suppression Direction.Certainly, can also be according to both it will be understood by those skilled in the art that the calculating of otherness is not limited only to the above method Angle interval calculation, i.e. interval it is nearer, represent difference it is smaller, both are more similar.
Equally by taking above example as an example, sound Sounnd source direction is 0 degree, 50 degree and 100 degree, and sound enhancing direction is 0 degree, 0 degree is removed, is left for 50 degree and 100 degree, to be denoted as dr1 (1), dr1 (2) respectively.In addition, image Sounnd source direction for 0 degree, 45 degree, 90 degree and 120 degree, 0 degree is removed, is left, for 45 degree, 90 degree and 120 degree, to be denoted as dr2 (1), dr2 (2), dr2 (3) respectively.Assuming that Threshold value is 10 degree, then passes through the judgement of the second otherness, c2(i=1, j=1)=5, less than 10 degree of threshold value, then can for example incite somebody to action 50 degree are determined as sound and suppress direction.Certainly, the application not limited to this, for example, it is also possible to be determined as sound suppression side by 45 degree It is determined as sound to or by 47.5 degree and suppresses direction etc..
Referring back to Fig. 2, in step S220, filter factor is pre-processed based on sound pretreatment set direction.
In one example, in the sound processing method according to the embodiment of the present application, based on the sound pretreatment side It can include to selection pretreatment filter factor S220:It is pre-designed and is filtered corresponding to the enhancing filter factor of different angle and suppression Wave system number;And select enhancing filter factor corresponding with sound enhancing direction respectively and suppress direction with the sound Corresponding suppression filter factor.
Specifically, can previously according to the formation of microphone array system design different angle enhancing filter factor and Suppress filter factor, filter factor design can be designed using least square method.Strengthen filter factor and suppress filter factor After precalculated, it can be stored among corresponding storage medium, system initialization is read out, or is stored in advance in Among program.It is then possible to direction and sound suppression direction are strengthened according to corresponding sound to select corresponding enhancing filtering system Number and suppression filter factor.
Therefore, in one example, in the sound processing method according to the embodiment of the present application, it is pre-designed corresponding to not Enhancing filter factor and suppression filter factor with angle include:Formation based on the microphone array, which is pre-designed, to be corresponded to The enhancing filter factor and suppression filter factor of different angle.
In step S230, pretreatment filtering is carried out to the multiple voice signal using the pretreatment filter factor, with Obtain initial signal source signal and initial noisc source signal.
In one example, in the sound processing method according to the embodiment of the present application, the pretreatment filtering system is used It is several that pretreatment filtering is carried out to the multiple voice signal, can to obtain initial signal source signal and initial noisc source signal S230 With including:The multiple voice signal is strengthened using the enhancing filter factor and the suppression filter factor respectively Filtering and suppression filtering, to obtain the initial signal source signal and the initial noisc source signal.
Specifically, enhancing filtering is carried out to multiple voice signals by strengthening filter factor, can be mainly included Signal source signal and a small amount of initial signal source signal (for example, it is desirable to voice signal) for including noise source signal.Also, pass through suppression Filter factor processed carries out suppression filtering to multiple voice signals, can obtain mainly including noise source signal and include signal on a small quantity The initial noisc source signal (for example, suppressing noise signal) of source signal.
Next, in step S240, adaptive-filtering coefficient is determined.
In the sound processing method according to the embodiment of the present application, adaptive-filtering coefficient has initial value, can basis Initial adaptive-filtering coefficient directly performs subsequent operation.
Alternatively,, can also be right first in the real time process of sound in order to ensure the accuracy of adaptive-filtering The initial adaptive-filtering coefficient is updated.
That is, in one example, in the sound processing method according to the embodiment of the present application, determine adaptive filter Wave system number includes:Obtain initial adaptive-filtering coefficient;With believed according to the initial signal source signal and the initial noisc source Number the initial adaptive-filtering coefficient is updated.
Specifically, for example, can be updated according to equation 1 below to the initial adaptive-filtering coefficient:
W (n+1)=W (n)+μ e (n) X (n) formula 1
Wherein, W (n) is initial adaptive-filtering coefficient, and W (n+1) is the adaptive-filtering coefficient after renewal, and μ is constant, E (n) is residual signals, and X (n) is the initial noisc source signal.
In addition, residual signals e (n) can be represented by equation 2 below:
E (n)=d (n)-XT(n) W (n) formula 2
Wherein, d (n) is the initial signal source signal.
And, it is preferable that, can be in no signal source signal or signal in order to preferably determine the characteristic of noise source signal Source signal is weaker or noise source signal it is stronger in the case of update adaptive-filtering coefficient, so as to preferably match noise source signal Characteristic.
Thus, for example, in the case where signal source is user, can be believed according to initial signal source signal, initial noisc source Number and whether speaking of user (for example, it moves detection to realize by mouth) initial adaptive-filtering coefficient is updated.
In the following, it will illustrate the process for updating initial adaptive-filtering coefficient with reference to Fig. 5 so that signal source is user as an example.
Fig. 5 illustrates the flow of the adaptive-filtering coefficient update in the sound processing method according to the embodiment of the present application Figure.
As shown in figure 5, in the sound processing method according to the embodiment of the present application, renewal adaptive-filtering coefficient includes: S510, sound enhancing direction is determined whether there is based on the sound Sounnd source direction and described image Sounnd source direction, if it is, S520 is entered step, otherwise enters step S550;S520, to strengthen direction, the mouth for carrying out user moves detection in response to there are sound; S530, it is determined whether detect that user's mouth moves, if it is, entering step S540, otherwise enter step S550;S540, response In detecting that user's mouth moves, determine whether the ratio of initial signal source signal and initial noisc source signal is less than predetermined signal-to-noise ratio threshold Value, if it is, S550 is entered step, if it is not, then not performing renewal;S550, updates adaptive-filtering coefficient.
In step S510, sound enhancing is determined whether there is based on the sound Sounnd source direction and described image Sounnd source direction Direction.The presence or absence in sound enhancing direction can for example be obtained by above step S360.
Therefore, in one example, in the sound processing method according to the application, according to the initial signal source signal The initial adaptive-filtering coefficient is updated with the initial noisc source signal including:In response to based on the sound sound Source direction and described image Sounnd source direction determine that voiceless sound strengthens direction, and the initial adaptive-filtering coefficient is updated.
In step S520, the mouth for carrying out user moves detection.
For example, it can detect whether the mouth of user moves according to camera acquired image signal.
The mouth that Fig. 6 illustrates the user in the sound processing method according to the embodiment of the present application moves the flow chart of detection.
Include as shown in fig. 6, the mouth of the user in the sound processing method of the embodiment of the present application moves detection:S610, In response to determining that image sound source is user, multiple image information corresponding with the face orientation of the user is gathered;S620, is based on The multiple image infomation detection is moved with the presence or absence of mouth.
In step S610, the mouth due to identifying user by single-frame images moves relatively difficult, it is possible to according to sound Strengthen set direction record a period of time in video information or multiple image information, i.e., by real time or quasi real time in a manner of gather Multiple image information corresponding with the face orientation of the user.
Then, in step S620, moved based on the multiple image infomation detection with the presence or absence of mouth.For example, by multiframe figure Each two consecutive frame image as in is matched, if mouth position does not have notable difference, illustrate may there is no mouth to move, Otherwise there may be mouth to move.If the mouth of user has movement, illustrate that user may speak.
Fig. 5 is returned to, in step S530, it is determined whether detect that user's mouth moves.Also, do not detecting what user's mouth moved In the case of, the initial adaptive-filtering coefficient is updated.
Therefore, in one example, in the sound processing method according to the embodiment of the present application, according to the initial signal Source signal and the initial noisc source signal the initial adaptive-filtering coefficient is updated including:It is in response to signal source User, gathers multiple image information corresponding with the face orientation of the user;Based on the multiple image infomation detection whether There are mouth to move;And to be moved in response to there is no mouth, the initial adaptive-filtering coefficient is updated.
It is true based on initial signal source signal and initial noisc source signal in response to detecting that user's mouth moves in step S540 It is fixed whether to update adaptive-filtering coefficient.As described above, in the feelings that initial signal source signal is small or initial noisc source signal is big Adaptive-filtering coefficient is updated under condition.
That is, in one example, in the sound processing method according to the embodiment of the present application, based on described more Frame image information detects whether to further comprise after moving there are mouth:To be moved in response to there are mouth, determine the initial signal source letter Number whether it is less than a predetermined snr threshold with the ratio of the initial noisc source signal;It is and described initial in response to determining The ratio of signal source signal and the initial noisc source signal is less than the predetermined snr threshold, to the initial adaptive filter Wave system number is updated.
Finally, in step S550, adaptive-filtering coefficient is updated.The renewal process of the adaptive-filtering coefficient with above It is identical with reference to described in formula 1 and 2, just repeat no more herein.
Finally, in step S250, using the adaptive-filtering coefficient come to the initial signal source signal and it is described just Beginning noise source signal carries out adaptive-filtering, to obtain enhancing signal source signal.
That is, by step S250, initial noisc source signal can be based on to initial signal source signal into advancing one The adaptive-filtering of step, so that a small amount of noise source signal included in initial signal source signal is removed, so as to obtain enhancing signal Source signal.
In one example, in the sound processing method according to the embodiment of the present application, the adaptive-filtering system is used Number includes to carry out adaptive-filtering S250 to the initial signal source signal and the initial noisc source signal:Will be described initial Noise source signal carries out the initial signal source signal using the adaptive-filtering coefficient adaptive as signal is referred to Filtering, to obtain the enhancing signal source signal.
In addition, it should be noted that in the sound processing method according to the embodiment of the present application, to adaptive-filtering system In the case that number is updated, the renewal process of adaptive-filtering coefficient can using the adaptive-filtering coefficient come to institute State initial signal source signal and the initial noisc source signal carry out adaptive-filtering with obtain before enhancing signal source signal into OK, it can also after which carry out, or simultaneously carry out.
That is, in one example, it is described certainly in use in the sound processing method according to the embodiment of the present application Adaptive filtering coefficient comes after carrying out adaptive-filtering to the initial signal source signal and the initial noisc source signal, into one Step includes:The initial adaptive-filtering coefficient is carried out according to the initial signal source signal and the initial noisc source signal Renewal.
Alternatively, in one example, in the sound processing method according to the embodiment of the present application, using described adaptive While filter factor to carry out adaptive-filtering to the initial signal source signal and the initial noisc source signal, further Including:The initial adaptive-filtering coefficient is carried out more according to the initial signal source signal and the initial noisc source signal Newly.
It can be seen from the above that using the sound processing method according to the embodiment of the present application, can be gathered according to microphone array Multiple voice signal and camera acquired image signals determine sound pre-process direction;Based on the sound pretreatment side Filter factor is pre-processed to selection;Pretreatment filtering is carried out to the multiple voice signal using the pretreatment filter factor, To obtain initial signal source signal and initial noisc source signal;Determine adaptive-filtering coefficient;And use the adaptive filter Wave system number to carry out adaptive-filtering to the initial signal source signal and the initial noisc source signal, to obtain enhancing signal Source signal.Therefore, signal source signal can be strengthened by pre-processing direction based on sound, it is remote, anti-so as to fulfill detecting distance The advantages of making an uproar property is good, speech recognition accuracy lifting.
Specifically, in embodiments herein, voice signal can be obtained by microphone array, obtains multiple sound Azimuth information, by camera obtain picture signal, carry out potential sound producing body (potential sound producing body, for example, television set, people, Radio, sound equipment etc.) detection, and record image orientation information where face;According to image orientation information and sound azimuth information Obtain and it is expected speech enhan-cement direction and noise suppressed direction, it is designed according to enhancing direction and suppression set direction one Strengthen filter factor and suppress filter factor, voice signal is filtered with filter factor is suppressed according to enhancing filter factor, Obtain expectation voice signal and suppress noise signal, to do adaptive-filtering to suppressing noise signal and expectation voice signal, and And adaptive-filtering coefficient update is carried out according to picture signal and voice signal.
In this way, by the identification of sound and image, it is capable of the orientation of preferably positioning signal source (for example, people), while root It can preferably suppress the interference of directionality according to the wave filter of designed desired orientation, can by moving detection with reference to mouth Preferably to do filter update;Due to combining image and voice, the energy ratio even in signal and interference is less than below 0dB In the case of, auditory localization is still effective.
Exemplary means
Fig. 7 illustrates the block diagram of the sound processing apparatus according to the embodiment of the present application.
As shown in fig. 7, included according to the sound processing apparatus 700 of the embodiment of the present application:Sound pretreatment direction determines list Member 710, multiple voice signal and camera acquired image signals for being gathered according to microphone array determine sound Pre-process direction;Filter factor selecting unit 720 is pre-processed, for based on sound pretreatment direction-determining unit 710 institute Definite sound pretreatment set direction pretreatment filter factor;Filter unit 730 is pre-processed, for being filtered using the pretreatment Pretreatment filter factor selected by ripple coefficient limiting unit 720 carries out pretreatment filtering to the multiple voice signal, to obtain Initial signal source signal and initial noisc source signal;Adaptive-filtering factor determination unit 740, for determining adaptive-filtering system Number;And adaptive-filtering unit 750, for obtained adaptive using the adaptive-filtering factor determination unit 740 Filter factor come to it is described pretreatment the obtained initial signal source signal of filter unit 730 and the initial noisc source signal into Row adaptive-filtering, to obtain enhancing signal source signal.
In one example, in the above sound processing unit 700, the sound pretreatment direction-determining unit 710 is used In:Sound Sounnd source direction is determined according to the multiple voice signal;Image Sounnd source direction is determined according to described image signal;With And based on the sound Sounnd source direction and described image Sounnd source direction come determine sound enhancing direction and sound suppress direction in It is at least one, pre-process direction as the sound.
In one example, in the above sound processing unit 700, the sound pre-processes 710 base of direction-determining unit Determine that sound enhancing direction and sound suppress in direction at least in the sound Sounnd source direction and described image Sounnd source direction One includes:Determine whether described image Sounnd source direction includes at least one image signal source direction associated with signal source; And in response to determining that described image Sounnd source direction includes at least one image signal source direction associated with signal source, will At least one image signal source direction is determined as the sound enhancing direction.
In one example, in the above sound processing unit 700, the sound pre-processes 710 base of direction-determining unit Determine that sound enhancing direction and sound suppress in direction at least in the sound Sounnd source direction and described image Sounnd source direction One further comprises:The direction that sound described in the sound Sounnd source direction strengthens beyond direction is determined as the sound suppression Direction processed.
In one example, in the above sound processing unit 700, the sound pre-processes 710 base of direction-determining unit Determine that sound enhancing direction and sound suppress in direction at least in the sound Sounnd source direction and described image Sounnd source direction One includes:Determine whether described image Sounnd source direction includes at least one image signal source direction associated with signal source; And in response to determining that described image Sounnd source direction includes at least one image signal source direction associated with signal source, base The sound enhancing direction and described is determined in the sound Sounnd source direction and at least one image signal source direction joint Sound suppresses at least one in direction.
In one example, in the above sound processing unit 700, the sound pre-processes 710 base of direction-determining unit The sound enhancing direction and described is determined in the sound Sounnd source direction and at least one image signal source direction joint Sound suppress direction in it is at least one including:Determine the sound Sounnd source direction and at least one image signal source direction The first otherness;It is minimized, determines corresponding with first otherness being minimized in response to first otherness Candidate sound Sounnd source direction and candidate image Sounnd source direction;And based on the candidate sound Sounnd source direction and the candidate Image Sounnd source direction determines the sound enhancing direction.
In one example, in the above sound processing unit 700, the sound pre-processes 710 base of direction-determining unit Determine that the sound enhancing direction includes in the candidate sound Sounnd source direction and the candidate image Sounnd source direction:By the time Select sound Sounnd source direction, the candidate image Sounnd source direction or the candidate sound Sounnd source direction and the candidate image sound The intermediate value in source direction strengthens direction as the sound.
In one example, in the above sound processing unit 700, the sound pre-processes 710 base of direction-determining unit The sound enhancing direction and described is determined in the sound Sounnd source direction and at least one image signal source direction joint Sound suppresses at least one to further comprise in direction:Determine in the sound Sounnd source direction except the sound enhancing direction with Second difference in outer direction and the direction at least one image signal source direction in addition to the sound strengthens direction Property;Determine whether second otherness is less than a predetermined similarity threshold;And it is less than institute in response to second otherness Predetermined similarity threshold is stated, determines to remove the sound enhancing side in the sound Sounnd source direction corresponding with second otherness Suppress direction to direction in addition for the sound.
In one example, in the above sound processing unit 700, the pretreatment filter factor selecting unit 720 is used In:It is pre-designed the enhancing filter factor and suppression filter factor corresponding to different angle;And selection respectively with the sound Strengthen the corresponding enhancing filter factor in direction and suppression filter factor corresponding with sound suppression direction.
In one example, in the above sound processing unit 700, the pretreatment filter factor selecting unit 720 is pre- First design includes corresponding to the enhancing filter factor and suppression filter factor of different angle:Formation based on the microphone array It is pre-designed the enhancing filter factor and suppression filter factor corresponding to different angle.
In one example, in the above sound processing unit 700, the pretreatment filter unit 730 is used for:Make respectively Enhancing filtering is carried out to the multiple voice signal and is suppressed to filter with the enhancing filter factor and the suppression filter factor Ripple, to obtain the initial signal source signal and the initial noisc source signal.
In one example, in the above sound processing unit 700, adaptive-filtering factor determination unit 740 is used for:Obtain Take initial adaptive-filtering coefficient;With according to the initial signal source signal and the initial noisc source signal to described initial Adaptive-filtering coefficient is updated.
In one example, in the above sound processing unit 700, adaptive-filtering factor determination unit 740 is according to institute State initial signal source signal and the initial noisc source signal the initial adaptive-filtering coefficient is updated including:Response In determining that voiceless sound strengthens direction based on the sound Sounnd source direction and described image Sounnd source direction, to the initial adaptive filter Wave system number is updated.
In one example, in the above sound processing unit 700, adaptive-filtering factor determination unit 740 is according to institute State initial signal source signal and the initial noisc source signal the initial adaptive-filtering coefficient is updated including:Response It is user in signal source, gathers multiple image information corresponding with the face orientation of the user;Believed based on the multiple image Breath detects whether that there are mouth to move;And to be moved in response to there is no mouth, the initial adaptive-filtering coefficient is updated.
In one example, in the above sound processing unit 700, adaptive-filtering factor determination unit 740 is according to institute State initial signal source signal and the initial noisc source signal the initial adaptive-filtering coefficient is updated including:Response To move in there are mouth, determine whether the ratio of the initial signal source signal and the initial noisc source signal is less than a predetermined noise Compare threshold value;And in response to determining that it is described pre- that the ratio of the initial signal source signal and the initial noisc source signal is less than Determine snr threshold, the initial adaptive-filtering coefficient is updated.
In one example, in the above sound processing unit 700, the adaptive-filtering unit 750 is used for:By described in Initial noisc source signal carries out certainly the initial signal source signal using the adaptive-filtering coefficient as signal is referred to Adaptive filtering, to obtain the enhancing signal source signal.
In one example, in the above sound processing unit 700, described in 750 use of adaptive-filtering unit Adaptive-filtering coefficient comes after carrying out adaptive-filtering to the initial signal source signal and the initial noisc source signal, institute Adaptive-filtering factor determination unit 740 is stated according to the initial signal source signal and the initial noisc source signal to described first Adaptive filtering coefficient is started to be updated.
In addition, in one example, in the above sound processing unit 700, used in the adaptive-filtering unit 750 The adaptive-filtering coefficient to carry out adaptive-filtering to the initial signal source signal and the initial noisc source signal Meanwhile the adaptive-filtering factor determination unit 740 is according to the initial signal source signal and the initial noisc source signal The initial adaptive-filtering coefficient is updated.
Here, it will be understood by those skilled in the art that other details according to the sound processing apparatus of the embodiment of the present application The relevant details of the sound processing method according to the embodiment of the present application with illustrating before are identical, in order to avoid redundancy just not Repeat again.
As described above, can be integrated in processing equipment 110 according to the sound processing apparatus 700 of the embodiment of the present application, Can be the stand-alone device independent with processing equipment 110.
In one example, according to the sound processing apparatus 700 of the embodiment of the present application can be used as software module and/ Or hardware module and be integrated into the processing equipment 110.For example, the sound processing apparatus 700 can be the processing equipment 110 A software module in operating system, or can be directed to the application program that the processing equipment 110 is developed;When So, which equally can be one of numerous hardware modules of the processing equipment 110.
Alternatively, in another example, the sound processing apparatus 700 and the processing equipment 110 can also be discrete set It is standby, and the sound processing apparatus 700 can be connected to the processing equipment 110 by wired and or wireless network, and according to The data format of agreement transmits interactive information.
Example electronic device
In the following, it is described with reference to Figure 8 the electronic equipment according to the embodiment of the present application.The electronic equipment can be such as Fig. 1 institutes The processing equipment 110 or the stand-alone device independent with it shown, the stand-alone device can communicate with the processing equipment 110, with Receive from it collected input signal.
Fig. 8 illustrates the block diagram of the electronic equipment according to the embodiment of the present application.
As shown in figure 8, electronic equipment 10 includes one or more processors 11 and memory 12.
Processor 11 can be central processing unit (CPU) or have data-handling capacity and/or instruction execution capability Other forms processing unit, and can be with the other assemblies in control electronics 10 to perform desired function.
Memory 12 can include one or more computer program products, and the computer program product can include each The computer-readable recording medium of kind form, such as volatile memory and/or nonvolatile memory.The volatile storage Device is such as can include random access memory (RAM) and/or cache memory (cache).It is described non-volatile to deposit Reservoir is such as can include read-only storage (ROM), hard disk, flash memory.It can be deposited on the computer-readable recording medium The one or more computer program instructions of storage, processor 11 can run described program instruction, to realize this Shen described above The sound localization method of each embodiment please and/or other desired functions.In the computer-readable recording medium In can also store the various contents such as voice signal, picture content, filter factor.
In one example, electronic equipment 10 can also include:Input unit 13 and output device 14, these components pass through Bindiny mechanism's (not shown) interconnection of bus system and/or other forms.
For example, when the electronic equipment is the processing equipment 110, which can be above-mentioned microphone array Row, for catching the voice signal of sound source, or video camera, for catching picture signal.It is stand-alone device in the electronic equipment When, which can be communication network connector, for receiving gathered input signal from the processing equipment 110.
In addition, the input equipment 13 can also include such as keyboard, mouse etc..
The output device 14 can export various information to outside, including determine range information, directional information etc..Should The long-range output that output equipment 14 can include such as display, loudspeaker, printer and communication network and its be connected is set It is standby etc..
Certainly, to put it more simply, illustrate only some in component related with the application in the electronic equipment 10 in Fig. 8, Eliminate the component of such as bus, input/output interface etc..In addition, according to concrete application situation, electronic equipment 10 is also It can include any other appropriate component.
Illustrative computer program product and computer-readable recording medium
In addition to the above method and equipment, embodiments herein can also be computer program product, it includes meter Calculation machine programmed instruction, the computer program instructions when being run by processor so that the processor to perform this specification above-mentioned The step in the sound processing method according to the various embodiments of the application described in " illustrative methods " part.
The computer program product can be used to hold with any combination of one or more programming languages to write The program code of row the embodiment of the present application operation, described program design language include object oriented program language, such as Java, C++ etc., further include conventional procedural programming language, such as " C " language or similar programming language.Journey Sequence code can perform fully on the user computing device, partly perform on a user device, independent as one soft Part bag performs, part performs or completely in remote computing device on a remote computing on the user computing device for part Or performed on server.
In addition, embodiments herein can also be computer-readable recording medium, it is stored thereon with computer program and refers to Order, the computer program instructions by processor when being run so that the processor performs above-mentioned " the exemplary side of this specification The step in the sound processing method according to the various embodiments of the application described in method " part.
The computer-readable recording medium can use any combination of one or more computer-readable recording mediums.Computer-readable recording medium can To be readable signal medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can include but is not limited to electricity, magnetic, light, electricity Magnetic, the system of infrared ray or semiconductor, device or device, or any combination above.Readable storage medium storing program for executing is more specifically Example (non exhaustive list) includes:Electrical connection, portable disc with one or more conducting wires, hard disk, random access memory Device (RAM), read-only storage (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc Read-only storage (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
The basic principle of the application is described above in association with specific embodiment, however, it is desirable to, it is noted that in this application The advantages of referring to, advantage, effect etc. are only exemplary rather than limiting, it is impossible to which it is the application to think these advantages, advantage, effect etc. Each embodiment is prerequisite.In addition, detail disclosed above is merely to exemplary effect and the work readily appreciated With, and it is unrestricted, above-mentioned details is not intended to limit the application as that must be realized using above-mentioned concrete details.
The block diagram of device, device, equipment, system involved in the application only illustratively the example of property and is not intended to It is required that or hint must be attached in the way of square frame illustrates, arrange, configure.As it would be recognized by those skilled in the art that , it can connect, arrange by any-mode, configuring these devices, device, equipment, system.Such as " comprising ", "comprising", " tool Have " etc. word be open vocabulary, refer to " including but not limited to ", and can be used interchangeably with it.Vocabulary used herein above "or" and " and " refer to vocabulary "and/or", and can be used interchangeably with it, unless it is not such that context, which is explicitly indicated,.Here made Vocabulary " such as " refers to phrase " such as, but not limited to ", and can be used interchangeably with it.
It may also be noted that in device, apparatus and method in the application, each component or each step are to decompose And/or reconfigure.These decompose and/or reconfigure the equivalents that should be regarded as the application.
The above description of disclosed aspect is provided so that any person skilled in the art can make or use this Application.Various modifications in terms of these are readily apparent to those skilled in the art, and are defined herein General Principle can be applied to other aspect without departing from scope of the present application.Therefore, the application is not intended to be limited to Aspect shown in this, but according to the widest range consistent with principle disclosed herein and novel feature.
In order to which purpose of illustration and description has been presented for above description.In addition, this description is not intended to the reality of the application Apply example and be restricted to form disclosed herein.Although already discussed above multiple exemplary aspects and embodiment, this area skill Art personnel will be recognized that its some modifications, modification, change, addition and sub-portfolio.

Claims (20)

1. a kind of sound processing method, including:
Multiple voice signal and camera the acquired image signals gathered according to microphone array determine that sound pre-processes Direction;
Based on sound pretreatment set direction pretreatment filter factor;
Pretreatment filtering is carried out to the multiple voice signal using the pretreatment filter factor, to obtain initial signal source letter Number and initial noisc source signal;
Determine adaptive-filtering coefficient;And
The initial signal source signal and the initial noisc source signal are carried out using the adaptive-filtering coefficient adaptive It should filter, to obtain enhancing signal source signal.
2. sound processing method as claimed in claim 1, wherein, multiple voice signals for being gathered according to microphone array and Camera acquired image signal determines that sound pretreatment direction includes:
Sound Sounnd source direction is determined according to the multiple voice signal;
Image Sounnd source direction is determined according to described image signal;And
Determine that sound enhancing direction and sound suppress in direction based on the sound Sounnd source direction and described image Sounnd source direction It is at least one, as the sound pre-process direction.
3. sound processing method as claimed in claim 2, wherein, based on the sound Sounnd source direction and described image sound source side Always determine sound enhancing direction and sound suppress direction in it is at least one including:
Determine whether described image Sounnd source direction includes at least one image signal source direction associated with signal source;And
In response to determining that described image Sounnd source direction includes at least one image signal source direction associated with signal source, by institute State at least one image signal source direction and be determined as the sound enhancing direction.
4. sound processing method as claimed in claim 3, based on the sound Sounnd source direction and described image Sounnd source direction come Determine that sound enhancing direction and sound suppress at least one to further comprise in direction:
The direction that sound described in the sound Sounnd source direction strengthens beyond direction is determined as the sound and suppresses direction.
5. sound processing method as claimed in claim 2, wherein, based on the sound Sounnd source direction and described image sound source side Always determine sound enhancing direction and sound suppress direction in it is at least one including:
Determine whether described image Sounnd source direction includes at least one image signal source direction associated with signal source;And
In response to determining that described image Sounnd source direction includes at least one image signal source direction associated with signal source, it is based on The sound Sounnd source direction and at least one image signal source direction joint determine the sound enhancing direction and the sound Sound suppresses at least one in direction.
6. sound processing method as claimed in claim 5, wherein, based on the sound Sounnd source direction and at least one figure As signal source direction joint determine sound enhancing direction and the sound suppress in direction it is at least one including:
Determine the first otherness of the sound Sounnd source direction and at least one image signal source direction;
It is minimized in response to first otherness, determines candidate sound corresponding with first otherness being minimized Sounnd source direction and candidate image Sounnd source direction;And
The sound enhancing direction is determined based on the candidate sound Sounnd source direction and the candidate image Sounnd source direction.
7. sound processing method as claimed in claim 6, wherein, schemed based on the candidate sound Sounnd source direction and the candidate As Sounnd source direction determines that the sound enhancing direction includes:
By the candidate sound Sounnd source direction, the candidate image Sounnd source direction or the candidate sound Sounnd source direction and institute State the intermediate value of candidate image Sounnd source direction strengthens direction as the sound.
8. sound processing method as claimed in claim 6, wherein, based on the sound Sounnd source direction and at least one figure As signal source direction joint determines sound enhancing direction and the sound suppresses in direction at least one further comprises:
Determine the direction in addition to the sound strengthens direction and at least one picture signal in the sound Sounnd source direction Second otherness in the direction in the direction of source in addition to the sound strengthens direction;
Determine whether second otherness is less than a predetermined similarity threshold;And
It is less than the predetermined similarity threshold in response to second otherness, determines corresponding with second otherness described Direction in sound Sounnd source direction in addition to the sound strengthens direction suppresses direction for the sound.
9. sound processing method as claimed in claim 2, wherein, based on sound pretreatment set direction pretreatment filtering Coefficient includes:
It is pre-designed the enhancing filter factor and suppression filter factor corresponding to different angle;And
Selection enhancing filter factor corresponding with sound enhancing direction and suppression corresponding with sound suppression direction respectively Filter factor processed.
10. sound processing method as claimed in claim 9, wherein, the enhancing being pre-designed corresponding to different angle filters system Number and suppression filter factor include:
Formation based on the microphone array is pre-designed enhancing filter factor and suppression filtering system corresponding to different angle Number.
11. sound processing method as claimed in claim 9, wherein, using the pretreatment filter factor to the multiple sound Sound signal carries out pretreatment filtering, is included with obtaining initial signal source signal and initial noisc source signal:
Enhancing filter is carried out to the multiple voice signal using the enhancing filter factor and the suppression filter factor respectively Ripple and suppression filter, to obtain the initial signal source signal and the initial noisc source signal.
12. sound processing method as claimed in claim 2, wherein it is determined that adaptive-filtering coefficient includes:
Obtain initial adaptive-filtering coefficient;With
The initial adaptive-filtering coefficient is carried out more according to the initial signal source signal and the initial noisc source signal Newly.
13. sound processing method as claimed in claim 12, wherein, described make an uproar according to the initial signal source signal and initially Sound-source signal the initial adaptive-filtering coefficient is updated including:
In response to determining that voiceless sound strengthens direction based on the sound Sounnd source direction and described image Sounnd source direction, to described initial Adaptive-filtering coefficient is updated.
14. sound processing method as claimed in claim 12, wherein, described make an uproar according to the initial signal source signal and initially Sound-source signal the initial adaptive-filtering coefficient is updated including:
It is user in response to signal source, gathers multiple image information corresponding with the face orientation of the user;
Moved based on the multiple image infomation detection with the presence or absence of mouth;And
To be moved in response to there is no mouth, the initial adaptive-filtering coefficient is updated.
15. sound processing method as claimed in claim 14, further comprises:
To be moved in response to there are mouth, determine whether the ratio of the initial signal source signal and the initial noisc source signal is less than one Predetermined snr threshold;And
Ratio in response to determining the initial signal source signal and the initial noisc source signal is less than the predetermined signal-to-noise ratio Threshold value, is updated the initial adaptive-filtering coefficient.
16. sound processing method as claimed in claim 1, wherein, using the adaptive-filtering coefficient come to described initial Signal source signal and the initial noisc source signal, which carry out adaptive-filtering, to be included:
Using the initial noisc source signal as signal is referred to, using the adaptive-filtering coefficient come to the initial signal source Signal carries out adaptive-filtering, to obtain the enhancing signal source signal.
17. sound processing method as claimed in claim 1, wherein, using the adaptive-filtering coefficient come to it is described just After beginning signal source signal and the initial noisc source signal carry out adaptive-filtering, further comprise:
The initial adaptive-filtering coefficient is carried out more according to the initial signal source signal and the initial noisc source signal Newly.
18. a kind of sound processing apparatus, including:
Sound pre-processes direction-determining unit, and multiple voice signals and camera for being gathered according to microphone array are adopted The picture signal of collection determines that sound pre-processes direction;
Filter factor selecting unit is pre-processed, for based on sound pretreatment set direction pretreatment filter factor;
Filter unit is pre-processed, for carrying out pretreatment filter to the multiple voice signal using the pretreatment filter factor Ripple, to obtain initial signal source signal and initial noisc source signal;
Adaptive-filtering factor determination unit, for determining adaptive-filtering coefficient;And
Adaptive-filtering unit, for using the adaptive-filtering coefficient come to the initial signal source signal and described initial Noise source signal carries out adaptive-filtering, to obtain enhancing signal source signal.
19. a kind of electronic equipment, including:
Processor;And
Memory, is stored with computer program instructions, the computer program instructions are by the processing in the memory Device causes the processor to perform the sound processing method as any one of claim 1-17 when running.
20. a kind of computer-readable recording medium, is stored thereon with computer program instructions, the computer program instructions are in quilt Processor causes the processor to perform the sound processing method as any one of claim 1-17 when running.
CN201711258117.XA 2017-12-04 2017-12-04 Sound processing method, device and electronic equipment Pending CN107993671A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711258117.XA CN107993671A (en) 2017-12-04 2017-12-04 Sound processing method, device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711258117.XA CN107993671A (en) 2017-12-04 2017-12-04 Sound processing method, device and electronic equipment

Publications (1)

Publication Number Publication Date
CN107993671A true CN107993671A (en) 2018-05-04

Family

ID=62035358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711258117.XA Pending CN107993671A (en) 2017-12-04 2017-12-04 Sound processing method, device and electronic equipment

Country Status (1)

Country Link
CN (1) CN107993671A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108766457A (en) * 2018-05-30 2018-11-06 北京小米移动软件有限公司 Acoustic signal processing method, device, electronic equipment and storage medium
CN108806711A (en) * 2018-08-07 2018-11-13 吴思 A kind of extracting method and device
CN108920640A (en) * 2018-07-02 2018-11-30 北京百度网讯科技有限公司 Context acquisition methods and equipment based on interactive voice
CN109147813A (en) * 2018-09-21 2019-01-04 神思电子技术股份有限公司 A kind of service robot noise-reduction method based on audio-visual location technology
CN110503970A (en) * 2018-11-23 2019-11-26 腾讯科技(深圳)有限公司 A kind of audio data processing method, device and storage medium
CN110858943A (en) * 2018-08-24 2020-03-03 纬创资通股份有限公司 Sound reception processing device and sound reception processing method thereof
WO2020043007A1 (en) * 2018-08-27 2020-03-05 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method, system, and computer-readable medium for purifying voice using depth information
CN111402912A (en) * 2020-02-18 2020-07-10 云知声智能科技股份有限公司 Voice signal noise reduction method and device
CN111435598A (en) * 2019-01-15 2020-07-21 北京地平线机器人技术研发有限公司 Voice signal processing method and device, computer readable medium and electronic equipment
WO2020173156A1 (en) * 2019-02-27 2020-09-03 北京地平线机器人技术研发有限公司 Method, device and electronic device for controlling audio playback of multiple loudspeakers
CN111863005A (en) * 2019-04-28 2020-10-30 北京地平线机器人技术研发有限公司 Sound signal acquisition method and device, storage medium and electronic equipment
CN112216295A (en) * 2019-06-25 2021-01-12 大众问问(北京)信息科技有限公司 Sound source positioning method, device and equipment
CN112509571A (en) * 2019-08-27 2021-03-16 富士通个人电脑株式会社 Information processing apparatus and recording medium
WO2021078116A1 (en) * 2019-10-21 2021-04-29 维沃移动通信有限公司 Video processing method and electronic device
CN113056925A (en) * 2018-08-06 2021-06-29 阿里巴巴集团控股有限公司 Method and device for detecting sound source position
CN113544775A (en) * 2019-03-06 2021-10-22 缤特力股份有限公司 Audio signal enhancement for head-mounted audio devices
CN113574597A (en) * 2018-12-21 2021-10-29 弗劳恩霍夫应用研究促进协会 Apparatus and method for source separation using estimation and control of sound quality
CN114420144A (en) * 2020-10-09 2022-04-29 雅马哈株式会社 Audio signal processing method and audio signal processing device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228993A (en) * 2016-09-29 2016-12-14 北京奇艺世纪科技有限公司 A kind of method and apparatus eliminating noise and electronic equipment
CN106328156A (en) * 2016-08-22 2017-01-11 华南理工大学 Microphone array voice reinforcing system and microphone array voice reinforcing method with combination of audio information and video information
CN106653041A (en) * 2017-01-17 2017-05-10 北京地平线信息技术有限公司 Audio signal processing equipment and method as well as electronic equipment
CN106782584A (en) * 2016-12-28 2017-05-31 北京地平线信息技术有限公司 Audio signal processing apparatus, method and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106328156A (en) * 2016-08-22 2017-01-11 华南理工大学 Microphone array voice reinforcing system and microphone array voice reinforcing method with combination of audio information and video information
CN106228993A (en) * 2016-09-29 2016-12-14 北京奇艺世纪科技有限公司 A kind of method and apparatus eliminating noise and electronic equipment
CN106782584A (en) * 2016-12-28 2017-05-31 北京地平线信息技术有限公司 Audio signal processing apparatus, method and electronic equipment
CN106653041A (en) * 2017-01-17 2017-05-10 北京地平线信息技术有限公司 Audio signal processing equipment and method as well as electronic equipment

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10798483B2 (en) 2018-05-30 2020-10-06 Beijing Xiaomi Mobile Software Co., Ltd. Audio signal processing method and device, electronic equipment and storage medium
CN108766457A (en) * 2018-05-30 2018-11-06 北京小米移动软件有限公司 Acoustic signal processing method, device, electronic equipment and storage medium
CN108920640B (en) * 2018-07-02 2020-12-22 北京百度网讯科技有限公司 Context obtaining method and device based on voice interaction
CN108920640A (en) * 2018-07-02 2018-11-30 北京百度网讯科技有限公司 Context acquisition methods and equipment based on interactive voice
CN113056925A (en) * 2018-08-06 2021-06-29 阿里巴巴集团控股有限公司 Method and device for detecting sound source position
CN108806711A (en) * 2018-08-07 2018-11-13 吴思 A kind of extracting method and device
CN110858943A (en) * 2018-08-24 2020-03-03 纬创资通股份有限公司 Sound reception processing device and sound reception processing method thereof
US11842745B2 (en) 2018-08-27 2023-12-12 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method, system, and computer-readable medium for purifying voice using depth information
WO2020043007A1 (en) * 2018-08-27 2020-03-05 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method, system, and computer-readable medium for purifying voice using depth information
CN109147813A (en) * 2018-09-21 2019-01-04 神思电子技术股份有限公司 A kind of service robot noise-reduction method based on audio-visual location technology
CN110503970A (en) * 2018-11-23 2019-11-26 腾讯科技(深圳)有限公司 A kind of audio data processing method, device and storage medium
CN113574597B (en) * 2018-12-21 2024-04-12 弗劳恩霍夫应用研究促进协会 Apparatus and method for source separation using estimation and control of sound quality
CN113574597A (en) * 2018-12-21 2021-10-29 弗劳恩霍夫应用研究促进协会 Apparatus and method for source separation using estimation and control of sound quality
CN111435598A (en) * 2019-01-15 2020-07-21 北京地平线机器人技术研发有限公司 Voice signal processing method and device, computer readable medium and electronic equipment
US11817112B2 (en) 2019-01-15 2023-11-14 Beijing Horizon Robotics Technology Research And Development Co., Ltd. Method, device, computer readable storage medium and electronic apparatus for speech signal processing
CN111435598B (en) * 2019-01-15 2023-08-18 北京地平线机器人技术研发有限公司 Voice signal processing method, device, computer readable medium and electronic equipment
CN111629301B (en) * 2019-02-27 2021-12-31 北京地平线机器人技术研发有限公司 Method and device for controlling multiple loudspeakers to play audio and electronic equipment
WO2020173156A1 (en) * 2019-02-27 2020-09-03 北京地平线机器人技术研发有限公司 Method, device and electronic device for controlling audio playback of multiple loudspeakers
US11856379B2 (en) 2019-02-27 2023-12-26 Beijing Horizon Robotics Technology Research And Development Co., Ltd. Method, device and electronic device for controlling audio playback of multiple loudspeakers
CN111629301A (en) * 2019-02-27 2020-09-04 北京地平线机器人技术研发有限公司 Method and device for controlling multiple loudspeakers to play audio and electronic equipment
US11664042B2 (en) 2019-03-06 2023-05-30 Plantronics, Inc. Voice signal enhancement for head-worn audio devices
CN113544775A (en) * 2019-03-06 2021-10-22 缤特力股份有限公司 Audio signal enhancement for head-mounted audio devices
CN111863005B (en) * 2019-04-28 2024-09-27 北京地平线机器人技术研发有限公司 Sound signal acquisition method and device, storage medium and electronic equipment
CN111863005A (en) * 2019-04-28 2020-10-30 北京地平线机器人技术研发有限公司 Sound signal acquisition method and device, storage medium and electronic equipment
CN112216295A (en) * 2019-06-25 2021-01-12 大众问问(北京)信息科技有限公司 Sound source positioning method, device and equipment
CN112216295B (en) * 2019-06-25 2024-04-26 大众问问(北京)信息科技有限公司 Sound source positioning method, device and equipment
CN112509571A (en) * 2019-08-27 2021-03-16 富士通个人电脑株式会社 Information processing apparatus and recording medium
WO2021078116A1 (en) * 2019-10-21 2021-04-29 维沃移动通信有限公司 Video processing method and electronic device
CN111402912A (en) * 2020-02-18 2020-07-10 云知声智能科技股份有限公司 Voice signal noise reduction method and device
CN114420144A (en) * 2020-10-09 2022-04-29 雅马哈株式会社 Audio signal processing method and audio signal processing device

Similar Documents

Publication Publication Date Title
CN107993671A (en) Sound processing method, device and electronic equipment
CN110600017B (en) Training method of voice processing model, voice recognition method, system and device
CN107240395B (en) Acoustic model training method and device, computer equipment and storage medium
DE112017003563B4 (en) METHOD AND SYSTEM OF AUTOMATIC LANGUAGE RECOGNITION USING POSTERIORI TRUST POINT NUMBERS
EP3480820B1 (en) Electronic device and method for processing audio signals
CN107799126B (en) Voice endpoint detection method and device based on supervised machine learning
CN108172213B (en) Surge audio identification method, surge audio identification device, surge audio identification equipment and computer readable medium
CN112183166B (en) Method and device for determining training samples and electronic equipment
EP3444809B1 (en) Personalized speech recognition method and system
US12046237B2 (en) Speech interaction method and apparatus, computer readable storage medium and electronic device
CN110473568B (en) Scene recognition method and device, storage medium and electronic equipment
CN112183107B (en) Audio processing method and device
CN112885328B (en) Text data processing method and device
CN110837758B (en) Keyword input method and device and electronic equipment
CN109947971B (en) Image retrieval method, image retrieval device, electronic equipment and storage medium
CN113516990A (en) Voice enhancement method, method for training neural network and related equipment
CN111833554A (en) Ticket selling machine, ticket selling machine system, ticket selling method and ticket selling device
CN112216307A (en) Speech emotion recognition method and device
CN111581470A (en) Multi-modal fusion learning analysis method and system for dialog system context matching
US20190348062A1 (en) System and method for encoding data using time shift in an audio/image recognition integrated circuit solution
CN112002346A (en) Gender and age identification method, device, equipment and storage medium based on voice
EP2503545A1 (en) Arrangement and method relating to audio recognition
CN111400463B (en) Dialogue response method, device, equipment and medium
CN110992971A (en) Method for determining voice enhancement direction, electronic equipment and storage medium
Paleček Experimenting with lipreading for large vocabulary continuous speech recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180504